Data ProcessingIntermediate✓ Test-run on a live n8n instance

RAG Eval: Blind A/B Agent Tester

RAG Eval: Blind A/B Agent Tester ships as a ready-to-import n8n automation package: load the bundled workflow JSON, connect the credentials listed on this page, and activate it on self-hosted n8n or n8n Cloud.

Compare two AI models head-to-head on any test suite. Runs both models, anonymizes outputs, scores with a judge model, aggregates stats, and delivers a markdown findings report to Google Sheets — ready in minutes.

Run any two OpenAI models head-to-head on a configurable set of test questions
Anonymize both responses so the judge scores blind — no model-name bias
Score each response on accuracy, completeness, clarity, and conciseness (1-10)
Aggregate per-dimension averages, win rates, and an overall recommendation automatically
Generate a concise markdown evaluation report ready for your team or stakeholders
Log every run summary to Google Sheets for trend tracking and audit history

OpenAIGoogle Sheets

n8n.io / workflow / rag-eval-blind-a-b-workflow

One-time

$19

~8 hrs / week

time back

Setup Service — optional live install and test. Choose your tier at checkout (not tied to this template's price).

Instant download after payment

Refund as per the Refund Policy for downloadable templates.

Email Support · 24h SLA

Lifetime updates

APIs required

OpenAIGoogle Sheets

Already purchased?

Download RAG Eval: Blind A/B Agent Tester

Paste the license key from your receipt. It must match this template.

How it works

Workflow summary

One workflow handles everything end-to-end.

1A manual trigger starts a new evaluation run using the models and test cases defined in the Config node
2Test questions are parsed from the Config node and split into individual evaluation items
3Each question is sent to the baseline model and the response is captured for comparison
4The same question is sent to the candidate model to produce the competing response
5A judge prompt is assembled that presents both responses anonymously (Response A and Response B)
6A judge model scores both responses 1-10 on accuracy, completeness, clarity, and conciseness
7Scores are parsed from the judge's JSON output and validated for each test case
8All per-case scores are aggregated into average scores, win rates, and an overall recommendation
9A synthesis model writes a concise markdown evaluation report from the aggregated statistics
10The final run summary — scores, recommendation, and report — is appended to Google Sheets

Why this exists

AI engineers unable to tell objectively whether a new RAG configuration outperformed the previous one, relying on developer instinct and cherry-picked examples instead of systematic scoring.

“We shipped a new retrieval strategy and declared it better because 'it felt better on the examples we tested.' We had no systematic eval. That's embarrassing.”

Source: Hacker News thread on RAG evaluation

Deliverables

What's included with your purchase

Every checkout includes the same practical package so you are never hunting for missing files mid-setup.

01Workflow JSON exports ready to import into your n8n workspace (self-hosted or n8n Cloud).
02Step-by-step guide that walks credential creation, node-by-node checks, and activation.
03Credential checklist for OpenAI, Google Sheets. Email support@n8ntemplatestore.com if you get stuck — replies within 24 hours (SLA).
04Perpetual (lifetime) use license Your one-time purchase includes an ongoing right to use this template package on n8n instances you control for your own automation — not for resale or public redistribution of the files as a product. Terms & conditions.
05We keep the copyright Workflow JSON, guides, screenshots, and bundled assets stay our copyrighted works (or our licensors'). Payment grants the limited license in our Terms only — it does not transfer ownership.

Need a second opinion before buying? Compare the full catalog, browse more Data Processing templates, see how n8n compares to Zapier and Make, or read the support policies.

Setup

Three steps to a running n8n automation

Import the workflow into n8n

Paste your API keys into the credentials panel

Press ‘activate’ — you’ll get your first report tomorrow morning

Support

Need help setting up this n8n template?

Beginner-friendly guidance is included. Our support team will help you if you get stuck — email support@n8ntemplatestore.com or visit support.

Email support

Replies within 24 hours (SLA)

Contact

Setup Service4 tiers — your choice

Optional live session — we install, configure, and test on a screen share. Add Setup Serviceat checkout and pick any tier; pricing is not tied to this template's list price.

Book setup

FAQ

About this template

How long does it take to set up RAG Eval: Blind A/B Agent Tester?

Most people finish in minutes following the guide. If you’ve never used n8n before, budget an hour — the first workflow is always the slowest.

What if I get stuck?

Email support@n8ntemplatestore.com. Free basic support is included with every purchase, and you’ll get a reply from our team within 24 hours. If setup needs more than email, we can schedule a call.

Do I need to pay for OpenAI?

API costs are separate and go directly to the provider — typically a few dollars a month at most for a solo business. Full breakdown is in the setup guide.

Can I customize the workflow?

Yes, completely. You own the .json — edit nodes, swap in your own prompts, re-wire the logic. Our support team can help you plan customizations over email.

What if it doesn’t work for me?

Refunds are issued only as described in our Refund Policy (https://n8ntemplatestore.com/refund-policy). Or add Setup Service at checkout for a live install and test session — pick any tier you prefer.