Skip to main content
Lifetime license included with every purchase
Data ProcessingIntermediate

RAG Eval: Blind A/B Agent Tester

RAG Eval: Blind A/B Agent Tester ships as a ready-to-import n8n automation package: load the bundled workflow JSON, connect the credentials listed on this page, and activate it on self-hosted n8n or n8n Cloud.

Compare two AI models head-to-head on any test suite. Runs both models, anonymizes outputs, scores with a judge model, aggregates stats, and delivers a markdown findings report to Google Sheets — ready in minutes.

  • Run any two OpenAI models head-to-head on a configurable set of test questions
  • Anonymize both responses so the judge scores blind — no model-name bias
  • Score each response on accuracy, completeness, clarity, and conciseness (1-10)
  • Aggregate per-dimension averages, win rates, and an overall recommendation automatically
  • Generate a concise markdown evaluation report ready for your team or stakeholders
  • Log every run summary to Google Sheets for trend tracking and audit history
OpenAIGoogle Sheets
n8n.io / workflow / rag-eval-blind-a-b-workflow
TriggerEvery DayGoogle AdsFetch adsMeta AdsFetch adsAI AnalysisSummariseGmailSend reportSheetsLog history
One-time
$19
~8 hrs / week
time back

Setup Serviceoptional live install and test. Choose your tier at checkout (not tied to this template's price).

Instant download after payment
Refund as per the Refund Policy for downloadable templates.
Email Support · 24h SLA
Lifetime updates

APIs required
OpenAIGoogle Sheets
Already purchased?
Download RAG Eval: Blind A/B Agent Tester

Paste the license key from your receipt. It must match this template.

How it works

Workflow summary

One workflow handles everything end-to-end.

  1. 1A manual trigger starts a new evaluation run using the models and test cases defined in the Config node
  2. 2Test questions are parsed from the Config node and split into individual evaluation items
  3. 3Each question is sent to the baseline model and the response is captured for comparison
  4. 4The same question is sent to the candidate model to produce the competing response
  5. 5A judge prompt is assembled that presents both responses anonymously (Response A and Response B)
  6. 6A judge model scores both responses 1-10 on accuracy, completeness, clarity, and conciseness
  7. 7Scores are parsed from the judge's JSON output and validated for each test case
  8. 8All per-case scores are aggregated into average scores, win rates, and an overall recommendation
  9. 9A synthesis model writes a concise markdown evaluation report from the aggregated statistics
  10. 10The final run summary — scores, recommendation, and report — is appended to Google Sheets
Deliverables

What's included with your purchase

Every checkout includes the same practical package so you are never hunting for missing files mid-setup.

  • 01Workflow JSON exports ready to import into your n8n workspace (self-hosted or n8n Cloud).
  • 02Step-by-step guide that walks credential creation, node-by-node checks, and activation.
  • 03Credential checklist for OpenAI, Google Sheets. Email support@n8ntemplatestore.com if you get stuck — replies within 24 hours (SLA).
  • 04Perpetual (lifetime) use license Your one-time purchase includes an ongoing right to use this template package on n8n instances you control for your own automation — not for resale or public redistribution of the files as a product. Terms & conditions.
  • 05We keep the copyright Workflow JSON, guides, screenshots, and bundled assets stay our copyrighted works (or our licensors'). Payment grants the limited license in our Terms only — it does not transfer ownership.

Need a second opinion before buying? Compare the full catalog or read the support policies.

Setup

Three steps to a running n8n automation

01

Import the workflow into n8n

02

Paste your API keys into the credentials panel

03

Press ‘activate’ — you’ll get your first report tomorrow morning

Support

Need help setting up this n8n template?

Beginner-friendly guidance is included. Our support team will help you if you get stuck — email support@n8ntemplatestore.com or visit support.

Email support
Replies within 24 hours (SLA)
Contact
Setup Service4 tiers — your choice
Optional live session — we install, configure, and test on a screen share. Add Setup Serviceat checkout and pick any tier; pricing is not tied to this template's list price.
Book setup
FAQ

About this template

How long does it take to set up RAG Eval: Blind A/B Agent Tester?
Most people finish in minutes following the guide. If you’ve never used n8n before, budget an hour — the first workflow is always the slowest.
What if I get stuck?
Email support@n8ntemplatestore.com. Free basic support is included with every purchase, and you’ll get a reply from our team within 24 hours. If setup needs more than email, we can schedule a call.
Do I need to pay for OpenAI?
API costs are separate and go directly to the provider — typically a few dollars a month at most for a solo business. Full breakdown is in the setup guide.
Can I customize the workflow?
Yes, completely. You own the .json — edit nodes, swap in your own prompts, re-wire the logic. Our support team can help you plan customizations over email.
What if it doesn’t work for me?
Refunds are issued only as described in our Refund Policy (https://n8ntemplatestore.com/refund-policy). Or add Setup Service at checkout for a live install and test session — pick any tier you prefer.