RAG Eval: Blind A/B Agent Tester
RAG Eval: Blind A/B Agent Tester ships as a ready-to-import n8n automation package: load the bundled workflow JSON, connect the credentials listed on this page, and activate it on self-hosted n8n or n8n Cloud.
Compare two AI models head-to-head on any test suite. Runs both models, anonymizes outputs, scores with a judge model, aggregates stats, and delivers a markdown findings report to Google Sheets — ready in minutes.
- Run any two OpenAI models head-to-head on a configurable set of test questions
- Anonymize both responses so the judge scores blind — no model-name bias
- Score each response on accuracy, completeness, clarity, and conciseness (1-10)
- Aggregate per-dimension averages, win rates, and an overall recommendation automatically
- Generate a concise markdown evaluation report ready for your team or stakeholders
- Log every run summary to Google Sheets for trend tracking and audit history
Setup Service — optional live install and test. Choose your tier at checkout (not tied to this template's price).
Paste the license key from your receipt. It must match this template.
Workflow summary
One workflow handles everything end-to-end.
- 1A manual trigger starts a new evaluation run using the models and test cases defined in the Config node
- 2Test questions are parsed from the Config node and split into individual evaluation items
- 3Each question is sent to the baseline model and the response is captured for comparison
- 4The same question is sent to the candidate model to produce the competing response
- 5A judge prompt is assembled that presents both responses anonymously (Response A and Response B)
- 6A judge model scores both responses 1-10 on accuracy, completeness, clarity, and conciseness
- 7Scores are parsed from the judge's JSON output and validated for each test case
- 8All per-case scores are aggregated into average scores, win rates, and an overall recommendation
- 9A synthesis model writes a concise markdown evaluation report from the aggregated statistics
- 10The final run summary — scores, recommendation, and report — is appended to Google Sheets
What's included with your purchase
Every checkout includes the same practical package so you are never hunting for missing files mid-setup.
- 01Workflow JSON exports ready to import into your n8n workspace (self-hosted or n8n Cloud).
- 02Step-by-step guide that walks credential creation, node-by-node checks, and activation.
- 03Credential checklist for OpenAI, Google Sheets. Email support@n8ntemplatestore.com if you get stuck — replies within 24 hours (SLA).
- 04Perpetual (lifetime) use license Your one-time purchase includes an ongoing right to use this template package on n8n instances you control for your own automation — not for resale or public redistribution of the files as a product. Terms & conditions.
- 05We keep the copyright Workflow JSON, guides, screenshots, and bundled assets stay our copyrighted works (or our licensors'). Payment grants the limited license in our Terms only — it does not transfer ownership.
Need a second opinion before buying? Compare the full catalog or read the support policies.
Three steps to a running n8n automation
Import the workflow into n8n
Paste your API keys into the credentials panel
Press ‘activate’ — you’ll get your first report tomorrow morning
Need help setting up this n8n template?
Beginner-friendly guidance is included. Our support team will help you if you get stuck — email support@n8ntemplatestore.com or visit support.
About this template
How long does it take to set up RAG Eval: Blind A/B Agent Tester?
What if I get stuck?
Do I need to pay for OpenAI?
Can I customize the workflow?
What if it doesn’t work for me?
Related n8n workflow templates
Data Gatekeeper — Preflight & Failure Logging
Pre-built n8n workflow template that automates data processing with Google Sheets. Live in about 10 minutes.
Meta Audience Drift Guard
Pre-built n8n workflow template that automates marketing with Meta Marketing API. Live in about 10 minutes.
Community Launch Accelerator
Pre-built n8n workflow template that automates marketing with OpenAI. Live in about 10 minutes.