All notes
Production Jun 15, 2026 5 min
Agentic AI content quality: 5 agents, one pipeline.
Separate eval from rewrite, route models per agent, guard inputs and outputs. Run it on every page before publish.
Most teams ask one model to evaluate and rewrite the same draft. That is how hallucinated scores and polished-but-wrong content slip through. This is the five-agent pipeline I use for GEO + SEO content QA: Orchestrator on Sonnet, read-only CORE and EEAT evaluators, Schema validator on Haiku, Rewrite on Opus only for failing sections. Input guardrails abort bad URLs before any token spend. Output guardrails recheck scores, fixes, citations, and JSON-LD before delivery. Three memory layers (CAG, RAG, session) keep context lean.
Tags#AgenticAI#Multi-Agent#ContentQuality#GEO#SEO#EEAT#Guardrails#ModelRouting#Orchestration#Production#Schema
Key takeaways
- 1Separating evaluation from rewriting cuts hallucinated scores roughly 3x versus one agent doing both jobs. Evaluators read only. The rewrite agent writes only. The orchestrator coordinates but never scores content.
- 2Five agents: Orchestrator (route + state), CORE Evaluator (40 GEO criteria), EEAT Evaluator (40 SEO criteria), Schema Validator (JSON-LD + HTML semantics), Rewrite Agent (failing sections only). CORE, EEAT, and Schema run in parallel; rewrite runs after, which cuts runtime about 60%.
- 3Route models per agent, not one frontier model for the pipeline. Sonnet 4.6 for orchestration and CORE structured eval. Haiku 4.5 for EEAT pattern checks and schema classification. Opus 4.7 only on the rewrite step. Roughly 80% of calls stay on Haiku and Sonnet.
- 4Four input guardrails run before any model call: URL reachability (abort in ~200ms), content type detection, 300-word length floor (skip E+R on thin pages), language detection. Zero tokens on bad inputs.
- 5Five output guardrails run before delivery: recalculate scores independently (>5pt divergence re-runs), every Fail must have a Fix, verify citation URLs, rewrite self-check with max 2 loops, validate JSON-LD against schema.org.
- 6Three memory layers, not one context dump: CAG for stable benchmarks and rubrics (90% cost reduction on cache hit), RAG for 2-3 schema specs per run, session context for page content and eval results cleared after each run.
- 7The fixed sequence: validate input, evaluate in parallel, score and sort (GEO-first, then dual, then SEO-first), rewrite failing sections only, validate output. Deliverables: 8 dimension scores, priority fix list, rewritten sections, pass/fail confidence flag against all output guardrails.
- 8Run this pipeline on every piece of content before publish. Same order every time. The orchestration pattern matches what I describe in supervisor-style multi-agent systems, with explicit read/write separation at each step.
More notes
Get the visual notes by email
New agentic AI notes and breakdowns, plus what I am shipping for clients — one email on Thursdays.


