What is a router agent?

A cheap model (Haiku or Flash) that classifies the incoming task and routes execution to the appropriate model tier. Build it before tuning prompts on frontier models.

Which tasks belong on Tier 1?

Intent detection, query classification, entity extraction, language detection, topic categorization. Anything that is pattern matching, not reasoning.

When should I use Opus?

Multi-step planning, deep reasoning, hard code generation, and agentic orchestration where fewer steps justify higher per-token price. Not on every sub-task.

How do I know if I have a routing problem?

Audit last month's bill by model ID. If roughly 80% of spend is frontier tier, you are over-routing to expensive models.

Should I promote routes stack-wide?

No. Run shadow mode first, then promote per route with evals. Stack-wide flips hide which routes actually need frontier models.

All notes

In this note (5 sections)

Architecture Jun 3, 2026Updated Jul 6, 2026 9 min

Stop paying frontier prices for classification.

Last updated on Jul 6, 2026

Four model tiers. Build the router agent first. Same quality, up to 10x cost spread if you route wrong.

Introduction

I open client AWS and Anthropic bills and the pattern repeats: Opus on intent detection, Opus on language classification, Opus on "is this a refund question." Same output quality. Ten times the cost.

Model routing is not a nice-to-have anymore. It is the difference between a demo budget and a production budget. This note is the four-tier map I draw before anyone touches a prompt.

Four model tiers

Task-to-tier routing (June 2026 picks)

Tier	Job	Example models
Tier 1: Small, fast, cheap	Classification, routing, entity extraction	Haiku 4.5, Gemini 3.5 Flash, GPT-5.4 nano
Tier 2: Mid-tier workhorse	Tool use, structured JSON, most agent steps	Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro
Tier 3: Frontier reasoning	Multi-step planning, hard code, orchestration	Opus 4.8, GPT-5.5, Gemini 3.1 Pro Deep Think
Tier 4: Embeddings	Semantic search, document retrieval (not generation)	BGE-M3, Qwen3-Embedding-8B, GTE-Qwen2

Build the router agent first

A cheap model classifies the task, then hands execution to the right tier. The router needs typed output, exit conditions, and an eval suite built from production mis-routes.

01
Define route taxonomy
Classification, tool call, reasoning, retrieval. Four routes, four tiers. No fifth category until you have data.
02
Run shadow mode
Log what the router would have picked without changing production routes. Compare against actual outcomes for two weeks.
03
Promote per route
Flip one route at a time, not stack-wide. Measure cost per completed task on your eval set, not list price per token.
04
Audit monthly by model ID
If roughly 80% of spend is frontier, you have a routing problem, not a capability problem.

Where each tier belongs in agent pipelines

The five-agent content quality pipeline is a worked example: Haiku on pattern checks, Sonnet on structured eval, Opus only on rewrite. Same pattern applies to support triage, document processing, and code review agents.

Tier 4 is a different product job entirely. Do not run a generation model where an embedding model plus vector index is the right tool. See the complete RAG pipeline for where embeddings sit in the stack.

Conclusion

The cost metric that matters is cost per completed task on your eval set. Audit last month's bill by model ID. Build the router before you tune prompts on the frontier model. Same quality on classification. Up to 10x lower cost if you route right.

Key takeaways

1Using one frontier model for every agent step is the most common cost mistake on production stacks. The spread between a classification tier and a frontier tier is often 10x on the same output quality for that step.
2Tier 1 (small, fast, cheap) is for classification and routing: intent detection, query classification, entity extraction, language detection, topic categorization. Haiku 4.5 and Gemini 3.5 Flash are the usual picks in June 2026.
3Tier 2 (mid-tier workhorse) is for tool use and structured output: function calling, JSON schemas, most agent loop steps, short reasoning, summarisation, API orchestration. Sonnet 4.6 is the default seat here.
4Tier 3 (frontier reasoning) is for multi-step planning, deep reasoning, hard code generation, and agentic orchestration where fewer steps justify higher per-token price. Opus 4.8 fast mode belongs here, not on every sub-task.
5Tier 4 (embeddings) is a different product job: semantic search and document retrieval for RAG. Do not run generation models where an embedding model plus vector index is the right tool.
6Build the router agent before you tune prompts on the frontier model. A Haiku or Flash classifier picks the execution tier; route classification to Tier 1, tool calls to Tier 2, hard reasoning to Tier 3, retrieval to an embedding index.
7Run the router in shadow mode first, then promote per route (not stack-wide). The router needs typed output, exit conditions, and an eval suite built from production mis-routes.
8The cost metric that matters is cost per completed task on your eval set, not list price per million tokens. Audit last month's bill by model ID: if roughly 80 percent of spend is frontier, you have a routing problem, not a capability problem.

Frequently asked questions

Get the visual notes by email

New agentic AI notes and breakdowns, plus what I am shipping for clients, one email on Thursdays.

Stop paying frontier prices for classification.

Introduction

Four model tiers

Build the router agent first

Where each tier belongs in agent pipelines

Conclusion

Key takeaways

Frequently asked questions

More notes

How to actually use Fable 5.

The complete RAG pipeline: 12 steps behind every reliable agent.

Graph engineering with Claude.

Get the visual notes by email