All notes
Architecture Jun 3, 2026 2 min
Part ofAgentic AIClaude APIAI Engineering

Stop paying frontier prices for classification.

Four model tiers. Build the router agent first. Same quality, up to 10x cost spread if you route wrong.

Using one frontier model for every agent step is the most expensive mistake I still see on client bills. Classification belongs on Haiku or Flash. Tool calling belongs on Sonnet. Planning belongs on Opus. Embeddings are a different product entirely. The pattern that fixes the bill: a router agent on a cheap model classifies the task, then hands execution to the right tier. One visual map of all four tiers, the June 2026 model picks, and the bill audit that tells you whether you have a routing problem.

Tags#ModelRouting#CostEngineering#RouterAgent#Claude#Gemini#Haiku#Sonnet#Opus#AgenticAI#Architecture#Embeddings

Key takeaways

  • 1Using one frontier model for every agent step is the most common cost mistake on production stacks. The spread between a classification tier and a frontier tier is often 10x on the same output quality for that step.
  • 2Tier 1 (small, fast, cheap) is for classification and routing: intent detection, query classification, entity extraction, language detection, topic categorization. Haiku 4.5 and Gemini 3.5 Flash are the usual picks in June 2026.
  • 3Tier 2 (mid-tier workhorse) is for tool use and structured output: function calling, JSON schemas, most agent loop steps, short reasoning, summarisation, API orchestration. Sonnet 4.6 is the default seat here.
  • 4Tier 3 (frontier reasoning) is for multi-step planning, deep reasoning, hard code generation, and agentic orchestration where fewer steps justify higher per-token price. Opus 4.8 fast mode belongs here, not on every sub-task.
  • 5Tier 4 (embeddings) is a different product job: semantic search and document retrieval for RAG. Do not run generation models where an embedding model plus vector index is the right tool.
  • 6Build the router agent before you tune prompts on the frontier model. A Haiku or Flash classifier picks the execution tier; route classification to Tier 1, tool calls to Tier 2, hard reasoning to Tier 3, retrieval to an embedding index.
  • 7Run the router in shadow mode first, then promote per route (not stack-wide). The router needs typed output, exit conditions, and an eval suite built from production mis-routes.
  • 8The cost metric that matters is cost per completed task on your eval set, not list price per million tokens. Audit last month's bill by model ID: if roughly 80 percent of spend is frontier, you have a routing problem, not a capability problem.

Get the visual notes by email

New agentic AI notes and breakdowns, plus what I am shipping for clients — one email on Thursdays.