What does a router or dispatcher do in a multi-agent system?

It classifies the incoming request and hands off to the right specialist agent. Because it sits on the critical path of every request, its cost and latency multiply across all traffic, which is why the model choice there matters so much.

Why is Haiku a good fit for routing?

Routing is a classification task: pick one of N categories with a confidence score. That is within a small, fast model's comfort zone, so you get most of the accuracy at a fraction of the cost and latency for the easy majority of requests.

Where does a cheap router go wrong?

On truly ambiguous requests, where it commits to one path instead of flagging the ambiguity, and on novel requests outside the category set, where it force-fits to the nearest category. Both cause expensive downstream rework by the specialist agent.

What is confidence-tiered model selection?

Route the easy, high-confidence majority through the cheap, fast model, and escalate only the low-confidence slice to a stronger model. The cheap model flags its own uncertainty with an "ambiguous" label or a confidence threshold rather than guessing.

Why measure cost per completed task instead of cost per token?

Because a cheap wrong route is not actually cheap: it triggers a specialist agent to do the wrong work, which costs far more than the routing call saved. Cost per completed task captures that downstream effect; cost per token hides it.

Haiku 4.5 Made Our Router 5x Cheaper: The Trade-off

In this post (4 sections)

In this post

In a multi-agent system you have a router, sometimes called a dispatcher or supervisor, that classifies the user request and hands off to a specialist agent. Until recently I used Sonnet for this. It was overkill for classification but reliable. With Haiku 4.5, I tried demoting the role. The orchestration context for this is supervisor pattern versus handoffs; this post is specifically about the model you put in the routing seat.

The wins

Routing cost dropped about 5x. Latency to first hand-off dropped meaningfully, which matters because the router sits on the critical path of every single request. Haiku is fast, and the router's job (pick one of N categories with a confidence score) is squarely within its comfort zone. For the easy majority of requests, the cheaper model was indistinguishable from the expensive one.

The two places it cost us

First, ambiguous requests. When a request truly straddles two categories, Sonnet would notice the ambiguity and ask for clarification. Haiku tended to commit to one path, which then went off track downstream. The cost of that wrong commitment was not in the routing call, it was in the specialist agent doing the wrong work, which is far more expensive. We added an explicit "ambiguous" output and rerouted those to Sonnet.

Second, novel requests outside our category set. Haiku would force-fit them into the closest category. Sonnet would notice they did not fit and escalate. We added a confidence threshold below which we re-classify with Sonnet. Both fixes share a shape: let the cheap model handle what it is good at, and give it a clean way to say "I am not sure" instead of guessing.

Sonnet router versus Haiku router with a confidence escalation

Dimension	Sonnet router	Haiku + escalate
Cost per route	Baseline	About 5x cheaper on the easy majority
Latency to first handoff	Higher	Meaningfully lower
Ambiguous requests	Handled well	Flagged "ambiguous", escalated
Novel requests	Recognised, escalated	Below-threshold confidence, escalated
Right metric	Cost per token	Cost per completed task

The pattern: tier by confidence, not request type

The generalisable lesson is to tier model selection by confidence, not by request type. Cheap and fast for the easy 90%, smart and slow for the ambiguous 10% that the cheap model itself flags. The key design move is giving the cheap model an explicit escape hatch, an "ambiguous" label or a confidence score, so it routes its own uncertainty upward instead of masking it with a confident wrong answer.

This only works if you measure the right thing. Cost per token makes Haiku look like a pure win. Cost per completed task tells the truth, because a cheap wrong route triggers expensive downstream rework. Optimise for the task-level number, which is the same discipline I push in the cheapest LLM call is the one you do not make and watch on the dashboards from the agent observability stack we ship. The four-tier map I sketch for teams before we wire the router is in stop paying frontier prices for classification.

Common mistakes

Swapping in the cheap model everywhere and judging it on token cost alone, missing the downstream rework it causes.
Not giving the cheap model an "I am not sure" output, so its uncertainty shows up as confident wrong routing.
Setting the confidence threshold once and never revisiting it as the category set drifts.

Demoting the router was the right call, but only with the escalation path in place. If you are tuning a model-tiering strategy across an agent system, that is exactly the kind of cost-versus-reliability work I do in consulting.

Haiku 4.5 made our router 5x cheaper. The trade-off matters

The wins

The two places it cost us

The pattern: tier by confidence, not request type

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Haiku 4.5 made our router 5x cheaper. The trade-off matters

The wins

The two places it cost us

The pattern: tier by confidence, not request type

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Inside Recruiting Atelier: a runnable reference for the primitives of an agentic system

Why I am replacing supervisor patterns with handoffs

Claude Code Artifacts turn terminal output into live review pages: what Team and Enterprise buyers should pilot first