Haiku 4.5 made our router 5x cheaper. The trade-off matters
Replacing Sonnet with Haiku in the dispatcher role cut our orchestration cost dramatically. It also cost us in two specific places I did not predict.
In this post (4 sections)
In a multi-agent system you have a router, sometimes called a dispatcher or supervisor, that classifies the user request and hands off to a specialist agent. Until recently I used Sonnet for this. It was overkill for classification but reliable. With Haiku 4.5, I tried demoting the role. The orchestration context for this is supervisor pattern versus handoffs; this post is specifically about the model you put in the routing seat.
The wins
Routing cost dropped about 5x. Latency to first hand-off dropped meaningfully, which matters because the router sits on the critical path of every single request. Haiku is fast, and the router's job (pick one of N categories with a confidence score) is squarely within its comfort zone. For the easy majority of requests, the cheaper model was indistinguishable from the expensive one.
The two places it cost us
First, ambiguous requests. When a request truly straddles two categories, Sonnet would notice the ambiguity and ask for clarification. Haiku tended to commit to one path, which then went off track downstream. The cost of that wrong commitment was not in the routing call, it was in the specialist agent doing the wrong work, which is far more expensive. We added an explicit "ambiguous" output and rerouted those to Sonnet.
Second, novel requests outside our category set. Haiku would force-fit them into the closest category. Sonnet would notice they did not fit and escalate. We added a confidence threshold below which we re-classify with Sonnet. Both fixes share a shape: let the cheap model handle what it is good at, and give it a clean way to say "I am not sure" instead of guessing.
The pattern: tier by confidence, not request type
The generalisable lesson is to tier model selection by confidence, not by request type. Cheap and fast for the easy 90%, smart and slow for the ambiguous 10% that the cheap model itself flags. The key design move is giving the cheap model an explicit escape hatch, an "ambiguous" label or a confidence score, so it routes its own uncertainty upward instead of masking it with a confident wrong answer.
This only works if you measure the right thing. Cost per token makes Haiku look like a pure win. Cost per completed task tells the truth, because a cheap wrong route triggers expensive downstream rework. Optimise for the task-level number, which is the same discipline I push in the cheapest LLM call is the one you do not make and watch on the dashboards from the agent observability stack we ship. The four-tier map I sketch for teams before we wire the router is in stop paying frontier prices for classification.
Common mistakes
- Swapping in the cheap model everywhere and judging it on token cost alone, missing the downstream rework it causes.
- Not giving the cheap model an "I am not sure" output, so its uncertainty shows up as confident wrong routing.
- Setting the confidence threshold once and never revisiting it as the category set drifts.
Demoting the router was the right call, but only with the escalation path in place. If you are tuning a model-tiering strategy across an agent system, that is exactly the kind of cost-versus-reliability work I do in consulting.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.