All posts
Multi-Agent Published 7 min

Haiku 4.5 made our router 5x cheaper. The trade-off matters

Replacing Sonnet with Haiku in the dispatcher role cut our orchestration cost dramatically. It also cost us in two specific places I did not predict.

Jigar JoshiJigar JoshiAgentic AI Architect and Consultant
In this post (4 sections)

In a multi-agent system you have a router, sometimes called a dispatcher or supervisor, that classifies the user request and hands off to a specialist agent. Until recently I used Sonnet for this. It was overkill for classification but reliable. With Haiku 4.5, I tried demoting the role. The orchestration context for this is supervisor pattern versus handoffs; this post is specifically about the model you put in the routing seat.

The wins

Routing cost dropped about 5x. Latency to first hand-off dropped meaningfully, which matters because the router sits on the critical path of every single request. Haiku is fast, and the router's job (pick one of N categories with a confidence score) is squarely within its comfort zone. For the easy majority of requests, the cheaper model was indistinguishable from the expensive one.

The two places it cost us

First, ambiguous requests. When a request truly straddles two categories, Sonnet would notice the ambiguity and ask for clarification. Haiku tended to commit to one path, which then went off track downstream. The cost of that wrong commitment was not in the routing call, it was in the specialist agent doing the wrong work, which is far more expensive. We added an explicit "ambiguous" output and rerouted those to Sonnet.

Second, novel requests outside our category set. Haiku would force-fit them into the closest category. Sonnet would notice they did not fit and escalate. We added a confidence threshold below which we re-classify with Sonnet. Both fixes share a shape: let the cheap model handle what it is good at, and give it a clean way to say "I am not sure" instead of guessing.

Sonnet router versus Haiku router with a confidence escalation
DimensionSonnet routerHaiku + escalate
Cost per routeBaselineAbout 5x cheaper on the easy majority
Latency to first handoffHigherMeaningfully lower
Ambiguous requestsHandled wellFlagged "ambiguous", escalated
Novel requestsRecognised, escalatedBelow-threshold confidence, escalated
Right metricCost per tokenCost per completed task

The pattern: tier by confidence, not request type

The generalisable lesson is to tier model selection by confidence, not by request type. Cheap and fast for the easy 90%, smart and slow for the ambiguous 10% that the cheap model itself flags. The key design move is giving the cheap model an explicit escape hatch, an "ambiguous" label or a confidence score, so it routes its own uncertainty upward instead of masking it with a confident wrong answer.

This only works if you measure the right thing. Cost per token makes Haiku look like a pure win. Cost per completed task tells the truth, because a cheap wrong route triggers expensive downstream rework. Optimise for the task-level number, which is the same discipline I push in the cheapest LLM call is the one you do not make and watch on the dashboards from the agent observability stack we ship. The four-tier map I sketch for teams before we wire the router is in stop paying frontier prices for classification.

Common mistakes

  • Swapping in the cheap model everywhere and judging it on token cost alone, missing the downstream rework it causes.
  • Not giving the cheap model an "I am not sure" output, so its uncertainty shows up as confident wrong routing.
  • Setting the confidence threshold once and never revisiting it as the category set drifts.

Demoting the router was the right call, but only with the escalation path in place. If you are tuning a model-tiering strategy across an agent system, that is exactly the kind of cost-versus-reliability work I do in consulting.

The weekly take

Agentic AI patterns, delivered Thursdays

What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.

Shipping an agentic AI project this quarter?
Book a 30-min consult
Frequently asked

Questions readers ask about this post

Share this post
LinkedIn Facebook