The one rule for designing agent tools that actually work
One tool, one purpose. Every tool that does two things will fail you on the third call. I have watched this pattern fail in every team I have trained, and the fix is the same refactor.
In this post (4 sections)
One tool, one purpose. If you cannot describe a tool's job in a single verb, split it. This rule sounds obvious until you go look at your own tool registry and see how many tools are quietly doing two things. It is the structural complement to the writing advice in tool descriptions are prompts: no amount of good description rescues a tool that has two jobs.
Why multi-purpose tools fail
Multi-purpose tools force the model to predict not just whether to call but which mode to call in. Every additional decision is an additional point of failure, and the two decisions are not independent. The model has to get "is this the right tool" and "which branch of the tool" both right, on the same call, from one description that is trying to cover both branches at once.
They fail in a characteristic way: on the third call, not the first. The first two demos work because context is clean and the example is the one you designed for. Once the session fills up and the inputs blur, the model picks the wrong mode, and a tool that "worked" suddenly does the opposite of what the user wanted. This is the same long-tail failure I describe in why your agent keeps failing after 3 steps: the happy path hides the design flaw.
The refactor
Take the tool that does both create_user and update_user depending on whether an ID is provided. Split it into create_user and update_user. Now the descriptions can be specific, the schemas can be tight, and the model picks the right one because there is only one right one. The branching logic that used to live inside the tool, invisible to the model, becomes an explicit choice between two named capabilities, which is exactly the kind of choice models are good at.
// Before: one tool, two hidden modes
upsert_user(id?: string, name: string, email: string)
// After: two tools, each with one job
create_user(name: string, email: string)
update_user(id: string, fields: object)The objection: now I have more tools
Yes, and that is a real cost, because every registered tool adds schema overhead and one more candidate to score against. The answer is not to merge tools back together, it is to scope the load: keep the default tool set small and load specialised tools on demand. I cover that trade-off in fix the registry, not the agent and the cost side in the cheapest LLM call is the one you do not make. Splitting for clarity and scoping for cost are not in conflict; they are the two halves of a clean registry.
Common mistakes
- Adding a "mode" or "action" enum parameter to dodge the split. That is the multi-purpose tool wearing a costume; the model still has to pick the branch.
- Splitting by implementation detail instead of by user intent. Two tools that map to the same user request just move the confusion.
- Leaving all the split tools in the default load instead of scoping the rarely-used ones.
The cost is more tools in the registry. The benefit is fewer wrong tool calls. In every team I have trained, this is the single edit that lifts tool-call accuracy the most, and it is the first thing we look at together in consulting and training.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.