What does "one tool, one purpose" actually mean?

If you cannot describe a tool's job in a single verb without an "or," it is doing too much. A tool that creates or updates depending on an argument is two tools sharing a name. Split it so each has one job, one description, and one schema.

Why do multi-purpose tools fail on the third call rather than the first?

The first calls happen with clean context and the inputs you designed for, so the model picks the right mode. As the session fills and inputs blur, the second decision (which mode) is the one that slips, and the tool quietly does the wrong branch.

Does splitting tools not just bloat my registry?

It adds entries, which has a real cost in schema overhead and selection noise. The fix is to scope the load: keep a small default tool set and load specialised tools on demand, rather than merging distinct jobs back into one tool.

Can I use a mode or action parameter instead of splitting?

That usually recreates the original problem. An action enum still forces the model to choose the branch, just inside the call instead of at selection time. Prefer separate named tools when the branches represent genuinely different intents.

Which lifts accuracy more, better descriptions or splitting tools?

They compound, but splitting comes first. A clean single-purpose tool is easy to describe well; a two-job tool resists even the best description because it has to hedge across both branches.

The One Rule for Designing Agent Tools

In this post (4 sections)

In this post

One tool, one purpose. If you cannot describe a tool's job in a single verb, split it. This rule sounds obvious until you go look at your own tool registry and see how many tools are quietly doing two things. It is the structural complement to the writing advice in tool descriptions are prompts: no amount of good description rescues a tool that has two jobs.

Why multi-purpose tools fail

Multi-purpose tools force the model to predict not just whether to call but which mode to call in. Every additional decision is an additional point of failure, and the two decisions are not independent. The model has to get "is this the right tool" and "which branch of the tool" both right, on the same call, from one description that is trying to cover both branches at once.

They fail in a characteristic way: on the third call, not the first. The first two demos work because context is clean and the example is the one you designed for. Once the session fills up and the inputs blur, the model picks the wrong mode, and a tool that "worked" suddenly does the opposite of what the user wanted. This is the same long-tail failure I describe in why your agent keeps failing after 3 steps: the happy path hides the design flaw.

The refactor

Take the tool that does both create_user and update_user depending on whether an ID is provided. Split it into create_user and update_user. Now the descriptions can be specific, the schemas can be tight, and the model picks the right one because there is only one right one. The branching logic that used to live inside the tool, invisible to the model, becomes an explicit choice between two named capabilities, which is exactly the kind of choice models are good at.

// Before: one tool, two hidden modes
upsert_user(id?: string, name: string, email: string)

// After: two tools, each with one job
create_user(name: string, email: string)
update_user(id: string, fields: object)

One multi-purpose tool versus two single-purpose tools

Property	upsert_user (two modes)	create_user + update_user
Decisions per call	Two: whether, and which mode	One: whether
Description	Hedged to cover both branches	Specific to one job
Schema	Loose (id is optional)	Tight (id required or absent)
Typical failure	Wrong mode on the third call	Wrong tool is rarer and obvious
Registry size	Smaller	One more entry

The objection: now I have more tools

Yes, and that is a real cost, because every registered tool adds schema overhead and one more candidate to score against. The answer is not to merge tools back together, it is to scope the load: keep the default tool set small and load specialised tools on demand. I cover that trade-off in fix the registry, not the agent and the cost side in the cheapest LLM call is the one you do not make. Splitting for clarity and scoping for cost are not in conflict; they are the two halves of a clean registry.

Common mistakes

Adding a "mode" or "action" enum parameter to dodge the split. That is the multi-purpose tool wearing a costume; the model still has to pick the branch.
Splitting by implementation detail instead of by user intent. Two tools that map to the same user request just move the confusion.
Leaving all the split tools in the default load instead of scoping the rarely-used ones.

The cost is more tools in the registry. The benefit is fewer wrong tool calls. In every team I have trained, this is the single edit that lifts tool-call accuracy the most, and it is the first thing we look at together in consulting and training.

The one rule for designing agent tools that actually work

Why multi-purpose tools fail

The refactor

The objection: now I have more tools

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

The one rule for designing agent tools that actually work

Why multi-purpose tools fail

The refactor

The objection: now I have more tools

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Your agents aren't broken, your tools are: three questions to ask before you build one

Tool registry design for agentic AI: how the wrong registry kills accuracy before the prompt is read

Tool descriptions are prompts. Fix the registry, not the agent.