Tool descriptions are prompts. Stop treating them like docstrings
A docstring tells a developer what a function does. A tool description tells a model when to call it. Different audience, different writing. Six concrete edits that lifted tool-call accuracy.
In this post (4 sections)
A docstring tells a developer what a function does. A tool description tells a model when to call it. Different audience, different writing. Most teams write the first when they should be writing the second, and then they blame the model when selection goes wrong. It almost never is the model. It is the writing. I argue the structural version of this in fix the registry, not the agent; this post is about the prose itself.
Why the audience changes everything
A developer reading a docstring brings enormous context: they opened the file, they know the module, they are already mid-task. The docstring only has to fill the last gap. A model choosing a tool brings none of that. It sees a flat list of candidate tools and a user request, and it has to decide which capability fits. The description is not filling a gap, it is making the entire case for selection. Writing for the first reader when your real reader is the second is the core mistake.
Once you internalise that the description is a prompt, the editing rules fall out of it. You would not write a prompt that says "this function returns a User object." You would write a prompt that says "when the user asks about their account, do this." Same instinct, applied to the registry.
The six edits
1. Lead with the trigger, not the implementation
Open with "Use this when…" rather than "This function…". The first word the model reads should be about the situation, not the mechanism. The mechanism is what the schema is for.
2. Include one concrete example request
Show the kind of user message that should map to this tool. Models match on surface forms more than people expect, and one example does more than a paragraph of abstraction to anchor the mapping.
3. Spell out the exclusions
Name the lookalike traps: the requests that feel like a match but belong to a different tool. Most wrong-tool calls are confusions between near-twins, so an explicit "do NOT use this for…" is often the single highest-leverage sentence in the description.
4. Use the phrasing the user will use
If your users say "invoice" and your tool says "billing document," bridge the gap in the description. The model is matching the request against your words; make your words look like the request.
5. Keep parameter descriptions tight
The schema names should already carry meaning. A parameter called customer_id with a one-line note about the ID format beats a paragraph re-explaining what a customer is. Tight params reduce the surface for the model to fill a field with garbage.
6. Test by reading only the description
Read the description with the function name and body hidden, then guess what it does and when you would call it. If you cannot, the model cannot either. This one test catches most weak descriptions before they reach an eval.
Before and after
Before: "Retrieves user data from the database." After: "Use this when the user asks about their own account, profile, settings, or order history. Do NOT use this for general FAQ-style questions about the product. Example: 'show me my last three orders' maps to this tool with user_id."
Tool-call accuracy on the engagement where I shipped these edits went from 78% to 94% on the eval set. Same model, same tools, different writing. The "do NOT use" line and the example request did most of the lifting, because the failures were lookalike confusions, not genuine unknowns.
Common mistakes
- Auto-generating descriptions from docstrings or type hints. It scales the wrong style across the whole registry.
- Describing the implementation ("wraps the orders API") instead of the trigger ("use this when the user asks about an order").
- Skipping exclusions, then wondering why two similar tools keep getting swapped.
- Writing long parameter prose instead of meaningful parameter names.
None of this requires a stronger model or a longer system prompt. It requires treating each description as the small prompt it already is. Pair these edits with the one structural rule that prevents most confusions in the first place: one tool, one purpose. If you want help running this across a large registry, that is core to my consulting work and our training.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.