What is the difference between a docstring and a tool description?

A docstring documents a function for a developer who already opened the file and decided to use it. A tool description has to persuade a model, which has no surrounding context, to select this capability at the right moment. Same string position in your code, completely different job.

Why does writing change tool-call accuracy if the model is the same?

Because the model selects against the description text, not the implementation. Sharper triggers, an example request, and explicit exclusions change what the model reads while deciding, which is the only input to its decision. That is why accuracy can jump from 78% to 94% with no model change.

Should I generate descriptions automatically from code?

Avoid it. Auto-generation copies the docstring style, which is written for the wrong reader, across your entire registry. Hand-write descriptions as selection criteria, at least for the tools that get confused.

What is the single most effective edit?

For most teams it is adding the exclusions: a clear "do NOT use this for…" that names the lookalike tool. Most wrong-tool calls are confusions between near-twins, and the exclusion line resolves them directly.

How do I test a tool description?

Read it with the function name and body hidden, then try to guess what the tool does and when you would call it. If you cannot select it confidently from the description alone, neither can the model.

Tool Descriptions Are Prompts, Not Docstrings

In this post (4 sections)

In this post

A docstring tells a developer what a function does. A tool description tells a model when to call it. Different audience, different writing. Most teams write the first when they should be writing the second, and then they blame the model when selection goes wrong. It almost never is the model. It is the writing. I argue the structural version of this in fix the registry, not the agent; this post is about the prose itself.

Why the audience changes everything

A developer reading a docstring brings enormous context: they opened the file, they know the module, they are already mid-task. The docstring only has to fill the last gap. A model choosing a tool brings none of that. It sees a flat list of candidate tools and a user request, and it has to decide which capability fits. The description is not filling a gap, it is making the entire case for selection. Writing for the first reader when your real reader is the second is the core mistake.

Once you internalise that the description is a prompt, the editing rules fall out of it. You would not write a prompt that says "this function returns a User object." You would write a prompt that says "when the user asks about their account, do this." Same instinct, applied to the registry.

The six edits

1. Lead with the trigger, not the implementation

Open with "Use this when…" rather than "This function…". The first word the model reads should be about the situation, not the mechanism. The mechanism is what the schema is for.

2. Include one concrete example request

Show the kind of user message that should map to this tool. Models match on surface forms more than people expect, and one example does more than a paragraph of abstraction to anchor the mapping.

3. Spell out the exclusions

Name the lookalike traps: the requests that feel like a match but belong to a different tool. Most wrong-tool calls are confusions between near-twins, so an explicit "do NOT use this for…" is often the single highest-leverage sentence in the description.

4. Use the phrasing the user will use

If your users say "invoice" and your tool says "billing document," bridge the gap in the description. The model is matching the request against your words; make your words look like the request.

5. Keep parameter descriptions tight

The schema names should already carry meaning. A parameter called customer_id with a one-line note about the ID format beats a paragraph re-explaining what a customer is. Tight params reduce the surface for the model to fill a field with garbage.

6. Test by reading only the description

Read the description with the function name and body hidden, then guess what it does and when you would call it. If you cannot, the model cannot either. This one test catches most weak descriptions before they reach an eval.

Docstring writing versus tool-description writing

Dimension	Docstring (for a developer)	Tool description (for a model)
Reader's context	High: already in the file	None: flat list of candidates
Job of the text	Explain what it does	Justify when to pick it
Opens with	The return type / behaviour	"Use this when…"
Handles lookalikes	Rarely needed	Essential ("do NOT use for…")
Success test	Compiles and reads clearly	Selectable from the text alone

Before and after

Before: "Retrieves user data from the database." After: "Use this when the user asks about their own account, profile, settings, or order history. Do NOT use this for general FAQ-style questions about the product. Example: 'show me my last three orders' maps to this tool with user_id."

Tool-call accuracy on the engagement where I shipped these edits went from 78% to 94% on the eval set. Same model, same tools, different writing. The "do NOT use" line and the example request did most of the lifting, because the failures were lookalike confusions, not genuine unknowns.

Common mistakes

Auto-generating descriptions from docstrings or type hints. It scales the wrong style across the whole registry.
Describing the implementation ("wraps the orders API") instead of the trigger ("use this when the user asks about an order").
Skipping exclusions, then wondering why two similar tools keep getting swapped.
Writing long parameter prose instead of meaningful parameter names.

None of this requires a stronger model or a longer system prompt. It requires treating each description as the small prompt it already is. Pair these edits with the one structural rule that prevents most confusions in the first place: one tool, one purpose. If you want help running this across a large registry, that is core to my consulting work and our training.

Tool descriptions are prompts. Stop treating them like docstrings

Why the audience changes everything

The six edits

1. Lead with the trigger, not the implementation

2. Include one concrete example request

3. Spell out the exclusions

4. Use the phrasing the user will use

5. Keep parameter descriptions tight

6. Test by reading only the description

Before and after

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Tool descriptions are prompts. Stop treating them like docstrings

Why the audience changes everything

The six edits

1. Lead with the trigger, not the implementation

2. Include one concrete example request

3. Spell out the exclusions

4. Use the phrasing the user will use

5. Keep parameter descriptions tight

6. Test by reading only the description

Before and after

Common mistakes

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Your agents aren't broken, your tools are: three questions to ask before you build one

Tool registry design for agentic AI: how the wrong registry kills accuracy before the prompt is read

Tool descriptions are prompts. Fix the registry, not the agent.