I was wrong about JSON mode. Here is what changed my mind
For two years I told teams to avoid forced JSON outputs and use structured tool calls. That was right then and partially wrong now. Schema enforcement got better, latency penalties got smaller.
In this post (4 sections)
This is a public correction post. Through 2024 and most of 2025 I told teams to avoid JSON mode and use structured tool calls instead. Schema enforcement was inconsistent, latency was higher, and the model would happily produce structurally valid garbage: the right shape with the wrong contents. That advice was correct then. It is partially wrong now, and I would rather say so than let an outdated heuristic spread under my name. This is the JSON-mode strand of three patterns I broke in 2025, told from the other side.
What changed
Three things. First, schema-aware decoding has become reliable across the major providers, so the structure is genuinely enforced at generation time rather than hoped for. Second, the latency penalty narrowed enough that most use cases no longer feel it. Third, and most important, the failure mode shifted. Modern JSON mode is more likely to refuse to produce output than to produce structurally valid garbage, and a clean refusal is a far better thing to handle than a well-formed lie. You can catch a refusal; valid-but-wrong slips straight through your validation.
The new rule
- Tool calls when the structure represents a side-effecting action: calling, creating, updating. The schema is a contract for doing something.
- JSON mode when the structure represents pure data extraction: classifying, parsing, or summarising into fields. The schema is a contract for shaping an answer.
- Plain text when the structure is genuinely free-form, and parse it downstream rather than forcing a schema it does not have.
Why the distinction still matters
Tool calls and JSON mode are not interchangeable just because both return structure. A tool call signals intent to act and routes into your side-effecting code; JSON mode shapes a piece of data. Blurring them is how you end up with the one tool, one purpose problem in a different costume, where a single mechanism is asked to both decide and do. Match the mechanism to whether the structure represents an action or an answer.
The meta-lesson
My heuristics had calcified. I should have re-tested earlier. Treating "I tried this six months ago" as authoritative is one of the easiest ways to fall behind in this field, where the ground moves under capabilities every quarter. The practical habit I have adopted is to put a re-test date on my strongest opinions, especially the ones shaped like "never do X." If you want a second opinion on which of your structured-output choices are due for a re-test, that is the kind of review I run in consulting.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.