Why did you previously recommend against JSON mode?

Because through 2024 and most of 2025 schema enforcement was inconsistent, latency was higher, and the model would produce structurally valid output with garbage in the fields. Structured tool calls were the more reliable way to get dependable structure at the time.

What specifically changed your mind?

Three things: schema-aware decoding became reliable across major providers, the latency penalty narrowed to the point most use cases do not feel it, and the failure mode shifted from valid-but-wrong output to outright refusal, which is much easier and safer to handle.

When should I use JSON mode versus a tool call now?

Use a tool call when the structure represents a side-effecting action such as creating or updating something. Use JSON mode when the structure represents pure data extraction such as classifying or parsing into fields. Use plain text when the output is genuinely free-form.

Is "valid but wrong" output still a risk?

Much less than before, because reliable schema-aware decoding and a refusal-based failure mode reduce the chance of well-formed garbage slipping through. You should still validate field contents, not just the JSON shape, for anything important.

What is the broader takeaway?

Re-test your strong opinions, especially the "never do X" ones. Capabilities in this field move fast enough that an accurate heuristic from six months ago can quietly become wrong, and treating old tests as authoritative is an easy way to fall behind.

I Was Wrong About JSON Mode. What Changed My Mind

In this post (4 sections)

In this post

This is a public correction post. Through 2024 and most of 2025 I told teams to avoid JSON mode and use structured tool calls instead. Schema enforcement was inconsistent, latency was higher, and the model would happily produce structurally valid garbage: the right shape with the wrong contents. That advice was correct then. It is partially wrong now, and I would rather say so than let an outdated heuristic spread under my name. This is the JSON-mode strand of three patterns I broke in 2025, told from the other side.

What changed

Three things. First, schema-aware decoding has become reliable across the major providers, so the structure is genuinely enforced at generation time rather than hoped for. Second, the latency penalty narrowed enough that most use cases no longer feel it. Third, and most important, the failure mode shifted. Modern JSON mode is more likely to refuse to produce output than to produce structurally valid garbage, and a clean refusal is a far better thing to handle than a well-formed lie. You can catch a refusal; valid-but-wrong slips straight through your validation.

The new rule

Tool calls when the structure represents a side-effecting action: calling, creating, updating. The schema is a contract for doing something.
JSON mode when the structure represents pure data extraction: classifying, parsing, or summarising into fields. The schema is a contract for shaping an answer.
Plain text when the structure is genuinely free-form, and parse it downstream rather than forcing a schema it does not have.

JSON mode then versus now

Dimension	Through 2025 (avoid it)	2026 (use it for extraction)
Schema enforcement	Inconsistent across providers	Reliable schema-aware decoding
Latency	Noticeably higher	Narrowed; most cases do not feel it
Worst failure	Valid shape, garbage fields	Refusal to produce output
Best fit	Almost nothing; prefer tool calls	Pure data extraction tasks

Why the distinction still matters

Tool calls and JSON mode are not interchangeable just because both return structure. A tool call signals intent to act and routes into your side-effecting code; JSON mode shapes a piece of data. Blurring them is how you end up with the one tool, one purpose problem in a different costume, where a single mechanism is asked to both decide and do. Match the mechanism to whether the structure represents an action or an answer.

The meta-lesson

My heuristics had calcified. I should have re-tested earlier. Treating "I tried this six months ago" as authoritative is one of the easiest ways to fall behind in this field, where the ground moves under capabilities every quarter. The practical habit I have adopted is to put a re-test date on my strongest opinions, especially the ones shaped like "never do X." If you want a second opinion on which of your structured-output choices are due for a re-test, that is the kind of review I run in consulting.

I was wrong about JSON mode. Here is what changed my mind

What changed

The new rule

Why the distinction still matters

The meta-lesson

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

I was wrong about JSON mode. Here is what changed my mind

What changed

The new rule

Why the distinction still matters

The meta-lesson

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Claude Code Artifacts turn terminal output into live review pages: what Team and Enterprise buyers should pilot first

MCP Enterprise-Managed Authorization is stable: how IdP-provisioned connector access replaces per-server OAuth hell

Cursor cloud subagents in 2026: /in-cloud, /babysit, and /automate without losing your local guardrails