Topic Pillar

Claude API.Production agents on Anthropic's model line-up.

Claude is the model behind most of the agents we ship: Opus for hardest reasoning, Sonnet for the production workhorse, Haiku for dispatching and routing. The API also gives you the production primitives that turn a model into a system — prompt caching, 1M context, tool-use telemetry, structured outputs.

22 cluster pages· 11 posts· 2 notes· 9 updates

Picking a model

Haiku 4.5 for routing, classification, and cheap dispatch. Sonnet 4.6 for the production workhorse — most agent reasoning, tool selection, and code work belong here. Opus 4.7 with its 1M context for the hardest steps and whole-codebase reasoning. The cost-per-completed-task is the metric to track, not the headline token price.

Prompt caching is the throughput metric

Stable system prompts, retrieval prefixes, and tool registries are all cacheable. Cache hit rate of 60–80% is achievable on most production workflows and translates to 40–50% token-cost reduction. If you are not measuring cache hit rate per route, you are leaving cost on the table.

Tool-use telemetry

Anthropic now exposes per-call selection scores at the model boundary. You can see why the model picked tool A over B before the call lands in your code. This collapses postmortems — the score deltas tell you whether the registry, the description, or the prompt was the weak link.

11 blog posts

Deep dives on Claude API

Tool Design

Tool descriptions are prompts. Fix the registry, not the agent.

When an agent picks the wrong tool, the registry is broken — not the agent. Three rules I now apply before debugging anything in a multi-tool system: precise names, "when to use" triggers, and a curated load list. Anthropic's new tool-selection telemetry finally puts numbers on what changes accuracy.

May 13, 20265 min
Read the post
Production

The cheapest LLM call is the one you do not make — GitHub's 19-62% token cut, decoded

GitHub published an instrumented analysis of their agentic CI workflows and reported 19-62% token-cost reductions. The savings are the headline. The technique — pre-agentic data fetching and tool-registry hygiene — is the story most teams will miss.

May 11, 20265 min
Read the post
Architecture

Claude Opus 4.7's 1M context: when to RAG and when to just stuff it

A million tokens reliably is real now, but it does not retire RAG — it changes the calculus. Cost, latency, recency, and the prompt-cache angle nobody is talking about.

May 6, 20266 min
Read the post
Production

Prompt caching is not optional anymore — measuring a 47% cost drop

A walkthrough from a client engagement: identifying stable prefixes, restructuring the system prompt for cacheability, and the telemetry that proved caching was actually working.

Apr 19, 20264 min
Read the post
Production

The agent observability stack we ship to every client

Traces, spans, evals, cost-per-completed-task, and the one dashboard panel that catches 80% of regressions. Vendor-agnostic — covers Langfuse, Honeycomb, and rolling your own.

Mar 28, 20267 min
Read the post
Architecture

Three patterns I broke in 2025 — and what I do instead now

Self-correction loops without budgets, single-agent solutions to multi-domain problems, and using JSON mode to force structure I should have built into the schema. An honest review.

Mar 14, 20268 min
Read the post
Multi-Agent

Haiku 4.5 made our router 5x cheaper. The trade-off matters

Replacing Sonnet with Haiku in the dispatcher role cut our orchestration cost dramatically. It also cost us in two specific places I did not predict.

Feb 22, 20265 min
Read the post
Production

Eval datasets: stop testing your agents on the happy path

If your eval set is the demos you showed the client, you are testing the wrong thing. How we build evals from production failures and the minimum viable suite to ship.

Jan 19, 20266 min
Read the post
Prompt Engineering

I was wrong about JSON mode. Here is what changed my mind

For two years I told teams to avoid forced JSON outputs and use structured tool calls. That was right then and partially wrong now — schema enforcement got better, latency penalties got smaller.

Dec 12, 20254 min
Read the post
Architecture

Why your agent keeps failing after 3 steps

The exit condition problem nobody talks about. Most agents are built for the happy path — where every tool call succeeds and the task completes cleanly. Real production agents are different.

Nov 8, 20254 min
Read the post
Architecture

RAG vs CAG: how to actually decide

A decision framework from real implementations. RAG retrieves. CAG stores in cache. Knowing which to use — and when to combine both — determines whether your agent finds the right answer at the right cost.

Sep 21, 20255 min
Read the post
9 ship-news updates

Latest in Claude API

Claude

Anthropic ships tool-use telemetry — every selection is scored and logged at the model boundary

May 13, 2026 · via Anthropic
Tools

Claude Code adds parallel sub-agent execution — multi-file refactors land in a single turn

May 13, 2026 · via Anthropic
Claude

Claude Opus 4.7 ships with 1M-token context window in production

May 7, 2026 · via Anthropic
Tools

Claude Code adds project memory — persistent context that survives across CLI sessions

May 5, 2026 · via Anthropic
Architecture

Anthropic publishes "Effective Tool Design" — official guidance for production agents

Apr 28, 2026 · via Anthropic
Claude

Sonnet 4.6 update: cheaper tokens, sharper tool calls, fewer retry loops

Apr 24, 2026 · via Anthropic
Claude

Haiku 4.5 in production — small-model speed, surprising tool-use chops

Apr 22, 2026 · via Anthropic
Research

Anthropic research: when to use supervisor vs. swarm patterns in multi-agent systems

Apr 15, 2026 · via Anthropic Research
OpenAI

OpenAI Agent Builder GA — pricing finally competitive for enterprise tool use

Apr 12, 2026 · via OpenAI
Frequently asked

Claude API — the questions teams actually ask