All posts
Production May 11, 2026 5 min

The cheapest LLM call is the one you do not make — GitHub's 19-62% token cut, decoded

GitHub published an instrumented analysis of their agentic CI workflows and reported 19-62% token-cost reductions. The savings are the headline. The technique — pre-agentic data fetching and tool-registry hygiene — is the story most teams will miss.

GitHub published an instrumented analysis of their own agentic CI workflows last week and reported 19-62% reductions in API token cost across half a dozen production agents. The numbers are good. The technique is better, and most teams running agents in production are not yet doing the thing that produced those savings.

The cheapest LLM call is the one you do not make

That is GitHub's framing and it deserves to be on every wall of every team shipping agents. Their core finding: most "agent turns" in their CI workflows were doing deterministic work — fetching issue metadata, reading file contents, listing branches. Work that does not need a model.

The fix is pre-agentic data fetching: run the deterministic steps first with plain CLI or scripts, hand the assembled context to the agent in one shot, and let the model reason only about what genuinely needs reasoning. Their Auto-Triage workflow cut 62% of token spend doing exactly this — and it runs about 6.8 times a day, so the savings compound to millions of tokens per observation period.

Your tool registry is silently expensive

The number from the report most teams will not have measured: each unused MCP tool registration adds roughly 8-12 KB of schema overhead to every API call. GitHub had about 40 tools registered in one workflow. The cost of "we might use it later" tools shows up as a meaningful bill increase before the agent thinks a single thought.

I have been writing about this for two years from a different angle — one tool, one purpose, descriptions are prompts, schemas matter. The cost data finally makes the case in numbers. If you have not audited your tool registry recently, do it this week.

How to apply this in IT services teams

  • Build a tiny audit script that logs input tokens, output tokens, and cache-hit rate per agent run. You cannot optimise what you cannot see, and most teams cannot see this today.
  • Audit the tool registry. Anything not called in the last 30 days, remove. The schema-overhead cost is real even when the tool is dormant.
  • For every agent loop, ask which of these turns are deterministic. Move those out to a pre-agentic step. The model should reason, not fetch.
  • Add a relevance gate before invoking the model at all. GitHub's Security Guard skips the LLM entirely for PRs that do not touch security-sensitive files. That is one cheap conditional saving every wasted run.
  • Track cache-hit rate per route as a first-class metric. GitHub's Contribution Check workflows hit 82-83% cache reads on input tokens — that is the target shape for stable system prompts.

The deeper point

CI workflow cost is not the headline story here. The bigger lesson is that "use an agent" became the default reach when "run a script then call the agent once" is usually cheaper and more reliable. The teams shipping production agents in 2026 are the ones who treat every LLM call as a budget to defend, not a free lookup.

Share this post
LinkedIn Facebook