All posts
Production Published Updated 14 min

Governing agent autonomy in 2026: Auto-review, pre-push review, and why approval prompts are not a security model

Cursor made Auto-review the default run mode and shipped /review so Bugbot runs before you push. Together they treat agent autonomy as a dial: low-stakes actions flow, high-stakes actions slow down. Here is how I wire that pattern into local agents, SDK headless runs, and CI without mistaking convenience for a hard security boundary.

Jigar JoshiJigar JoshiAgentic AI Architect and Consultant
In this post (11 sections)

Introduction

Background agents stopped being a demo sometime in late 2025. By June 2026 they are how a lot of teams ship refactors, tests, and review comments. The bottleneck moved from "can the agent write code" to "can we let it act without approving every shell command." Cursor's answer in June is two-fold: Auto-review as a contextual autonomy dial, and pre-push /review so code quality gates happen before CI.

I map both onto the guardrail box in the anatomy of an AI agent and the broader agentic AI production patterns I teach. Classifier gates sit where actions leave the agent. Bugbot sits where code leaves the laptop. Neither replaces typed tools or an owner who answers when the agent misbehaves at 3 a.m.

What shipped in June 2026 (release overview)

  • Auto-review is the default run mode for new Cursor users; existing users enable it in Settings > Agents.
  • Shell, MCP, and Fetch tool calls route through allowlist, sandbox, or classifier in that order.
  • Cursor SDK (TypeScript and Python) exposes local.autoReview and permissions.json steering for headless runs.
  • Bugbot on Composer 2.5: ~90 second average review (down from ~5 minutes), ~10% more bugs found, ~22% lower cost.
  • /review, /review-bugbot, and /review-security run Bugbot locally before push; patch IDs sync with GitHub/GitLab.

Auto-review: autonomy as a dial, not a switch

Cursor's Auto-review blog post frames the problem correctly. Agents should move freely when stakes are low and slow down when an action crosses a meaningful boundary. Auto-review applies to Shell, MCP, and Fetch tool calls. Allowlisted calls run immediately. Sandboxable calls run sandboxed. Everything else goes to a classifier subagent that can allow, suggest a safer path, or surface a standard approval prompt.

Cursor reports roughly 7% of chats in Auto-review mode hit at least one interruption, versus about 40% of actions blocked under some enterprise allowlist setups. That is the throughput win. The caveat, which Cursor states in forum and docs, is that the classifier is non-deterministic. It can miss in both directions. I use Auto-review to reduce prompt fatigue on trusted read paths. I do not use it as the only guardrail before an execute-mode agent touches prod.

How the three-step flow works

  • Allowlist match: run immediately (good for known read-only inspections).
  • Sandbox: run with filesystem and network restrictions where the platform supports it.
  • Classifier: everything else gets contextual review with feedback routed back to the parent agent.

The SDK mirror matters for teams running headless agents. The June SDK release exposes local.autoReview and permissions.json steering via autoRun.allow_instructions and autoRun.block_instructions. That is the same pattern I want in CI: not unrestricted yolo mode, not interactive allowlist hell, but a written policy the classifier can apply. If those headless runs authenticate through a Claude subscription seat, read the June 15 Agent SDK billing checklist before you turn autoReview on in cron. Billing and guardrails changed in the same week.

{
  "autoRun": {
    "allow_instructions": [
      "Read-only inspections of ./dist and test output are fine.",
      "git status, git diff, and ripgrep searches may run without prompt."
    ],
    "block_instructions": [
      "Always pause rm, drop, truncate, and any write under ~/.claude or mcp.json.",
      "Pause outbound curl to non-allowlisted domains."
    ]
  }
}

How Auto-review differs from a global allowlist

Auto-review vs strict allowlist vs unrestricted
ModeThroughputRisk profileBest for
Unrestricted headlessHighestHighest; no gate on MCP/fetchThrowaway sandboxes only
Strict allowlistLow; frequent promptsLow if list maintainedRegulated laptops with creds
Auto-review classifierMedium-high; ~7% chats interruptedNon-deterministic; convenience not complianceDaily background agent work
Pre-push /review onlyN/A at runtimeCatches code issues pre-CIPair with any run mode

Pre-push /review: close the loop before CI

The second half of the June tooling wave is when review happens. Cursor's Bugbot June update cut average review time from about five minutes to about ninety seconds on Composer 2.5, with more bugs found per run at lower cost. The workflow change is /review (or /review-bugbot and /review-security) before you open a PR.

If you run /review locally and then push the same diff, Bugbot on GitHub or GitLab recognizes the patch, skips a redundant scan, and leaves a comment noting it already reviewed that diff. That closes a gap I see constantly: the agent wrote the code, nobody reviewed it until CI, and CI only sees what the agent chose to show. Moving review to pre-push makes agent-generated diffs go through the same gate human diffs should.

The workflow I recommend for agent-written PRs

  1. 01
    Agent completes the task in Cursor
    Let Auto-review handle mixed-risk tool calls during the session. Do not disable the gate to "go faster" on credentialed machines.
  2. 02
    Run /review before git push
    Fix Bugbot and Security Review findings in the same session while context is hot.
  3. 03
    Push and open PR
    Bugbot on the host recognizes the patch ID and skips duplicate work.
  4. 04
    Human owns merge decision
    Review gates catch defects; they do not replace ownership. Someone still signs the merge.

Where this sits in the four-part agent anatomy

Auto-review and pre-push review are guardrail-layer tools. They do not replace memory, tools, or the loop. They sit at the boundary where actions leave the agent and touch your systems. I map them onto the anatomy of an AI agent: memory holds context, tools execute, the loop decides, guardrails vet. Classifier gates are guardrails. Bugbot is a guardrail on code quality and security findings before merge.

They also do not fix bad tools. A classifier that approves a call to a tool returning null on failure still loses. The tool contract work in your agents are not broken, your tools are comes first. Governance on top of sloppy tools just governs sloppy outcomes faster.

Agent autonomy patterns in 2026: what each is good for
PatternBest forNot a substitute for
Global allowlistStrict internal tooling with human in loopProduction execute-mode agents at scale
Auto-review classifierLong local agent runs with mixed-risk tool callsHard security boundary or compliance sign-off
Pre-push /reviewCatching agent-generated diffs before CIRuntime guardrails on live tool execution
Sandbox + API key isolationHeadless CI agents with bounded blast radiusTool schema design and eval suites

How to adopt Auto-review and /review step by step

  1. 01
    Turn on Auto-review for local agent work
    Settings > Agents for IDE users. For SDK scripts, set local.autoReview and write block_instructions for destructive shell patterns (rm, drop, truncate, credential paths).
  2. 02
    Add /review to the agent workflow before push
    Treat it like lint for agent sessions. If Bugbot flags a security issue, fix before PR, not in review comment thread tennis.
  3. 03
    Keep secret scanning at the commit boundary
    Auto-review does not replace scanning where the agent writes. The supply-chain playbook in your agent's supply chain is the attack surface still applies, and MCP-scoped least privilege is the same story I told in MCP goes stateless: shrink what the agent can reach, then gate what it can do.
  4. 04
    Measure interruptions, not vibes
    Track how often classifiers block high-risk calls you care about vs block benign read paths. Tune allow_instructions until read-only work flows and deletes always pause.

Should you enable Auto-review on production credential machines?

Enable it for throughput on daily agent work if you also run secret scanning at commit time, pin MCP servers, and keep block_instructions aggressive on credential paths. Do not treat the classifier as a compliance control. In high-trust environments I still pair Auto-review with explicit allowlists for production database tools and supply-chain hygiene on extensions and skills.

Common mistakes

  • Calling Auto-review "zero trust" because marketing language sounded reassuring.
  • Skipping tool registry hygiene because the classifier "usually catches" bad calls.
  • Running unrestricted headless SDK agents in CI because CLI parity for Auto-review was still catching up.
  • Reviewing only in GitHub after the agent already pushed broken security patterns to a branch.
  • No eval set for agent outputs, so you cannot tell if review gates improved completion rate or just slowed the loop (evals beyond happy path).

What comes after June 2026 in agent governance

Cursor has been explicit that Auto-review is early and focused on local desktop agents today, with the same ideas expected to spread to more surfaces. Watch for CLI parity, tighter MCP-scoped classifiers, and enterprise policy packs you can check into repo. The direction is clear: agents get more autonomy by default, but autonomy becomes policy-as-code rather than a single global toggle.

Best practices for agent autonomy in production

  • Write block_instructions before allow_instructions so destructive patterns always pause.
  • Run /review on every agent-generated diff before push, not only on "big" changes.
  • Keep MCP scopes least-privilege; Auto-review does not fix over-scoped servers.
  • Log classifier blocks to your observability stack (agent observability stack).
  • Name an owner for agent workflows who tunes policy when false positives spike.

Conclusion

June 2026 agent tooling finally treats autonomy as contextual. That is the right abstraction. Auto-review and pre-push review are worth adopting this week if you run background agents daily. They are not worth treating as the whole governance story. Pair them with typed tools, observable runs, and an owner who answers when the agent misbehaves at 3 a.m.

The weekly take

Agentic AI patterns, delivered Thursdays

What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.

Shipping an agentic AI project this quarter?
Book a 30-min consult
Frequently asked

Questions readers ask about this post

Share this post
LinkedIn Facebook