Governing agent autonomy in 2026: Auto-review, pre-push review, and why approval prompts are not a security model
Cursor made Auto-review the default run mode and shipped /review so Bugbot runs before you push. Together they treat agent autonomy as a dial: low-stakes actions flow, high-stakes actions slow down. Here is how I wire that pattern into local agents, SDK headless runs, and CI without mistaking convenience for a hard security boundary.
In this post (11 sections)
Introduction
Background agents stopped being a demo sometime in late 2025. By June 2026 they are how a lot of teams ship refactors, tests, and review comments. The bottleneck moved from "can the agent write code" to "can we let it act without approving every shell command." Cursor's answer in June is two-fold: Auto-review as a contextual autonomy dial, and pre-push /review so code quality gates happen before CI.
I map both onto the guardrail box in the anatomy of an AI agent and the broader agentic AI production patterns I teach. Classifier gates sit where actions leave the agent. Bugbot sits where code leaves the laptop. Neither replaces typed tools or an owner who answers when the agent misbehaves at 3 a.m.
What shipped in June 2026 (release overview)
- Auto-review is the default run mode for new Cursor users; existing users enable it in Settings > Agents.
- Shell, MCP, and Fetch tool calls route through allowlist, sandbox, or classifier in that order.
- Cursor SDK (TypeScript and Python) exposes local.autoReview and permissions.json steering for headless runs.
- Bugbot on Composer 2.5: ~90 second average review (down from ~5 minutes), ~10% more bugs found, ~22% lower cost.
- /review, /review-bugbot, and /review-security run Bugbot locally before push; patch IDs sync with GitHub/GitLab.
Auto-review: autonomy as a dial, not a switch
Cursor's Auto-review blog post frames the problem correctly. Agents should move freely when stakes are low and slow down when an action crosses a meaningful boundary. Auto-review applies to Shell, MCP, and Fetch tool calls. Allowlisted calls run immediately. Sandboxable calls run sandboxed. Everything else goes to a classifier subagent that can allow, suggest a safer path, or surface a standard approval prompt.
Cursor reports roughly 7% of chats in Auto-review mode hit at least one interruption, versus about 40% of actions blocked under some enterprise allowlist setups. That is the throughput win. The caveat, which Cursor states in forum and docs, is that the classifier is non-deterministic. It can miss in both directions. I use Auto-review to reduce prompt fatigue on trusted read paths. I do not use it as the only guardrail before an execute-mode agent touches prod.
How the three-step flow works
- Allowlist match: run immediately (good for known read-only inspections).
- Sandbox: run with filesystem and network restrictions where the platform supports it.
- Classifier: everything else gets contextual review with feedback routed back to the parent agent.
The SDK mirror matters for teams running headless agents. The June SDK release exposes local.autoReview and permissions.json steering via autoRun.allow_instructions and autoRun.block_instructions. That is the same pattern I want in CI: not unrestricted yolo mode, not interactive allowlist hell, but a written policy the classifier can apply. If those headless runs authenticate through a Claude subscription seat, read the June 15 Agent SDK billing checklist before you turn autoReview on in cron. Billing and guardrails changed in the same week.
{
"autoRun": {
"allow_instructions": [
"Read-only inspections of ./dist and test output are fine.",
"git status, git diff, and ripgrep searches may run without prompt."
],
"block_instructions": [
"Always pause rm, drop, truncate, and any write under ~/.claude or mcp.json.",
"Pause outbound curl to non-allowlisted domains."
]
}
}How Auto-review differs from a global allowlist
Pre-push /review: close the loop before CI
The second half of the June tooling wave is when review happens. Cursor's Bugbot June update cut average review time from about five minutes to about ninety seconds on Composer 2.5, with more bugs found per run at lower cost. The workflow change is /review (or /review-bugbot and /review-security) before you open a PR.
If you run /review locally and then push the same diff, Bugbot on GitHub or GitLab recognizes the patch, skips a redundant scan, and leaves a comment noting it already reviewed that diff. That closes a gap I see constantly: the agent wrote the code, nobody reviewed it until CI, and CI only sees what the agent chose to show. Moving review to pre-push makes agent-generated diffs go through the same gate human diffs should.
The workflow I recommend for agent-written PRs
- 01Agent completes the task in CursorLet Auto-review handle mixed-risk tool calls during the session. Do not disable the gate to "go faster" on credentialed machines.
- 02Run /review before git pushFix Bugbot and Security Review findings in the same session while context is hot.
- 03Push and open PRBugbot on the host recognizes the patch ID and skips duplicate work.
- 04Human owns merge decisionReview gates catch defects; they do not replace ownership. Someone still signs the merge.
Where this sits in the four-part agent anatomy
Auto-review and pre-push review are guardrail-layer tools. They do not replace memory, tools, or the loop. They sit at the boundary where actions leave the agent and touch your systems. I map them onto the anatomy of an AI agent: memory holds context, tools execute, the loop decides, guardrails vet. Classifier gates are guardrails. Bugbot is a guardrail on code quality and security findings before merge.
They also do not fix bad tools. A classifier that approves a call to a tool returning null on failure still loses. The tool contract work in your agents are not broken, your tools are comes first. Governance on top of sloppy tools just governs sloppy outcomes faster.
How to adopt Auto-review and /review step by step
- 01Turn on Auto-review for local agent workSettings > Agents for IDE users. For SDK scripts, set local.autoReview and write block_instructions for destructive shell patterns (rm, drop, truncate, credential paths).
- 02Add /review to the agent workflow before pushTreat it like lint for agent sessions. If Bugbot flags a security issue, fix before PR, not in review comment thread tennis.
- 03Keep secret scanning at the commit boundaryAuto-review does not replace scanning where the agent writes. The supply-chain playbook in your agent's supply chain is the attack surface still applies, and MCP-scoped least privilege is the same story I told in MCP goes stateless: shrink what the agent can reach, then gate what it can do.
- 04Measure interruptions, not vibesTrack how often classifiers block high-risk calls you care about vs block benign read paths. Tune allow_instructions until read-only work flows and deletes always pause.
Should you enable Auto-review on production credential machines?
Enable it for throughput on daily agent work if you also run secret scanning at commit time, pin MCP servers, and keep block_instructions aggressive on credential paths. Do not treat the classifier as a compliance control. In high-trust environments I still pair Auto-review with explicit allowlists for production database tools and supply-chain hygiene on extensions and skills.
Common mistakes
- Calling Auto-review "zero trust" because marketing language sounded reassuring.
- Skipping tool registry hygiene because the classifier "usually catches" bad calls.
- Running unrestricted headless SDK agents in CI because CLI parity for Auto-review was still catching up.
- Reviewing only in GitHub after the agent already pushed broken security patterns to a branch.
- No eval set for agent outputs, so you cannot tell if review gates improved completion rate or just slowed the loop (evals beyond happy path).
What comes after June 2026 in agent governance
Cursor has been explicit that Auto-review is early and focused on local desktop agents today, with the same ideas expected to spread to more surfaces. Watch for CLI parity, tighter MCP-scoped classifiers, and enterprise policy packs you can check into repo. The direction is clear: agents get more autonomy by default, but autonomy becomes policy-as-code rather than a single global toggle.
Best practices for agent autonomy in production
- Write block_instructions before allow_instructions so destructive patterns always pause.
- Run /review on every agent-generated diff before push, not only on "big" changes.
- Keep MCP scopes least-privilege; Auto-review does not fix over-scoped servers.
- Log classifier blocks to your observability stack (agent observability stack).
- Name an owner for agent workflows who tunes policy when false positives spike.
Conclusion
June 2026 agent tooling finally treats autonomy as contextual. That is the right abstraction. Auto-review and pre-push review are worth adopting this week if you run background agents daily. They are not worth treating as the whole governance story. Pair them with typed tools, observable runs, and an owner who answers when the agent misbehaves at 3 a.m.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.