What is Agentjacking?

Agentjacking is an attack where poisoned observability content (fake stack traces, log lines, or ticket bodies) tricks a coding agent into running attacker-controlled commands during normal triage. The repository is not compromised; the agent is manipulated through data it treats as ground truth.

Which coding agents are vulnerable to Agentjacking?

Tenet Threat Labs demonstrated the attack against Cursor, Claude Code, and Codex in controlled tests, with more than 100 agent actions on injected errors and roughly 85% success in their lab. Any agent that ingests external error text and can invoke shell tools is in scope, regardless of vendor.

Does branch protection or code review stop Agentjacking?

No. The attack does not require git write access or merging malicious code. It exploits the agent's incident workflow: read an error, run remediation commands. Repo protections never see the malicious instruction because it lives in the observability feed, not the tree.

How do I harden Sentry-to-agent workflows?

Treat Sentry and similar feeds as untrusted input. Audit public DSN exposure, scope MCP read tools for triage-only agents, split triage from remediation with a human gate, block shell commands parsed from raw stack traces, and log every shell invocation during triage sessions.

Is Auto-review enough protection during on-call triage?

No. Classifier gates optimize throughput for mixed low-stakes actions. Incident remediation commands often look legitimate to a classifier. Pair Auto-review with explicit block_instructions, separate read-only triage agents, and human approval before execute-mode remediation.

What is agent-jackstop?

agent-jackstop is an open-source configuration package from Tenet Security with drop-in hardening rules for coding agents against untrusted telemetry. Use it as a baseline, then layer org-specific MCP scoping and triage/remediation separation on top.

Agentjacking: Fake Sentry Errors Hijack Coding Agents

In this post (8 sections)

In this post

Introduction

On June 17, Tenet Threat Labs published Agentjacking: a demonstration that a fake Sentry error report can redirect coding agents into executing attacker shell commands. They tested Cursor, Claude Code, and Codex. More than 100 agents acted on injected errors in their lab setup. Roughly 85% of attempts succeeded. The attack needs no repository write access, no compromised dependency, and no prompt injection in your codebase. It only needs your agent to ingest observability output you told it to trust.

I have been saying since May that the agent supply chain is the attack surface. Agentjacking is the observability branch of that story. When an engineer asks an agent to "investigate this Sentry issue," the stack trace becomes instructions. If the stack trace is attacker-controlled, the investigation becomes execution. This post is the hardening checklist I would run on any engagement where agents read production errors through MCP.

What Agentjacking is (and why it bypasses repo security)

Classic supply-chain attacks compromise code you pull in: a poisoned npm package, a trojanized VS Code extension, a malicious skill file. Agentjacking skips that entirely. The attacker publishes or injects content into a channel your agent already reads: an error reporting endpoint, a log stream, a ticket body, a CI artifact summary. The agent interprets that content as facts about a failure. Embedded instructions in the fake stack trace or error metadata become the next actions.

Tenet's Sentry vector works because many teams expose a DSN in client-side code or public repos. An attacker who knows the DSN can submit crafted error events. When your on-call workflow pipes Sentry issues into a coding agent ("pull the latest critical error and fix it"), the poisoned event arrives looking identical to a real production failure. Approval UX does not help if the agent believes it is remediating an incident.

Agentjacking vs classic agent supply-chain attacks

Vector	Touches your git repo?	Typical entry	What the agent trusts
Poisoned npm / extension	Often yes	Dependency install	Code in the tree
Malicious SKILL.md	Sometimes	Skill load	Instruction file
Agentjacking (observability)	No	Sentry / logs / tickets	Error text as ground truth
Prompt injection in PR	Yes (content)	Diff or comment	Repository text

How the Sentry injection attack chain works

The chain Tenet documented is short enough to fit in a standup, which is why it scares me.

01
Attacker learns or guesses a public DSN
Client-side Sentry configs, leaked env files, or public frontend bundles often expose project DSNs. DSNs are not secret keys in the way API keys are, but they are write endpoints for your error stream.
02
Attacker submits a crafted error event
The stack trace and exception message contain shell commands or instructions framed as "remediation steps." Tenet embedded directives that looked like debugging guidance.
03
Agent ingests the issue through MCP or API
Your workflow asks the agent to triage open Sentry issues, reproduce locally, or apply a hotfix. The poisoned issue is indistinguishable from a real one at the text layer.
04
Agent executes attacker commands
With Auto-review, headless mode, or permissive allowlists, shell tools run because the agent believes it is fixing production. Tenet reported high success across Cursor, Claude Code, and Codex in their lab.

Tenet open-sourced agent-jackstop drop-in configs to harden agents against untrusted telemetry. I treat that repo as a starting point, not a substitute for org policy. The configs help; the policy is what survives employee turnover.

Why classifier gates alone do not save you

If you read my Auto-review and pre-push review guide, you know I like classifier gates for throughput. Agentjacking is the counterexample. The requested action looks legitimate in context: curl a diagnostic endpoint, run a cleanup script, fetch a "patch" URL. The classifier sees an agent responding to an incident, not an attacker.

Auto-review reduces prompt fatigue on read-only paths. It is not a guarantee that commands sourced from external error text are safe. Pair it with block_instructions that pause any shell invocation whose arguments came from an unverified observability payload, and with human approval for writes during incident mode.

Surfaces most exposed to Agentjacking

MCP servers that wrap Sentry, Datadog, PagerDuty, or Jira issue bodies.
Headless triage loops: "every hour, pull critical Sentry issues and propose fixes."
Auto-review or auto mode during on-call hours when engineers want speed.
Shared DSNs across staging and prod where staging is easier to poison.

The hardening checklist I run after Agentjacking

01
Inventory every observability-to-agent path
List MCP tools, webhooks, and cron jobs where error text enters an agent context. If you cannot draw the path on a whiteboard in two minutes, you do not control it yet.
02
Treat observability output as untrusted input
Same discipline as user-generated content in a RAG corpus. Sanitize, scope, and never pass raw stack traces straight into a shell tool without a human-named approver for execute mode.
03
Audit DSN and webhook exposure
Grep public repos and frontend bundles for Sentry DSNs. Rotate if exposed. Restrict ingest to known environments where the platform allows it.
04
Scope MCP read tools narrowly
An agent that triages production does not need write tools on the same turn. Split read-triage agents from write-fix agents with an explicit handoff and human gate.
05
Add block_instructions for incident-sourced commands
In permissions.json or equivalent, block curl/wget/bash when the parent prompt references external issue IDs unless a human explicitly names the command. Cursor SDK local.autoReview supports this pattern.
06
Log and alert on agent shell from triage workflows
Wire the agent observability stack so any shell invocation during a Sentry-triage session generates a structured event. Review weekly.

Example: split triage from remediation

The pattern I recommend is two agents, not one hero agent. Agent A reads Sentry through MCP and outputs a structured summary: file, line, hypothesis, suggested diff. Agent B only runs after a human approves the summary. Agent A never gets shell write tools.

Triage agent (read-only MCP):
  tools: sentry.list_issues, sentry.get_event, repo.read_file
  output: JSON { hypothesis, proposed_patch, confidence }

Remediation agent (human-gated):
  tools: repo.write, shell.test_only
  input: approved JSON from triage
  rule: no shell args parsed from raw stack trace strings

How Agentjacking fits the broader supply chain

Layer four in my supply chain table was "tool definitions and the data they return." Agentjacking is what happens when you forget that return data is input. A Sentry MCP tool returns text. That text is as executable as a SKILL.md if your agent architecture treats it as instructions.

If you are rolling out MCP Enterprise-Managed Authorization to fix OAuth fatigue, do not stop there. EMA controls who can reach the connector. It does not validate the semantic content of each issue. Content trust is still your problem.

Common mistakes I expect this quarter

Assuming repo branch protection saves you. Agentjacking never needed a commit.
Giving triage agents the same tool bundle as feature agents because "it is faster."
Treating public DSNs as low risk because they are "meant to be client-side."
Relying on Auto-review alone during incident response.
Skipping red-team exercises on observability feeds because "we only read internal Sentry."

Conclusion

Agentjacking is not theoretical. Tenet demonstrated it across the three coding agents most of my clients run. The fix is not a better model. It is architecture: untrusted input boundaries, split triage and remediation, scoped MCP tools, and explicit blocks on executing commands that originated in error text. Run the checklist once on the Sentry-to-agent path and you will find at least one place where ground truth was assumed. That is the hole.

Sources: Tenet Threat Labs Agentjacking report at https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/; agent-jackstop configs at https://github.com/tenetsecurity/agent-jackstop.

Agentjacking is real: poisoned Sentry errors can hijack Cursor, Claude Code, and Codex without touching your repo

Introduction

What Agentjacking is (and why it bypasses repo security)

How the Sentry injection attack chain works

Why classifier gates alone do not save you

Surfaces most exposed to Agentjacking

The hardening checklist I run after Agentjacking

Example: split triage from remediation

How Agentjacking fits the broader supply chain

Common mistakes I expect this quarter

Conclusion

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Agentjacking is real: poisoned Sentry errors can hijack Cursor, Claude Code, and Codex without touching your repo

Introduction

What Agentjacking is (and why it bypasses repo security)

How the Sentry injection attack chain works

Why classifier gates alone do not save you

Surfaces most exposed to Agentjacking

The hardening checklist I run after Agentjacking

Example: split triage from remediation

How Agentjacking fits the broader supply chain

Common mistakes I expect this quarter

Conclusion

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Claude Code Artifacts turn terminal output into live review pages: what Team and Enterprise buyers should pilot first

The June 15 Claude billing change: Agent SDK credits, model retirement, and the checklist I run before anything breaks

Governing agent autonomy in 2026: Auto-review, pre-push review, and why approval prompts are not a security model