Your Agent's Tool Belt Is Now Your Largest Attack Surface


A security story and a platform story collided this week — the security one matters more. If you only have one minute

  • Your agent's most dangerous code may not be the code it writes, but the tools it is allowed to call.
  • Tenet Security and CSA Labs found roughly 2,388 organizations exposed to a Sentry-MCP injection path.
  • Amazon, Cloudflare, and Vercel are all shipping agent harnesses while the MCP threat model is still catching up.
  • Block says Builderbot now ships about 15% of production code and 1,500 PRs per week, which raises the security stakes.
  • The next serious agent incident will probably look less like bad autocomplete and more like trusted tooling gone sideways.

The harness rush has a threat-model problem

Your agent's tool belt is now your largest attack surface. That is the uncomfortable read from a week where the Sentry-MCP "agentjacking" disclosure landed beside Amazon Bedrock AgentCore GA, Cloudflare Flue, and Vercel eve.

I do not read those as separate stories. I read them as one sentence: the industry is professionalizing agent execution before it has finished threat-modeling agent authority.

Last month I would have told you the main risk in AI coding agents was still output quality. Does the patch compile? Does the test pass for the right reason? Did the agent preserve the team's patterns? Those questions still matter. But I changed my mind about the center of gravity.

A good-looking agent session with a poisoned tool call in the middle is now the scarier failure mode.

Tenet Security and CSA Labs described a Sentry-MCP injection path where a public Sentry DSN could become a route into Claude Code, Cursor, and Codex-style workflows. Their estimate was roughly 2,388 exposed organizations at disclosure. That number is not the whole story, but it is enough to make the category real.

The lesson is not "Sentry is bad" or "MCP is doomed." The lesson is that observability, issue trackers, repos, package registries, docs, terminals, and deployment tools are becoming writable parts of the agent's nervous system. The old security model assumed a developer inspected those surfaces. The new one often asks an agent to summarize them, reason over them, and act.

If an agent can read a tool, that tool can prompt the agent. If an agent can act on a tool, that prompt can become authority.

That sentence should be in every MCP rollout RFC.

This is not the same as the runtime moat

This edition is adjacent to the runtime conversation, but it is not the same argument.

The runtime question asks: who owns the durable loop, memory, observability, sandbox, and retry path that lets agents keep working after the first prompt? That was the platform fight.

This week's question is different: once that loop exists, what can the agent touch, and who reviews the permissions?

Amazon's AgentCore harness puts managed agent runtime into a familiar cloud shape. Cloudflare's Flue and Vercel's eve point at durable execution as a primitive. That is useful. Engineers should not have to hand-roll every retry loop, sandbox, and session resume path.

But managed execution also makes trust feel finished too early. A polished harness can make a messy permission graph look safe because it is configured in YAML, approved in an RFC, and wrapped in vendor docs.

Ask the boring questions anyway:

  • Which MCP servers can write, not just read?
  • Which tools can see secrets, stack traces, customer data, or private issue text?
  • Which prompts come from external systems the team does not control?
  • Which agent actions require human review during on-call or incident response?
  • Where will the postmortem find the full tool-call transcript?

MCP config review is the new code review.

Throughput makes the blast radius bigger

Block's Builderbot disclosure is the number that makes this more than a security niche. Block says Builderbot is responsible for about 15% of production code and 1,500 PRs per week.

I like that number because it moves the agent debate out of vibes. It asks a practical question: how much merged work is flowing through the agent path?

But the same number changes the risk model. When an agent accounts for 1,500 PRs a week, its tool permissions deserve the same seriousness as CI credentials and deploy tokens.

A staff engineer reviewing an agent rollout should not only ask whether the agent saves time. They should ask what happens when the agent believes the wrong thing from a trusted tool.

That is where MCP config review becomes a real engineering practice. Not a security team's afterthought. Not a checklist after procurement. A normal part of the change: diff the server list, review scopes, test hostile inputs, record tool calls, and decide where human approval sits.

The teams that win with coding agents make authority explicit instead of banning every tool.

⚡ Tech news weekly roundup

  • Agentjacking turns Sentry-MCP into agent attack surface
    Tenet Security's agent-jackstop repo is the first agent-security artifact I would put in an MCP rollout review.

  • Block says Builderbot ships 15% of production code
    Block's post gives engineering leaders a harder metric than demo quality: merged work per week.

  • Amazon makes the agent harness a cloud primitive
    AgentCore harness GA says the managed-agent market is moving from SDKs to operating surface.

  • Cloudflare and Vercel race toward durable agent execution
    Flue and eve point at the same prize: agents that can pause, resume, and survive messy real workflows.

  • Claude Code Artifacts reframes review as session inspection
    Anthropic's artifacts beta turns the agent transcript into a review object, which may matter as much as the final diff.

💰 Funding & valuation

Company Round Editor read
Undo $37M Runtime debugger context is becoming agent input; static repo context alone is starting to look thin.
Conduct $60M Enterprise buyers are underwriting modernization, not greenfield demos, because old systems hold the budget.
Devplan $2.5M seed The bottleneck is moving upstream from code to coordination, where specs and priorities become agent fuel.

History byte: the Morris Worm's warning

In 1988, Robert Tappan Morris released a program that was supposed to measure the size of the internet. It exploited weaknesses in fingerd and sendmail, copied itself across machines, and spread much faster than intended. The network was still small enough that many systems trusted one another by habit. That trust became the bug.

The Morris Worm affected a meaningful slice of the early internet and helped lead to the creation of CERT. More important, it changed the field's posture. Network services were no longer friendly pipes between known machines. They were attack surface.

MCP has a similar innocence right now. It was built so agents could collaborate with tools. Agentjacking is the reminder that collaboration channels carry hostile input too.

The fix was not to stop connecting systems — it was to stop assuming connections were safe.

Every protocol built on trust gets its Morris moment.

Reflection

If your team had to review every MCP server like a production dependency, which one would make the on-call engineer most nervous?

📚 Resources for the AI native engineer

Latest in newsletter

Newsletter • Jun 15, 2026
YOUR EVAL SUITE IS THE SOURCE CODE THAT MATTERS NOW

Newsletter • Jun 8, 2026
THE RUNTIME LAYER IS WHERE AI CODING GETS DEFENSIBLE

Newsletter • Jun 1, 2026
THE ORCHESTRATOR IS THE PRODUCT. THE MODEL IS A COMMODITY.

By Neeraj Khandelwal • Zencoder • https://zencoder.ai/newsletter