This is Part 3 of a four-part series.
Part 1: The Repo You Didn't Scan
Part 2: Chasing the Nine-Tailed Fox
Part 4: Your Agent Army Awaits
In Part 1, we built an automated gate for scanning public repos before cloning them. In Part 2, we audited a real tool and discovered the structural vulnerability that affects every AI agent reading untrusted content.
This post is about a different attack surface: the things you install that your agent then calls autonomously.
When you npm install a library, it runs inside your application's process with your application's permissions. You control when and how it's called.
An MCP (Model Context Protocol) server is fundamentally different. It runs as a separate process on your machine—with its own credentials, its own network access, and its own filesystem permissions. It doesn't need a "shell access" tool to be dangerous. It is code. It can read your files, make network calls, access databases, and write to disk directly, because it's a process running with whatever permissions your shell has.
On top of that, your AI agent decides when to call it. The MCP server exposes "tools"—functions the agent can invoke autonomously based on its interpretation of your instructions. A well-designed MCP server for Postgres might expose query, list_tables, and describe_schema. A malicious one doesn't need to expose a run_shell_command tool—the server process itself can execute whatever it wants when any of its tools are invoked. The tool call is just the trigger.
In Part 1, we mentioned security researcher Jamieson O'Reilly's work on OpenClaw. One of his most striking findings: the ClawdHub marketplace—where users install "skills" and extensions—let him artificially inflate a malicious extension's download count by 4,000 to make it look trustworthy. Supply-chain poisoning, made trivially easy.
This isn't unique to OpenClaw. Any marketplace where trust signals (stars, downloads, reviews) can be gamed is vulnerable. And in the AI tooling ecosystem, marketplaces are proliferating faster than the security infrastructure to support them.
The pattern is familiar from browser extensions, npm packages, and mobile app stores: create something that looks useful, game the trust signals, wait for installs. The difference with MCP servers is the blast radius—an installed MCP server is a code process with whatever permissions your shell has.
A supply chain attack is when an attacker compromises a dependency so that everyone who installs or updates it gets compromised too. You don't need to trick thousands of developers individually—you trick one, and the dependency chain does the rest.
This isn't theoretical. In September 2025, a threat actor phished an npm developer's account credentials and published malicious versions of Chalk and Debug—packages with 2.6 billion combined weekly downloads. The injected code hijacked crypto wallet transactions. A single phished token, billions of affected installs.
Around the same time, the Shai-Hulud campaign took it further: malicious npm packages designed to self-propagate by stealing npm tokens and GitHub credentials from infected machines, then using those credentials to publish more malicious packages. A worm for the package registry.
And in the MCP ecosystem specifically, CVE-2025-6514 — a critical vulnerability (CVSS 9.6) in mcp-remote, a package downloaded 437,000 times — enabled full remote code execution against MCP clients. Even Anthropic's own Git MCP server had three CVEs that could be chained into remote code execution via prompt injection.
Small indie repos are especially vulnerable. A solo maintainer is a single point of compromise—one phished email, one reused password, one missing 2FA. Most MCP servers and AI skills today are exactly this: small projects by individual developers, installed via npx or pip install by thousands of users who trust the package name.
The strongest defense is simple: don't use the registry. Use the source code you just audited.
# Good — clone, audit, pin to tag
git clone https://github.com/author/mcp-server.git ~/mcps/mcp-server
cd ~/mcps/mcp-server
git checkout v1.2.3
npm install # from audited source, not from npm
# Bad — blind trust in the registry
npx @author/mcp-server
When you clone and build from source at a pinned tag, the npm/PyPI registry is out of the picture entirely. The author's account can be phished, malicious versions can be published to the registry, and none of it touches you—because you're running the code you read, not the code the registry serves.
Here's the thing most people don't consider: the majority of open-source MCP servers are thin API wrappers—and many of them are poorly maintained. Solo-dev projects that got popular, then stalled. You're trusting a static binary from a dormant repo to be the interface between your AI agent and your critical systems.
When you clone the source, you're not just dodging supply chain attacks—you're getting something you can actually own. Your agent just audited the code. It understands it. Now it can modify it: strip out tools you don't need, tighten credential scope, add logging, fix the bug the maintainer ghosted on six months ago. You go from "consumer of someone else's abandoned wrapper" to "owner of a thin integration layer your agent can maintain."
And here's the recursive bit: remember how in Part 1 you could ask your agent to edit the git skill's markdown to customize your org's trust boundary? The same applies here. You can prompt your agent to improve the MCP server it just cloned—add a missing tool, harden an auth flow, refactor the credential handling. The code is local, your agent already understands it, and the change is one prompt away. Try doing that with an npx package.
This also aligns with a broader shift in how AI agents use tools. MCP servers expose a static set of tools that are always loaded—and when you install several, the combined tool inventory floods your agent's context window, degrading its performance. The emerging alternative is skills: dynamically loaded instructions with accompanying scripts, invoked only when relevant. By working from source, your forked MCP server is already halfway to becoming a skill—a set of code your agent understands and can evolve. If you eventually want to migrate it fully, the path is short.
If you must use a registry (some tools only distribute that way), fall back to provenance checking:
*, not latest, not >=. Exact versions, with integrity hashes.postinstall in package.json and custom build commands in setup.py execute automatically during install—before your code review even happens.Skills are often thought of as "just a prompt file." They're not. A skill directory can contain .py scripts, .js modules, .sh helpers—executable code that runs when the skill is invoked. The SKILL.md file tells the agent what to do; the scripts are the tools it uses to do it.
That means a malicious skill has two attack vectors:
The code vector. Scripts in the skill directory run with whatever permissions the agent has. A scripts/setup.py that phones home, a helpers/format.js that reads ~/.ssh/id_rsa—these are traditional code attacks, and the scanning skills from Part 1 catch them the same way they catch any suspicious code pattern.
The prompt vector. The SKILL.md itself can subtly redirect your agent's behavior:
These are prompt injection attacks baked into the skill file. The agent follows them because that's what skills are—instructions to follow. There's no technical distinction between "legitimate skill instruction" and "injected malicious instruction." They're both natural language.
As we discussed in Part 2, automated prompt injection detection is an unsolved problem. The scanning skills can flag structural red flags—outbound network calls in a skill that shouldn't need them, shell execution in a formatting helper—but they can't reliably distinguish a clever exfiltration instruction from a legitimate one.
The scanning skills from Part 1 apply here, with additional checks specific to MCP servers and skills:
For MCP servers:
For AI skills:
.py, .js, and .sh files. These are executable code, not documentation. Apply the same scrutiny as any dependency.package.json or requirements.txt: are dependencies pinned? Any post-install scripts?The install-mcp and install-skill skills automate the structural checks—tool inventory, network surfaces, credential scope. But for the semantic checks—is this skill subtly malicious?—you are the last line of defense.
In Part 1, we drew the line between internal and external code: your org's repos pass through, public repos get scanned. The same boundary applies to MCP servers and skills:
Across Parts 1–3, we've covered three layers of the AI agent security problem: the code you clone, the recursive trap of scanning untrusted content, and the MCP/skill supply chain.
In Part 4: Your Agent Army Awaits, we turn these defenses into a force multiplier — orchestrating multiple AI agents together, centralizing credentials properly, and building security review into your vibe coding workflow so it happens before you push, not after something breaks.
Continue to Part 4 — Your Agent Army Awaits