AI Agent Survival Guide, Part 1: The Repo You Didn't Scan

Written by Andrew Filev | Feb 12, 2026 12:13:19 PM

This is Part 1 of a four-part series.

Part 2: Chasing the Nine-Tailed Fox

Part 3: That MCP Server You Just Installed

Part 4: Your Agent Army Awaits

This year we are going to see a steady beat of two progressing themes:

All the wonderful things AI agents can do for us, in a more and more autonomous way, and
All the security breaches that the combination of network access, shell control, and YOLO mode brings to bear.

A week doesn't pass without a new vibe-coded repo that people plug into their AI agents that have network and bash access. For some of you, this is brand new territory, so I'll try to explain as we go. Some of you may think you're already familiar with most of these, but because it's a new muscle, repetition will still be helpful. For example, last week someone dropped a link in my feed for a cool open-source pentesting bot. My immediate reactions were:

Things are going to get spicier very quickly, and
I bet most people didn't think that the bot itself is subject to prompt injections.

Those thoughts—and a string of very real breaches in January—are why I wrote this series.

Why agentic tooling changes the risk model

"Normal" open-source libraries are mostly passive. You import them, they run inside your app, and the blast radius is constrained by your runtime and your code paths.

Agentic tooling flips that:

It often runs outside your app (CLI, CI, a bot user, a server, a background worker).
It's designed to be autonomous (it will do things without a human in the loop).
It frequently needs broad permissions to be useful (repo read, write access, tokens).
It's built by small teams moving fast (which is good) and sometimes without mature security posture (which is predictable). Some of those repos are proudly vibe-coded without even asking the agent to do a basic security code review.

None of this is moral failure. It's physics.

Most "getting started" guides optimize for one thing: time-to-wow. Install, paste a token, run with broad permissions. That's fine for a demo. It's not fine as a default.

This isn't theoretical

Late January, security researcher Jamieson O'Reilly demonstrated three attack vectors against OpenClaw (100,000+ GitHub stars). Nearly a thousand instances were found with open gateway ports and no authentication—giving attackers access to every integrated system: Signal messages, credentials, conversation histories.

Zenity Labs then showed the deeper problem: a prompt injection hidden in a document the agent processes during routine work is enough to create a persistent backdoor and escalate to full system compromise. No software vulnerability required—every step abuses the agent's intended capabilities.

On the supply chain side, 1Password's security team discovered that hundreds of OpenClaw skills distributed macOS malware through a coordinated ClickFix-style campaign. A top-downloaded "Twitter" skill directed users to install a fake prerequisite—the install instructions decoded obfuscated payloads, removed macOS Gatekeeper protections, and dropped an infostealer targeting browser sessions, saved credentials, developer tokens, and SSH keys. The skill's SKILL.md file was the attack vector: not code, but instructions that looked like setup steps.

Early February, Wiz security researchers discovered that Moltbook—a popular social network for AI agents—used a Supabase publishable key in its client-side JavaScript (normal for Supabase) but never configured Row Level Security. Without RLS, that publishable key gave unauthenticated, full read/write access to the entire database. 1.5 million API tokens. 35,000 email addresses. 4,060 private conversations between agents, some containing plaintext third-party API credentials. Complete account takeover for any user—not because the key was exposed, but because nothing behind it enforced access control.

And AI pentesting tools like Shannon can now find and exploit these vulnerabilities automatically, at $50 per scan, outperforming human pentesters. Any security weakness in the repos you depend on will be found faster than ever.

The implication: checking your open-source dependencies is no longer optional. It's urgent.

At minimum, you should be doing this

There are many steps we could take to harden our setups, but the more complex they are, the less likely we are to follow them, and security practices only work if they are followed. Could we automate some of it to avoid relying on memory or discipline? Let's see if we can use AI agents to help us secure our AI agents.

Here's the idea: teach your AI coding agent to gate every git clone of external code with a fast security scan, automatically.

You're already using an AI agent to write code. That same agent can search the web for known vulnerabilities, shallow-clone a repo, scan the code for red flags, and give you a summary table in 30-60 seconds. You just need to tell it to do this by default, which is where "AI skills" come in handy (a new way to teach your old AI dogs new tricks, supported by Claude Code, Zen CLI, Codex, and others).

Install the skill

I've published this as an open-source skill you can install in any AI coding CLI (Claude Code, Zen CLI, Codex, and others):

Skill	What it gates	Link
git	Every `git clone` of a public repo	git/SKILL.md

It delegates to a shared oss-security-check engine that contains the scanning methodology. Once installed, it triggers automatically—the agent does it for you, every time.

(In Part 3, we publish companion skills for MCP server installs and skill imports—same engine, additional checks specific to each attack surface. In Part 4: Your Agent Army Awaits, we put multiple AI agents to work together — multi-agent specs, cross-model code review, and building security into your workflow.)

How it works

You read a tweet about some new cool OSS repo and ask your agent to clone it, per usual.
The agent recognizes that it needs to use the git "skill", and checks if the repo is external. It determines whether it's public vs. private repo. Private/org code passes through instantly—no friction. You can harden this part further if you want. To give you some recursive meta here: you can ask your agent to

Edit the git skill's markdown file in a way that repos in the <your-org> GitHub organization and repos hosted on git.yourcompany.com skip the security gate. All other public repos should be scanned before cloning.

One prompt, and the skill is tailored to your org's trust boundary.

If it's external, the agent runs a security assessment. It searches the web for known issues with the repo and author, shallow-clones to a temp directory, and scans the code. [Again, if you have specialized tools, policies, or prompts, you can go further.]
You see a summary table and decide: proceed, sandbox, or abort.

What the scan actually looks for

The assessment checks two layers:

Layer 1: Reputation and history (web search). This is the highest-signal check and the one most people skip.

Known issues — The agent searches for "owner/repo vulnerability OR CVE OR malware", checks GitHub Security Advisories, and scans issues for security-related reports.
Author reputation — Is the author/org established? Do they have other maintained projects?
Repo age and activity — A brand-new repo or one with a sudden spike after months of dormancy deserves extra scrutiny.

A repo with known incidents should be flagged before you even look at the code.

Layer 2: Code scan. The agent shallow-clones (--depth 1) to a temp directory and checks:

Outbound network surfaces — HTTP clients, webhook URLs, analytics/telemetry, hardcoded external hosts.
Secret handling — Where env vars are read, whether secrets are logged or persisted to disk, hardcoded API keys. (Missing RLS on a publishable Supabase key is how Moltbook exposed 1.5 million tokens.)
Action surfaces — Shell execution, file writes outside its own directory, git push/PR operations, cloud API calls.
Dependency risk — npm audit / pip audit results, curl | bash install patterns, unpinned dependencies.

Both layers feed into a single summary:

## OSS Security Check: org/repo

| Category               | Finding              | Risk |
|------------------------|----------------------|------|
| Known issues (web)     | No CVEs, author est. | ✅   |
| Repo age & activity    | Created 2 weeks ago  | ⚠️   |
| Outbound network       | Posts to analytics   | ⚠️   |
| Secret handling        | Reads .env, no logs  | ✅   |
| Action surfaces        | Shell exec, gated    | ✅   |
| Dependencies           | 3 high-severity CVEs | ❌   |
| Prompt injection       | Automated scan only  | ⚠️   |

Recommendation: Medium risk — clone to sandbox, use throwaway credentials

Then you decide. 30-60 seconds of your agent's time, evidence instead of vibes.

Make it a team default

This reminds me of the early days of the internet's widespread adoption (though things are much faster now), where people blessedly created their "123password"s and left admin consoles open on static IP addresses. Unfortunately, the world wasn't kind to them - with "worms", botnets, malware, and other unpleasantries. We were all forced to invest more effort and cognitive cycles into cyber.

If you want to avoid the 2026 version of "123password," I hope this series will give you some basic knowledge and more importantly, some basic automatic tooling to ease that cognitive load.

Coming up next

This post covered the practical defense: automate the scan, make it a team default. But the scanner itself reads untrusted code—which raises an uncomfortable question.

In Part 2, we audit a 6K-star AI pentesting tool and discover what bypassPermissions means when an agent reads untrusted content. In Part 3, we look at the MCP server and AI skill supply chain—where the "install" command is the attack vector. And in Part 4: Your Agent Army Awaits, we turn these security skills into multi-CLI productivity tools, orchestrate multiple agents together, and build security review into your vibe coding workflow so it happens before you push — not after something breaks.

Continue to Part 2 — Chasing the Nine-Tailed Fox

View full post