JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 37
  • Score
    100M100P100Q72052F
  • License MIT

AI Agent risk scanner โ€” detect security risks in skills, MCP servers & plugins. 29 rules mapped to OWASP Top 10 LLM, MITRE ATLAS & CWE standards. Offline, open source.

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@elliotllliu/agent-shield) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    ๐Ÿ›ก๏ธ AgentShield

    AI Agent Risk Scanner โ€” Detect security risks before they reach your agents

    AI Agent ้ฃŽ้™ฉๆ‰ซๆๅ™จ โ€” ๅŸบไบŽ OWASP/MITRE ๆ ‡ๅ‡†ๆฃ€ๆต‹ๅฎ‰ๅ…จ้ฃŽ้™ฉ

    npm License: MIT Tests Rules Standards

    Scan skills, MCP servers, and plugins for data exfiltration, backdoors, prompt injection, tool poisoning, and supply chain risks. Every finding is mapped to OWASP Top 10 for LLM, MITRE ATLAS, and CWE โ€” so you're reviewing established standards, not our opinions.

    Offline-first. AST-powered. Open source. Your data never leaves your machine.

    npx @elliotllliu/agent-shield scan ./my-skill/

    ๐Ÿ’ก Example Output

    ๐Ÿ›ก๏ธ  AgentShield Risk Report
    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    ๐Ÿ“ Target:  ./my-plugin
    ๐Ÿ“„ Files:   8 files, 262 lines
    โฑ  Time:    245ms
    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    
    ๐Ÿ“Š Risk Summary
    
      ๐Ÿ”ด LLM09: Supply Chain Vulnerabilities (3 high, 2 medium)
         https://genai.owasp.org/llmrisk/llm09-supply-chain-vulnerabilities/
      ๐ŸŸก LLM06: Sensitive Information Disclosure (1 medium)
         https://genai.owasp.org/llmrisk/llm06-sensitive-information-disclosure/
      ๐ŸŸข LLM01: Prompt Injection (2 low)
         https://genai.owasp.org/llmrisk/llm01-prompt-injection/
    
    ๐Ÿ“‹ Detailed Findings
    
      [LLM09: Supply Chain Vulnerabilities]
      Skill/plugin behavioral hijacking โ€” patterns that modify agent config,
      inject prompts, or override other skills.
      Standards: OWASP LLM09 ยท CWE-829 ยท ATLAS AML.T0049
      Research: Greshake et al. (2023) "Not what you've signed up for"
    
        โ”œโ”€ plugin.ts:15 โ€” Plugin injects content via before_prompt_build
        โ”œโ”€ install.sh:8 โ€” Config tampering: Modifies agent configuration
        โ””โ”€ SKILL.md:5  โ€” Behavioral override: forced usage directive
    
      [LLM06: Sensitive Information Disclosure]
      Patterns where sensitive data is read and sent via network requests.
      Standards: OWASP LLM06 ยท CWE-200 ยท ATLAS AML.T0048.004
    
        โ””โ”€ media.js:14 โ€” Reads sensitive data and sends HTTP request
    
    โœ… No security risks detected.     โ† Clean projects show this

    We are an X-ray machine, not a doctor. We show what patterns exist and cite established standards โ€” you decide what they mean for your use case.


    ๐Ÿ† What Makes AgentShield Different

    1. ๐Ÿ“š Standards-Based Detection

    Every finding is mapped to authoritative security frameworks:

    Standard Coverage Purpose
    OWASP Top 10 for LLM 26/29 rules Industry standard for LLM application security
    CWE 24/29 rules Common Weakness Enumeration (MITRE)
    MITRE ATLAS 7/29 rules Adversarial Threat Landscape for AI Systems
    Academic papers 4 rules Peer-reviewed research on prompt injection & tool poisoning

    We don't invent risk categories. We map code patterns to standards that industry experts have already established.

    2. ๐Ÿ”’ Runtime MCP Interception

    Other tools only scan source code. AgentShield also sits between your MCP client and server, intercepting every JSON-RPC message in real-time:

    # Insert AgentShield between client and server
    agent-shield proxy node my-mcp-server.js
    
    # Enforce mode: automatically block high-risk tool calls
    agent-shield proxy --enforce python mcp_server.py

    What it catches at runtime:

    • ๐ŸŽญ Tool description injection โ€” hidden instructions in tool descriptions
    • ๐Ÿ’‰ Result injection โ€” malicious content in tool return values
    • ๐Ÿ”‘ Credential leakage โ€” sensitive data in tool call parameters
    • ๐Ÿ“ก Beacon behavior โ€” abnormal periodic callbacks (C2 pattern)
    • ๐Ÿชค Rug-pull attacks โ€” tools changing behavior after initial trust

    3. โ›“๏ธ Cross-File Attack Chain Detection

    Most scanners check one file at a time. AgentShield traces data flow across your entire codebase:

    ๐Ÿ”ด Cross-file data flow (OWASP LLM09 ยท CWE-506):
       config_reader.py reads ~/.ssh/id_rsa โ†’ exfiltrator.py POSTs to external server
    
    ๐Ÿ”ด Kill Chain detected (ATLAS AML.T0049):
       Reconnaissance โ†’ Access โ†’ Collection โ†’ Exfiltration โ†’ Persistence

    4. ๐Ÿง  AST Taint Tracking (Not Regex)

    Uses Python's ast module for precise analysis โ€” dramatically reducing false positives:

    user = input("cmd: ")
    eval(user)          # โ†’ ๐Ÿ”ด Tainted input flows to eval (CWE-94)
    eval("{'a': 1}")    # โ†’ โœ… NOT flagged (safe string literal)
    exec(config_var)    # โ†’ ๐ŸŸก Dynamic, not proven tainted

    5. ๐Ÿ•ต๏ธ Skill Hijack Detection

    Detects multi-layer supply chain attacks targeting AI agent ecosystems:

    [LLM09: Supply Chain Vulnerabilities]
    Standards: OWASP LLM09 ยท CWE-829 ยท ATLAS AML.T0049
    
      ๐Ÿ”ด Plugin prompt injection: before_prompt_build + prependContext
      ๐Ÿ”ด Config tampering: Modifies agent configuration via CLI
      ๐Ÿ”ด Silent OTA: Downloads update then re-executes itself
      ๐ŸŸก Non-standard install source: non-registry domain
      ๐ŸŸก Behavioral override: forced usage directive in SKILL.md

    Real-world case study: detected a 3-layer supply chain attack where a published skill silently installed a CLI tool from a private CDN, which then injected prompts, modified agent config, and auto-updated without user consent.


    โšก Quick Start

    # Scan a skill / MCP server / plugin (29 rules, offline, <1s)
    npx @elliotllliu/agent-shield scan ./my-skill/
    
    # Scan with optional reference score
    npx @elliotllliu/agent-shield scan ./my-skill/ --score
    
    # Scan Dify plugins (.difypkg auto-extraction)
    npx @elliotllliu/agent-shield scan ./plugin.difypkg
    
    # Runtime interception (MCP proxy)
    npx @elliotllliu/agent-shield proxy node my-mcp-server.js
    
    # AI-powered deep analysis (uses YOUR API key)
    npx @elliotllliu/agent-shield scan ./skill/ --ai --provider openai --model gpt-4o
    
    # Discover installed agents on your machine
    npx @elliotllliu/agent-shield discover
    
    # JSON output for programmatic use
    npx @elliotllliu/agent-shield scan ./skill/ --json
    
    # SARIF output for GitHub Code Scanning
    npx @elliotllliu/agent-shield scan ./skill/ --sarif -o results.sarif
    
    # HTML report
    npx @elliotllliu/agent-shield scan ./skill/ --html

    ๐Ÿ” 29 Security Rules (Mapped to Standards)

    Risk Category: Code Execution (OWASP LLM09 ยท CWE-94)

    Rule Detects CWE
    backdoor eval(), exec(), new Function() with dynamic input CWE-94
    reverse-shell Outbound socket connections piped to shell CWE-506
    crypto-mining Mining pool connections, xmrig, coinhive CWE-400
    obfuscation eval(atob(...)), hex chains, packed code CWE-506
    python-security 35 patterns: eval, pickle, subprocess, SQL injection CWE-94
    go-rust-security 22 patterns: command injection, unsafe blocks CWE-676

    Risk Category: Data Safety (OWASP LLM06 ยท CWE-200)

    Rule Detects CWE
    data-exfil Reads sensitive data + sends HTTP requests CWE-200
    env-leak Environment variables + outbound HTTP CWE-526
    sensitive-read Access to ~/.ssh, ~/.aws, ~/.kube CWE-538
    credential-hardcode Hardcoded AWS keys, GitHub PATs, Stripe tokens CWE-798
    phone-home Periodic beacons to external endpoints CWE-200

    Risk Category: Tool Integrity (OWASP LLM07)

    Rule Detects Standard
    tool-shadowing Cross-server tool name conflicts ATLAS AML.T0052
    description-integrity Hidden instructions in tool descriptions OWASP LLM07
    mcp-manifest Wildcard perms, undeclared capabilities OWASP LLM07
    mcp-runtime Missing authorization, debug exposure CWE-862
    network-ssrf User-controlled URLs, SSRF patterns CWE-918

    Risk Category: Prompt Injection (OWASP LLM01 ยท ATLAS AML.T0051)

    Rule Detects Standard
    prompt-injection 55+ patterns: override, identity manipulation, TPA CWE-77
    multilang-injection 8-language injection: ไธญ/ๆ—ฅ/้Ÿ“/ไฟ„/้˜ฟ/่ฅฟ/ๆณ•/ๅพท CWE-77
    prompt-injection-llm LLM-evaluated semantic injection CWE-77

    Risk Category: Supply Chain (OWASP LLM09 ยท ATLAS AML.T0049)

    Rule Detects CWE
    skill-hijack Plugin prompt injection, config tampering, silent OTA CWE-829
    attack-chain Multi-stage kill chains (recon โ†’ exfil) CWE-506
    cross-file Coordinated attacks spanning multiple files CWE-506
    supply-chain Known CVEs in dependencies CWE-829
    typosquatting Package name squatting: 1odash โ†’ lodash CWE-829
    hidden-files .env with secrets, unexpected files CWE-538

    Risk Category: Permissions & Quality

    Rule Detects Standard
    privilege SKILL.md permissions vs actual behavior mismatch CWE-250
    skill-risks Financial ops, external dependencies OWASP LLM07
    toxic-flow Cross-tool data leak patterns CWE-502

    ๐Ÿ“Š AgentShield vs Alternatives

    AgentShield Snyk Agent Scan Tencent AI-Infra-Guard
    Standards mapping โœ… OWASP+CWE+ATLAS Partial โŒ
    Runtime MCP Interception โœ… MCP Proxy โŒ โŒ
    Cross-file Attack Chain โœ… โŒ Partial
    AST Taint Tracking โœ… Python โŒ Unknown
    Skill Hijack Detection โœ… 6 sub-categories โŒ โŒ
    Static Rules 29 6 Many (incl. infra)
    Multi-language Injection โœ… 8 languages โŒ English only Unknown
    100% Offline โœ… โŒ cloud required โœ…
    Zero Install (npx) โœ… โŒ Python + uv โŒ Docker
    VS Code Extension โœ… โŒ โŒ
    GitHub App + Action โœ… โŒ โŒ
    Open Source โœ… MIT โŒ โœ…

    ๐Ÿ“‹ Real-World Validation: 493 Dify Plugins

    We scanned the entire langgenius/dify-plugins repository:

    Metric Value
    Plugins scanned 493
    Files analyzed 9,862
    Lines of code 939,367
    Scan time ~120s

    6 plugins flagged with eval()/exec() executing dynamic code (CWE-94).

    Full report โ†’


    ๐Ÿ”Œ Integrate AgentShield Into Your Platform

    Running a skill marketplace, MCP directory, or plugin registry?

    Your platform lists hundreds of skills and plugins. Users install them into AI agents with access to files, credentials, and shell commands. AgentShield gives you:

    • Risk reports on every submission โ€” based on industry standards, not arbitrary scores
    • CI/CD gates โ€” fail PRs that introduce high-risk patterns
    • SARIF integration โ€” feed results into GitHub Code Scanning

    How to Integrate

    npx @elliotllliu/agent-shield scan ./skill --json
    {
      "totalFindings": 3,
      "summary": { "high": 1, "medium": 1, "low": 1 },
      "findings": [
        {
          "severity": "high",
          "rule": "skill-hijack",
          "file": "plugin.ts",
          "line": 15,
          "message": "Plugin injects content via before_prompt_build",
          "references": {
            "owasp": "LLM09: Supply Chain Vulnerabilities",
            "cwe": "CWE-829"
          }
        }
      ]
    }

    ๐Ÿ“– Full Integration Guide โ†’


    ๐Ÿ“ฆ Ecosystem

    ๐Ÿค– GitHub App

    Auto-scan every PR for security risks. Learn more โ†’

    ๐Ÿ’ป VS Code Extension

    Real-time security diagnostics in your editor. Learn more โ†’

    ๐Ÿ”’ Runtime MCP Proxy

    Monitor MCP server behavior in real-time.

    agent-shield proxy --enforce node my-mcp-server.js

    โš™๏ธ CI Integration

    GitHub Action

    name: Security Scan
    on: [push, pull_request]
    jobs:
      scan:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - uses: elliotllliu/agent-shield@main
            with:
              path: './skills/'

    GitHub Action with SARIF Upload

    name: Security Scan (SARIF)
    on: [push, pull_request]
    jobs:
      scan:
        runs-on: ubuntu-latest
        permissions:
          security-events: write
        steps:
          - uses: actions/checkout@v4
          - uses: elliotllliu/agent-shield@main
            with:
              path: './skills/'
              sarif: 'true'
          - name: Upload SARIF
            if: always()
            uses: github/codeql-action/upload-sarif@v3
            with:
              sarif_file: agent-shield-results.sarif

    npx one-liner

    - name: Security scan
      run: npx -y @elliotllliu/agent-shield scan .

    โš™๏ธ Configuration

    Create .agent-shield.yml (or run agent-shield init):

    rules:
      disable:
        - supply-chain
        - phone-home
    ignore:
      - "tests/**"
      - "*.test.ts"

    ๐Ÿ—‚๏ธ Supported Platforms

    Platform Support
    AI Agent Skills OpenClaw, Codex, Claude Code
    MCP Servers Model Context Protocol tool servers
    Dify Plugins .difypkg archive extraction + scan
    npm Packages Any package with executable code
    Python Projects AST analysis + 35 security patterns
    General Any directory with JS/TS/Python/Go/Rust/Shell code

    ๐Ÿ“š Methodology & References

    AgentShield's detection rules are grounded in established security research:

    • OWASP Top 10 for LLM Applications (2025) โ€” genai.owasp.org
    • MITRE ATLAS โ€” atlas.mitre.org
    • CWE (Common Weakness Enumeration) โ€” cwe.mitre.org
    • NIST AI 100-2 โ€” Adversarial Machine Learning taxonomy
    • Greshake et al. (2023) โ€” "Not what you've signed up for" โ€” arXiv:2302.12173
    • Liu et al. (2024) โ€” "Automatic and Universal Prompt Injection" โ€” arXiv:2403.04957
    • Invariant Labs (2024) โ€” "Tool Poisoning Attacks on MCP Servers" โ€” invariantlabs.ai

    For a detailed mapping of each rule to its standards, see docs/rules.md.


    ๐Ÿค Contributing

    We especially welcome:

    • New detection rules (with CWE/OWASP mapping)
    • False positive / false negative reports
    • Third-party benchmark test results

    See CONTRIBUTING.md

    ๐ŸŒ Community & Partners

    Partner Contribution
    Agent Skills Hub Real-world testing across skill registries, security insights, and feature feedback

    ๐Ÿ“ฆ npm ยท ๐Ÿ“– Rule Docs ยท ๐Ÿค– GitHub App ยท ๐Ÿ’ป VS Code ยท ๐Ÿ”Œ Integration Guide ยท ๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡ README

    License

    MIT