Package Exports

nodebench-mcp
nodebench-mcp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (nodebench-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

NodeBench MCP

Investigate a topic and return a sourced report fast. NodeBench MCP now defaults to a small workflow facade, with heavier research and admin surfaces moved behind explicit presets.

Default install: 19 visible tools total, including 7 core workflow tools: investigate, compare, track, summarize, search, report, and ask_context.

# Default v3 core workflow surface
claude mcp add nodebench -- npx -y nodebench-mcp

# Extended workflow surface
claude mcp add nodebench -- npx -y nodebench-mcp --preset power

# Admin/runtime surface
claude mcp add nodebench -- npx -y nodebench-mcp --preset admin --admin

What's New

v3 default surface — the default preset is now a workflow-first facade instead of a tool warehouse.
Admin-only runtime gates — dashboards and the observability watchdog no longer start on the default hot path. Use --admin, --dashboards, or --watchdog explicitly.
Faster health path — --health now loads only the active preset instead of eagerly loading every domain.
Power and admin presets — heavier research and operator lanes are still available, but no longer define the front door.

New here? Read AGENT_LOGIC.md for the complete guide to how NodeBench thinks.

RETHINK REDESIGN APR 2026

This section records a repo-grounded reexamination of NodeBench MCP as of April 2026. It is intentionally blunt. The goal is to shrink the product to clear, measurable workflows and make the runtime and claims trustworthy.

Repo-grounded findings

Surface area and messaging are out of sync
- This README currently markets 350+ tools, founder under 50 tools, and full as 338.
- The audit that drove this section measured roughly 28 tools in the starter tools/list payload, roughly 186 in founder, and roughly 546 in full, before counting the extra dynamic-loading helpers separately.
The core executable is overloaded
- packages/mcp-local/src/index.ts currently combines MCP serving, analytics tracking, embedding bootstrapping, profiling hooks, A/B instrumentation, dynamic tool loading, dashboard startup, and engine hosting in one runtime path.
Performance is not yet credibly measured
- During this audit, node packages/mcp-local/dist/index.js --health took about 2.9s locally because CLI mode eagerly loaded all toolsets before exit.
- The main performance comparison in the package measures local toolchain overhead, not end-user outcome quality, startup SLOs, workflow completion, or real provider spend.
Cost reporting is not yet auditable
- The current profiler relies on heuristic per-tool cost estimates, not authoritative provider billing or exact token accounting.
Workflow clarity is weak
- The product currently explains presets, domains, discovery engines, and meta-systems before it proves one concrete job to be done.

Redesign principles

One dominant workflow per persona
- A founder should land on one obvious path.
- A banker should land on one obvious path.
- The MCP should prove value before it explains its architecture.
Truthful counts and truthful claims
- Preset docs, README counts, and tools/list output must agree.
Speed is a product behavior
- Measure startup time, tools/list payload size, per-tool latency, and workflow time-to-first-value.
Keep optional systems out of the hot path
- Embeddings, dashboards, profilers, and engine surfaces should not make the default MCP startup slower.
Prefer workflows over catalogs
- Opinionated, repeatable outcomes are more valuable than an oversized flat tool inventory.
Ship process, not prose
- The right mental model is a small number of strong workflows with evidence and verification, not a giant capabilities brochure.

Execution board

Ship order	Cause	Symptom in NodeBench	Metric to enforce
1	No single dominant job	The package reads like "everything MCP" instead of one clear wedge	Every persona entry point must answer the main job in one sentence
2	Tool surface is not truthful	README counts and runtime counts disagree	Published counts must match `tools/list` within 5%
3	Workflow is hidden behind tool volume	Users see presets, domains, and discovery systems before a clear task path	One canonical workflow per persona, completable in 3 to 5 tool calls
4	Performance is claimed, not instrumented	Benchmarks focus on harness overhead, not real user value	Track startup p50/p95, `tools/list` bytes, per-tool latency, workflow success, and real spend
5	Hot path contains optional systems	Profiling, embeddings, dashboards, and engine behavior sit in the main boot path	`--health` under 300ms and stdio ready under 500ms on warm start
6	Benchmarking is inward-facing	The package proves bookkeeping more than business outcomes	Benchmark by artifact quality, time-to-artifact, cost-per-artifact, and reuse rate

Reference models

agent-skills: process, not prose. Strong lifecycle commands, checkpoints, and evidence requirements.
GitHub MCP Server: narrower toolsets improve tool choice and reduce context size. The server also exposes a simple tool-search utility.
Google MCP Toolbox: performance and operability are anchored in concrete runtime primitives such as connection pooling and OpenTelemetry, not only internal heuristics.

Bottom line

NodeBench MCP should evolve from a monolithic "hundreds of tools" surface into a smaller number of opinionated workflow products backed by a measurable, trustworthy runtime.

Unified cross-surface spec: docs/architecture/UNIFIED_WORKFLOW_SPEC.md

Production companion spec: docs/architecture/UNIFIED_WEB_MCP_PRODUCTION_SPEC.md

What You Get

NodeBench is now a workflow-first MCP. The default install proves one concrete job quickly, and the heavier surfaces are still available when you explicitly ask for them.

Presets

Preset	Visible tools	What it is for
`default`	`19`	v3 core workflow facade: investigate, compare, track, summarize, search, report, ask_context, plus discovery/meta helpers
`power`	`203`	Extended research and founder workflow pack without auto-starting admin runtime surfaces
`admin`	`106`	Profiling, observability, dashboards, eval, and debug-oriented operator lanes
`founder`	`191`	Legacy founder preset kept for compatibility
`full`	all domains	Maximum coverage when you explicitly want the warehouse

Default workflow

ASK -> CHECK -> WRITE -> SAVE

The default preset is optimized for that loop. It does not start local dashboards or the observability watchdog unless you pass admin flags explicitly.

# Claude Code
claude mcp add nodebench -- npx -y nodebench-mcp

# Windsurf / Cursor — add --preset to args in your MCP config when you want more than the default workflow surface

Quick Start

Claude Code (CLI)

claude mcp add nodebench -- npx -y nodebench-mcp

Or add to ~/.claude/settings.json or .mcp.json in your project root:

{
  "mcpServers": {
    "nodebench": {
      "command": "npx",
      "args": ["-y", "nodebench-mcp"]
    }
  }
}

Cursor

Add to .cursor/mcp.json (or Settings > MCP). Use the cursor preset to stay within Cursor's tool limit:

{
  "mcpServers": {
    "nodebench": {
      "command": "npx",
      "args": ["-y", "nodebench-mcp", "--preset", "cursor"]
    }
  }
}

Windsurf

Add to .windsurf/mcp.json (or Settings > MCP > View raw config):

{
  "mcpServers": {
    "nodebench": {
      "command": "npx",
      "args": ["-y", "nodebench-mcp", "--preset", "founder"]
    }
  }
}

Other MCP Clients

Any MCP-compatible client works. Point command to npx, args to ["-y", "nodebench-mcp"]. Add "--preset", "<name>" to the args array for presets.

First Prompts to Try

# Find tools for your task
> Use discover_tools("evaluate this acquisition target") to find relevant tools

# Load a toolset
> Use load_toolset("deep_sim") to activate decision simulation tools

# Run a decision simulation
> Use run_deep_sim_scenario to simulate a business decision with multiple variables

# Generate a decision memo
> Use generate_decision_memo to produce a shareable memo from your analysis

# Weekly founder reset
> Use founder_weekly_reset to review the week's decisions and outcomes

# Pre-delegation briefing
> Use pre_delegation_briefing to prepare context before handing off a task

Optional: API Keys

export GEMINI_API_KEY="your-key"        # Web search + vision (recommended)
export GITHUB_TOKEN="your-token"        # GitHub (higher rate limits)

Set these as environment variables, or add them to the env block in your MCP config:

{
  "mcpServers": {
    "nodebench": {
      "command": "npx",
      "args": ["-y", "nodebench-mcp"],
      "env": {
        "GEMINI_API_KEY": "your-key",
        "GITHUB_TOKEN": "your-token"
      }
    }
  }
}

Progressive Discovery — How 338 Tools Fit in Any Context Window

The starter preset loads 15 tools. The other 323 are discoverable and loadable on demand.

How it works

1. discover_tools("your task description")    → ranked results from all 338 tools
2. load_toolset("deep_sim")                   → tools activate in your session
3. Use the tools directly                     → no proxy, native binding
4. unload_toolset("deep_sim")                 → free context budget when done

discover_tools scores tools using 14 parallel strategies:

Strategy	What it does
Keyword + TF-IDF	Exact matching, rare tags score higher
Fuzzy (Levenshtein)	Tolerates typos
Semantic (synonyms)	30 word families — "check" finds "verify", "validate"
N-gram + Bigram	Partial words and phrases
Dense (TF-IDF cosine)	Vector-like ranking
Embedding (neural)	Agent-as-a-Graph bipartite search
Execution traces	Co-occurrence mining from usage logs
Intent pre-filter	Narrow to relevant categories before search

Plus cursor pagination (offset/limit), result expansion (expand: N), and multi-hop BFS traversal (depth: 1-3) via get_tool_quick_ref.

Client compatibility

Client	Dynamic Loading
Claude Code, GitHub Copilot	Native — re-fetches tools after `list_changed`
Windsurf, Cursor, Claude Desktop, Gemini CLI	Via `call_loaded_tool` fallback (always available)

Key Features

Decision Intelligence (Deep Sim)

Simulate decisions before committing. Run scenarios with multiple variables, score trajectories, generate postmortems, produce decision memos.

Causal Memory

Track actions, paths, and state across sessions. Important-change review surfaces what shifted since your last session.

Artifact Packets

Every analysis produces a shareable artifact — decision memos, delegation briefs, investigation reports. The output is the distribution.

Founder Tools

Weekly reset, pre-delegation briefing, company tracking, important-change review. Built for the founder who needs to make 20 decisions a day with incomplete information.

Knowledge Compounding

record_learning + search_all_knowledge — findings persist across sessions. By session 9, the agent finds 2+ relevant prior findings before writing a single line of code.

Headless Engine API

NodeBench ships a headless, API-first engine for programmatic access.

# Start MCP server with engine API on port 6276
npx nodebench-mcp --engine

# With auth token
npx nodebench-mcp --engine --engine-secret "your-token"

Method	Path	Purpose
GET	`/`	Engine status, tool count, uptime
GET	`/api/health`	Health check
GET	`/api/tools`	List all available tools
POST	`/api/tools/:name`	Execute a single tool
GET	`/api/workflows`	List workflow chains
POST	`/api/workflows/:name`	Execute a workflow (SSE streaming)
POST	`/api/sessions`	Create an isolated session
GET	`/api/sessions/:id`	Session status + call history
GET	`/api/sessions/:id/report`	Conformance report
GET	`/api/presets`	List presets with tool counts

Fine-Grained Control

# Include only specific toolsets
npx nodebench-mcp --toolsets deep_sim,recon,learning

# Exclude heavy toolsets
npx nodebench-mcp --exclude vision,ui_capture,parallel

# Dynamic loading — start minimal, load on demand
npx nodebench-mcp --dynamic

# Smart preset recommendation based on your project
npx nodebench-mcp --smart-preset

# Usage stats
npx nodebench-mcp --stats

# List all presets
npx nodebench-mcp --list-presets

# See all options
npx nodebench-mcp --help

TOON Format — Token Savings

TOON (Token-Oriented Object Notation) is on by default. Every tool response is TOON-encoded for ~40% fewer tokens vs JSON. Disable with --no-toon.

Security & Trust Boundaries

NodeBench MCP runs locally on your machine.

All persistent data stored in ~/.nodebench/ (SQLite). No data sent to external servers unless you provide API keys and use tools that call external APIs.
Analytics data never leaves your machine.
The local_file toolset can read files anywhere your Node.js process has permission. Use the starter preset to restrict file system access.
All API keys read from environment variables — never hardcoded or logged.
All database queries use parameterized statements.

Build from Source

git clone https://github.com/HomenShum/nodebench-ai.git
cd nodebench-ai/packages/mcp-local
npm install && npm run build

Then use absolute path:

{
  "mcpServers": {
    "nodebench": {
      "command": "node",
      "args": ["/path/to/packages/mcp-local/dist/index.js"]
    }
  }
}

Troubleshooting

"No search provider available" — Set GEMINI_API_KEY, OPENAI_API_KEY, or PERPLEXITY_API_KEY

"GitHub API error 403" — Set GITHUB_TOKEN for higher rate limits

"Cannot find module" — Run npm run build in the mcp-local directory

MCP not connecting — Check path is absolute, run claude --mcp-debug, ensure Node.js >= 18

Windsurf not finding tools — Verify ~/.codeium/windsurf/mcp_config.json has correct JSON structure

Cursor tools not loading — Ensure .cursor/mcp.json exists in project root. Use --preset cursor to stay within the tool cap. Restart Cursor after config changes.

Dynamic loading not working — Claude Code and GitHub Copilot support native dynamic loading. For Windsurf/Cursor, use call_loaded_tool as a fallback.

License

MIT

nodebench-mcp