JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 21
  • Score
    100M100P100Q52757F
  • License MIT

AI context selection done right. Picks the right files, sanitizes secrets, learns from your feedback. --context, --audit, --accept/--reject.

Package Exports

  • cto-ai-cli

Readme

CTO — AI Context Selection Engine

npm License: MIT Tests

The most complete AI context selection engine in open source. Picks the right code chunks (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.

cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy
→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository

202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.


The Problem

When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:

  • Send everything → expensive, slow, hits token limits
  • Pick files manually → miss dependencies, forget test files, leak secrets

CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.

Quick Demo

cto --demo   # Run a live showcase on your project

This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.

Benchmark Results

Eval Harness v8.1 — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:

Metric v8.0 v8.1
Must-have recall 100% 100%
Precision 38% 60% (+22pp)
F1 55% 74% (+19pp)
Noise rate 11.3% 5.7% (-5.6pp)

Real production repos (Java monoliths):

Repo Files Without CTO With CTO v8.0
seller-info-service 219 212 files (97%) 166 chunks from 59 files
sizechart-middleend 1,719 230 files 72 chunks from 37 files
charts-backend 1,261 685 files (54%) 142 chunks from 16 files

Internal benchmark (8 tasks, own codebase):

Strategy Precision Recall F1
CTO + Reranker 96.9% 100% 98.4%
TF-IDF only 54.6% 87.5% 62.0%
Random 7.7% 6.3% 2.8%

ROI

On a typical 130-file TypeScript project:

Metric Without CTO With CTO
Tokens per interaction 370K (all files) ~28K (selected)
Cost per interaction (Sonnet) $1.11 $0.08
Monthly cost (10 devs, 40/day) $8,880 $640
Annual savings ~$99,000

Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.

How it Works (v8.0 Pipeline)

Task → Query Intent Parser → structured action/entities/layers
         │
         ▼
   BM25 (weighted) ──────┐
   TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
   Multi-hop (auto) ─────┘          │
                                    ▼
                              Selection ─→ Chunk Extraction ─→ Output
                                              (methods, not files)

10-step pipeline:

# Step What it does
0 Query Intent Parses "fix cache invalidation on delete" → action:fix, entities:[cache,kvs], layers:[cache]
1 BM25 + Embedding Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion
2 Multi-hop Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops)
3 Path IDF Boost Query terms in file paths get boosted
4 Layer Boost Architectural layer matching (controller, service, repository)
5 Import Boost Dependencies of top-ranked files get pulled in
6 Call Graph Boost Cross-file method calls traced (Java/TS/Python/Go)
7 Git Co-Change Files frequently modified together (Jaccard similarity from commits)
8 Reranker 5-signal quality gate: term coverage, specificity, bigram proximity, deps, path
9 Chunk Extraction Extracts relevant functions/methods — not whole files. 10x token efficiency

No AI is used for selection. Same input → same output. Deterministic.

Install

npm i -g cto-ai-cli    # global
npx cto-ai-cli         # or one-shot

Context Selection

cto --context "refactor the auth middleware"                 # human-readable summary
cto --context "fix login bug" --stdout | pbcopy              # pipe to clipboard
cto --context "add tests" --output context.md                # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json                         # JSON for tooling
cto --context "fix auth" --budget 30000                      # custom token budget

Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.

Feedback Loop

CTO learns from real feedback, not from itself:

cto --accept                         # last selection was good
cto --reject                         # last selection was bad
cto --reject --missing src/auth.ts   # this file was missing
cto --stats                          # see what CTO has learned

On --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.

Secret Audit

cto --audit                  # scan all files
cto --audit --init-hook      # install pre-commit hook
cto --audit --full-scan      # ignore cache, scan everything
cto --audit --json           # machine-readable output

45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.

Before:  OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After:   OPENAI_KEY = "sk-R********************De"

AI Gateway (Enterprise)

A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.

cto --gateway                        # Start on port 8787
cto --gateway --port 9000            # Custom port
cto --gateway --block-secrets        # Block requests with critical secrets
cto --gateway --budget-daily 50      # $50/day budget limit
cto --gateway --budget-monthly 500   # $500/month budget limit
Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
                ↓
          Dashboard (http://localhost:8787/__cto)

What the gateway does automatically:

  • Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
  • Redacts secrets before they leave the network (45+ patterns)
  • Tracks costs per model, per day, per month with budget alerts
  • Streams responses with zero-copy SSE passthrough
  • Serves a live dashboard at /__cto with real-time metrics

Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.

Cross-Repo Context

When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.

cto --context "fix payment webhook" --auto-repos   # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-service

How it works:

  1. Discovers sibling repos in parent directory (any dir with package.json, tsconfig.json, Cargo.toml, etc.)
  2. Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
  3. Queries each sibling with the task description
  4. Returns ranked matches with repo attribution and content

Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.

Cost-Aware Model Routing

CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.

cto --context "update readme" --route     # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route      # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route      # → Sonnet ($0.30/call, balanced)

Complexity is computed from real signals:

  • Token density (% of budget used)
  • Risk concentration (top-5 file avg risk vs project max)
  • Directory diversity (cross-cutting = harder)
  • Dependency density among selected files

The gateway also uses this: every proxied request gets a model recommendation in the injected context.

MCP Server

Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).

3 tools: cto_select_context, cto_audit_secrets, cto_explain

// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }

// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }

MCP output is also auto-sanitized when includeContents: true.

Programmatic API

import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';

const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
  .map(m => ({ filePath: m.filePath, score: m.score }));

const selection = await selectContext({
  task: 'fix auth',
  analysis,
  budget: 50_000,
  semanticScores,
});

v8.0 — What's New

Chunk-Level Retrieval (the big one)

Instead of including entire files, CTO now extracts only the relevant functions and methods. A 2000-line file with 1 relevant method → 50 lines included, not 2000.

### src/main/java/com/example/cache/CacheService.java
```java
// L15-22: method invalidate
public void invalidate(String id) {
    redis.delete("cache:seller:" + id);
}

// ... lines 23-45 omitted ...

// L46-52: method retrieve
public SellerDTO retrieve(String id) {
    return redis.opsForValue().get("cache:seller:" + id);
}

Supports Java, TypeScript, Python, Go.

Query Intent Parsing

Before searching, CTO parses your task into structured intent:

"fix the seller cache invalidation on KVS delete"
  → action: fix
  → entities: [seller, kvs] (3× weight)
  → operations: [invalidate, delete] (2× weight)
  → layers: [cache]

Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.

Embedding Search + RRF Fusion

TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.

Cross-File Call Graph

Traces method calls across files: cacheService.invalidate() in UseCase → finds CacheService.java. Regex-based, works for Java/TS/Python/Go.

Git Co-Change Signal

Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.

Multi-Hop Reasoning

Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).

Evaluation Harness

Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.

Enterprise Features

  • AI Gateway — transparent HTTP proxy with context injection, secret redaction, cost tracking
  • Team Auth — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
  • Policy Engine — model overrides by task type, cost caps, block rules
  • Metrics — Prometheus, Datadog JSON, StatsD UDP
  • A/B Testing — context strategy experiments with z-test significance
  • LSP Bridge — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
  • Persistent Index Cache — 50K-file repos: 5s → <100ms on warm cache

Competitor Comparison

Feature CTO v8 Cursor Sourcegraph Cody
BM25 retrieval
Embedding search ✅ TF-IDF cosine+RRF
Chunk-level retrieval ✅ 4 langs
Multi-signal RRF fusion ✅ 8-signal
Cross-file call graph
Git co-change signal
Multi-hop reasoning
Query intent parsing
Feedback learning
Secret redaction
Total signals 18 ~3 ~5

Honest Limitations

  • TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
  • Embeddings are TF-IDF cosine, not neural. ONNX infrastructure ready — neural model would add ~5-10% recall.
  • Learning needs ~5 feedback cycles to start influencing selection. First runs are pure pipeline.
  • Chunk extraction is regex-based — works for standard methods/functions, may miss DSLs or deeply nested code.
  • Benchmarked against naive baselines. Not compared against Cursor/Copilot internal context engines.

Contributing

git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test  # 1,133 tests

License

MIT