Package Exports

cto-ai-cli

Readme

CTO — AI Context Selection Engine

The most complete AI context selection engine in open source. Picks the right code chunks (not just files), auto-redacts secrets, learns from feedback. 18 signals. Zero AI dependencies.

cto --context "fix the seller info cache invalidation on KVS delete" --stdout | pbcopy

→ 166 relevant chunks from 59 files (26K tokens, 0 secrets)
→ Full chain: DeleteEndpoint → Router → UseCase → CacheService → KvsRepository

202KB package · 1,133 tests · 96 source modules · Zero AI dependencies.

The Problem

When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:

Send everything → expensive, slow, hits token limits
Pick files manually → miss dependencies, forget test files, leak secrets

CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.

Quick Demo

cto --demo   # Run a live showcase on your project

This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.

Benchmark Results

Eval Harness v8.1 — 20-file Java enterprise project, 4 tasks with expert-labeled ground truth:

Metric	v8.0	v8.1
Must-have recall	100%	100%
Precision	38%	60% (+22pp)
F1	55%	74% (+19pp)
Noise rate	11.3%	5.7% (-5.6pp)

Real production repos (Java monoliths):

Repo	Files	Without CTO	With CTO v8.0
seller-info-service	219	212 files (97%)	166 chunks from 59 files
sizechart-middleend	1,719	230 files	72 chunks from 37 files
charts-backend	1,261	685 files (54%)	142 chunks from 16 files

Internal benchmark (8 tasks, own codebase):

Strategy	Precision	Recall	F1
CTO + Reranker	96.9%	100%	98.4%
TF-IDF only	54.6%	87.5%	62.0%
Random	7.7%	6.3%	2.8%

ROI

On a typical 130-file TypeScript project:

Metric	Without CTO	With CTO
Tokens per interaction	370K (all files)	~28K (selected)
Cost per interaction (Sonnet)	$1.11	$0.08
Monthly cost (10 devs, 40/day)	$8,880	$640
Annual savings	—	~$99,000

Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.

How it Works (v8.0 Pipeline)

Task → Query Intent Parser → structured action/entities/layers
         │
         ▼
   BM25 (weighted) ──────┐
   TF-IDF Embedding ─────┤──→ RRF Fusion ─→ 8-signal Boosting ─→ Reranker
   Multi-hop (auto) ─────┘          │
                                    ▼
                              Selection ─→ Chunk Extraction ─→ Output
                                              (methods, not files)

10-step pipeline:

#	Step	What it does
0	Query Intent	Parses "fix cache invalidation on delete" → `action:fix`, `entities:[cache,kvs]`, `layers:[cache]`
1	BM25 + Embedding	Lexical matching + TF-IDF cosine vectors, merged via Reciprocal Rank Fusion
2	Multi-hop	Complex queries auto-detected → iterative BM25 expansion via deps + call graph (2 hops)
3	Path IDF Boost	Query terms in file paths get boosted
4	Layer Boost	Architectural layer matching (controller, service, repository)
5	Import Boost	Dependencies of top-ranked files get pulled in
6	Call Graph Boost	Cross-file method calls traced (Java/TS/Python/Go)
7	Git Co-Change	Files frequently modified together (Jaccard similarity from commits)
8	Reranker	5-signal quality gate: term coverage, specificity, bigram proximity, deps, path
9	Chunk Extraction	Extracts relevant functions/methods — not whole files. 10x token efficiency

No AI is used for selection. Same input → same output. Deterministic.

Install

npm i -g cto-ai-cli    # global
npx cto-ai-cli         # or one-shot

Context Selection

cto --context "refactor the auth middleware"                 # human-readable summary
cto --context "fix login bug" --stdout | pbcopy              # pipe to clipboard
cto --context "add tests" --output context.md                # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json                         # JSON for tooling
cto --context "fix auth" --budget 30000                      # custom token budget

Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.

Feedback Loop

CTO learns from real feedback, not from itself:

cto --accept                         # last selection was good
cto --reject                         # last selection was bad
cto --reject --missing src/auth.ts   # this file was missing
cto --stats                          # see what CTO has learned

On --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.

Secret Audit

cto --audit                  # scan all files
cto --audit --init-hook      # install pre-commit hook
cto --audit --full-scan      # ignore cache, scan everything
cto --audit --json           # machine-readable output

45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.

Before:  OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After:   OPENAI_KEY = "sk-R********************De"

AI Gateway (Enterprise)

A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.

cto --gateway                        # Start on port 8787
cto --gateway --port 9000            # Custom port
cto --gateway --block-secrets        # Block requests with critical secrets
cto --gateway --budget-daily 50      # $50/day budget limit
cto --gateway --budget-monthly 500   # $500/month budget limit

Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
                ↓
          Dashboard (http://localhost:8787/__cto)

What the gateway does automatically:

Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
Redacts secrets before they leave the network (45+ patterns)
Tracks costs per model, per day, per month with budget alerts
Streams responses with zero-copy SSE passthrough
Serves a live dashboard at /__cto with real-time metrics

Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.

Cross-Repo Context

When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.

cto --context "fix payment webhook" --auto-repos   # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-service

How it works:

Discovers sibling repos in parent directory (any dir with package.json, tsconfig.json, Cargo.toml, etc.)
Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
Queries each sibling with the task description
Returns ranked matches with repo attribution and content

Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.

Cost-Aware Model Routing

CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.

cto --context "update readme" --route     # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route      # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route      # → Sonnet ($0.30/call, balanced)

Complexity is computed from real signals:

Token density (% of budget used)
Risk concentration (top-5 file avg risk vs project max)
Directory diversity (cross-cutting = harder)
Dependency density among selected files

The gateway also uses this: every proxied request gets a model recommendation in the injected context.

MCP Server

Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).

3 tools: cto_select_context, cto_audit_secrets, cto_explain

// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }

// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }

MCP output is also auto-sanitized when includeContents: true.

Programmatic API

import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';

const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
  .map(m => ({ filePath: m.filePath, score: m.score }));

const selection = await selectContext({
  task: 'fix auth',
  analysis,
  budget: 50_000,
  semanticScores,
});

v8.0 — What's New

Chunk-Level Retrieval (the big one)

Instead of including entire files, CTO now extracts only the relevant functions and methods. A 2000-line file with 1 relevant method → 50 lines included, not 2000.

### src/main/java/com/example/cache/CacheService.java
```java
// L15-22: method invalidate
public void invalidate(String id) {
    redis.delete("cache:seller:" + id);
}

// ... lines 23-45 omitted ...

// L46-52: method retrieve
public SellerDTO retrieve(String id) {
    return redis.opsForValue().get("cache:seller:" + id);
}

Supports Java, TypeScript, Python, Go.

Query Intent Parsing

Before searching, CTO parses your task into structured intent:

"fix the seller cache invalidation on KVS delete"
  → action: fix
  → entities: [seller, kvs] (3× weight)
  → operations: [invalidate, delete] (2× weight)
  → layers: [cache]

Entities get 3× BM25 weight, operations get 2×. Much better precision on enterprise queries.

Embedding Search + RRF Fusion

TF-IDF cosine embedding vectors complement BM25 lexical matching. Merged via Reciprocal Rank Fusion (60/40 BM25/embedding). Catches semantic similarity that BM25 misses.

Cross-File Call Graph

Traces method calls across files: cacheService.invalidate() in UseCase → finds CacheService.java. Regex-based, works for Java/TS/Python/Go.

Git Co-Change Signal

Files frequently modified together in git history get boosted. Jaccard similarity from commit co-occurrence.

Multi-Hop Reasoning

Complex enterprise queries auto-detected. Iterative BM25: top matches → expand via deps + call graph → re-query. Traces full execution chains (4/4 hops).

Evaluation Harness

Ground truth benchmark with must-have/relevant/noise labels. 100% must-have recall on 4-task Java enterprise benchmark.

Enterprise Features

AI Gateway — transparent HTTP proxy with context injection, secret redaction, cost tracking
Team Auth — per-team API keys, JWT (HS256/RS256), rate limiting, OIDC discovery
Policy Engine — model overrides by task type, cost caps, block rules
Metrics — Prometheus, Datadog JSON, StatsD UDP
A/B Testing — context strategy experiments with z-test significance
LSP Bridge — JSON-RPC 2.0 for VS Code, JetBrains, Neovim
Persistent Index Cache — 50K-file repos: 5s → <100ms on warm cache

Competitor Comparison

Feature	CTO v8	Cursor	Sourcegraph Cody
BM25 retrieval	✅	✅	✅
Embedding search	✅ TF-IDF cosine+RRF	✅	✅
Chunk-level retrieval	✅ 4 langs	✅	✅
Multi-signal RRF fusion	✅ 8-signal	❌	❌
Cross-file call graph	✅	❌	❌
Git co-change signal	✅	❌	❌
Multi-hop reasoning	✅	❌	❌
Query intent parsing	✅	❌	❌
Feedback learning	✅	❌	❌
Secret redaction	✅	❌	❌
Total signals	18	~3	~5

Honest Limitations

TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based parsing (good for graphs + chunking, not AST-precise).
Embeddings are TF-IDF cosine, not neural. ONNX infrastructure ready — neural model would add ~5-10% recall.
Learning needs ~5 feedback cycles to start influencing selection. First runs are pure pipeline.
Chunk extraction is regex-based — works for standard methods/functions, may miss DSLs or deeply nested code.
Benchmarked against naive baselines. Not compared against Cursor/Copilot internal context engines.

Contributing

git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test  # 1,133 tests

License

MIT