Package Exports
- cto-ai-cli
Readme
CTO — AI Context Selection Engine
Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.
cto --context "fix the auth middleware" --stdout | pbcopy # → clipboard
cto --context "fix auth" --prompt "Refactor to use JWT" # → AI prompt
cto --accept # → learns76KB package · 606 tests · Zero AI dependencies.
The Problem
When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:
- Send everything → expensive, slow, hits token limits
- Pick files manually → miss dependencies, forget test files, leak secrets
CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.
Quick Demo
cto --demo # Run a live showcase on your projectThis runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.
Benchmark Results
Tested against 8 curated tasks with ground truth (known correct files):
| Strategy | Precision | Must-have Recall | F1 |
|---|---|---|---|
| CTO | 33.6% | 100.0% | 48.7% |
| TF-IDF only | 54.6% | 87.5% | 62.0% |
| Risk-only | 20.8% | 18.8% | 15.0% |
| Alphabetical | 8.3% | 31.3% | 12.9% |
| Random | 7.7% | 6.3% | 2.8% |
CTO never misses a must-have file (100% recall). 3.8× better F1 than alphabetical. 17× better than random.
ROI
On a typical 130-file TypeScript project:
| Metric | Without CTO | With CTO |
|---|---|---|
| Tokens per interaction | 370K (all files) | ~28K (selected) |
| Cost per interaction (Sonnet) | $1.11 | $0.08 |
| Monthly cost (10 devs, 40/day) | $8,880 | $640 |
| Annual savings | — | ~$99,000 |
Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.
How it Works
Task description ──→ TF-IDF/BM25 ──→ Semantic scores ──┐
│
Project files ──→ Dependency graph ──→ Risk scores ──────┤──→ Composite ──→ Greedy ──→ Selection
│ ranking alloc
Feedback history ──→ Bayesian learner ──→ Boosts ────────┘- Dependency graph — parses imports, builds adjacency list, identifies hubs
- Risk scoring — complexity × centrality × recency (continuous, log-scaled)
- TF-IDF/BM25 semantic matching — task description scored against file contents + path boosting
- Composite ranking —
finalScore = semantic × 0.55 + risk × 0.25 + learner × 0.2 - Noise filtering — files with zero semantic relevance are excluded (benchmark-driven optimization)
- Greedy allocation — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
- Bayesian learning — exponential decay, Wilson score confidence, per-task-type patterns
No AI is used for selection. Same input → same output. Deterministic.
Install
npm i -g cto-ai-cli # global
npx cto-ai-cli # or one-shotContext Selection
cto --context "refactor the auth middleware" # human-readable summary
cto --context "fix login bug" --stdout | pbcopy # pipe to clipboard
cto --context "add tests" --output context.md # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json # JSON for tooling
cto --context "fix auth" --budget 30000 # custom token budgetOutput includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.
Feedback Loop
CTO learns from real feedback, not from itself:
cto --accept # last selection was good
cto --reject # last selection was bad
cto --reject --missing src/auth.ts # this file was missing
cto --stats # see what CTO has learnedOn --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.
Secret Audit
cto --audit # scan all files
cto --audit --init-hook # install pre-commit hook
cto --audit --full-scan # ignore cache, scan everything
cto --audit --json # machine-readable output45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.
Before: OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After: OPENAI_KEY = "sk-R********************De"AI Gateway (Enterprise)
A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.
cto --gateway # Start on port 8787
cto --gateway --port 9000 # Custom port
cto --gateway --block-secrets # Block requests with critical secrets
cto --gateway --budget-daily 50 # $50/day budget limit
cto --gateway --budget-monthly 500 # $500/month budget limitDeveloper → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
↓
Dashboard (http://localhost:8787/__cto)What the gateway does automatically:
- Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
- Redacts secrets before they leave the network (45+ patterns)
- Tracks costs per model, per day, per month with budget alerts
- Streams responses with zero-copy SSE passthrough
- Serves a live dashboard at
/__ctowith real-time metrics
Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.
Cross-Repo Context
When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.
cto --context "fix payment webhook" --auto-repos # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-serviceHow it works:
- Discovers sibling repos in parent directory (any dir with
package.json,tsconfig.json,Cargo.toml, etc.) - Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
- Queries each sibling with the task description
- Returns ranked matches with repo attribution and content
Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.
Cost-Aware Model Routing
CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.
cto --context "update readme" --route # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route # → Sonnet ($0.30/call, balanced)Complexity is computed from real signals:
- Token density (% of budget used)
- Risk concentration (top-5 file avg risk vs project max)
- Directory diversity (cross-cutting = harder)
- Dependency density among selected files
The gateway also uses this: every proxied request gets a model recommendation in the injected context.
MCP Server
Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).
3 tools: cto_select_context, cto_audit_secrets, cto_explain
// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }
// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }MCP output is also auto-sanitized when includeContents: true.
Programmatic API
import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';
const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
.map(m => ({ filePath: m.filePath, score: m.score }));
const selection = await selectContext({
task: 'fix auth',
analysis,
budget: 50_000,
semanticScores,
});v7.0 Enterprise Features
Precision Reranker (96.9% precision, was 33.6%)
Multi-signal reranker between BM25 retrieval and greedy allocation:
- Term coverage: fraction of unique query terms matched per file
- Term specificity: IDF-weighted — rare terms matter more
- Bigram proximity: query terms appearing close together in the file
- Dependency signal: files in the dependency cone of top matches
- Quality gate: adaptive cutoff stops filling budget with noise
Persistent Index Cache
TF-IDF index persisted to .cto/index-cache.json with per-file mtime tracking. Subsequent queries only re-tokenize changed files. 50K-file repos go from 5s → <100ms on warm cache.
Multi-Language Dependency Graphs
Regex-based import parsing for Python, Go, Java, and Rust alongside ts-morph for TS/JS. Enables hub detection, risk scoring, and dependency expansion for polyglot codebases.
# Works on Python, Go, Java, Rust projects — not just TypeScript
cto --context "fix auth handler" /path/to/go-projectTeam Authentication & SSO
Per-team API keys, JWT validation (HS256/RS256), rate limiting, model allowlists. Teams stored in .cto/gateway/teams.json.
Metrics Export
Prometheus exposition format at /__cto/metrics, Datadog JSON, and StatsD UDP. Counters, histograms, gauges for requests, tokens, cost, latency, secrets.
Per-Team Policy Engine
Routing rules per team: model overrides by task type, cost caps per request, context budget limits, block rules. Preset policies: createCostConscious(), createSecurityFirst().
Closed-Loop A/B Testing
Real experimentation on context strategies with two-proportion z-test for statistical significance. Deterministic assignment (SHA-256 hashing), auto-conclusion when p < 0.05.
LSP Bridge (IDE Plugin)
JSON-RPC 2.0 server over stdin/stdout for any IDE: VS Code, JetBrains, Neovim, Emacs. Custom methods: cto/selectContext, cto/score, cto/audit, cto/experiments.
Honest Limitations
- TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based import parsing (good for graphs, not AST-accurate).
- BM25 + reranker, not embeddings. 96.9% precision on our benchmark. No neural model needed.
- Learning needs ~5 feedback cycles to start influencing selection. First runs are pure graph + risk + semantic.
- Benchmarked against naive baselines (alphabetical, random, risk-only, TF-IDF-only). Not compared against Cursor/Copilot internal context engines.
Contributing
git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test # 776 tests