Package Exports

cto-ai-cli

Readme

CTO — AI Context Selection Engine

Pick the right files for any AI task. Secrets auto-redacted. Learns from your feedback.

cto --context "fix the auth middleware" --stdout | pbcopy   # → clipboard
cto --context "fix auth" --prompt "Refactor to use JWT"     # → AI prompt
cto --accept                                                 # → learns

76KB package · 606 tests · Zero AI dependencies.

The Problem

When developers use AI coding assistants, they need to provide context — the right source files. Today, most teams either:

Send everything → expensive, slow, hits token limits
Pick files manually → miss dependencies, forget test files, leak secrets

CTO solves both: it automatically selects the most relevant files for any task, sanitizes secrets before they reach any AI provider, and learns from feedback to get better over time.

Quick Demo

cto --demo   # Run a live showcase on your project

This runs a self-contained presentation that shows: project analysis, semantic matching proof, secret sanitization, ROI calculation, and benchmark results.

Benchmark Results

Tested against 8 curated tasks with ground truth (known correct files):

Strategy	Precision	Must-have Recall	F1
CTO	33.6%	100.0%	48.7%
TF-IDF only	54.6%	87.5%	62.0%
Risk-only	20.8%	18.8%	15.0%
Alphabetical	8.3%	31.3%	12.9%
Random	7.7%	6.3%	2.8%

CTO never misses a must-have file (100% recall). 3.8× better F1 than alphabetical. 17× better than random.

ROI

On a typical 130-file TypeScript project:

Metric	Without CTO	With CTO
Tokens per interaction	370K (all files)	~28K (selected)
Cost per interaction (Sonnet)	$1.11	$0.08
Monthly cost (10 devs, 40/day)	$8,880	$640
Annual savings	—	~$99,000

Plus: fewer hallucinations (right context), zero secret leaks, and the learner gets smarter with every --accept / --reject.

How it Works

Task description ──→ TF-IDF/BM25 ──→ Semantic scores ──┐
                                                         │
Project files ──→ Dependency graph ──→ Risk scores ──────┤──→ Composite ──→ Greedy ──→ Selection
                                                         │     ranking      alloc
Feedback history ──→ Bayesian learner ──→ Boosts ────────┘

Dependency graph — parses imports, builds adjacency list, identifies hubs
Risk scoring — complexity × centrality × recency (continuous, log-scaled)
TF-IDF/BM25 semantic matching — task description scored against file contents + path boosting
Composite ranking — finalScore = semantic × 0.55 + risk × 0.25 + learner × 0.2
Noise filtering — files with zero semantic relevance are excluded (benchmark-driven optimization)
Greedy allocation — fills token budget top-down, cascading prune levels (full → signatures → skeleton)
Bayesian learning — exponential decay, Wilson score confidence, per-task-type patterns

No AI is used for selection. Same input → same output. Deterministic.

Install

npm i -g cto-ai-cli    # global
npx cto-ai-cli         # or one-shot

Context Selection

cto --context "refactor the auth middleware"                 # human-readable summary
cto --context "fix login bug" --stdout | pbcopy              # pipe to clipboard
cto --context "add tests" --output context.md                # save to file
cto --context "fix login" --prompt "Refactor to async/await" # full AI prompt
cto --context "debug scoring" --json                         # JSON for tooling
cto --context "fix auth" --budget 30000                      # custom token budget

Output includes full file contents in markdown, ready for Claude, ChatGPT, or any AI. Secrets are automatically redacted — API keys, tokens, passwords, PII are replaced with **** before output.

Feedback Loop

CTO learns from real feedback, not from itself:

cto --accept                         # last selection was good
cto --reject                         # last selection was bad
cto --reject --missing src/auth.ts   # this file was missing
cto --stats                          # see what CTO has learned

On --reject, CTO also detects files you edited after the selection that weren't in the context — those get automatically boosted for next time.

Secret Audit

cto --audit                  # scan all files
cto --audit --init-hook      # install pre-commit hook
cto --audit --full-scan      # ignore cache, scan everything
cto --audit --json           # machine-readable output

45+ patterns (AWS, Stripe, GitHub, OpenAI, Slack, Cloudflare...) plus Shannon entropy analysis. The real value: audit protects context — every --stdout, --output, and --prompt auto-sanitizes secrets before output.

Before:  OPENAI_KEY = "sk-Rk8bN3xYz2Wq5PmL7jCvT1aBcDe"
After:   OPENAI_KEY = "sk-R********************De"

AI Gateway (Enterprise)

A transparent HTTP proxy between your developers and AI providers. Automatically injects optimized context, redacts secrets, and tracks costs — without changing developer workflow.

cto --gateway                        # Start on port 8787
cto --gateway --port 9000            # Custom port
cto --gateway --block-secrets        # Block requests with critical secrets
cto --gateway --budget-daily 50      # $50/day budget limit
cto --gateway --budget-monthly 500   # $500/month budget limit

Developer → CTO Gateway → [context injection + sanitization + cost tracking] → AI Provider
                ↓
          Dashboard (http://localhost:8787/__cto)

What the gateway does automatically:

Injects CTO-selected context into every AI request (TF-IDF + composite scoring)
Redacts secrets before they leave the network (45+ patterns)
Tracks costs per model, per day, per month with budget alerts
Streams responses with zero-copy SSE passthrough
Serves a live dashboard at /__cto with real-time metrics

Supports OpenAI, Anthropic, Google, and Azure OpenAI. SSRF protection built-in.

Cross-Repo Context

When working on a task, CTO can pull relevant files from sibling repositories — not just the current project.

cto --context "fix payment webhook" --auto-repos   # Auto-discover sibling repos
cto --context "fix payment webhook" --repos shared-types,payment-service

How it works:

Discovers sibling repos in parent directory (any dir with package.json, tsconfig.json, Cargo.toml, etc.)
Builds a lightweight TF-IDF index per sibling (reads source files, no full analysis)
Queries each sibling with the task description
Returns ranked matches with repo attribution and content

Real use case: You're fixing a webhook handler in api-gateway — CTO finds the Payment interface in shared-types and the consumer in notification-service automatically.

Cost-Aware Model Routing

CTO analyzes the actual selected context (not just the project) to recommend the cheapest model that can handle the task.

cto --context "update readme" --route     # → Haiku ($0.08/call, 73% cheaper)
cto --context "fix auth bug" --route      # → Opus ($1.33/call, critical complexity)
cto --context "refactor API" --route      # → Sonnet ($0.30/call, balanced)

Complexity is computed from real signals:

Token density (% of budget used)
Risk concentration (top-5 file avg risk vs project max)
Directory diversity (cross-cutting = harder)
Dependency density among selected files

The gateway also uses this: every proxied request gets a model recommendation in the injected context.

MCP Server

Works as an MCP server for AI editors (Windsurf, Claude Desktop, Cursor).

3 tools: cto_select_context, cto_audit_secrets, cto_explain

// Windsurf: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "cto": { "command": "cto-mcp" } } }

// Claude Desktop
{ "mcpServers": { "cto": { "command": "npx", "args": ["-y", "cto-ai-cli"] } } }

MCP output is also auto-sanitized when includeContents: true.

Programmatic API

import { analyzeProject, selectContext, buildIndex, query } from 'cto-ai-cli';

const analysis = await analyzeProject('./my-project');
const index = buildIndex(files);
const semanticScores = query(index, 'fix auth', 50)
  .map(m => ({ filePath: m.filePath, score: m.score }));

const selection = await selectContext({
  task: 'fix auth',
  analysis,
  budget: 50_000,
  semanticScores,
});

v7.0 Enterprise Features

Precision Reranker (96.9% precision, was 33.6%)

Multi-signal reranker between BM25 retrieval and greedy allocation:

Term coverage: fraction of unique query terms matched per file
Term specificity: IDF-weighted — rare terms matter more
Bigram proximity: query terms appearing close together in the file
Dependency signal: files in the dependency cone of top matches
Quality gate: adaptive cutoff stops filling budget with noise

Persistent Index Cache

TF-IDF index persisted to .cto/index-cache.json with per-file mtime tracking. Subsequent queries only re-tokenize changed files. 50K-file repos go from 5s → <100ms on warm cache.

Multi-Language Dependency Graphs

Regex-based import parsing for Python, Go, Java, and Rust alongside ts-morph for TS/JS. Enables hub detection, risk scoring, and dependency expansion for polyglot codebases.

# Works on Python, Go, Java, Rust projects — not just TypeScript
cto --context "fix auth handler" /path/to/go-project

Team Authentication & SSO

Per-team API keys, JWT validation (HS256/RS256), rate limiting, model allowlists. Teams stored in .cto/gateway/teams.json.

Metrics Export

Prometheus exposition format at /__cto/metrics, Datadog JSON, and StatsD UDP. Counters, histograms, gauges for requests, tokens, cost, latency, secrets.

Per-Team Policy Engine

Routing rules per team: model overrides by task type, cost caps per request, context budget limits, block rules. Preset policies: createCostConscious(), createSecurityFirst().

Closed-Loop A/B Testing

Real experimentation on context strategies with two-proportion z-test for statistical significance. Deterministic assignment (SHA-256 hashing), auto-conclusion when p < 0.05.

LSP Bridge (IDE Plugin)

JSON-RPC 2.0 server over stdin/stdout for any IDE: VS Code, JetBrains, Neovim, Emacs. Custom methods: cto/selectContext, cto/score, cto/audit, cto/experiments.

Honest Limitations

TypeScript/JavaScript gets AST analysis. Python/Go/Java/Rust get regex-based import parsing (good for graphs, not AST-accurate).
BM25 + reranker, not embeddings. 96.9% precision on our benchmark. No neural model needed.
Learning needs ~5 feedback cycles to start influencing selection. First runs are pure graph + risk + semantic.
Benchmarked against naive baselines (alphabetical, random, risk-only, TF-IDF-only). Not compared against Cursor/Copilot internal context engines.

Contributing

git clone https://github.com/cto-ai/cto-ai-cli.git && cd cto-ai-cli
npm install && npm run build && npm test  # 776 tests

License

MIT