Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@mrabhishek1105/codemem) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
CodeMem
AI-agnostic local memory layer for codebases. Index once, remember forever, switch AI freely.
Status: v0.1.0 — Early release. Core is stable and working. Active development.
CodeMem is a local sidecar process that runs alongside your editor and gives any AI assistant (Claude, GPT-4, Cursor, Copilot, etc.) a persistent, semantic memory of your codebase — without sending your code to the cloud, without locking you into one AI provider, and without blowing up your context window.
The Problem
Every time you start a new AI chat, it has zero memory of your project. You paste the same files over and over. You hit the context limit. You switch to a different AI and lose everything again. Large codebases are simply too big to fit in any context window.
The Solution
CodeMem indexes your entire codebase locally into a vector database. When you ask an AI a question, CodeMem retrieves only the most relevant code chunks — typically 3–8 functions — and injects them into the prompt. The AI gets exactly what it needs, nothing more.
Your codebase (10,000+ lines)
↓ codemem init (one time, per project)
Vector Index (.codemem/ local to your project)
↓ on every query
Top 6 relevant chunks (~400 tokens)
↓
Any AI Assistant ← answers with full context, no lock-inFeatures
| Feature | Description |
|---|---|
| Semantic search | Find code by meaning, not keywords — "how does auth work?" finds the right file even if it never says "auth" |
| Hybrid ranking | Semantic similarity (55%) + BM25 keyword matching (30%) + recency decay (15%) |
| Structured context assembly | Returns a readable code flow (imports → function → call chain), not just raw chunks — AI answers are noticeably better |
| AI provider independent | Works with GPT-4, Claude, Cursor, Copilot, or any HTTP client — switch providers anytime without losing memory |
| Multi-project support | Each project keeps its own .codemem/ directory — switch projects and memory switches with you, no reindex required |
| Incremental indexing | Only re-indexes files that changed — seconds, not minutes |
| Parallel indexing | Embeds multiple files concurrently (controlled concurrency) for fast initial indexing |
| Live file watcher | Detects file saves and updates the index in the background within ~1 second |
| Fully local & private | Nothing leaves your machine. No API key, no cloud, no telemetry |
| Token savings | Reduces context window usage by ~95% vs. pasting full files |
| Multi-language | TypeScript, JavaScript, Python, Rust, Go, Java, Ruby — with per-language chunking |
| Smart chunking | Splits at function/class boundaries, not arbitrary line counts |
| Lightweight local vector store | Powered by Vectra — no external database process, no Python, no Docker |
Example
$ codemem search "how does auth work"
Results (3 chunks, 31ms):
──────────────────────────────────────
1. src/middleware/auth.ts :: verifyToken [87%]
export async function verifyToken(req, res) {
2. src/routes/login.ts :: loginHandler [81%]
export async function loginHandler(req, reply) {
3. src/models/user.ts :: findByEmail [74%]
export async function findByEmail(email: string) {
→ 83K tokens saved vs full readThe assembled context passed to your AI looks like:
## Project: myapp | Language: typescript | Framework: Fastify
### Relevant files:
1. src/middleware/auth.ts — verifyToken (lines 12–34)
2. src/routes/login.ts — loginHandler (lines 5–48)
3. src/models/user.ts — findByEmail (lines 22–29)
### Code:
--- src/middleware/auth.ts (lines 12–34) [score: 0.87] ---
export async function verifyToken(req, res) {
const token = req.headers.authorization?.split(' ')[1];
const payload = jwt.verify(token, process.env.JWT_SECRET);
req.user = await db.user.findUnique({ where: { id: payload.sub } });
}
--- src/routes/login.ts (lines 5–48) [score: 0.81] ---
...The AI sees exactly what it needs. Nothing more.
Multi-Project Support
Each project keeps its own independent memory. There is no global index or cross-project state.
cd ~/projects/api-server
codemem init # indexes api-server into ~/projects/api-server/.codemem/
codemem start # serves memory for api-server
cd ~/projects/frontend
codemem init # indexes frontend into ~/projects/frontend/.codemem/
codemem start # serves memory for frontendSwitching projects is instant — the .codemem/ folder travels with the project. You can also run multiple sidecars on different ports simultaneously.
Requirements
| Requirement | Details |
|---|---|
| Node.js | >= 18.0.0 (tested on v24) |
| npm | >= 8 |
| Disk space | ~50 MB (model cache) + ~5 MB per 1,000 files indexed |
| RAM | ~300 MB while running |
| Python | Not required |
| GPU | Not required — CPU-only WASM inference |
Works on Windows, macOS, and Linux.
Installation
Option A — Clone and link (recommended for development)
git clone https://github.com/youruser/codemem.git
cd codemem
npm install
npm run build
npm linkOption B — Install from npm (once published)
npm install -g @mrabhishek1105/codememQuick Start
# 1. Go to any project you want to index
cd /path/to/your-project
# 2. Initialize CodeMem (downloads model ~22MB on first run, indexes codebase)
codemem init
# 3. Start the sidecar server
codemem start
# 4. Search from the CLI
codemem search "how does authentication work"
# 5. Or hit the HTTP API directly (works with any AI tool)
curl -X POST http://localhost:8432/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "how does the payment flow work", "options": {"top_k": 5}}'CLI Commands
codemem init
Initialize CodeMem in the current directory. Downloads the embedding model on first run (~22 MB), scans all files, generates embeddings, and stores them in .codemem/.
codemem init # interactive
codemem init --yes # skip prompts
codemem init --debug # verbose loggingWhat it creates:
your-project/
└── .codemem/
├── config.json ← project settings
├── db/ ← local vector index (Vectra)
├── meta/ ← file hashes, stats, recent changes
└── logs/ ← debug logscodemem start
Start the HTTP sidecar server. Loads the vector index and embedding model into memory, starts the file watcher, and serves the API on localhost:8432.
codemem start # default port 8432
codemem start --port 9000 # custom port
codemem start --debug # verbose loggingPress Ctrl+C to stop gracefully.
codemem stop
Stop the running sidecar from another terminal window.
codemem stopcodemem status
Show current index health and server state.
codemem status CodeMem Status
Project
──────────────────────────────────────
Name my-project
Language typescript
Framework Fastify
Root /home/user/my-project
Index
──────────────────────────────────────
Files indexed 47
Chunks 422
Last indexed 2026-03-29T19:08:58Z
Server
──────────────────────────────────────
Status ● Running (localhost:8432)
Stats
──────────────────────────────────────
Queries served 38
Tokens saved 1.2Mcodemem stats
Show token savings and estimated cost saved.
codemem statscodemem search <query>
Search the indexed codebase from the terminal. Requires the sidecar to be running.
codemem search "how does the embedder load the model"
codemem search "error handling in HTTP routes" --top 8codemem reindex
Re-index after significant changes.
codemem reindex # incremental (only changed files — fast)
codemem reindex --full # force full re-indexHTTP API
The sidecar exposes a REST API on http://localhost:8432. Any tool that can make an HTTP POST can use CodeMem.
POST /api/v1/query
Semantic search — the main endpoint.
Request:
{
"query": "how does user authentication work",
"options": {
"top_k": 6,
"token_budget": 4000
}
}Response:
{
"context": {
"project_summary": "Project: myapp | Language: typescript | ...",
"chunks": [
{
"id": "src/auth/middleware.ts::verifyToken",
"file_path": "src/auth/middleware.ts",
"content": "export async function verifyToken(req, res) { ... }",
"relevance_score": 0.87,
"type": "function",
"lines": [12, 34]
}
],
"assembled_text": "## Project: ...\n\n--- src/auth/middleware.ts ---\n...",
"token_count": 380
},
"stats": {
"chunks_searched": 422,
"chunks_returned": 6,
"tokens_saved_estimate": 82000,
"query_time_ms": 31
}
}POST /api/v1/index
Trigger a full re-index via HTTP.
{ "force": true }POST /api/v1/update
Trigger an incremental update for specific files.
{ "files": ["src/auth/middleware.ts", "src/models/user.ts"] }GET /api/v1/status
Full sidecar status as JSON.
GET /api/v1/stats
Token savings statistics as JSON.
GET /api/v1/health
Health check. Returns {"ok": true}.
GET /api/v1/config and PUT /api/v1/config
Read or update configuration at runtime.
{
"retrieval": { "top_k": 8, "token_budget": 6000 }
}How It Works
1. Chunking
Every source file is split into semantic chunks at function, class, and block boundaries. Each chunk is wrapped in an envelope that adds rich context before embedding:
[FILE: src/auth/middleware.ts | LANG: typescript | TYPE: function]
[DESC: verifyToken]
[IMPORTS: jsonwebtoken, prisma]
[CALLS: jwt.verify, db.user.findUnique]
[CODE]
export async function verifyToken(req, res) {
const token = req.headers.authorization?.split(' ')[1];
...
}The vector captures context, not just syntax — so semantically related code is found even when it uses different words.
2. Embedding
Each envelope is run through all-MiniLM-L6-v2 — a 22MB quantized ONNX model running entirely on CPU via WebAssembly. Produces a 384-dimensional vector. No GPU, no cloud, no API key needed.
3. Storage
Vectors are stored in Vectra — a lightweight local vector database backed by plain JSON files in .codemem/db/. No separate database process, no Python, no Docker.
4. Retrieval & Context Assembly
At query time:
- Query text is embedded into a 384-dim vector
- Vectra finds top candidates by cosine similarity
- Candidates are re-ranked with a hybrid score:
- 55% semantic similarity
- 30% BM25-lite keyword overlap
- 15% recency decay (recently edited files rank slightly higher)
- Top chunks are assembled into a structured, readable context block — with project summary, file references, call flow, and code — within your token budget
5. File Watching
While codemem start is running, chokidar watches for file changes. A saved file is re-chunked and re-embedded within ~1 second. The index stays fresh automatically.
Reliability
- Safe incremental indexing — only changed files are re-processed; the rest of the index is untouched
- Graceful recovery on restart — index is persisted on disk; no rebuild needed after a crash or reboot
- No index corruption — Vectra writes are atomic; a mid-write crash leaves the previous state intact
- Automatic deduplication — re-indexing a file replaces its chunks, never creates duplicates
- Secrets never indexed —
.env,*.key,*.pem, and similar files are hard-excluded regardless of.gitignore
Configuration
Configuration lives in .codemem/config.json. Edit directly or use PUT /api/v1/config.
{
"project": {
"name": "my-project",
"detected_language": "typescript",
"detected_framework": "Fastify"
},
"server": {
"port": 8432,
"cors_origins": ["http://localhost:3000"]
},
"indexing": {
"include_patterns": ["src/**/*", "lib/**/*"],
"max_file_size_kb": 500,
"debounce_ms": 500
},
"retrieval": {
"top_k": 6,
"token_budget": 4000,
"min_score": 0.15
},
"embedding": {
"model": "Xenova/all-MiniLM-L6-v2",
"dimensions": 384,
"batch_size": 32
}
}Ignoring Files
CodeMem respects .gitignore automatically. Create .codememignore for extra exclusions:
# .codememignore
tests/fixtures/**
*.generated.ts
docs/**
coverage/**Always excluded regardless of ignore files:
node_modules/,dist/,build/,.git/- Binary files (images, fonts, archives, executables)
- Secret files (
.env,*.key,*.pem,*.secret) - The
.codemem/folder itself
Integrating with AI Tools
Any AI via HTTP
Add this to your AI system prompt or tool definition:
CodeMem is running at http://localhost:8432.
To get relevant codebase context, POST to /api/v1/query:
{ "query": "your question here", "options": { "top_k": 6 } }
Use the returned "assembled_text" field as codebase context.From a script
curl -s -X POST http://localhost:8432/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "database connection pooling", "options": {"top_k": 5}}' \
| jq '.context.assembled_text'MCP Server (coming soon)
A Model Context Protocol server is planned for native Claude Desktop and Cursor integration — CodeMem will be called automatically when codebase context is needed.
Performance
| Operation | Typical time |
|---|---|
codemem init — first run (model download) |
30–90s |
codemem init — model cached, ~50 files |
60–180s |
codemem start — load model + index into memory |
5–15s |
| Semantic query | 30–300ms (depends on index size) |
| Incremental re-index — 1 file changed | < 2s |
| Full re-index — 50 files | 60–180s |
Query time stays low regardless of codebase size because search is vector similarity (sub-linear). Indexing time scales linearly with file count.
Supported Languages
| Language | Extensions | Chunk strategy |
|---|---|---|
| TypeScript | .ts, .tsx |
Functions, classes, interfaces, arrow functions |
| JavaScript | .js, .jsx, .mjs |
Functions, classes, arrow functions |
| Python | .py |
Functions (def), classes |
| Rust | .rs |
Functions (fn), structs, impl blocks |
| Go | .go |
Functions (func) |
| Java | .java |
Methods, classes |
| Ruby | .rb |
Methods (def), classes |
| Other | .md, .json, .yaml, etc. |
Paragraph / section splits |
When NOT to Use CodeMem
CodeMem is designed for real codebases. It is overkill or unhelpful for:
- Very small projects (< 5 files) — just paste the files directly
- One-off scripts — not worth indexing something you run once
- Non-code content — large datasets, binary assets, log files
- Frequently generated files — build artifacts, compiled output (add to
.codememignore)
Project Structure
codemem/
├── src/
│ ├── index.ts ← CLI entry point (Commander)
│ ├── types/
│ │ ├── chunk.ts ← Chunk, ChunkHeader, VectraMetadata types
│ │ ├── config.ts ← CodeMemConfig, DEFAULT_CONFIG
│ │ └── query.ts ← QueryOptions, QueryResult, IndexResult
│ ├── utils/
│ │ ├── logger.ts ← Structured JSON logger
│ │ ├── hash.ts ← SHA-256 file change detection
│ │ ├── tokens.ts ← Token estimation + budget trimming
│ │ └── ignore.ts ← .gitignore / .codememignore filter
│ ├── parsers/
│ │ └── regex-parser.ts ← Language-aware semantic chunker
│ ├── core/
│ │ ├── embedder.ts ← @xenova/transformers WASM wrapper
│ │ ├── indexer.ts ← Full + incremental parallel indexing
│ │ ├── retriever.ts ← Hybrid search + structured context assembly
│ │ ├── file-watcher.ts ← chokidar-based live watcher
│ │ └── project-analyzer.ts ← Language/framework/entrypoint detection
│ ├── storage/
│ │ ├── vectra-store.ts ← Vectra LocalIndex wrapper
│ │ ├── config-store.ts ← .codemem/config.json CRUD
│ │ └── meta-store.ts ← File hashes, stats, recent changes
│ ├── server/
│ │ ├── http-server.ts ← Fastify server + watcher integration
│ │ ├── routes/
│ │ │ ├── query.ts ← POST /api/v1/query, /index, /update
│ │ │ └── status.ts ← GET /api/v1/status, /stats, /config, /health
│ │ └── middleware/
│ │ └── error-handler.ts ← Fastify error handler
│ └── cli/
│ ├── ui.ts ← Terminal UI (chalk, boxen, ora)
│ └── commands/
│ ├── init.ts ← codemem init
│ ├── start.ts ← codemem start
│ ├── stop.ts ← codemem stop
│ ├── status.ts ← codemem status
│ ├── stats.ts ← codemem stats
│ ├── search.ts ← codemem search
│ └── reindex.ts ← codemem reindex
├── node_ort_shim/
│ ├── package.json ← Masquerades as onnxruntime-node
│ └── index.js ← Redirects to onnxruntime-web (WASM)
├── dist/ ← Compiled ESM output (tsup)
├── package.json
├── tsconfig.json
└── tsup.config.tsTroubleshooting
codemem init fails with "Failed to load model"
Windows: Ensure Node.js v18+ is installed. The model runs via WebAssembly — no Visual C++ Redistributable or GPU drivers needed.
All platforms: Check that ~/.codemem/models/ is writable. Delete it and re-run codemem init to re-download the model.
Server won't start — "address already in use"
Another process is using port 8432. Use a different port:
codemem start --port 9000
# or edit .codemem/config.json → "server": { "port": 9000 }codemem search says "Is the sidecar running?"
The HTTP server must be running before you can search:
codemem startSearch results are irrelevant
Reindex after major refactors:
codemem reindex --full.codemem/ folder is large
The index grows with your codebase. To reset completely:
rm -rf .codemem/
codemem initPrivacy & Security
- No telemetry — CodeMem never phones home
- No cloud — the embedding model runs locally via WebAssembly
- Your code never leaves your machine — all vectors are stored in
.codemem/inside your project - Secrets are hard-excluded —
.env,*.key,*.pem, and similar files are never indexed .codemem/is gitignored automatically —codemem initadds it for you
Roadmap
- MCP server for native Claude Desktop and Cursor integration
- VS Code extension with inline context panel
- Tree-sitter AST chunker (more precise than regex)
- Dependency graph awareness — index call chains across files
-
codemem doctor— diagnose index health issues -
@xenova/transformersv3 upgrade for faster inference
License
MIT