Package Exports

@crewchief/maproom-mcp
@crewchief/maproom-mcp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@crewchief/maproom-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@crewchief/maproom-mcp

Semantic code search powered by PostgreSQL, pgvector, and your choice of embedding provider.

Fast semantic search. One setup command. One line config.

Features

✨ Choice of Providers - OpenAI (recommended), Google Vertex AI, or local Ollama
🚀 Fast Hybrid Search - Vector similarity + full-text search with PostgreSQL
🎯 Semantic Ranking ✨ NEW - Implementations rank first, not tests or docs
🔄 Auto-Sync - Watch mode keeps your index up-to-date automatically
🌿 Automatic Branch Detection ✨ NEW - Auto-index branches on switch (no manual scan needed)
📦 Fully Containerized - Everything runs in Docker, isolated and clean
🌳 Multi-Language - Tree-sitter parsing for TypeScript, JavaScript, Rust, and more
🔒 Privacy Options - Use local Ollama for 100% private embeddings (no API keys)

Semantic Ranking

Maproom now uses semantic entry point ranking to prioritize code implementations over tests and documentation in search results.

The Problem: Traditional full-text search ranks results by keyword frequency. When you search for "authenticate", documentation mentioning the word 20+ times ranks higher than the actual authenticate() function.

The Solution: Semantic ranking applies kind multipliers to boost implementations (2.5×) and demote tests (0.6×) and docs (0.3-0.6×). Exact symbol matches get an additional 3.0× bonus.

Example

Query: authenticate

Before:

1. Documentation: "Authentication Guide" ← Not helpful
2. Documentation: "User Authentication"
...
8. Function: authenticate() ← What you wanted!

After:

1. Function: authenticate() ← Found immediately! ✨
2. Function: authenticate_user()
3. Class: Authenticator

Performance

17% faster on average (p95 latency: 48ms → 40ms)
55% of queries improved by >10%
All queries <100ms p95 latency

Debug Mode

Enable debug mode to see how scores are calculated:

const results = await search({
  query: 'authenticate',
  debug: true  // Shows score breakdown
})

Learn more: See docs/search-ranking.md for complete documentation.

Quick Start

1. Run Setup (First Time Only)

Recommended: OpenAI (fast, low cost)

export OPENAI_API_KEY=sk-...
npx @crewchief/maproom-mcp setup --provider=openai

Alternative: Google Vertex AI (fast, low cost)

export GOOGLE_PROJECT_ID=my-project
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
npx @crewchief/maproom-mcp setup --provider=google

Local: Ollama (slower, no API key needed)

npx @crewchief/maproom-mcp setup --provider=ollama

This will (2-5 minutes on first run):

Download Docker images
Download embedding model (Ollama only)
Initialize PostgreSQL with pgvector
Validate everything works

Devcontainer Support

The maproom-mcp setup command automatically detects Docker-in-Docker environments (devcontainers) and configures the correct workspace path for volume mounting.

How it works:

Detects if running inside a Docker container
Discovers the actual host path where /workspace is mounted
Automatically sets WORKSPACE_HOST_PATH before starting containers
No manual configuration required

Supported environments:

VS Code devcontainers
GitHub Codespaces
Cursor devcontainers
Local Docker Desktop

Manual override (if needed):

export WORKSPACE_HOST_PATH=/path/to/workspace
npx @crewchief/maproom-mcp setup --provider=openai

Troubleshooting:

If detection fails, manually set WORKSPACE_HOST_PATH
Verify Docker socket access: docker ps
Check container mounts: docker inspect $(hostname)

2. Index Your Codebase

Automatic Indexing (Recommended) ✨ NEW

Start the branch watcher to automatically index as you switch branches:

# Set database URL
export MAPROOM_DATABASE_URL="postgresql://maproom:maproom@localhost:5432/maproom"

# Start watcher (Terminal 1)
maproom branch-watch --repo /path/to/your/repo

# Work normally (Terminal 2) - branches auto-index
git checkout feature-auth  # Automatically indexed in <1 minute

The watcher runs continuously and indexes branches automatically when you switch. For more details, see the Automatic Indexing Guide.

Manual Indexing

Alternatively, manually trigger indexing:

With OpenAI:

MAPROOM_EMBEDDING_PROVIDER=openai npx @crewchief/maproom-mcp scan /path/to/your/repo

With Google Vertex AI:

MAPROOM_EMBEDDING_PROVIDER=google npx @crewchief/maproom-mcp scan /path/to/your/repo

With Ollama (local):

MAPROOM_EMBEDDING_PROVIDER=ollama npx @crewchief/maproom-mcp scan /path/to/your/repo

Optional: Auto-sync with watch mode

MAPROOM_EMBEDDING_PROVIDER=openai npx @crewchief/maproom-mcp watch /path/to/your/repo

This keeps your index up-to-date as you edit code. Leave it running in a terminal.

3. Add to MCP Configuration

Claude Code (.claude/mcp.json in your project):

{
  "mcpServers": {
    "maproom": {
      "command": "docker",
      "args": [
        "exec",
        "-i",
        "maproom-mcp",
        "node",
        "/app/dist/index.js"
      ],
      "env": {
        "MAPROOM_EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
      }
    }
  }
}

Cursor (.cursor/mcp.json in your project):

{
  "mcpServers": {
    "maproom": {
      "command": "docker",
      "args": [
        "exec",
        "-i",
        "maproom-mcp",
        "node",
        "/app/dist/index.js"
      ],
      "env": {
        "MAPROOM_EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
      }
    }
  }
}

For Google Vertex AI, use:

"env": {
  "MAPROOM_EMBEDDING_PROVIDER": "google",
  "GOOGLE_PROJECT_ID": "${GOOGLE_PROJECT_ID}",
  "GOOGLE_APPLICATION_CREDENTIALS": "${GOOGLE_APPLICATION_CREDENTIALS}"
}

For Ollama (local), use:

"env": {
  "MAPROOM_EMBEDDING_PROVIDER": "ollama"
}

4. Restart Your MCP Client

Restart Claude Code or Cursor to connect to Maproom.

That's it! Use Maproom tools for semantic code search.

Database Setup

Maproom uses a dual-database architecture with separate PostgreSQL instances for development and testing:

Development Database (port 5433) - For manual work, CLI commands, and MCP operations
Test Database (port 5434) - Isolated database for automated tests only

Starting Databases

The setup command starts the development database only (automatic via depends_on in docker-compose.yml):

npx @crewchief/maproom-mcp setup --provider=openai

For developers/CI needing test isolation, the test database must be started manually (opt-in):

cd ~/.maproom-mcp  # or packages/maproom-mcp/config in monorepo
docker compose up -d postgres-test

Regular maproom users don't need the test database running.

Running Tests

Tests automatically use the test database (port 5434):

cd packages/maproom-mcp
pnpm test

The test database connection is configured via TEST_MAPROOM_DATABASE_URL environment variable and defaults to the test database.

Schema Initialization

Both databases require manual schema initialization after first start:

# Development database
docker exec -i maproom-postgres psql -U maproom -d maproom < ~/.maproom-mcp/init.sql

# Test database
docker exec -i maproom-postgres-test psql -U maproom -d maproom_test < ~/.maproom-mcp/init.sql

Need More Details?

See the comprehensive Test Database Setup Guide for:

Troubleshooting connection issues
Resetting test database
Volume management
CI/CD configuration
Advanced workflows

System Requirements

Docker Desktop 4.x+ (Install Docker)
4-8 GB RAM available for Docker
5 GB disk space (images + model + database)
Supported OS: macOS, Linux, Windows with WSL2

Verify Docker is running:

docker --version
docker compose version

Provider Comparison

Provider	Speed	Cost	Setup	Privacy
OpenAI	⚡ Fast	💵 ~$0.02/1M tokens	API key	☁️ Cloud
Google	⚡ Fast	💵 Similar to OpenAI	GCP setup	☁️ Cloud
Ollama	🐌 Slow*	💰 Free	None	🔒 100% Local

*Ollama is 5-10x slower without GPU. Requires 8GB+ RAM.

Recommendation: Use OpenAI or Google for best performance. Use Ollama only if you need 100% local processing and have good hardware.

Commands

`setup`

Initial configuration. Required before first use.

npx @crewchief/maproom-mcp setup --provider=openai
npx @crewchief/maproom-mcp setup --provider=google
npx @crewchief/maproom-mcp setup --provider=ollama

`scan`

Index a repository (run after cloning or major changes).

npx @crewchief/maproom-mcp scan /path/to/repo
npx @crewchief/maproom-mcp scan .  # Current directory

`watch`

Monitor repository for changes and auto-reindex.

npx @crewchief/maproom-mcp watch /path/to/repo
npx @crewchief/maproom-mcp watch --debounce=5000  # Custom debounce (ms)

Leave running in a terminal. Press Ctrl+C to stop.

When to Use Spawning vs Daemon

Maproom uses two execution patterns depending on the operation type:

Use Spawning When:

One-time operations (scan, upsert single files)
Startup/initialization tasks
Operations where spawn overhead (<200ms) is negligible
Example: Initial workspace scan at startup

Why: Spawning overhead (~100-200ms) is negligible compared to operation time (seconds to minutes for scan).

Use Daemon When:

Repeated operations (search queries)
Low-latency requirements (<50ms response time)
Connection pooling beneficial (reuse database connections)
Example: MCP server search operations (20-50x faster)

Why: Daemon eliminates spawn overhead for every request, achieving <50ms latency for search.

Current Implementation:

MCP search tool: Uses daemon (correct - repeated operations)
MCP upsert tool: Uses spawning (correct - one-time file indexing)
VSCode scan: Uses spawning (correct - one-time workspace indexing)
VSCode search (future): Will use MCP daemon via extension API

Performance comparison:

Spawning: ~100-200ms overhead per operation
Daemon: <1ms overhead per operation (after initial startup)

When NOT to migrate:

If operation takes >10 seconds (scan, large upserts), spawn overhead is <2% of total time
If operation runs once at startup (workspace scan), daemon provides no benefit

Progress Indicators

The scan command now shows real-time progress during indexing, making it easy to track what's happening without slowing down performance.

Scan Command Progress

When you run scan, you'll see:

🔍 Scanning worktree: main @ abc12345
   Repository: my-repo
   Path: /path/to/repo

Processing: 45/100 files (45%)
✅ Completed in 8.3s

📊 Scan Summary:
   Files processed: 100
   Total chunks: 847
   Total size: 2.14 MB

Features:

Real-time progress updates (throttled to every 200-500ms to avoid console flooding)
File and chunk counts as indexing progresses
Completion timing prominently displayed
Works in both TTY (interactive terminal) and non-TTY (CI/logging) environments

Default Directory Behavior: You don't need to specify . for the current directory - it's the default:

# These are equivalent:
npx @crewchief/maproom-mcp scan
npx @crewchief/maproom-mcp scan .
npx @crewchief/maproom-mcp scan /path/to/repo  # Or specify a path

Verbose Mode

For more detailed output during debugging:

npx @crewchief/maproom-mcp scan --verbose

Currently shows the same output as default mode, but reserved for future detailed diagnostics.

Performance

Progress tracking adds minimal overhead (<5%) through:

Atomic counters for thread-safe updates
Smart throttling (200ms minimum between updates)
Efficient TTY detection

Troubleshooting

"Connection refused" errors to localhost:11434

Problem: OpenAI or Cohere provider attempting to connect to local Ollama endpoint.

Solution: This was a bug in earlier versions (< 1.2.0). Update to the latest version where provider-aware endpoint validation prevents this issue:

npx @crewchief/maproom-mcp@latest setup --provider=openai

The fix ensures cloud providers only use their official endpoints, preventing cross-provider endpoint pollution.

Custom endpoint not used

Problem: Set EMBEDDING_API_ENDPOINT but provider uses default.

Solution: Ensure the endpoint domain matches your provider:

OpenAI: Must contain "openai.com"
Cohere: Must contain "cohere"
Ollama/Local: Any endpoint accepted
Google: Ignores EMBEDDING_API_ENDPOINT (uses region-based endpoint)

Example of correct custom endpoint:

# ✅ Correct: OpenAI custom endpoint (contains "openai.com")
export EMBEDDING_API_ENDPOINT=https://api.openai.com/v1/embeddings

# ❌ Wrong: Ollama endpoint for OpenAI provider (ignored)
export EMBEDDING_API_ENDPOINT=http://localhost:11434

Database "column updated_at does not exist" errors

Problem: Missing column in database schema.

Solution: Run database migrations. The maproom binary automatically applies migrations on startup:

npx @crewchief/maproom-mcp setup --provider=<your-provider>

Or manually apply migrations by restarting containers:

docker compose -f ~/.maproom-mcp/docker-compose.yml restart

"Setup required!" error

Run the setup command with your chosen provider:

npx @crewchief/maproom-mcp setup --provider=openai

Containers not starting

Verify Docker is running: docker info

Check for port conflicts:

lsof -i :5433  # PostgreSQL
lsof -i :11434 # Ollama (if using)

Re-run setup

Database errors

Reset everything:

docker compose -f ~/.maproom-mcp/docker-compose.yml down -v
npx @crewchief/maproom-mcp setup --provider=<your-provider>

Slow indexing with Ollama

Ollama is CPU-bound without GPU. Consider:

Using OpenAI or Google instead (much faster)
Adding a GPU to your system
Reducing batch size: EMBEDDING_BATCH_SIZE=10 (slower but lower memory)

Enable diagnostic mode

MAPROOM_MCP_DEBUG=true npx @crewchief/maproom-mcp setup

Data Persistence

All data is stored in Docker volumes:

maproom-data - PostgreSQL database (indexed code + embeddings)
ollama-models - Downloaded Ollama models (if using Ollama)
maproom-logs - MCP server logs

Your indexed code persists between sessions. To completely reset:

docker volume rm maproom-data ollama-models maproom-logs

Database Schema

Core Tables

chunks table - Code chunks with worktree tracking

chunk_id - UUID primary key
blob_sha - Content-addressed SHA (links to embeddings)
relpath - File path relative to repository root
symbol_name - Function/class/symbol name
content - Source code text
worktree_ids - JSONB array of worktree IDs containing this chunk
start_line, end_line - Line range in file
created_at, updated_at - Timestamps

worktree_index_state table - Tracks last indexed git tree SHA per worktree

worktree_id - Foreign key to worktrees table
last_tree_sha - Git tree SHA from git rev-parse HEAD^{tree}
last_indexed - Timestamp of last successful scan
chunks_processed - Cumulative count for monitoring
embeddings_generated - Cost tracking metric

code_embeddings table - Cached embeddings for content deduplication

blob_sha - Primary key (content-addressed)
embedding - Vector embedding (pgvector type)
model - Embedding model name
dimension - Vector dimension

Indexes

GIN index on worktree_ids - Enables efficient worktree filtering

CREATE INDEX idx_chunks_worktree_ids
ON maproom.chunks USING gin(worktree_ids);

Supports JSONB operators:

WHERE worktree_ids ? '2' - Find chunks in worktree 2
WHERE worktree_ids ?| ARRAY['2', '5'] - Find chunks in any of multiple worktrees

Branch-Aware Features

Content deduplication: Same code across branches shares single embedding (via blob_sha)

Incremental updates: Tree SHA comparison enables instant "no changes" detection (<100ms)

Worktree filtering: Search code from specific branch/worktree

See also: Branch-Aware Indexing Architecture for complete technical details

Database Connection

The Maproom MCP server uses intelligent connection fallback to detect and connect to the PostgreSQL database automatically.

Connection Priority

The system tries these methods in order:

MAPROOM_DATABASE_URL (explicit config) - If set, uses this connection string exactly
```
export MAPROOM_DATABASE_URL="postgresql://user:pass@host:port/dbname"
```
MAPROOM_DB_HOST (component override) - If MAPROOM_DATABASE_URL not set, constructs connection from parts
```
export MAPROOM_DB_HOST="custom-host"
export MAPROOM_DB_PORT="5432"  # optional, defaults to 5432
```
maproom-postgres (auto-detection) - Attempts to connect to maproom-postgres hostname
- Works automatically in Docker environments
- No configuration needed if maproom-postgres container is running (default)
localhost:5433 (fallback) - Development fallback for local testing
- Useful for local postgres instances on non-standard port

Troubleshooting Connection Issues

Can't connect to database:

Verify maproom-postgres is running:
```
docker ps | grep maproom-postgres
```

Start if needed:

docker compose -f ~/.maproom-mcp/docker-compose.yml up -d

Check logs:
```
docker logs maproom-postgres
```

Connection refused:

Verify port 5432 (internal) or 5433 (host) is not blocked
Check network connectivity:
```
docker network inspect maproom-network
```

Hostname not found:

Verify you're in correct Docker network

Try setting MAPROOM_DATABASE_URL explicitly:

export MAPROOM_DATABASE_URL="postgresql://maproom:maproom@127.0.0.1:5433/maproom"

Custom database setup: If you want to use your own PostgreSQL instance instead of the bundled one:

export MAPROOM_DATABASE_URL="postgresql://myuser:mypass@myhost:5432/mydb"
npx @crewchief/maproom-mcp scan /path/to/code

Advanced Configuration

Custom Database

Override the default database connection:

{
  "mcpServers": {
    "maproom": {
      "command": "docker",
      "args": [
        "exec",
        "-i",
        "maproom-mcp",
        "node",
        "/app/dist/index.js"
      ],
      "env": {
        "MAPROOM_DATABASE_URL": "postgresql://user:pass@custom-host:5432/mydb",
        "MAPROOM_EMBEDDING_PROVIDER": "openai",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
      }
    }
  }
}

Custom Embedding Models

OpenAI:

"env": {
  "MAPROOM_EMBEDDING_PROVIDER": "openai",
  "MAPROOM_EMBEDDING_MODEL": "text-embedding-3-large",
  "EMBEDDING_DIMENSION": "3072"
}

Google:

"env": {
  "MAPROOM_EMBEDDING_PROVIDER": "google",
  "MAPROOM_EMBEDDING_MODEL": "textembedding-gecko@003"
}

Ollama:

"env": {
  "MAPROOM_EMBEDDING_PROVIDER": "ollama",
  "MAPROOM_EMBEDDING_MODEL": "mxbai-embed-large"
}

Batch Size Tuning

Adjust embedding batch size (default: 50):

"env": {
  "EMBEDDING_BATCH_SIZE": "100"
}

Higher = faster but more memory. Lower = slower but less memory.

Search Tool - Semantic Code Search

New in v2.1.0: The search tool now automatically scopes results to your current git branch, eliminating result duplication and making search results more relevant to your active work.

The search MCP tool performs semantic code search across your indexed codebase using hybrid search (vector similarity + full-text search).

Parameters

Parameter	Type	Description
`repo`	string	Required. Repository name (must match indexed name)
`query`	string	Required. Search query (concept or keywords)
`worktree`	string \| null \| undefined	Optional. Worktree scope: - `undefined` (default): Auto-detect current branch - `"branch-name"`: Search specific branch - `null`: Search all worktrees
`limit`	number	Optional. Max results (default: 10)
`mode`	string	Optional. Search mode: `"vector"`, `"fts"`, or `"hybrid"` (default)
`debug`	boolean	Optional. Include ranking details (default: false)

Search Modes

The search tool supports three modes:

FTS (Full-Text Search): Fast keyword-based search using PostgreSQL FTS
- Best for: Finding specific function names, error messages, exact terms
- Latency: ~50-100ms
- Requires: Indexed repository
Vector (Semantic Search): AI-powered similarity search using embeddings
- Best for: Conceptual queries, "code that does X", finding similar patterns
- Latency: ~100-200ms
- Requires: Indexed repository + generated embeddings (run generate-embeddings)
Hybrid (Combined): Merges FTS and vector results with reciprocal rank fusion
- Best for: Most searches - combines precision of FTS with recall of vector
- Latency: ~200-300ms (runs both searches)
- Requires: Same as vector mode

Examples

// Fast keyword search
{ mode: "fts", query: "handleSearch", repo: "crewchief" }

// Semantic similarity search
{ mode: "vector", query: "authentication logic", repo: "crewchief" }

// Best of both worlds
{ mode: "hybrid", query: "error handling patterns", repo: "crewchief" }

Mode Selection Guide

Use FTS when: Looking for specific identifiers, known terms
Use Vector when: Exploring concepts, finding related code
Use Hybrid when: Unsure which mode fits, or want comprehensive results


### Worktree-Scoped Search (Auto-Detection)

**Default behavior (v2.1.0+)**: When `worktree` parameter is omitted, the search tool automatically detects your current git branch and scopes results to that branch only.

**Example 1: Auto-detection** (recommended)
```typescript
// In feature-auth branch, searches only feature-auth worktree
const results = await mcp__maproom__search({
  repo: "my-repo",
  query: "authentication flow"
})
// Returns: { hits: [...], worktree: "feature-auth", auto_detected: true, mode: "auto" }

Example 2: Explicit worktree override

// In feature-auth branch, but search main worktree instead
const results = await mcp__maproom__search({
  repo: "my-repo",
  query: "authentication flow",
  worktree: "main"
})
// Returns: { hits: [...], worktree: "main", auto_detected: false, mode: "explicit" }

Example 3: Search all worktrees

// Search across all indexed branches
const results = await mcp__maproom__search({
  repo: "my-repo",
  query: "authentication flow",
  worktree: null
})
// Returns: { hits: [...], worktree: null, mode: "all" }

File Type Filtering

Filter search results by file extension to focus on specific languages or file types.

Single extension:

const result = await mcp__maproom__search({
  repo: 'crewchief',
  query: 'authentication',
  filters: { file_type: 'ts' }
})
// Returns only TypeScript (.ts) files

Multiple extensions:

const result = await mcp__maproom__search({
  repo: 'crewchief',
  query: 'authentication',
  filters: { file_type: 'ts,tsx,js' }
})
// Returns TypeScript or JavaScript files

Common patterns:

// Search only documentation
filters: { file_type: 'md,mdx' }

// Search Rust code
filters: { file_type: 'rs' }

// Search frontend code
filters: { file_type: 'tsx,jsx,vue,svelte' }

// Combine with recency filter
filters: {
  file_type: 'ts,tsx',
  recency_threshold: '7 days'
}
// Returns recent TypeScript files only

Syntax:

Comma-separated for multiple extensions
Case insensitive: "TS" same as "ts"
With or without dot: ".ts" same as "ts"
Maximum 20 extensions per filter

Error handling:

Empty filter ("") searches all files (no error)
Too many extensions (>20) returns error with helpful message
Invalid input normalized or filtered out gracefully

Fallback Behavior

When auto-detection is enabled but the current branch is not indexed, the search tool gracefully falls back to the main worktree with a helpful hint:

// In unindexed feature-xyz branch
const results = await mcp__maproom__search({
  repo: "my-repo",
  query: "authentication"
})

// Returns:
{
  hits: [...],  // Results from 'main' worktree
  worktree: "main",
  mode: "fallback",
  hint: "Current branch 'feature-xyz' is not indexed.\n\n" +
        "To search your current code:\n" +
        "1. Run: mcp__maproom__scan({repo: \"my-repo\", worktree: \"feature-xyz\"})\n\n" +
        "Searching 'main' worktree instead."
}

If the main worktree is also not indexed, the tool falls back to searching all worktrees.

Result Metadata

Search results include metadata about worktree resolution:

Field	Type	Description
`hits`	array	Search results with content, file paths, and scores
`total`	number	Total number of results returned
`worktree`	string \| null	Which worktree was searched
`auto_detected`	boolean	Was worktree auto-detected from git?
`mode`	string	Resolution mode: `"explicit"`, `"auto"`, `"fallback"`, or `"all"`
`hint`	string \| undefined	Helpful message when fallback occurs
`debug`	object \| undefined	Ranking details (only if `debug: true`)

Performance

Cache hit rate: >99% for git branch detection (60s TTL)
Search latency: <10ms with warm cache
Memory overhead: Minimal (<100 KB for LRU caches)

Troubleshooting

See Troubleshooting section for common issues.

Open Tool - File Retrieval

The open MCP tool retrieves file contents from your indexed codebase with intelligent path resolution and security validation.

Multi-Candidate Fallback

When multiple worktrees exist with the same name (common after repeated indexing), the open tool automatically tries each candidate in order:

Queries database for all matching worktrees (ordered by most recent ID first)
Validates each candidate path against the filesystem
Returns content from the first valid worktree found

This gracefully handles database pollution from:

Repeated indexing from different working directories
Repository moves or renames
Stale database entries

Security Features

Path Traversal Protection:

Validates all relative paths before filesystem access
Rejects paths containing ../, absolute paths, or null bytes
Prevents access outside repository boundaries

Symlink Validation:

Detects symlinks using fs.lstat() before reading
Resolves symlink targets with fs.realpath()
Blocks symlinks pointing outside repository boundaries
Allows legitimate internal symlinks (e.g., shared configs)

File Type Checking:

Only returns content for regular files
Directories and special files are rejected
Ensures fileExists() helper validates both readability AND file type

Error Messages

Error Message	Meaning	Recommended Action
`File exists in other worktrees: main, develop`	File not found in specified worktree but exists in others	Check worktree parameter spelling or use suggested worktree
`File 'X' not found in worktree 'Y'`	No matching database entry	Ensure repository is indexed and file path is correct
`File 'X' not accessible in worktree 'Y'. Tried N candidates...`	Database pollution detected - multiple entries but none valid on disk	Run `maproom db cleanup-stale` to remove stale entries
`Path traversal detected: ../../../etc/passwd`	Security violation in input	Use relative paths only, no parent directory references
`Path is outside repository boundaries`	Symlink or resolved path escapes repo	Check symlink targets or file paths
`Null bytes not allowed in path`	Invalid characters in path parameter	Remove null bytes from file path

Troubleshooting

Issue: "Tried N candidate paths but none exist on disk"

This indicates database pollution - the database has multiple entries for the same worktree name, but none correspond to valid paths on the filesystem.

Diagnosis:

# Check for duplicate worktree entries
docker exec -it maproom-postgres psql -U maproom -d maproom -c \
  "SELECT w.name, w.abs_path, COUNT(*)
   FROM maproom.worktrees w
   GROUP BY w.name, w.abs_path
   HAVING COUNT(*) > 1;"

Solution:

# Clean up stale database entries
maproom db cleanup-stale

Issue: File not found but file definitely exists

Diagnosis:

Verify the repository is indexed: Check maproom status output
Verify worktree name: The worktree parameter must match the database entry exactly
Check file path: Path must be relative to repository root

Issue: Symlink outside repository

Diagnosis:

# Check where symlink points
readlink /path/to/symlink

# Verify it's within repo boundaries
# Should start with repository root path

Solution:

Move symlink target inside repository, or
Access target file directly instead of via symlink

Path Resolution Flow

1. Input Validation
   ├─ Reject path traversal (../)
   ├─ Reject absolute paths (/)
   └─ Reject null bytes (\0)

2. Database Query
   └─ SELECT all matching (worktree, relpath) pairs
      ORDER BY worktree.id DESC

3. Multi-Candidate Validation
   ├─ For each candidate:
   │  ├─ Check filesystem existence
   │  ├─ Validate within repo boundaries
   │  └─ Return if valid
   └─ Error if all candidates fail

4. Security Checks
   ├─ Detect symlinks (fs.lstat)
   ├─ Validate symlink target (validateWithinRepo)
   └─ Verify file type (stats.isFile)

5. Content Retrieval
   └─ Read file with size limit validation

Environment Variables

Provider Configuration

MAPROOM_EMBEDDING_PROVIDER: (Required) One of: openai, cohere, google, ollama, local
MAPROOM_EMBEDDING_MODEL: (Required) Model name for the provider
EMBEDDING_DIMENSION: (Required) Vector dimension for embeddings
EMBEDDING_API_ENDPOINT: (Optional) Custom endpoint override

Endpoint Configuration

Cloud Providers (OpenAI, Cohere):

Use official endpoints by default (https://api.openai.com/v1/embeddings, etc.)
EMBEDDING_API_ENDPOINT only used if domain matches provider
Example: Setting EMBEDDING_API_ENDPOINT=http://localhost:11434 for OpenAI is ignored

Ollama:

Defaults to http://localhost:11434/api/embed
Set EMBEDDING_API_ENDPOINT for custom Ollama server location

Google Vertex AI:

Endpoint constructed from GOOGLE_VERTEX_REGION (e.g., us-west1)
EMBEDDING_API_ENDPOINT is ignored

Local Provider:

Requires EMBEDDING_API_ENDPOINT to be set explicitly

Environment Variable Precedence

Explicit configuration in code (if applicable)
EMBEDDING_API_ENDPOINT environment variable (validated by provider)
Provider-specific default endpoint

API Keys

OPENAI_API_KEY: For OpenAI provider
COHERE_API_KEY: For Cohere provider
GOOGLE_APPLICATION_CREDENTIALS: For Google Vertex AI

License

MIT - See LICENSE file for details.