Package Exports
- mongodocs-mcp
- mongodocs-mcp/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mongodocs-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
mongodocs-mcp
A Model Context Protocol (MCP) server that transforms any GitHub repository into searchable vector embeddings, enabling semantic search across codebases and documentation through IDE integration.
Architecture
The system implements a three-phase indexing pipeline with smart change detection:
Repository → Git Clone → Smart Chunking → Vector Embeddings → MongoDB Atlas
↓ ↓ ↓ ↓
Hash Tracking Semantic Split voyage-context-3 Vector SearchCore Components
- Indexer (
src/core/indexer.ts): Git-based change detection using commit hashes - Semantic Chunker (
src/core/semantic-chunker.ts): Multi-strategy content splitting - Embedding Service (
src/core/embeddings.ts): Voyage AI integration with batching - Storage Service (
src/core/storage.ts): MongoDB Atlas vector operations - Search Service (
src/core/search.ts): Vector, hybrid RRF, and MMR algorithms - MCP Server (
src/index.ts): Protocol implementation for IDE integration
Installation
Global Package
npm install -g mongodocs-mcpFrom Source
git clone https://github.com/yourusername/mongodocs-mcp.git
cd mongodocs-mcp
npm install
npm run build
npm linkSetup
1. MongoDB Atlas
Create free M0 cluster at cloud.mongodb.com:
# Database structure
Database: mongodb_semantic_docs
Collection: documents
# Connection string format
mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majorityNetwork Access Configuration:
- Navigate to Network Access → Add IP Address
- Add
0.0.0.0/0for development (restrict in production)
Vector Search Index Creation:
- Go to Atlas Search → Create Index
- Select "JSON Editor"
- Paste configuration:
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"type": "knnVector",
"dimensions": 1024,
"similarity": "cosine"
}
}
}
}Name: vector_index
2. Voyage AI
Get API key from voyageai.com:
- Model:
voyage-context-3 - Dimensions: 1024
- Context window: 32,000 tokens
- Rate limit: 2000 RPM
3. Environment Configuration
Create .env file:
# Required
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
VOYAGE_API_KEY=pa-your-api-key
# Optional
GITHUB_TOKEN=ghp_your_token # For private reposUsage
Web Interface
# Start web UI
npm run web
# Opens http://localhost:3000
# 4-step wizard:
# 1. Configure APIs
# 2. Select repositories
# 3. Review MCP setup
# 4. Start processingCommand Line
# Index repositories (smart mode - only changed files)
npm run index
# Force complete rebuild
npm run rebuild
# Monitor indexing progress
npm run progress
# Database statistics
npm run stats
# Clean database
npm run cleanProgrammatic API
import { Indexer } from 'mongodocs-mcp';
const config = {
repositories: [{
name: 'My Documentation',
repo: 'owner/repository',
branch: 'main',
product: 'custom-my-docs'
}],
embedding: {
model: 'voyage-context-3',
dimensions: 1024,
chunkSize: 1000,
chunkOverlap: 200
}
};
const indexer = new Indexer(config);
indexer.onProgress((progress) => {
console.log(`${progress.phase}: ${progress.current}/${progress.total}`);
});
await indexer.index();MCP Integration
Claude Desktop
File: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"mongodocs": {
"command": "npx",
"args": ["mongodocs-mcp"],
"env": {
"MONGODB_URI": "your-connection-string",
"VOYAGE_API_KEY": "your-api-key"
}
}
}
}Cursor IDE
File: .cursor/mcp_settings.json
{
"mcpServers": {
"mongodocs": {
"command": "npx",
"args": ["mongodocs-mcp"],
"env": {
"MONGODB_URI": "your-connection-string",
"VOYAGE_API_KEY": "your-api-key"
}
}
}
}Restart IDE after configuration.
Search Methods
1. Hybrid RRF Search (Primary)
Reciprocal Rank Fusion combining vector and keyword search:
// Weight configuration
vectorWeight: 0.7
keywordWeight: 0.3
// Ranking formula
score = 1 / (k + rank) where k = 602. MMR Search (Diversity)
Maximum Marginal Relevance for result diversity:
// Parameters
fetchK: 20 // Initial candidates
lambdaMult: 0.7 // Relevance vs diversity
limit: 5 // Final results
// Algorithm
MMR = λ * Sim(Di, Q) - (1-λ) * max Sim(Di, Dj)3. Pure Vector Search
Cosine similarity search:
// Configuration
numCandidates: 40 // 7.5x faster than default 300
limit: 10Technical Implementation
Semantic Chunking
Three-strategy approach with statistical analysis:
1. Interquartile Method
// Calculate sentence distances
distances = sentences.map(embed).map(cosineDistance)
// Find breakpoints at quartile boundaries
Q1, Q3 = quartiles(distances)
threshold = Q3 + 1.5 * (Q3 - Q1)2. Gradient Method
// Identify semantic transitions
gradients = distances.map(derivative)
breakpoints = gradients.filter(g => g > threshold)3. Hybrid Scoring
score = 0.6 * interquartile + 0.4 * gradient
// Adaptive to content typeChunk Optimization
const CHUNK_CONFIG = {
base: 1000, // Target size
min: 100, // Prevent empty
max: 2500, // Respect limits
overlap: 200, // Context preservation
// Token validation
maxTokens: 6000, // voyage-context-3 safety
tokenizer: 'cl100k_base'
};Smart Indexing
Repository state tracking:
// Check existing hash
const existingHash = await storage.getRepositoryHash(repo.name);
const currentHash = await git.getLatestCommit();
if (existingHash === currentHash) {
console.log('✅ Repository up to date, skipping...');
return;
}
// Process only changed files
const changedFiles = await git.diff(existingHash, currentHash);
await processFiles(changedFiles);
await storage.updateRepositoryHash(repo.name, currentHash);Error Handling
// Exponential backoff with jitter
const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
const jitter = Math.random() * 1000;
await sleep(delay + jitter);
// Token limit handling
if (error.message.includes('32000 tokens')) {
// Split chunk and retry
const subChunks = emergencySplit(chunk);
return processSubChunks(subChunks);
}Performance Characteristics
Indexing Metrics
- Processing rate: 100-150 docs/hour (Voyage API limited)
- Batch size: 32 documents optimal
- Memory usage: <500MB peak
- Network bandwidth: ~10MB/hour
Search Performance
- Latency: <100ms p99
- Throughput: 1000+ QPS
- Index size: ~1.5KB per chunk
- Cache TTL: 5 minutes
Storage Efficiency
// Document structure (avg 1.5KB)
{
_id: ObjectId,
title: string, // 50 bytes
content: string, // 1000 bytes
embedding: float[1024], // 4KB compressed
metadata: { // 200 bytes
file: string,
repo: string,
product: string,
indexedAt: Date
}
}Repository Configuration
Default Repositories
const repositories = [
{
name: 'MongoDB Documentation',
repo: 'mongodb/docs',
branch: 'master',
product: 'mongodb-docs',
priority: 10
},
// Add custom repositories...
];Custom Repository
{
name: 'Your Documentation',
repo: 'owner/repository',
branch: 'main',
product: 'custom-your-docs',
// Optional filters
include: ['docs/**/*.md'],
exclude: ['**/node_modules/**'],
// Processing options
chunkSize: 1500,
chunkOverlap: 300
}Development
Build Pipeline
# Development with watch
npm run dev
# Production build
npm run build
# Type checking
npm run typecheck
# Linting
npm run lint
# Testing
npm testProject Structure
src/
├── core/
│ ├── indexer.ts # Orchestration
│ ├── semantic-chunker.ts # Content splitting
│ ├── embeddings.ts # Vector generation
│ ├── storage.ts # Database operations
│ └── search.ts # Query algorithms
├── config/
│ └── index.ts # Repository definitions
├── web/
│ ├── server.ts # Express server
│ ├── coordinator.ts # Web orchestration
│ └── templates/ # HTML interfaces
└── index.ts # MCP server
dist/ # Compiled output
.repos/ # Cloned repositoriesKey Dependencies
{
"mongodb": "^6.10.0", // Native driver
"voyageai": "^0.0.1-5", // Embeddings
"@modelcontextprotocol/sdk": "^1.0.0", // MCP
"js-tiktoken": "^1.0.15", // Tokenization
"simple-git": "^3.27.0" // Repository ops
}Troubleshooting
Connection Issues
# Test MongoDB connection
node -e "
const { MongoClient } = require('mongodb');
MongoClient.connect(process.env.MONGODB_URI)
.then(() => console.log('✅ Connected'))
.catch(err => console.error('❌', err.message));
"
# Test Voyage AI
curl -X POST https://api.voyageai.com/v1/embeddings \
-H "Authorization: Bearer $VOYAGE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": ["test"], "model": "voyage-context-3"}'Index Issues
# Verify vector index
mongosh $MONGODB_URI --eval "
db.documents.getSearchIndexes()
"
# Check document structure
mongosh $MONGODB_URI --eval "
db.documents.findOne()
"Performance Tuning
// Adjust for your use case
const tuning = {
// Smaller batches for memory constraints
batchSize: 16,
// More candidates for precision
numCandidates: 100,
// Larger chunks for context
chunkSize: 2000,
// Disable for speed
smartIndexing: false
};Best Practices
Security
- Store credentials in environment variables
- Use least-privilege MongoDB user
- Rotate API keys regularly
- Enable MongoDB audit logging
Optimization
- Index during off-peak hours
- Use incremental updates
- Monitor token usage
- Cache frequent queries
Scaling
- Horizontal sharding for large corpuses
- Read replicas for search traffic
- CDN for static assets
- Queue system for processing
Contributing
Pull requests welcome. Please ensure:
- TypeScript strict mode compliance
- Test coverage >80%
- Conventional commits
- Documentation updates
License
MIT
Support
- Issues: GitHub
- Discussions: GitHub Discussions
- MCP Spec: modelcontextprotocol.io
Built with MongoDB Atlas vector search and Voyage AI embeddings.