Package Exports

@ngotaico/mcp-codebase-index
@ngotaico/mcp-codebase-index/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@ngotaico/mcp-codebase-index) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

MCP Codebase Index Server

AI-powered semantic search for your codebase in GitHub Copilot, Kiro, and other MCP-compatible editors

A Model Context Protocol (MCP) server that enables AI editors to search and understand your codebase using Google's Gemini embeddings and Qdrant vector storage.

Supported Editors:

✅ VS Code with GitHub Copilot
✅ VS Code with Roo Cline
✅ GitHub Copilot CLI
✅ Google Gemini CLI
✅ Kiro AI Editor
✅ Any MCP-compatible editor

🚀 Getting Started

📖 Full Documentation - Complete documentation
⚙️ Setup Guide - VS Code - Installation for VS Code Copilot
🖥️ Setup Guide - CLI - Installation for GitHub Copilot CLI
🤖 Setup Guide - Gemini CLI - Installation for Google Gemini CLI
🎯 Setup Guide - Kiro - Installation for Kiro AI Editor
🦘 Setup Guide - Roo Cline - Installation for Roo Cline (VS Code)
⚡ Quick Reference - Command cheat sheet
🗺️ Navigation Guide - Find any doc quickly

💻 For Developers

Source Code Structure - Code organization
MCP Server Guide - Build your own MCP server
Bootstrap Guide - Auto-generate memory entities
Memory Integration - Memory system overview (v3.2)
Memory Quick Reference - Memory cheat sheet
Roadmap - Future plans

🔧 Resources

Qdrant Setup - Get Qdrant credentials
Testing Guide - Test search functionality
Prompt Enhancement Guide - Use prompt enhancement effectively
Vector Visualization Guide - Visualize your codebase
Changelog - Version history

✨ Features

🔍 Smart Code Search

Semantic Search - Find code by meaning, not just keywords
Multi-language Support - Works with 15+ programming languages
Real-time Watch - Auto-updates index when files change
Incremental Indexing - 90%+ faster by only indexing changed files

🧠 Memory System

Auto-Bootstrap - Generate 50+ entities in 3-5 minutes from your codebase
Web UI - Interactive D3.js graph visualization at localhost:3001
5 MCP Tools - bootstrap_memory, search_memory, open_memory_ui, close_memory_ui, check_memory_sync
Health Monitoring - Automatic sync checks and orphaned vector cleanup
Fast & Efficient - 2.8-6.0x speedup with parallel processing, <$0.01 per project

🎯 Advanced Capabilities

Vector Visualization - See your codebase in 2D/3D space
Prompt Enhancement - AI-powered query improvement (optional)
Simple Setup - Just 4 environment variables to get started

🚀 Quick Start

Prerequisites

Gemini API Key - Get free at Google AI Studio
Qdrant Cloud Account - Sign up free at cloud.qdrant.io

Installation

Choose your environment:

VS Code Users: Follow steps below or see Roo Cline Setup

Copilot CLI Users: See Copilot CLI Setup Guide

Gemini CLI Users: See Gemini CLI Setup Guide

Kiro Users: See Kiro Setup Guide

Step 1: Open MCP Configuration in VS Code

Open GitHub Copilot Chat (Ctrl+Alt+I / Cmd+Alt+I)
Click Settings icon → MCP Servers → MCP Configuration (JSON)

Step 2: Add this configuration to mcp.json:

{
  "servers": {
    "codebase": {
      "command": "npx",
      "args": ["-y", "@ngotaico/mcp-codebase-index"],
      "env": {
        "REPO_PATH": "/absolute/path/to/your/project",
        "GEMINI_API_KEY": "AIzaSyC...",
        "QDRANT_URL": "https://your-cluster.gcp.cloud.qdrant.io:6333",
        "QDRANT_API_KEY": "eyJhbGci..."
      },
      "type": "stdio"
    }
  }
}

Step 3: Restart VS Code

The server will automatically:

Connect to Qdrant Cloud
Index your codebase
Watch for file changes

📖 Detailed instructions:

📖 Usage

Search Your Codebase

Ask GitHub Copilot:

"Find the authentication logic"
"Show me how database connections are handled"
"Where is error logging implemented?"

Visualize Your Codebase

Ask GitHub Copilot:

"Visualize my codebase"
"Show me how my code is organized"
"Visualize authentication code"

📖 Complete guide: Vector Visualization Guide

Memory Management (AI Chat + Web UI Only)

Bootstrap via AI:

"Bootstrap memory for this codebase"

Auto-generates 50+ entities in 3-5 minutes via MCP tool.

Search via AI:

"Search memory for authentication entities"
"Find recent bugfixes in memory"

Visual exploration:

"Open memory UI"

Opens Web UI at http://localhost:3001 with:

📊 D3.js graph visualization
🔍 Real-time search & filters
📈 Statistics dashboard
🖱️ Click nodes for details

What it does:

✅ Extracts code structure via AST parsing (0 tokens, 549 files/sec)
✅ Detects patterns via clustering (0 tokens, 464 vectors/sec)
✅ Analyzes complex code with Gemini AI (95.6% confidence)
✅ Generates 50+ entities in 3-5 minutes for large projects
✅ Token efficient: <100k tokens (~$0.01 cost)

Performance:

Fast: 549 files/sec AST, 464 vectors/sec clustering
Quality: 95.6% AI confidence average
Cheap: <100k tokens for 500-file project

📖 Complete guide: Bootstrap Guide

Check Indexing Status

"Check indexing status"
"Show me detailed indexing progress"

📖 More examples: Testing Guide

📊 Vector Visualization

See your codebase in 2D/3D space - Understand semantic relationships and code organization visually.

What is Vector Visualization?

Vector visualization transforms your codebase's 768-dimensional embeddings into interactive 2D or 3D visualizations using UMAP dimensionality reduction. This allows you to:

🎨 Explore semantic relationships - Similar code clusters together
🔍 Understand architecture - See your codebase structure at a glance
🎯 Debug search results - Visualize why certain code was retrieved
📈 Track code organization - Identify modules, patterns, and outliers

Quick Start

Visualize entire codebase:

User: "Visualize my codebase"

Result: Interactive clusters showing:
- API Controllers & Routes (28%)
- Database Models (23%)
- Authentication (19%)
- Business Logic (18%)
- Test Suites (12%)

Export as HTML:

User: "Export visualization as HTML"

Result: Standalone HTML file with:
- Interactive hover, zoom, pan
- Click clusters to highlight
- Modern gradient UI
- Works offline

Understanding the Visualization

Colors and Clusters:

Each color represents a semantic cluster (module/functionality)
Points close together = similar in meaning
Distance reflects semantic similarity
Outliers indicate unique/specialized code

Common Cluster Patterns:

Blue: Frontend/UI components
Orange: API endpoints and routes
Green: Database models and queries
Red: Authentication and security
Purple: Tests and validation
Gray: Utilities and helpers

Use Cases

🏗️ Architecture Understanding
- Visualize to see module boundaries
- Identify tightly coupled code
- Find opportunities for refactoring
🔍 Code Discovery
- Locate related functionality visually
- Find all code touching a feature
- Discover cross-cutting concerns
🐛 Search Debugging
- Understand why results were retrieved
- See semantic relationships
- Refine queries based on visualization
👥 Team Onboarding
- Export HTML for new developers
- Visual guide to codebase structure
- Interactive exploration tool
✅ Refactoring Validation
- Visualize before/after refactoring
- Verify improved code organization
- Track architecture evolution

Performance

Collection Size	Processing Time	Recommended maxVectors
Small (<500 vectors)	~1s	500
Medium (500-2K)	~4s	1000
Large (2K-10K)	~15s	2000
Very Large (>10K)	~30s	3000

Tips:

Use 2D for faster processing (40% faster than 3D)
Limit maxVectors for large codebases
Export HTML for offline exploration

📖 Learn More

For detailed documentation including:

Complete tool reference
Interpretation guide
Technical details (UMAP, clustering)
Troubleshooting
Best practices
Advanced use cases

See: Vector Visualization Guide

🎯 Prompt Enhancement (Optional)

TL;DR: Prompt enhancement is a transparent background tool that automatically improves search quality. Just ask naturally - no need to mention "enhance" in your prompts.

Quick Overview

When enabled (PROMPT_ENHANCEMENT=true), the AI automatically:

Enhances your search query with codebase context
Searches with the improved query
Continues with your original request (implement, fix, explain, etc.)

Good Prompts ✅

✅ "Find authentication logic and add 2FA support"
✅ "Locate payment flow and fix the timeout issue"
✅ "Search for profile feature and add bio field"

Why these work: Clear goal (find + action) → AI knows what to do

Bad Prompts ❌

❌ "Enhance and search for authentication"
❌ "Use prompt enhancement to find profile"

Why these fail: No clear action → AI stops after search

Key Principle

Prompt enhancement is invisible infrastructure.

Just tell the AI what you want to accomplish. It will automatically use enhancement to improve search quality behind the scenes.

Think of it like autocomplete: You don't say "use autocomplete" - you just type and it helps automatically.

📖 Learn More

For detailed guide including:

Technical details and architecture
Configuration options
Real-world examples (TypeScript, Python, Dart, etc.)
Performance tips and optimization
Troubleshooting and FAQ
Advanced use cases

See: Prompt Enhancement Guide

🎛️ Configuration

Required Variables

{
  "env": {
    "REPO_PATH": "/Users/you/Projects/myapp",
    "GEMINI_API_KEY": "AIzaSyC...",
    "QDRANT_URL": "https://xxx.gcp.cloud.qdrant.io:6333",
    "QDRANT_API_KEY": "eyJhbGci..."
  }
}

Optional Variables

{
  "env": {
    "QDRANT_COLLECTION": "my_project",
    "WATCH_MODE": "true",
    "BATCH_SIZE": "50",
    "EMBEDDING_MODEL": "text-embedding-004",
    "PROMPT_ENHANCEMENT": "true",
    "ENABLE_INTERNAL_MEMORY": "true"
  }
}

Variable	Description	Default
`QDRANT_COLLECTION`	Collection name in Qdrant	`codebase`
`WATCH_MODE`	Auto-reindex on file changes	`true`
`BATCH_SIZE`	Embedding batch size	`50`
`EMBEDDING_MODEL`	Gemini embedding model	`text-embedding-004`
`PROMPT_ENHANCEMENT`	Enable AI query enhancement	`false`
`ENABLE_INTERNAL_MEMORY`	Use internal Qdrant memory (vs external MCP Memory)	`false`

Memory Options

You have 2 choices for memory:

Option 1: Internal Qdrant-based Memory (Recommended)

{
  "env": {
    "ENABLE_INTERNAL_MEMORY": "true"
  }
}

✅ Fast semantic search (50-150ms)
✅ Auto-bootstrap from codebase
✅ 5 MCP Tools for complete management
✅ Health monitoring and orphan cleanup
✅ 2.8-6.0x faster batch operations

Setup:

# 1. Enable in config
ENABLE_INTERNAL_MEMORY=true

# 2. Bootstrap via AI chat
"Bootstrap memory for this codebase"

Option 2: External MCP Memory Server (Advanced)

{
  "env": {
    "ENABLE_INTERNAL_MEMORY": "false"
  }
}

✅ Graph-based relations
✅ Custom storage (SQLite, Neo4j, etc.)
✅ MCP protocol standard
⚠️ User manages (must provide own MCP Memory Server)

Examples:

@modelcontextprotocol/server-memory
Your custom graph database

When to use each:

Use Case	Internal Memory	External Memory
Large codebase (500+ files)	✅ Best	❌ Too slow
Semantic code search	✅ Perfect	⚠️ Limited
Complex relations	⚠️ Basic	✅ Excellent
Easy setup	✅ One command	❌ Manual
Custom logic	❌ Fixed	✅ Flexible


**📖 Full configuration guide:** [Setup Guide](./docs/SETUP.md)

---

## 🌍 Supported Languages

Python • TypeScript • JavaScript • Dart • Go • Rust • Java • Kotlin • Swift • Ruby • PHP • C • C++ • C# • Shell • SQL • HTML • CSS

---

## 📊 Performance

### Codebase Search & Indexing

| Metric | Value |
|--------|-------|
| **Indexing Speed** | ~25 files/min |
| **Search Latency** | <100ms |
| **Incremental Savings** | 90%+ time reduction |
| **Parallel Processing** | 25 chunks/sec |

### Memory System (v3.2)

| Metric | Value | Notes |
|--------|-------|-------|
| **Memory Search Speed** | 50-150ms | Qdrant vector search |
| **Memory Search Accuracy** | 88% | Semantic similarity |
| **Bootstrap Speed** | 3-5 min | For 500-file project |
| **Batch Store Speed** | 2.8-6.0x faster | Parallel processing |
| **Health Check** | <200ms | Entity + orphan check |

**v3.2 Features:**
- ✅ 5 MCP tools: `bootstrap_memory`, `search_memory`, `open_memory_ui`, `close_memory_ui`, `check_memory_sync`
- ✅ Entity validation prevents corruption
- ✅ Auto orphan cleanup
- ✅ Health monitoring every 5 minutes
- ✅ 2.8-6.0x faster batch operations

**📖 Memory docs:** [Memory User Guide](./docs/memory/MEMORY_USER_GUIDE.md) | [Memory Quick Reference](./docs/memory/MEMORY_QUICK_REFERENCE.md)

---

## 🐛 Troubleshooting

### Server not appearing?
1. Check Copilot Chat → Settings → MCP Servers → Show Output
2. Verify all 4 env variables are set
3. Ensure `REPO_PATH` is absolute path

### Can't connect to Qdrant?
```bash
curl -H "api-key: YOUR_KEY" \
  https://YOUR_CLUSTER.gcp.cloud.qdrant.io:6333/collections

Indexing too slow?

Large repos take 5-10 minutes initially
Subsequent runs only index changed files (90%+ faster)

📖 More troubleshooting: Main Documentation

📁 Project Structure

mcp-codebase-index/
├── docs/                    # All documentation
│   ├── README.md           # Main documentation
│   ├── SETUP.md            # Setup guide
│   ├── CHANGELOG.md        # Version history
│   ├── NAVIGATION.md       # Navigation guide
│   ├── PHASE_1_SUMMARY.md  # Memory Vector Store (v3.1)
│   ├── PHASE_2_SUMMARY.md  # Memory Sync System (v3.1)
│   ├── guides/             # Detailed guides
│   └── planning/           # Development planning
│
├── src/                     # Source code
│   ├── core/               # Core business logic
│   ├── storage/            # Data persistence
│   ├── memory/             # Memory system (v3.2)
│   │   ├── vector-store.ts    # Memory vector storage
│   │   ├── types.ts           # Memory type definitions
│   │   └── sync/              # Health monitoring
│   ├── enhancement/        # Prompt enhancement
│   ├── visualization/      # Vector visualization
│   ├── bootstrap/          # Smart Bootstrap
│   │   ├── orchestrator.ts    # Main orchestrator
│   │   ├── ast-parser.ts      # Code structure extraction
│   │   ├── index-analyzer.ts  # Pattern detection
│   │   └── gemini-analyzer.ts # Semantic analysis
│   ├── mcp/                # MCP server
│   │   ├── server.ts          # Server orchestration
│   │   └── handlers/          # 5 memory + other handlers
│   ├── types/              # Type definitions
│   └── index.ts            # Entry point
│
├── test/                    # Tests
│   ├── memory-flow/           # Memory system tests
│   └── todo-tests/            # Feature implementation tests
│
├── config/                  # Configuration files
├── .data/                   # Runtime data (gitignored)
├── package.json
└── README.md               # This file

📖 Detailed structure: Project Structure | Source Code Structure

🔧 Development

Build

npm run build

Run Locally

npm run dev

Test

npm test

📖 Development guide: Source Code Structure

🤝 Contributing

Contributions welcome! Check out:

Improvement Plan - Roadmap
Issues - Detailed feature docs
Source Code - Code structure

📄 License

MIT © NgoTaiCo

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: ngotaico.flutter@gmail.com

⭐ If you find this useful, please star the repo!

@ngotaico/mcp-codebase-index

Package Exports

Readme

MCP Codebase Index Server

📚 Quick Navigation

🚀 Getting Started

💻 For Developers

🔧 Resources

✨ Features

🔍 Smart Code Search

🧠 Memory System

🎯 Advanced Capabilities

🚀 Quick Start

Prerequisites

Installation

📖 Usage

Search Your Codebase

Visualize Your Codebase

Memory Management (AI Chat + Web UI Only)

Check Indexing Status

📊 Vector Visualization

What is Vector Visualization?

Quick Start

Understanding the Visualization

Use Cases

Performance

📖 Learn More

🎯 Prompt Enhancement (Optional)

Quick Overview

Good Prompts ✅

Bad Prompts ❌

Key Principle

📖 Learn More

🎛️ Configuration

Required Variables

Optional Variables

Memory Options

Option 1: Internal Qdrant-based Memory (Recommended)

Option 2: External MCP Memory Server (Advanced)

Indexing too slow?

📁 Project Structure

🔧 Development

Build

Run Locally

Test

🤝 Contributing

📄 License

📞 Support