JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 27480
  • Score
    100M100P100Q155891F
  • License MIT

Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@tobilu/qmd) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    QMD - Query Markup Documents

    An on-device search engine for everything you need to remember. Index your markdown notes, meeting transcripts, documentation, and knowledge bases. Search with keywords or natural language. Ideal for your agentic flows.

    QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking—all running locally via node-llama-cpp with GGUF models.

    QMD Architecture

    You can read more about QMD's progress in the CHANGELOG.

    Quick Start

    # Install globally (Node or Bun)
    npm install -g @tobilu/qmd
    # or
    bun install -g @tobilu/qmd
    
    # Or run directly
    npx @tobilu/qmd ...
    bunx @tobilu/qmd ...
    
    # Create collections for your notes, docs, and meeting transcripts
    qmd collection add ~/notes --name notes
    qmd collection add ~/Documents/meetings --name meetings
    qmd collection add ~/work/docs --name docs
    
    # Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
    qmd context add qmd://notes "Personal notes and ideas"
    qmd context add qmd://meetings "Meeting transcripts and notes"
    qmd context add qmd://docs "Work documentation"
    
    # Generate embeddings for semantic search
    qmd embed
    
    # Search across everything
    qmd search "project timeline"           # Fast keyword search
    qmd vsearch "how to deploy"             # Semantic search
    qmd query "quarterly planning process"  # Hybrid + reranking (best quality)
    
    # Get a specific document
    qmd get "meetings/2024-01-15.md"
    
    # Get a document by docid (shown in search results)
    qmd get "#abc123"
    
    # Get multiple documents by glob pattern
    qmd multi-get "journals/2025-05*.md"
    
    # Search within a specific collection
    qmd search "API" -c notes
    
    # Export all matches for an agent
    qmd search "API" --all --files --min-score 0.3

    Using with AI Agents

    QMD's --json and --files output formats are designed for agentic workflows:

    # Get structured results for an LLM
    qmd search "authentication" --json -n 10
    
    # List all relevant files above a threshold
    qmd query "error handling" --all --files --min-score 0.4
    
    # Retrieve full document content
    qmd get "docs/api-reference.md" --full

    MCP Server

    Although the tool works perfectly fine when you just tell your agent to use it on the command line, it also exposes an MCP (Model Context Protocol) server for tighter integration.

    Tools exposed:

    • qmd_search - Fast BM25 keyword search (supports collection filter)
    • qmd_vector_search - Semantic vector search (supports collection filter)
    • qmd_deep_search - Deep search with query expansion and reranking (supports collection filter)
    • qmd_get - Retrieve document by path or docid (with fuzzy matching suggestions)
    • qmd_multi_get - Retrieve multiple documents by glob pattern, list, or docids
    • qmd_status - Index health and collection info

    Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json):

    {
      "mcpServers": {
        "qmd": {
          "command": "qmd",
          "args": ["mcp"]
        }
      }
    }

    Claude Code — Install the plugin (recommended):

    claude marketplace add tobi/qmd
    claude plugin add qmd@qmd

    Or configure MCP manually in ~/.claude/settings.json:

    {
      "mcpServers": {
        "qmd": {
          "command": "qmd",
          "args": ["mcp"]
        }
      }
    }

    HTTP Transport

    By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:

    # Foreground (Ctrl-C to stop)
    qmd mcp --http                    # localhost:8181
    qmd mcp --http --port 8080        # custom port
    
    # Background daemon
    qmd mcp --http --daemon           # start, writes PID to ~/.cache/qmd/mcp.pid
    qmd mcp stop                      # stop via PID file
    qmd status                        # shows "MCP: running (PID ...)" when active

    The HTTP server exposes two endpoints:

    • POST /mcp — MCP Streamable HTTP (JSON responses, stateless)
    • GET /health — liveness check with uptime

    LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are disposed after 5 min idle and transparently recreated on the next request (~1s penalty, models remain loaded).

    Point any MCP client at http://localhost:8181/mcp to connect.

    Architecture

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                         QMD Hybrid Search Pipeline                          │
    └─────────────────────────────────────────────────────────────────────────────┘
    
                                  ┌─────────────────┐
                                  │   User Query    │
                                  └────────┬────────┘
                                           │
                            ┌──────────────┴──────────────┐
                            ▼                             ▼
                   ┌────────────────┐            ┌────────────────┐
                   │ Query Expansion│            │  Original Query│
                   │  (fine-tuned)  │            │   (×2 weight)  │
                   └───────┬────────┘            └───────┬────────┘
                           │                             │
                           │ 2 alternative queries       │
                           └──────────────┬──────────────┘
                                          │
                  ┌───────────────────────┼───────────────────────┐
                  ▼                       ▼                       ▼
         ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
         │ Original Query  │     │ Expanded Query 1│     │ Expanded Query 2│
         └────────┬────────┘     └────────┬────────┘     └────────┬────────┘
                  │                       │                       │
          ┌───────┴───────┐       ┌───────┴───────┐       ┌───────┴───────┐
          ▼               ▼       ▼               ▼       ▼               ▼
      ┌───────┐       ┌───────┐ ┌───────┐     ┌───────┐ ┌───────┐     ┌───────┐
      │ BM25  │       │Vector │ │ BM25  │     │Vector │ │ BM25  │     │Vector │
      │(FTS5) │       │Search │ │(FTS5) │     │Search │ │(FTS5) │     │Search │
      └───┬───┘       └───┬───┘ └───┬───┘     └───┬───┘ └───┬───┘     └───┬───┘
          │               │         │             │         │             │
          └───────┬───────┘         └──────┬──────┘         └──────┬──────┘
                  │                        │                       │
                  └────────────────────────┼───────────────────────┘
                                           │
                                           ▼
                              ┌───────────────────────┐
                              │   RRF Fusion + Bonus  │
                              │  Original query: ×2   │
                              │  Top-rank bonus: +0.05│
                              │     Top 30 Kept       │
                              └───────────┬───────────┘
                                          │
                                          ▼
                              ┌───────────────────────┐
                              │    LLM Re-ranking     │
                              │  (qwen3-reranker)     │
                              │  Yes/No + logprobs    │
                              └───────────┬───────────┘
                                          │
                                          ▼
                              ┌───────────────────────┐
                              │  Position-Aware Blend │
                              │  Top 1-3:  75% RRF    │
                              │  Top 4-10: 60% RRF    │
                              │  Top 11+:  40% RRF    │
                              └───────────────────────┘

    Score Normalization & Fusion

    Search Backends

    Backend Raw Score Conversion Range
    FTS (BM25) SQLite FTS5 BM25 Math.abs(score) 0 to ~25+
    Vector Cosine distance 1 / (1 + distance) 0.0 to 1.0
    Reranker LLM 0-10 rating score / 10 0.0 to 1.0

    Fusion Strategy

    The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:

    1. Query Expansion: Original query (×2 for weighting) + 1 LLM variation
    2. Parallel Retrieval: Each query searches both FTS and vector indexes
    3. RRF Fusion: Combine all result lists using score = Σ(1/(k+rank+1)) where k=60
    4. Top-Rank Bonus: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
    5. Top-K Selection: Take top 30 candidates for reranking
    6. Re-ranking: LLM scores each document (yes/no with logprobs confidence)
    7. Position-Aware Blending:
      • RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
      • RRF rank 4-10: 60% retrieval, 40% reranker
      • RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)

    Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.

    Score Interpretation

    Score Meaning
    0.8 - 1.0 Highly relevant
    0.5 - 0.8 Moderately relevant
    0.2 - 0.5 Somewhat relevant
    0.0 - 0.2 Low relevance

    Requirements

    System Requirements

    • Node.js >= 22
    • Bun >= 1.0.0
    • macOS: Homebrew SQLite (for extension support)
      brew install sqlite

    GGUF Models (via node-llama-cpp)

    QMD uses three local GGUF models (auto-downloaded on first use):

    Model Purpose Size
    embeddinggemma-300M-Q8_0 Vector embeddings ~300MB
    qwen3-reranker-0.6b-q8_0 Re-ranking ~640MB
    qmd-query-expansion-1.7B-q4_k_m Query expansion (fine-tuned) ~1.1GB

    Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.

    Installation

    npm install -g @tobilu/qmd
    # or
    bun install -g @tobilu/qmd

    Development

    git clone https://github.com/tobi/qmd
    cd qmd
    npm install
    npm link

    Usage

    Collection Management

    # Create a collection from current directory
    qmd collection add . --name myproject
    
    # Create a collection with explicit path and custom glob mask
    qmd collection add ~/Documents/notes --name notes --mask "**/*.md"
    
    # List all collections
    qmd collection list
    
    # Remove a collection
    qmd collection remove myproject
    
    # Rename a collection
    qmd collection rename myproject my-project
    
    # List files in a collection
    qmd ls notes
    qmd ls notes/subfolder

    Generate Vector Embeddings

    # Embed all indexed documents (900 tokens/chunk, 15% overlap)
    qmd embed
    
    # Force re-embed everything
    qmd embed -f

    Context Management

    Context adds descriptive metadata to collections and paths, helping search understand your content.

    # Add context to a collection (using qmd:// virtual paths)
    qmd context add qmd://notes "Personal notes and ideas"
    qmd context add qmd://docs/api "API documentation"
    
    # Add context from within a collection directory
    cd ~/notes && qmd context add "Personal notes and ideas"
    cd ~/notes/work && qmd context add "Work-related notes"
    
    # Add global context (applies to all collections)
    qmd context add / "Knowledge base for my projects"
    
    # List all contexts
    qmd context list
    
    # Remove context
    qmd context rm qmd://notes/old

    Search Commands

    ┌──────────────────────────────────────────────────────────────────┐
    │                        Search Modes                              │
    ├──────────┬───────────────────────────────────────────────────────┤
    │ search   │ BM25 full-text search only                           │
    │ vsearch  │ Vector semantic search only                          │
    │ query    │ Hybrid: FTS + Vector + Query Expansion + Re-ranking  │
    └──────────┴───────────────────────────────────────────────────────┘
    # Full-text search (fast, keyword-based)
    qmd search "authentication flow"
    
    # Vector search (semantic similarity)
    qmd vsearch "how to login"
    
    # Hybrid search with re-ranking (best quality)
    qmd query "user authentication"

    Options

    # Search options
    -n <num>           # Number of results (default: 5, or 20 for --files/--json)
    -c, --collection   # Restrict search to a specific collection
    --all              # Return all matches (use with --min-score to filter)
    --min-score <num>  # Minimum score threshold (default: 0)
    --full             # Show full document content
    --line-numbers     # Add line numbers to output
    --index <name>     # Use named index
    
    # Output formats (for search and multi-get)
    --files            # Output: docid,score,filepath,context
    --json             # JSON output with snippets
    --csv              # CSV output
    --md               # Markdown output
    --xml              # XML output
    
    # Get options
    qmd get <file>[:line]  # Get document, optionally starting at line
    -l <num>               # Maximum lines to return
    --from <num>           # Start from line number
    
    # Multi-get options
    -l <num>           # Maximum lines per file
    --max-bytes <num>  # Skip files larger than N bytes (default: 10KB)

    Output Format

    Default output is colorized CLI format (respects NO_COLOR env):

    docs/guide.md:42 #a1b2c3
    Title: Software Craftsmanship
    Context: Work documentation
    Score: 93%
    
    This section covers the **craftsmanship** of building
    quality software with attention to detail.
    See also: engineering principles
    
    
    notes/meeting.md:15 #d4e5f6
    Title: Q4 Planning
    Context: Personal notes and ideas
    Score: 67%
    
    Discussion about code quality and craftsmanship
    in the development process.
    • Path: Collection-relative path (e.g., docs/guide.md)
    • Docid: Short hash identifier (e.g., #a1b2c3) - use with qmd get #a1b2c3
    • Title: Extracted from document (first heading or filename)
    • Context: Path context if configured via qmd context add
    • Score: Color-coded (green >70%, yellow >40%, dim otherwise)
    • Snippet: Context around match with query terms highlighted

    Examples

    # Get 10 results with minimum score 0.3
    qmd query -n 10 --min-score 0.3 "API design patterns"
    
    # Output as markdown for LLM context
    qmd search --md --full "error handling"
    
    # JSON output for scripting
    qmd query --json "quarterly reports"
    
    # Use separate index for different knowledge base
    qmd --index work search "quarterly reports"

    Index Maintenance

    # Show index status and collections with contexts
    qmd status
    
    # Re-index all collections
    qmd update
    
    # Re-index with git pull first (for remote repos)
    qmd update --pull
    
    # Get document by filepath (with fuzzy matching suggestions)
    qmd get notes/meeting.md
    
    # Get document by docid (from search results)
    qmd get "#abc123"
    
    # Get document starting at line 50, max 100 lines
    qmd get notes/meeting.md:50 -l 100
    
    # Get multiple documents by glob pattern
    qmd multi-get "journals/2025-05*.md"
    
    # Get multiple documents by comma-separated list (supports docids)
    qmd multi-get "doc1.md, doc2.md, #abc123"
    
    # Limit multi-get to files under 20KB
    qmd multi-get "docs/*.md" --max-bytes 20480
    
    # Output multi-get as JSON for agent processing
    qmd multi-get "docs/*.md" --json
    
    # Clean up cache and orphaned data
    qmd cleanup

    Data Storage

    Index stored in: ~/.cache/qmd/index.sqlite

    Schema

    collections     -- Indexed directories with name and glob patterns
    path_contexts   -- Context descriptions by virtual path (qmd://...)
    documents       -- Markdown content with metadata and docid (6-char hash)
    documents_fts   -- FTS5 full-text index
    content_vectors -- Embedding chunks (hash, seq, pos, 900 tokens each)
    vectors_vec     -- sqlite-vec vector index (hash_seq key)
    llm_cache       -- Cached LLM responses (query expansion, rerank scores)

    Environment Variables

    Variable Default Description
    XDG_CACHE_HOME ~/.cache Cache directory location

    How It Works

    Indexing Flow

    Collection ──► Glob Pattern ──► Markdown Files ──► Parse Title ──► Hash Content
        │                                                   │              │
        │                                                   │              ▼
        │                                                   │         Generate docid
        │                                                   │         (6-char hash)
        │                                                   │              │
        └──────────────────────────────────────────────────►└──► Store in SQLite
                                                                           │
                                                                           ▼
                                                                      FTS5 Index

    Embedding Flow

    Documents are chunked into ~900-token pieces with 15% overlap using smart boundary detection:

    Document ──► Smart Chunk (~900 tokens) ──► Format each chunk ──► node-llama-cpp ──► Store Vectors
                    │                           "title | text"        embedBatch()
                    │
                    └─► Chunks stored with:
                        - hash: document hash
                        - seq: chunk sequence (0, 1, 2...)
                        - pos: character position in original

    Smart Chunking

    Instead of cutting at hard token boundaries, QMD uses a scoring algorithm to find natural markdown break points. This keeps semantic units (sections, paragraphs, code blocks) together.

    Break Point Scores:

    Pattern Score Description
    # Heading 100 H1 - major section
    ## Heading 90 H2 - subsection
    ### Heading 80 H3
    #### Heading 70 H4
    ##### Heading 60 H5
    ###### Heading 50 H6
    ``` 80 Code block boundary
    --- / *** 60 Horizontal rule
    Blank line 20 Paragraph boundary
    - item / 1. item 5 List item
    Line break 1 Minimal break

    Algorithm:

    1. Scan document for all break points with scores
    2. When approaching the 900-token target, search a 200-token window before the cutoff
    3. Score each break point: finalScore = baseScore × (1 - (distance/window)² × 0.7)
    4. Cut at the highest-scoring break point

    The squared distance decay means a heading 200 tokens back (score ~30) still beats a simple line break at the target (score 1), but a closer heading wins over a distant one.

    Code Fence Protection: Break points inside code blocks are ignored—code stays together. If a code block exceeds the chunk size, it's kept whole when possible.

    Query Flow (Hybrid)

    Query ──► LLM Expansion ──► [Original, Variant 1, Variant 2]
                    │
          ┌─────────┴─────────┐
          ▼                   ▼
       For each query:     FTS (BM25)
          │                   │
          ▼                   ▼
       Vector Search      Ranked List
          │
          ▼
       Ranked List
          │
          └─────────┬─────────┘
                    ▼
             RRF Fusion (k=60)
             Original query ×2 weight
             Top-rank bonus: +0.05/#1, +0.02/#2-3
                    │
                    ▼
             Top 30 candidates
                    │
                    ▼
             LLM Re-ranking
             (yes/no + logprob confidence)
                    │
                    ▼
             Position-Aware Blend
             Rank 1-3:  75% RRF / 25% reranker
             Rank 4-10: 60% RRF / 40% reranker
             Rank 11+:  40% RRF / 60% reranker
                    │
                    ▼
             Final Results

    Model Configuration

    Models are configured in src/llm.ts as HuggingFace URIs:

    const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
    const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
    const DEFAULT_GENERATE_MODEL = "hf:tobil/qmd-query-expansion-1.7B-gguf/qmd-query-expansion-1.7B-q4_k_m.gguf";

    EmbeddingGemma Prompt Format

    // For queries
    "task: search result | query: {query}"
    
    // For documents
    "title: {title} | text: {content}"

    Qwen3-Reranker

    Uses node-llama-cpp's createRankingContext() and rankAndSort() API for cross-encoder reranking. Returns documents sorted by relevance score (0.0 - 1.0).

    Qwen3 (Query Expansion)

    Used for generating query variations via LlamaChatSession.

    License

    MIT