JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 1471
  • Score
    100M100P100Q100398F
  • License MIT

Semantic search + knowledge graph MCP server for any folder of markdown files

Package Exports

  • @theglitchking/semantic-pages
  • @theglitchking/semantic-pages/mcp

Readme

Semantic Pages

Semantic search + knowledge graph MCP server for any folder of markdown files.

npm version License: MIT

[!IMPORTANT] Semantic Pages runs a local embedding model (80MB) on first launch. This download happens once and is cached at `/.semantic-pages/models/`. No API key required. No data leaves your machine.


Summary

When you have markdown notes scattered across a project — a vault/, docs/, notes/, or wiki — your AI assistant can't search them by meaning, traverse their connections, or help you maintain them. Semantic Pages fixes this by indexing your markdown files into a vector database and knowledge graph, then exposing 21 MCP tools that let Claude (or any MCP-compatible client) search semantically, traverse wikilinks, manage frontmatter, and perform full CRUD operations. No Docker, no Python, no Obsidian required — just npx.


Operational Summary

The server indexes all .md files in a directory you point it at. Each file is parsed for YAML frontmatter, [[wikilinks]], #tags, and headings. The text content is split into ~512-token chunks and embedded locally using the nomic-embed-text-v1.5 model running via WebAssembly in Node.js. These embeddings are stored in an HNSW index for fast approximate nearest neighbor search. Simultaneously, a directed graph is built from wikilinks and shared tags using graphology.

When Claude calls search_semantic, the query is embedded and compared against all chunks via cosine similarity. When Claude calls search_graph, it does a breadth-first traversal from matching nodes. search_hybrid combines both — semantic results re-ranked by graph proximity. Beyond search, Claude can create, read, update, delete, and move notes, manage YAML frontmatter fields, add/remove/rename tags vault-wide, and query the knowledge graph for backlinks, forwardlinks, shortest paths, and connectivity statistics.

The index is stored in .semantic-pages-index/ alongside your notes (gitignore it). A file watcher detects changes and re-indexes incrementally. Everything runs locally over stdio — no network, no server, no background processes beyond the MCP connection itself.


Features

  • Semantic Search: Find notes by meaning, not just keywords, using local vector embeddings
  • Knowledge Graph: Traverse [[wikilinks]] and shared #tags as a directed graph
  • Hybrid Search: Combined vector + graph search with re-ranking
  • Full-Text Search: Keyword and regex search with path, tag, and case filters
  • Full CRUD: Create, read, update (overwrite/append/prepend/patch-by-heading), delete, and move notes
  • Frontmatter Management: Get and set YAML frontmatter fields atomically
  • Tag Management: Add, remove, list, and rename tags vault-wide (frontmatter + inline)
  • Graph Queries: Backlinks, forwardlinks, shortest path, graph statistics (orphans, density, most connected)
  • File Watcher: Incremental re-indexing on file changes with debounce
  • Local Embeddings: No API key, no network after first model download
  • Zero Dependencies Beyond Node: No Docker, no Python, no Obsidian, no GUI

Quick Start

1. Installation Methods

Method A: NPX (No installation needed)

This lets you run the server without installing it permanently.

Step 1: Open your terminal in your project folder

Step 2: Run:

npx semantic-pages --notes ./vault --stats

Step 3: The first time you run it, NPX downloads the package and the embedding model (~80MB). This takes 1-2 minutes.

Step 4: After that, it runs instantly.

Use this method when: You want to try it out, or you're adding it to a project's .mcp.json config.

This installs the tool on your computer so you can use it in any project.

Step 1: Open your terminal

Step 2: Type this command and press Enter:

npm install -g @theglitchking/semantic-pages

Step 3: Test that it worked:

semantic-pages --version

Step 4: You should see a version number. If you do, it's installed correctly!

Add to your project's .mcp.json so Claude has automatic access:

{
  "semantic-pages": {
    "command": "npx",
    "args": ["-y", "semantic-pages", "--notes", "./vault"]
  }
}

Point --notes at any folder of .md files: ./vault, ./docs, ./notes, or . for the whole repo.

What to expect: Next time you run claude in that project, Claude will have 21 new tools for searching, reading, writing, and traversing your notes.

Method D: Project Installation (For team projects)

This installs the tool only for one specific project.

Step 1: Open your terminal in your project folder

Step 2: Type this command:

npm install --save-dev @theglitchking/semantic-pages

Step 3: Add a script to your package.json file:

{
  "scripts": {
    "notes": "semantic-pages --notes ./vault",
    "notes:stats": "semantic-pages --notes ./vault --stats",
    "notes:reindex": "semantic-pages --notes ./vault --reindex"
  }
}

2. How to Use

CLI Commands

These commands run in your terminal and manage your notes index.

Command Description
semantic-pages --notes <path> Start MCP server (default mode)
semantic-pages --notes <path> --stats Show vault statistics and exit
semantic-pages --notes <path> --reindex Force full reindex and exit
semantic-pages --notes <path> --no-watch Start server without file watcher
semantic-pages tools List all 21 MCP tools with descriptions
semantic-pages tools <name> Show arguments and examples for a specific tool
semantic-pages --version Show version number
semantic-pages --help Show all options
Built-in Tool Help

Every MCP tool has built-in documentation accessible from the CLI:

# List all 21 tools organized by category
semantic-pages tools
Semantic Pages — 21 MCP Tools

  Search:
    search_semantic          Vector similarity search — find notes by meaning, not just keywords
    search_text              Full-text keyword or regex search with optional filters
    search_graph             Graph traversal — find notes connected to a concept via wikilinks and tags
    search_hybrid            Combined semantic + graph search — vector results re-ranked by graph proximity

  Read:
    read_note                Read the full content of a specific note by path
    read_multiple_notes      Batch read multiple notes in one call
    list_notes               List all indexed notes with metadata (title, tags, link count)
    ...
# Get detailed help for a specific tool — arguments, types, and examples
semantic-pages tools search_semantic
  search_semantic
  ───────────────
  Vector similarity search — find notes by meaning, not just keywords

  Arguments:
    { "query": "string", "limit?": 10 }

  Examples:
    { "query": "microservices architecture", "limit": 5 }
    { "query": "how to deploy to production" }
# More examples
semantic-pages tools update_note      # See all 4 editing modes
semantic-pages tools move_note        # See wikilink-aware rename
semantic-pages tools manage_tags      # See add/remove/list actions
semantic-pages tools rename_tag       # See vault-wide tag rename
Command Examples and Details

--stats - Check your vault

How to use it:

semantic-pages --notes ./vault --stats

When to use it: Quick check to see what's in your vault.

What to expect:

Notes: 47
Chunks: 312
Wikilinks: 89
Tags: 23 unique

--reindex - Rebuild the index

How to use it:

semantic-pages --notes ./vault --reindex

When to use it:

  • After bulk-adding or modifying notes outside of the MCP tools
  • If the index seems stale or corrupted
  • After changing the embedding model

What to expect: Full re-parse, re-embed, and re-index of all markdown files. Takes 10-60 seconds depending on vault size and whether the model is cached.


MCP Tools

When the server is running (via .mcp.json or CLI), Claude has access to these 21 tools:

Search Tools
Tool Description
search_semantic Vector similarity search — "find notes similar to this idea"
search_text Full-text keyword/regex search with path, tag, and case filters
search_graph Graph traversal — "find notes connected to this concept"
search_hybrid Combined — semantic results re-ranked by graph proximity

search_semantic - Find notes by meaning

When Claude uses it: When you ask things like "find notes about deployment strategies" or "what have I written about authentication?"

What to expect: Returns notes ranked by semantic similarity to your query, with relevance scores and text snippets. Works even if the exact words don't appear in the notes.

Example conversation:

You: What notes do I have about scaling microservices?
Claude: [calls search_semantic with query "scaling microservices"]
Claude: I found 4 relevant notes:
1. architecture/scaling-patterns.md (0.87 similarity) — discusses horizontal vs vertical scaling
2. devops/kubernetes-autoscaling.md (0.82 similarity) — HPA and VPA configuration
3. architecture/service-mesh.md (0.71 similarity) — mentions scaling in the context of Istio
4. meeting-notes/2024-03-15.md (0.65 similarity) — team discussion about scaling concerns

search_text - Find exact matches

When Claude uses it: When you need exact keyword or regex matches, not semantic similarity.

What to expect: Returns notes containing the exact pattern, with snippets showing context. Supports:

  • Case-sensitive/insensitive search
  • Regex patterns
  • Path glob filters (e.g., only search in notes/)
  • Tag filters (e.g., only search notes tagged #architecture)

search_graph - Traverse connections

When Claude uses it: When you want to explore how notes are connected — "what's related to this concept?"

What to expect: Starting from notes matching your concept, does a breadth-first traversal through wikilinks and shared tags, returning all connected notes within the specified depth.


search_hybrid - Best of both

When Claude uses it: When you want comprehensive results — semantic matches boosted by graph proximity.

What to expect: Semantic search results re-ranked so that notes which are also graph-connected score higher. Best for "find everything relevant to X."


Read Tools
Tool Description
read_note Read full content of a specific note
read_multiple_notes Batch read multiple notes in one call
list_notes List all indexed notes with metadata (title, tags, link count)

Write Tools
Tool Description
create_note Create a new markdown note with optional frontmatter
update_note Edit note content (overwrite, append, prepend, or patch by heading)
delete_note Delete a note (requires explicit confirmation)
move_note Move/rename a note — automatically updates wikilinks across the vault

update_note - Four editing modes

Modes:

  • overwrite — replace entire content
  • append — add to the end
  • prepend — add after frontmatter, before existing content
  • patch-by-heading — replace the content under a specific heading (preserves other sections)

Example:

You: Add a "Rollback" section to the deployment guide
Claude: [calls update_note with mode "patch-by-heading", heading "Rollback"]
Claude: Updated deployment-guide.md — added Rollback section with kubectl rollback instructions.

move_note - Smart rename

What makes it special: When you move user-service.md to auth-service.md, every [[user-service]] wikilink in every other note gets updated to [[auth-service]] automatically.


Metadata Tools
Tool Description
get_frontmatter Read parsed YAML frontmatter as JSON
update_frontmatter Set or delete frontmatter keys atomically (pass null to delete)
manage_tags Add, remove, or list tags on a note (frontmatter + inline)
rename_tag Rename a tag across all notes in the vault

rename_tag - Vault-wide tag rename

When Claude uses it: When you want to rename #architecture to #arch everywhere — in frontmatter tags: arrays and inline #tags across every file.

What to expect: Returns the count of files modified.


Graph Tools
Tool Description
backlinks All notes that link TO a given note via [[wikilinks]]
forwardlinks All notes linked FROM a given note
graph_path Shortest path between two notes in the knowledge graph
graph_statistics Most connected nodes, orphan count, graph density

graph_path - Find connections between notes

Example conversation:

You: How are the deployment guide and the user service connected?
Claude: [calls graph_path from "deployment-guide.md" to "user-service.md"]
Claude: Path: deployment-guide.md → microservices.md → user-service.md
The deployment guide links to the microservices overview, which links to the user service.

graph_statistics - Vault health overview

What to expect:

{
  "totalNodes": 47,
  "totalEdges": 89,
  "orphanCount": 3,
  "mostConnected": [
    { "path": "project-overview.md", "connections": 12 },
    { "path": "microservices.md", "connections": 9 }
  ],
  "density": 0.04
}

System Tools
Tool Description
get_stats Vault stats — total notes, chunks, embeddings, graph density, model info
reindex Force full reindex of the vault

Common Workflows

Quick Vault Check (10 seconds)

semantic-pages --notes ./vault --stats

Adding Semantic Pages to a Project (2 minutes)

# Step 1: Create .mcp.json in your project root
echo '{
  "semantic-pages": {
    "command": "npx",
    "args": ["-y", "semantic-pages", "--notes", "./notes"]
  }
}' > .mcp.json

# Step 2: Add index to .gitignore
echo ".semantic-pages-index/" >> .gitignore

# Step 3: Start Claude — it now has 21 note tools
claude

Asking Claude About Your Notes

You: What have I written about authentication?
Claude: [calls search_semantic] I found 3 notes about authentication...

You: What links to the API gateway doc?
Claude: [calls backlinks] 4 notes link to api-gateway.md...

You: Create a new note summarizing today's meeting
Claude: [calls create_note] Created meeting-2024-03-15.md with frontmatter...

You: Rename the #backend tag to #server across all notes
Claude: [calls rename_tag] Renamed #backend to #server in 12 files.

Per-Repo Pattern

any-repo/
├── notes/                      # your markdown files
├── .mcp.json                   # point semantic-pages at ./notes
├── .semantic-pages-index/      # gitignored, auto-rebuilt
└── .gitignore                  # add .semantic-pages-index/

Each repo gets its own independent knowledge base. No shared state between projects.


Technical Details

Architecture Overview

Semantic Pages is built with TypeScript and organized into a core library with thin transport layers:

src/
├── core/                        # Pure library — no transport assumptions
│   ├── index.ts                # Core exports
│   ├── types.ts                # Shared type definitions
│   ├── indexer.ts              # Markdown parser (unified + remark)
│   ├── embedder.ts             # Local embedding model (@huggingface/transformers)
│   ├── graph.ts                # Knowledge graph (graphology)
│   ├── vector.ts               # HNSW vector index (hnswlib-node)
│   ├── search-text.ts          # Full-text / regex search
│   ├── crud.ts                 # Create/update/delete/move notes
│   ├── frontmatter.ts          # Frontmatter + tag management
│   └── watcher.ts              # File watcher (chokidar)
│
├── mcp/                         # MCP stdio server (thin wrapper over core)
│   └── server.ts               # Server setup + 21 tool definitions
│
└── cli/                         # CLI entrypoint
    └── index.ts                # commander-based CLI

Tech Stack

Concern Package Why
Markdown parsing unified + remark-parse AST-based, handles wikilinks
Frontmatter gray-matter YAML/TOML frontmatter extraction
Wikilinks remark-wiki-link [[note-name]] extraction from AST
Embeddings @huggingface/transformers WASM runtime, no Python, no API key
Embedding model nomic-embed-text-v1.5 High quality, ~80MB, runs locally
Vector index hnswlib-node HNSW algorithm, same as production vector DBs
Knowledge graph graphology Directed graph, serializable, rich algorithms
Graph algorithms graphology-traversal + graphology-shortest-path BFS, shortest path
File watching chokidar Cross-platform, debounced
MCP server @modelcontextprotocol/sdk Official MCP TypeScript SDK
CLI commander Standard Node.js CLI framework

Index Layout

.semantic-pages-index/           # gitignored, rebuilt on demand
├── embeddings.json              # serialized chunk vectors
├── hnsw.bin                     # HNSW vector index
├── hnsw-meta.json               # chunk → document mapping
├── graph.json                   # knowledge graph (graphology format)
└── meta.json                    # index metadata (vault path, model, timestamp)

Document Processing Pipeline

Step 1: Parse

.md file → gray-matter (frontmatter) → remark (AST) → extract:
  - title (frontmatter > first heading > filename)
  - wikilinks ([[note-name]])
  - tags (frontmatter tags: + inline #tags)
  - headers (H1-H6)
  - plain text (markdown stripped)

Step 2: Chunk

Plain text → split at sentence boundaries → ~512 token chunks

Step 3: Embed

Each chunk → nomic-embed-text-v1.5 (WASM) → normalized Float32Array

Step 4: Index

Embeddings → HNSW index (hnswlib-node)
Wikilinks + tags → directed graph (graphology)

Step 5: Serve

MCP tools → query embeddings / graph / files → return results

Using as a Library

The core library is importable independently of the MCP server:

import { Indexer, Embedder, GraphBuilder, VectorIndex, TextSearch } from "@theglitchking/semantic-pages";

// Index all notes
const indexer = new Indexer("./vault");
const docs = await indexer.indexAll();

// Build embeddings
const embedder = new Embedder();
await embedder.init();
const chunks = docs.flatMap(d => d.chunks);
const vecs = await embedder.embedBatch(chunks);

// Build vector index
const vectorIndex = new VectorIndex(embedder.getDimensions());
vectorIndex.build(vecs, chunks.map((text, i) => ({
  docPath: docs[Math.floor(i / docs.length)].path,
  chunkIndex: i,
  text
})));

// Search
const queryVec = await embedder.embed("microservices architecture");
const results = vectorIndex.search(queryVec, 5);

// Build knowledge graph
const graph = new GraphBuilder();
graph.buildFromDocuments(docs);
const backlinks = graph.backlinks("project-overview.md");
const path = graph.findPath("overview.md", "auth.md");

Performance

Metric Value
Index 100 notes ~5 seconds
Index 1,000 notes ~30 seconds
Semantic search latency <100ms
Text search latency <10ms
Graph traversal latency <5ms
Model download (first run) 80MB, cached at `/.semantic-pages/models/`
Index size (100 notes) ~10MB
npm package size 85.7 kB

Requirements

  • Node.js: Version 18.0.0 or higher
  • Operating System: Linux, macOS, or Windows (with WSL2)
  • Disk Space: ~80MB for the embedding model (downloaded once)

Troubleshooting

Installation Issues

Problem: npx semantic-pages fails or shows "not found"

Solution:

# Clear npx cache and retry
npx --yes semantic-pages --notes ./vault --stats

# Or install globally
npm install -g @theglitchking/semantic-pages

Problem: Model download fails

Solution:

# Check internet connection, then retry
# The model is cached at ~/.semantic-pages/models/
# Delete and re-download if corrupted:
rm -rf ~/.semantic-pages/models/
semantic-pages --notes ./vault --reindex

Usage Issues

Problem: Search returns no results

Solution:

# Force reindex
semantic-pages --notes ./vault --reindex

# Check that .md files exist in the path
ls ./vault/*.md

Problem: Index seems stale after editing files externally

Solution: The file watcher should catch changes, but if it misses some:

# Force reindex
semantic-pages --notes ./vault --reindex

Problem: hnswlib-node fails to install (native addon)

Solution:

# Install build tools
# On Ubuntu/Debian:
sudo apt install build-essential python3

# On macOS:
xcode-select --install

# Then retry
npm install -g @theglitchking/semantic-pages

Contributing

Contributions are welcome! The project uses:

  • TypeScript with strict mode
  • tsup for bundling (ESM)
  • vitest for testing (123 tests across 11 suites)
# Clone and install
git clone https://github.com/TheGlitchKing/semantic-pages.git
cd semantic-pages
npm install

# Run tests
npm test

# Build
npm run build

# Type check
npm run lint

License

MIT License - see LICENSE file for details.


Support


Made with care by TheGlitchKing

NPM | GitHub | Issues