Package Exports

@theglitchking/semantic-pages
@theglitchking/semantic-pages/mcp

Readme

Semantic Pages

Semantic search + knowledge graph MCP server for any folder of markdown files.

[!IMPORTANT] Semantic Pages runs a local embedding model (~~80MB) on first launch. This download happens once and is cached at `~~/.semantic-pages/models/`. No API key required. No data leaves your machine.

Summary

When you have markdown notes scattered across a project — a vault/, docs/, notes/, or wiki — your AI assistant can't search them by meaning, traverse their connections, or help you maintain them. Semantic Pages fixes this by indexing your markdown files into a vector database and knowledge graph, then exposing 21 MCP tools that let Claude (or any MCP-compatible client) search semantically, traverse wikilinks, manage frontmatter, and perform full CRUD operations. No Docker, no Python, no Obsidian required — just npx.

Operational Summary

The server indexes all .md files in a directory you point it at. Each file is parsed for YAML frontmatter, [[wikilinks]], #tags, and headings. The text content is split into ~512-token chunks and embedded locally using the nomic-embed-text-v1.5 model running via WebAssembly in Node.js. These embeddings are stored in an HNSW index for fast approximate nearest neighbor search. Simultaneously, a directed graph is built from wikilinks and shared tags using graphology.

When Claude calls search_semantic, the query is embedded and compared against all chunks via cosine similarity. When Claude calls search_graph, it does a breadth-first traversal from matching nodes. search_hybrid combines both — semantic results re-ranked by graph proximity. Beyond search, Claude can create, read, update, delete, and move notes, manage YAML frontmatter fields, add/remove/rename tags vault-wide, and query the knowledge graph for backlinks, forwardlinks, shortest paths, and connectivity statistics.

The index is stored in .semantic-pages-index/ alongside your notes (gitignore it). A file watcher detects changes and re-indexes incrementally. Everything runs locally over stdio — no network, no server, no background processes beyond the MCP connection itself.

Features

Semantic Search: Find notes by meaning, not just keywords, using local vector embeddings
Knowledge Graph: Traverse [[wikilinks]] and shared #tags as a directed graph
Hybrid Search: Combined vector + graph search with re-ranking
Full-Text Search: Keyword and regex search with path, tag, and case filters
Full CRUD: Create, read, update (overwrite/append/prepend/patch-by-heading), delete, and move notes
Frontmatter Management: Get and set YAML frontmatter fields atomically
Tag Management: Add, remove, list, and rename tags vault-wide (frontmatter + inline)
Graph Queries: Backlinks, forwardlinks, shortest path, graph statistics (orphans, density, most connected)
File Watcher: Incremental re-indexing on file changes with debounce
Local Embeddings: No API key, no network after first model download
Zero Dependencies Beyond Node: No Docker, no Python, no Obsidian, no GUI

Quick Start

1. Installation Methods

Method A: NPX (No installation needed)

This lets you run the server without installing it permanently.

Step 1: Open your terminal in your project folder

Step 2: Run:

npx semantic-pages --notes ./vault --stats

Step 3: The first time you run it, NPX downloads the package and the embedding model (~80MB). This takes 1-2 minutes.

Step 4: After that, it runs instantly.

Use this method when: You want to try it out, or you're adding it to a project's .mcp.json config.

Method B: Global Installation (Recommended for regular use)

This installs the tool on your computer so you can use it in any project.

Step 1: Open your terminal

Step 2: Type this command and press Enter:

npm install -g @theglitchking/semantic-pages

Step 3: Test that it worked:

semantic-pages --version

Step 4: You should see a version number. If you do, it's installed correctly!

Method C: MCP Configuration (Recommended for Claude Code)

Add to your project's .mcp.json so Claude has automatic access:

{
  "semantic-pages": {
    "command": "npx",
    "args": ["-y", "semantic-pages", "--notes", "./vault"]
  }
}

Point --notes at any folder of .md files: ./vault, ./docs, ./notes, or . for the whole repo.

What to expect: Next time you run claude in that project, Claude will have 21 new tools for searching, reading, writing, and traversing your notes.

Method D: Project Installation (For team projects)

This installs the tool only for one specific project.

Step 1: Open your terminal in your project folder

Step 2: Type this command:

npm install --save-dev @theglitchking/semantic-pages

Step 3: Add a script to your package.json file:

{
  "scripts": {
    "notes": "semantic-pages --notes ./vault",
    "notes:stats": "semantic-pages --notes ./vault --stats",
    "notes:reindex": "semantic-pages --notes ./vault --reindex"
  }
}

2. How to Use

CLI Commands

These commands run in your terminal and manage your notes index.

Command	Description
`semantic-pages --notes <path>`	Start MCP server (default mode)
`semantic-pages --notes <path> --stats`	Show vault statistics and exit
`semantic-pages --notes <path> --reindex`	Force full reindex and exit
`semantic-pages --notes <path> --no-watch`	Start server without file watcher
`semantic-pages tools`	List all 21 MCP tools with descriptions
`semantic-pages tools <name>`	Show arguments and examples for a specific tool
`semantic-pages --version`	Show version number
`semantic-pages --help`	Show all options

Built-in Tool Help

Every MCP tool has built-in documentation accessible from the CLI:

# List all 21 tools organized by category
semantic-pages tools

Semantic Pages — 21 MCP Tools

  Search:
    search_semantic          Vector similarity search — find notes by meaning, not just keywords
    search_text              Full-text keyword or regex search with optional filters
    search_graph             Graph traversal — find notes connected to a concept via wikilinks and tags
    search_hybrid            Combined semantic + graph search — vector results re-ranked by graph proximity

  Read:
    read_note                Read the full content of a specific note by path
    read_multiple_notes      Batch read multiple notes in one call
    list_notes               List all indexed notes with metadata (title, tags, link count)
    ...

# Get detailed help for a specific tool — arguments, types, and examples
semantic-pages tools search_semantic

  search_semantic
  ───────────────
  Vector similarity search — find notes by meaning, not just keywords

  Arguments:
    { "query": "string", "limit?": 10 }

  Examples:
    { "query": "microservices architecture", "limit": 5 }
    { "query": "how to deploy to production" }

# More examples
semantic-pages tools update_note      # See all 4 editing modes
semantic-pages tools move_note        # See wikilink-aware rename
semantic-pages tools manage_tags      # See add/remove/list actions
semantic-pages tools rename_tag       # See vault-wide tag rename

Command Examples and Details

--stats - Check your vault

How to use it:

semantic-pages --notes ./vault --stats

When to use it: Quick check to see what's in your vault.

What to expect:

Notes: 47
Chunks: 312
Wikilinks: 89
Tags: 23 unique

--reindex - Rebuild the index

How to use it:

semantic-pages --notes ./vault --reindex

When to use it:

After bulk-adding or modifying notes outside of the MCP tools
If the index seems stale or corrupted
After changing the embedding model

What to expect: Full re-parse, re-embed, and re-index of all markdown files. Takes 10-60 seconds depending on vault size and whether the model is cached.

MCP Tools

When the server is running (via .mcp.json or CLI), Claude has access to these 21 tools:

Search Tools

Tool	Description
`search_semantic`	Vector similarity search — "find notes similar to this idea"
`search_text`	Full-text keyword/regex search with path, tag, and case filters
`search_graph`	Graph traversal — "find notes connected to this concept"
`search_hybrid`	Combined — semantic results re-ranked by graph proximity

search_semantic - Find notes by meaning

When Claude uses it: When you ask things like "find notes about deployment strategies" or "what have I written about authentication?"

What to expect: Returns notes ranked by semantic similarity to your query, with relevance scores and text snippets. Works even if the exact words don't appear in the notes.

Example conversation:

You: What notes do I have about scaling microservices?
Claude: [calls search_semantic with query "scaling microservices"]
Claude: I found 4 relevant notes:
1. architecture/scaling-patterns.md (0.87 similarity) — discusses horizontal vs vertical scaling
2. devops/kubernetes-autoscaling.md (0.82 similarity) — HPA and VPA configuration
3. architecture/service-mesh.md (0.71 similarity) — mentions scaling in the context of Istio
4. meeting-notes/2024-03-15.md (0.65 similarity) — team discussion about scaling concerns

search_text - Find exact matches

When Claude uses it: When you need exact keyword or regex matches, not semantic similarity.

What to expect: Returns notes containing the exact pattern, with snippets showing context. Supports:

Case-sensitive/insensitive search
Regex patterns
Path glob filters (e.g., only search in notes/)
Tag filters (e.g., only search notes tagged #architecture)

search_graph - Traverse connections

When Claude uses it: When you want to explore how notes are connected — "what's related to this concept?"

What to expect: Starting from notes matching your concept, does a breadth-first traversal through wikilinks and shared tags, returning all connected notes within the specified depth.

search_hybrid - Best of both

When Claude uses it: When you want comprehensive results — semantic matches boosted by graph proximity.

What to expect: Semantic search results re-ranked so that notes which are also graph-connected score higher. Best for "find everything relevant to X."

Read Tools

Tool	Description
`read_note`	Read full content of a specific note
`read_multiple_notes`	Batch read multiple notes in one call
`list_notes`	List all indexed notes with metadata (title, tags, link count)

Write Tools

Tool	Description
`create_note`	Create a new markdown note with optional frontmatter
`update_note`	Edit note content (overwrite, append, prepend, or patch by heading)
`delete_note`	Delete a note (requires explicit confirmation)
`move_note`	Move/rename a note — automatically updates wikilinks across the vault

update_note - Four editing modes

Modes:

overwrite — replace entire content
append — add to the end
prepend — add after frontmatter, before existing content
patch-by-heading — replace the content under a specific heading (preserves other sections)

Example:

You: Add a "Rollback" section to the deployment guide
Claude: [calls update_note with mode "patch-by-heading", heading "Rollback"]
Claude: Updated deployment-guide.md — added Rollback section with kubectl rollback instructions.

move_note - Smart rename

What makes it special: When you move user-service.md to auth-service.md, every [[user-service]] wikilink in every other note gets updated to [[auth-service]] automatically.

Metadata Tools

Tool	Description
`get_frontmatter`	Read parsed YAML frontmatter as JSON
`update_frontmatter`	Set or delete frontmatter keys atomically (pass `null` to delete)
`manage_tags`	Add, remove, or list tags on a note (frontmatter + inline)
`rename_tag`	Rename a tag across all notes in the vault

rename_tag - Vault-wide tag rename

When Claude uses it: When you want to rename #architecture to #arch everywhere — in frontmatter tags: arrays and inline #tags across every file.

What to expect: Returns the count of files modified.

Graph Tools

Tool	Description
`backlinks`	All notes that link TO a given note via `[[wikilinks]]`
`forwardlinks`	All notes linked FROM a given note
`graph_path`	Shortest path between two notes in the knowledge graph
`graph_statistics`	Most connected nodes, orphan count, graph density

graph_path - Find connections between notes

Example conversation:

You: How are the deployment guide and the user service connected?
Claude: [calls graph_path from "deployment-guide.md" to "user-service.md"]
Claude: Path: deployment-guide.md → microservices.md → user-service.md
The deployment guide links to the microservices overview, which links to the user service.

graph_statistics - Vault health overview

What to expect:

{
  "totalNodes": 47,
  "totalEdges": 89,
  "orphanCount": 3,
  "mostConnected": [
    { "path": "project-overview.md", "connections": 12 },
    { "path": "microservices.md", "connections": 9 }
  ],
  "density": 0.04
}

System Tools

Tool	Description
`get_stats`	Vault stats — total notes, chunks, embeddings, graph density, model info
`reindex`	Force full reindex of the vault

Common Workflows

Quick Vault Check (10 seconds)

semantic-pages --notes ./vault --stats

Adding Semantic Pages to a Project (2 minutes)

# Step 1: Create .mcp.json in your project root
echo '{
  "semantic-pages": {
    "command": "npx",
    "args": ["-y", "semantic-pages", "--notes", "./notes"]
  }
}' > .mcp.json

# Step 2: Add index to .gitignore
echo ".semantic-pages-index/" >> .gitignore

# Step 3: Start Claude — it now has 21 note tools
claude

Asking Claude About Your Notes

You: What have I written about authentication?
Claude: [calls search_semantic] I found 3 notes about authentication...

You: What links to the API gateway doc?
Claude: [calls backlinks] 4 notes link to api-gateway.md...

You: Create a new note summarizing today's meeting
Claude: [calls create_note] Created meeting-2024-03-15.md with frontmatter...

You: Rename the #backend tag to #server across all notes
Claude: [calls rename_tag] Renamed #backend to #server in 12 files.

Per-Repo Pattern

any-repo/
├── notes/                      # your markdown files
├── .mcp.json                   # point semantic-pages at ./notes
├── .semantic-pages-index/      # gitignored, auto-rebuilt
└── .gitignore                  # add .semantic-pages-index/

Each repo gets its own independent knowledge base. No shared state between projects.

Technical Details

Architecture Overview

Semantic Pages is built with TypeScript and organized into a core library with thin transport layers:

src/
├── core/                        # Pure library — no transport assumptions
│   ├── index.ts                # Core exports
│   ├── types.ts                # Shared type definitions
│   ├── indexer.ts              # Markdown parser (unified + remark)
│   ├── embedder.ts             # Local embedding model (@huggingface/transformers)
│   ├── graph.ts                # Knowledge graph (graphology)
│   ├── vector.ts               # HNSW vector index (hnswlib-node)
│   ├── search-text.ts          # Full-text / regex search
│   ├── crud.ts                 # Create/update/delete/move notes
│   ├── frontmatter.ts          # Frontmatter + tag management
│   └── watcher.ts              # File watcher (chokidar)
│
├── mcp/                         # MCP stdio server (thin wrapper over core)
│   └── server.ts               # Server setup + 21 tool definitions
│
└── cli/                         # CLI entrypoint
    └── index.ts                # commander-based CLI

Tech Stack

Concern	Package	Why
Markdown parsing	`unified` + `remark-parse`	AST-based, handles wikilinks
Frontmatter	`gray-matter`	YAML/TOML frontmatter extraction
Wikilinks	`remark-wiki-link`	`[[note-name]]` extraction from AST
Embeddings	`@huggingface/transformers`	WASM runtime, no Python, no API key
Embedding model	`nomic-embed-text-v1.5`	High quality, ~80MB, runs locally
Vector index	`hnswlib-node`	HNSW algorithm, same as production vector DBs
Knowledge graph	`graphology`	Directed graph, serializable, rich algorithms
Graph algorithms	`graphology-traversal` + `graphology-shortest-path`	BFS, shortest path
File watching	`chokidar`	Cross-platform, debounced
MCP server	`@modelcontextprotocol/sdk`	Official MCP TypeScript SDK
CLI	`commander`	Standard Node.js CLI framework

Index Layout

.semantic-pages-index/           # gitignored, rebuilt on demand
├── embeddings.json              # serialized chunk vectors
├── hnsw.bin                     # HNSW vector index
├── hnsw-meta.json               # chunk → document mapping
├── graph.json                   # knowledge graph (graphology format)
└── meta.json                    # index metadata (vault path, model, timestamp)

Document Processing Pipeline

Step 1: Parse

.md file → gray-matter (frontmatter) → remark (AST) → extract:
  - title (frontmatter > first heading > filename)
  - wikilinks ([[note-name]])
  - tags (frontmatter tags: + inline #tags)
  - headers (H1-H6)
  - plain text (markdown stripped)

Step 2: Chunk

Plain text → split at sentence boundaries → ~512 token chunks

Step 3: Embed

Each chunk → nomic-embed-text-v1.5 (WASM) → normalized Float32Array

Step 4: Index

Embeddings → HNSW index (hnswlib-node)
Wikilinks + tags → directed graph (graphology)

Step 5: Serve

MCP tools → query embeddings / graph / files → return results

Using as a Library

The core library is importable independently of the MCP server:

import { Indexer, Embedder, GraphBuilder, VectorIndex, TextSearch } from "@theglitchking/semantic-pages";

// Index all notes
const indexer = new Indexer("./vault");
const docs = await indexer.indexAll();

// Build embeddings
const embedder = new Embedder();
await embedder.init();
const chunks = docs.flatMap(d => d.chunks);
const vecs = await embedder.embedBatch(chunks);

// Build vector index
const vectorIndex = new VectorIndex(embedder.getDimensions());
vectorIndex.build(vecs, chunks.map((text, i) => ({
  docPath: docs[Math.floor(i / docs.length)].path,
  chunkIndex: i,
  text
})));

// Search
const queryVec = await embedder.embed("microservices architecture");
const results = vectorIndex.search(queryVec, 5);

// Build knowledge graph
const graph = new GraphBuilder();
graph.buildFromDocuments(docs);
const backlinks = graph.backlinks("project-overview.md");
const path = graph.findPath("overview.md", "auth.md");

Performance

Metric	Value
Index 100 notes	~5 seconds
Index 1,000 notes	~30 seconds
Semantic search latency	<100ms
Text search latency	<10ms
Graph traversal latency	<5ms
Model download (first run)	~~80MB, cached at `~~/.semantic-pages/models/`
Index size (100 notes)	~10MB
npm package size	85.7 kB

Requirements

Node.js: Version 18.0.0 or higher
Operating System: Linux, macOS, or Windows (with WSL2)
Disk Space: ~80MB for the embedding model (downloaded once)

Troubleshooting

Installation Issues

Problem: npx semantic-pages fails or shows "not found"

Solution:

# Clear npx cache and retry
npx --yes semantic-pages --notes ./vault --stats

# Or install globally
npm install -g @theglitchking/semantic-pages

Problem: Model download fails

Solution:

# Check internet connection, then retry
# The model is cached at ~/.semantic-pages/models/
# Delete and re-download if corrupted:
rm -rf ~/.semantic-pages/models/
semantic-pages --notes ./vault --reindex

Usage Issues

Problem: Search returns no results

Solution:

# Force reindex
semantic-pages --notes ./vault --reindex

# Check that .md files exist in the path
ls ./vault/*.md

Problem: Index seems stale after editing files externally

Solution: The file watcher should catch changes, but if it misses some:

# Force reindex
semantic-pages --notes ./vault --reindex

Problem: hnswlib-node fails to install (native addon)

Solution:

# Install build tools
# On Ubuntu/Debian:
sudo apt install build-essential python3

# On macOS:
xcode-select --install

# Then retry
npm install -g @theglitchking/semantic-pages

Contributing

Contributions are welcome! The project uses:

TypeScript with strict mode
tsup for bundling (ESM)
vitest for testing (123 tests across 11 suites)

# Clone and install
git clone https://github.com/TheGlitchKing/semantic-pages.git
cd semantic-pages
npm install

# Run tests
npm test

# Build
npm run build

# Type check
npm run lint

License

MIT License - see LICENSE file for details.

Support

GitHub Issues: Report bugs or request features
NPM Package: @theglitchking/semantic-pages
Marketplace: Glitch Kingdom of Plugins

Made with care by TheGlitchKing

NPM | GitHub | Issues