Package Exports
- @mdxdb/payload
- @mdxdb/payload/embedding/processor
- @mdxdb/payload/git/metadata
- @mdxdb/payload/node
- @mdxdb/payload/search/queries
- @mdxdb/payload/utilities/isFileCollection
Readme
@mdxdb/payload
A hybrid database adapter for Payload CMS that combines human-readable MDX files with powerful ClickHouse querying. Version your content with Git while enjoying full-text search, vector similarity, and SQL-level query performance.
Why mdxdb?
| Traditional DB | mdxdb |
|---|---|
| Opaque binary storage | Human-readable .mdx files |
| Vendor lock-in | Plain files, zero lock-in |
| Complex backup/restore | Git push/pull |
| Limited history | Full Git history with blame |
| Review in custom UI | Review in GitHub PRs |
Best of both worlds: Content lives as files you can read, edit, and version with Git. Queries run on ClickHouse for SQL-level performance with full-text and vector search.
Features
Storage & Versioning
- Human-readable MDX - Documents stored as
.mdxfiles with YAML frontmatter - Git-native - Full history, branching, blame, and PR workflows
- GitHub-friendly - Edit content directly on GitHub, review changes in PRs
Query Performance
- ClickHouse-powered - Automatic local server for fast SQL queries
- Full-text search - Inverted indexes with relevance scoring
- Vector similarity - HNSW indexes for semantic search
- Smart chunking - Markdown-aware content splitting for search
AI-Ready
- Embedding pipeline - Generate embeddings with Workers AI (
@cf/baai/bge-m3) - 1024-dimension vectors - State-of-the-art semantic search
- Background processing - Non-blocking embedding generation
Architecture
- Dual storage - Content collections: files + ClickHouse
- Database-only mode - Auth & internal collections skip files entirely
- Namespace support - Multi-tenant and cross-app search
- Git metadata - Track author, commit hash, and message per document
Installation
npm install @mdxdb/payload
# or
pnpm add @mdxdb/payload
# or
yarn add @mdxdb/payloadClickHouse is downloaded automatically on first run (50MB, cached in `/.mdxdb`).
Quick Start
// payload.config.ts
import { buildConfig } from 'payload'
import { mdxdbAdapter } from '@mdxdb/payload'
export default buildConfig({
db: mdxdbAdapter({
basePath: './content',
}),
collections: [
{
slug: 'posts',
fields: [
{ name: 'title', type: 'text', required: true },
{ name: 'status', type: 'select', options: ['draft', 'published'] },
{ name: 'content', type: 'richText' },
],
},
],
})This creates documents like:
content/
posts/
my-first-post.mdx
another-post.mdxConfiguration
mdxdbAdapter({
// Required: Base directory for content files
basePath: './content',
// Optional: ClickHouse data directory (default: ~/.mdxdb/data)
clickhousePath: '~/.mdxdb/data',
// Optional: ClickHouse server port (default: 9000)
clickhousePort: 9000,
// Optional: Database name (default: 'mdxdb')
database: 'mdxdb',
// Optional: Namespace for multi-tenancy (default: 'default')
ns: 'my-app',
// Optional: Table name prefix (default: 'mdxdb')
tablePrefix: 'mdxdb',
// Optional: Embedding configuration
embedding: {
// Workers AI account ID
accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
// Workers AI API token
apiToken: process.env.CLOUDFLARE_API_TOKEN,
// Model (default: '@cf/baai/bge-m3')
model: '@cf/baai/bge-m3',
// Batch size (default: 100)
batchSize: 100,
},
})Storage Model
Content Collections (Default)
Content collections use dual storage:
- MDX files - Source of truth, human-readable, Git-versioned
- ClickHouse - Query index for fast search and filtering
Database-Only Collections
For collections that don't need file storage (e.g., analytics, logs, sessions), use dbOnly: true:
mdxdbAdapter({
basePath: './content',
collections: {
// These collections are stored in ClickHouse only, no MDX files
analytics: { dbOnly: true },
sessions: { dbOnly: true },
logs: { dbOnly: true },
},
})Automatically database-only:
- Auth collections (with
auth: true) - Internal Payload collections (
payload-*)
content/
posts/
hello-world.mdx # File storage
my-second-post.mdx
pages/
about.mdxAuth & Internal Collections
Auth collections (users, etc.) and internal Payload collections (payload-*) are stored in ClickHouse only - never written to files:
- Passwords are hashed, not stored in plaintext
- Verification tokens and reset tokens are database-only
- No sensitive data in your Git history
MDX File Format
Documents are stored as MDX files with YAML frontmatter:
---
id: my-post
status: published
publishedAt: 2024-01-15T10:30:00Z
views: 1234
createdAt: 2024-01-10T08:00:00Z
updatedAt: 2024-01-15T10:30:00Z
---
# My Post Title
This is the main content of the post. It supports **markdown** formatting,
including code blocks, lists, and more.
## Code Example
```typescript
export function hello() {
console.log('Hello, world!')
}
```
### Field Mapping
| Payload Field Type | MDX Representation |
|-------------------|-------------------|
| text, number, email, etc. | YAML frontmatter |
| richText | Markdown content after `# Title` |
| date, point, json | YAML frontmatter (serialized) |
| relationship | ID reference in frontmatter |
| array, blocks | YAML array in frontmatter |
## Querying
### Standard Payload Queries
All Payload query operators work as expected:
```typescript
// Find published posts
const posts = await payload.find({
collection: 'posts',
where: {
status: { equals: 'published' },
views: { greater_than: 100 },
},
sort: '-publishedAt',
limit: 10,
})Full-Text Search
Search across document content:
import { fullTextSearch } from '@mdxdb/payload/search/queries'
const results = await fullTextSearch({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
query: 'typescript react hooks',
collection: 'posts', // optional: filter by collection
ns: 'default', // optional: filter by namespace
limit: 20,
})
// Returns: [{ id, docId, collection, content, path, score }]Vector Similarity Search
Find semantically similar content:
import { vectorSearch } from '@mdxdb/payload/search/queries'
const results = await vectorSearch({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
embedding: queryVector, // 1024-dimension float array
collection: 'posts',
limit: 10,
})
// Returns: [{ id, docId, collection, content, path, score }]Search & Embeddings
How It Works
- Document Creation/Update - Content is chunked at heading boundaries
- Chunk Storage - Each chunk stored with path hierarchy (e.g.,
posts > Introduction > Getting Started) - Embedding Generation - Background process generates vectors via Workers AI
- Search - Full-text via inverted index, semantic via HNSW vector index
Chunk Structure
interface SearchChunk {
id: string // "{docId}_{chunkIndex}"
ns: string // Namespace
collection: string // Collection slug
docId: string // Parent document ID
chunkIndex: number // Position in document
path: string // Heading hierarchy
content: string // Chunk text (max ~1500 tokens)
embedding: number[] // 1024-dim vector (when ready)
status: 'pending' | 'ready' | 'failed'
}Processing Pending Embeddings
import { processPendingChunks } from '@mdxdb/payload/embedding/processor'
// Process all pending chunks
await processPendingChunks({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
embeddingConfig: adapter.embeddingConfig,
batchSize: 50,
})ClickHouse Schema
Data Table
CREATE TABLE mdxdb_data (
ns String, -- Namespace
collection String, -- Collection slug
id String, -- Document ID
data String, -- JSON document data
filepath Nullable(String), -- File path (null for DB-only)
gitHash Nullable(String), -- Last commit hash
gitAuthor Nullable(String), -- Last commit author
gitDate Nullable(DateTime64), -- Last commit date
gitMessage Nullable(String), -- Last commit message
v DateTime64(3), -- Version timestamp
deletedAt Nullable(DateTime64(3))
) ENGINE = ReplacingMergeTree(v)
ORDER BY (ns, collection, id)Search Table
CREATE TABLE mdxdb_search (
-- ... chunk fields ...
embedding Array(Float32), -- 1024-dim vector
INDEX idx_fts content TYPE full_text,
INDEX idx_vec embedding TYPE vector_similarity('hnsw', 'cosineDistance')
) ENGINE = ReplacingMergeTree(updatedAt)
ORDER BY (ns, collection, docId, chunkIndex)Migrations
Create data transformation migrations (MongoDB-style, not SQL schema):
// migrations/20240115_add_default_status.ts
import type { MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'
export async function up({ payload }: MigrateUpArgs): Promise<void> {
const { docs } = await payload.find({
collection: 'posts',
where: { status: { exists: false } },
limit: 0,
})
for (const doc of docs) {
await payload.update({
collection: 'posts',
id: doc.id,
data: { status: 'draft' },
})
}
}
export async function down({ payload }: MigrateDownArgs): Promise<void> {
// Reverse the migration if needed
}Run migrations:
payload migrate # Run pending migrations
payload migrate:down # Roll back last batch
payload migrate:status # Check migration status
payload migrate:fresh # Reset and re-run allCLI
# Sync MDX files to ClickHouse
npx mdxdb sync --path ./content
# Watch for changes and auto-sync
npx mdxdb watch --path ./content
# Rebuild search index
npx mdxdb rebuild --path ./content
# Process pending embeddings
npx mdxdb embed --batch-size 50Directory Structure
Default organization by collection slug:
content/
posts/
hello-world.mdx
my-second-post.mdx
pages/
about.mdx
contact.mdx
site-settings.mdx # GlobalWith admin groups:
content/
blog/ # admin.group: 'Blog'
posts/
categories/
settings/ # admin.group: 'Settings'
navigation.mdxGit Integration
Automatic Metadata
Every document tracks its Git history:
const post = await payload.findOne({
collection: 'posts',
where: { id: { equals: 'my-post' } },
})
// Access via ClickHouse query
// gitHash, gitAuthor, gitDate, gitMessageBest Practices
- Commit frequently - Small, focused commits for better history
- Use branches - Feature branches for content changes
- Review in PRs - Human-readable diffs in GitHub
- Automate deploys - Push to main triggers site rebuild
Node.js Client
For Node.js environments, use the /node export for the native ClickHouse client with automatic binary installation and server management:
import { connect, startServer, stopServer, isServerRunning } from '@mdxdb/payload/node'
// Automatically downloads ClickHouse binary and starts server
const client = await connect({
database: 'mydb',
// Optional: custom install directory (default: ~/.mdxdb)
installDir: '~/.mdxdb',
// Optional: custom port (default: 8123)
port: 8123,
})
// Query using native client
const result = await client.query({
query: 'SELECT * FROM mydb.data LIMIT 10',
format: 'JSONEachRow',
})Server Management
Control the ClickHouse server lifecycle:
import {
ensureBinary,
getInstallPaths,
isServerRunning,
startServer,
stopServer,
waitForReady,
} from '@mdxdb/payload/node'
// Check if binary is installed, download if not
const binaryPath = await ensureBinary({ installDir: '~/.mdxdb' })
// Get installation paths
const paths = getInstallPaths({ installDir: '~/.mdxdb' })
// { binaryPath, binDir, dataDir, logDir, installDir }
// Check server status
const status = await isServerRunning({ port: 8123 })
if (!status.isRunning) {
// Start as daemon
await startServer({
binaryPath: paths.binaryPath,
dataDir: paths.dataDir,
logDir: paths.logDir,
httpPort: 8123,
})
}
// Wait for server to be ready (with timeout)
await waitForReady({ port: 8123, timeout: 30000 })
// Gracefully stop server
await stopServer({ port: 8123 })Installation Paths
ClickHouse is installed to ~/.mdxdb by default:
~/.mdxdb/
bin/
clickhouse # Binary (~500MB)
data/
config.xml # Auto-generated config
... # ClickHouse data files
logs/
clickhouse-server.log
clickhouse-server.err.logAPI Reference
Adapter Options
| Option | Type | Default | Description |
|---|---|---|---|
basePath |
string |
required | Base directory for content files |
collections |
object |
{} |
Per-collection config (see below) |
clickhousePath |
string |
~/.mdxdb/data |
ClickHouse data directory |
clickhousePort |
number |
9000 |
ClickHouse server port |
database |
string |
'mdxdb' |
Database name |
ns |
string |
'default' |
Namespace for multi-tenancy |
tablePrefix |
string |
'mdxdb' |
Table name prefix |
Collection Options:
| Option | Type | Default | Description |
|---|---|---|---|
dbOnly |
boolean |
false |
Store in ClickHouse only, no MDX files |
pathPattern |
string |
- | Custom file path pattern |
template |
string |
- | Custom MDX template |
Exported Types
import type { MdxdbAdapter, MdxdbAdapterArgs, MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'Exported Functions
// Node.js client with auto-install (recommended for Node.js)
import {
connect,
createClickHouseClient,
ensureBinary,
getInstallPaths,
isServerRunning,
startServer,
stopServer,
waitForReady,
} from '@mdxdb/payload/node'
// Search
import { fullTextSearch, vectorSearch } from '@mdxdb/payload/search/queries'
// Embedding
import { processPendingChunks } from '@mdxdb/payload/embedding/processor'
// Git metadata
import { getGitMetadata } from '@mdxdb/payload/git/metadata'
// Utilities
import { isFileCollection } from '@mdxdb/payload/utilities/isFileCollection'Performance
When to Use mdxdb
Ideal for:
- Content-heavy sites (blogs, docs, marketing pages)
- Git-based content workflows
- Teams comfortable with GitHub
- Projects needing human-readable content storage
- Sites with semantic search requirements
Consider alternatives for:
- High-frequency writes (>100/sec)
- Complex multi-document transactions
- Real-time collaborative editing
- Very large collections (>100k documents without pagination)
Optimization Tips
- Use pagination - Don't fetch all documents at once
- Index strategically - ClickHouse handles most queries well
- Batch embeddings - Process in batches of 50-100
- Namespace separation - Use namespaces for multi-tenant isolation
Limitations
- No transactions - ClickHouse and files don't share transactional boundaries
- No Payload versions - Use Git history instead of Payload's version system
- Eventual consistency - File writes may briefly lag behind ClickHouse
- No real-time sync - Changes require explicit sync for multi-instance setups
Troubleshooting
ClickHouse won't start
# Check if port is in use
lsof -i :9000
# Remove stale lock files
rm -rf ~/.mdxdb/data/clickhouse-server.pidEmbeddings not generating
# Check for pending chunks
npx mdxdb status
# Verify API credentials
echo $CLOUDFLARE_ACCOUNT_ID
echo $CLOUDFLARE_API_TOKENFile/database mismatch
# Full resync from files
npx mdxdb rebuild --path ./contentContributing
# Clone the repo
git clone https://github.com/mdxdb/payload.git
# Install dependencies
pnpm install
# Run tests
pnpm test
# Build
pnpm build
# Lint
pnpm lintLicense
MIT
Built with love for content creators who believe in open, readable, versionable content.