Package Exports
- @mdxdb/payload
- @mdxdb/payload/embedding/processor
- @mdxdb/payload/git/metadata
- @mdxdb/payload/search/queries
- @mdxdb/payload/utilities/isFileCollection
Readme
@mdxdb/payload
A hybrid database adapter for Payload CMS that combines human-readable MDX files with powerful ClickHouse querying. Version your content with Git while enjoying full-text search, vector similarity, and SQL-level query performance.
Why mdxdb?
| Traditional DB | mdxdb |
|---|---|
| Opaque binary storage | Human-readable .mdx files |
| Vendor lock-in | Plain files, zero lock-in |
| Complex backup/restore | Git push/pull |
| Limited history | Full Git history with blame |
| Review in custom UI | Review in GitHub PRs |
Best of both worlds: Content lives as files you can read, edit, and version with Git. Queries run on your choice of database (ClickHouse, RPC, etc.) for powerful querying with full-text and vector search.
Features
Storage & Versioning
- Human-readable MDX - Documents stored as
.mdxfiles with YAML frontmatter - Git-native - Full history, branching, blame, and PR workflows
- GitHub-friendly - Edit content directly on GitHub, review changes in PRs
Query Performance
- Pluggable database - Use any Payload database adapter (ClickHouse, RPC, etc.)
- Full-text search - Inverted indexes with relevance scoring (when using ClickHouse)
- Vector similarity - HNSW indexes for semantic search (when using ClickHouse)
- Smart chunking - Markdown-aware content splitting for search
AI-Ready
- Embedding pipeline - Generate embeddings with Workers AI (
@cf/baai/bge-m3) - 1024-dimension vectors - State-of-the-art semantic search
- Background processing - Non-blocking embedding generation
Architecture
- Dual storage - Content collections: files + remote database
- Database-only mode - Auth & internal collections skip files entirely
- Sync control -
syncToRemoteoption controls MDX→database sync - Namespace support - Multi-tenant and cross-app search
- Git metadata - Track author, commit hash, and message per document
Installation
npm install @mdxdb/payload @dotdo/db-clickhouse
# or
pnpm add @mdxdb/payload @dotdo/db-clickhouse
# or
yarn add @mdxdb/payload @dotdo/db-clickhouseFor local development, ClickHouse can be auto-downloaded and managed using the daemon module (see Local Development below).
Quick Start
// payload.config.ts
import { buildConfig } from 'payload'
import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
export default buildConfig({
db: mdxdb({
basePath: './content',
db: clickhouseAdapter({
url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
namespace: 'my-app',
}),
}),
collections: [
{
slug: 'posts',
fields: [
{ name: 'title', type: 'text', required: true },
{ name: 'status', type: 'select', options: ['draft', 'published'] },
{ name: 'content', type: 'richText' },
],
},
],
})This creates documents like:
content/
posts/
my-first-post.mdx
another-post.mdxConfiguration
Database Adapters
mdxdb accepts any Payload database adapter via the db option. Common options:
- ClickHouse (
@dotdo/db-clickhouse) - High-performance columnar database with vector search - RPC (
@dotdo/db-rpc) - Connect to remote database via RPC protocol - Any other Payload-compatible database adapter
Options
import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
// OR
import { rpcAdapter } from '@dotdo/db-rpc'
mdxdb({
// Required: Base directory for content files
basePath: './content',
// Required: Payload database adapter (ClickHouse, RPC, etc.)
db: clickhouseAdapter({
url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
namespace: 'my-app',
}),
// OR use RPC adapter for remote database
// db: rpcAdapter({
// url: process.env.DB_RPC_URL,
// }),
// Optional: Collections to store in database only (no MDX files)
// Useful for analytics, sessions, logs, etc.
dbOnly: ['analytics', 'sessions'],
// Optional: Whether to sync MDX file writes to remote database (default: true)
// Set to false to only write MDX files, skipping database sync
syncToRemote: true,
// Optional: Per-collection configuration for path patterns and templates
collections: {
posts: {
pathPattern: 'blog/{slug}',
template: 'post-template.mdx',
},
},
// Optional: Namespace for multi-tenancy (default: 'default')
ns: 'my-app',
// Optional: Git configuration for auto-commits
git: {
autoCommit: true,
debounceMs: 5000,
messageTemplate: '{action}({collection}): {id}',
},
})Storage Model
Content Collections (Default)
Content collections use dual storage:
- MDX files - Source of truth, human-readable, Git-versioned
- Remote database - Query index for fast search and filtering (ClickHouse, RPC, etc.)
Database-Only Collections
For collections that don't need file storage (e.g., analytics, logs, sessions), use the dbOnly array:
mdxdb({
basePath: './content',
db: clickhouseAdapter({ url: process.env.CLICKHOUSE_URL }),
// These collections are stored in database only, no MDX files
dbOnly: ['analytics', 'sessions', 'logs'],
})Automatically database-only:
- Auth collections (with
auth: true) - Internal Payload collections (
payload-*)
content/
posts/
hello-world.mdx # File storage
my-second-post.mdx
pages/
about.mdxAuth & Internal Collections
Auth collections (users, etc.) and internal Payload collections (payload-*) are stored in the database only - never written to files:
- Passwords are hashed, not stored in plaintext
- Verification tokens and reset tokens are database-only
- No sensitive data in your Git history
MDX File Format
Documents are stored as MDX files with YAML frontmatter:
---
id: my-post
status: published
publishedAt: 2024-01-15T10:30:00Z
views: 1234
createdAt: 2024-01-10T08:00:00Z
updatedAt: 2024-01-15T10:30:00Z
---
# My Post Title
This is the main content of the post. It supports **markdown** formatting,
including code blocks, lists, and more.
## Code Example
```typescript
export function hello() {
console.log('Hello, world!')
}
```
### Field Mapping
| Payload Field Type | MDX Representation |
|-------------------|-------------------|
| text, number, email, etc. | YAML frontmatter |
| richText | Markdown content after `# Title` |
| date, point, json | YAML frontmatter (serialized) |
| relationship | ID reference in frontmatter |
| array, blocks | YAML array in frontmatter |
## Querying
### Standard Payload Queries
All Payload query operators work as expected:
```typescript
// Find published posts
const posts = await payload.find({
collection: 'posts',
where: {
status: { equals: 'published' },
views: { greater_than: 100 },
},
sort: '-publishedAt',
limit: 10,
})Full-Text Search
Search across document content:
import { fullTextSearch } from '@mdxdb/payload/search/queries'
const results = await fullTextSearch({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
query: 'typescript react hooks',
collection: 'posts', // optional: filter by collection
ns: 'default', // optional: filter by namespace
limit: 20,
})
// Returns: [{ id, docId, collection, content, path, score }]Vector Similarity Search
Find semantically similar content:
import { vectorSearch } from '@mdxdb/payload/search/queries'
const results = await vectorSearch({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
embedding: queryVector, // 1024-dimension float array
collection: 'posts',
limit: 10,
})
// Returns: [{ id, docId, collection, content, path, score }]Search & Embeddings
How It Works
- Document Creation/Update - Content is chunked at heading boundaries
- Chunk Storage - Each chunk stored with path hierarchy (e.g.,
posts > Introduction > Getting Started) - Embedding Generation - Background process generates vectors via Workers AI
- Search - Full-text via inverted index, semantic via HNSW vector index
Chunk Structure
interface SearchChunk {
id: string // "{docId}_{chunkIndex}"
ns: string // Namespace
collection: string // Collection slug
docId: string // Parent document ID
chunkIndex: number // Position in document
path: string // Heading hierarchy
content: string // Chunk text (max ~1500 tokens)
embedding: number[] // 1024-dim vector (when ready)
status: 'pending' | 'ready' | 'failed'
}Processing Pending Embeddings
import { processPendingChunks } from '@mdxdb/payload/embedding/processor'
// Process all pending chunks
await processPendingChunks({
client: adapter.client,
database: adapter.database,
tableName: adapter.tables.search,
embeddingConfig: adapter.embeddingConfig,
batchSize: 50,
})Database Schema
When using ClickHouse as the database adapter, mdxdb creates the following schema:
Data Table
CREATE TABLE mdxdb_data (
ns String, -- Namespace
collection String, -- Collection slug
id String, -- Document ID
data String, -- JSON document data
filepath Nullable(String), -- File path (null for DB-only)
gitHash Nullable(String), -- Last commit hash
gitAuthor Nullable(String), -- Last commit author
gitDate Nullable(DateTime64), -- Last commit date
gitMessage Nullable(String), -- Last commit message
v DateTime64(3), -- Version timestamp
deletedAt Nullable(DateTime64(3))
) ENGINE = ReplacingMergeTree(v)
ORDER BY (ns, collection, id)Search Table
CREATE TABLE mdxdb_search (
-- ... chunk fields ...
embedding Array(Float32), -- 1024-dim vector
INDEX idx_fts content TYPE full_text,
INDEX idx_vec embedding TYPE vector_similarity('hnsw', 'cosineDistance')
) ENGINE = ReplacingMergeTree(updatedAt)
ORDER BY (ns, collection, docId, chunkIndex)Migrations
Create data transformation migrations (MongoDB-style, not SQL schema):
// migrations/20240115_add_default_status.ts
import type { MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'
export async function up({ payload }: MigrateUpArgs): Promise<void> {
const { docs } = await payload.find({
collection: 'posts',
where: { status: { exists: false } },
limit: 0,
})
for (const doc of docs) {
await payload.update({
collection: 'posts',
id: doc.id,
data: { status: 'draft' },
})
}
}
export async function down({ payload }: MigrateDownArgs): Promise<void> {
// Reverse the migration if needed
}Run migrations:
payload migrate # Run pending migrations
payload migrate:down # Roll back last batch
payload migrate:status # Check migration status
payload migrate:fresh # Reset and re-run allCLI
# Sync MDX files to ClickHouse
npx mdxdb sync --path ./content
# Watch for changes and auto-sync
npx mdxdb watch --path ./content
# Rebuild search index
npx mdxdb rebuild --path ./content
# Process pending embeddings
npx mdxdb embed --batch-size 50Directory Structure
Default organization by collection slug:
content/
posts/
hello-world.mdx
my-second-post.mdx
pages/
about.mdx
contact.mdx
site-settings.mdx # GlobalWith admin groups:
content/
blog/ # admin.group: 'Blog'
posts/
categories/
settings/ # admin.group: 'Settings'
navigation.mdxGit Integration
Automatic Metadata
Every document tracks its Git history:
const post = await payload.findOne({
collection: 'posts',
where: { id: { equals: 'my-post' } },
})
// Access via ClickHouse query
// gitHash, gitAuthor, gitDate, gitMessageBest Practices
- Commit frequently - Small, focused commits for better history
- Use branches - Feature branches for content changes
- Review in PRs - Human-readable diffs in GitHub
- Automate deploys - Push to main triggers site rebuild
Local Development
For local development, @dotdo/db-clickhouse provides a daemon module that automatically downloads and manages a local ClickHouse server:
import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
import { connect } from '@dotdo/db-clickhouse/local'
// Auto-download binary and start daemon if needed
const client = await connect({
database: 'mydb',
installDir: '~/.clickhouse', // default
port: 8123,
})
// Use in mdxdb config
export default buildConfig({
db: mdxdb({
basePath: './content',
db: clickhouseAdapter({
url: 'http://localhost:8123',
namespace: 'my-app',
}),
}),
})Daemon Management
Control the ClickHouse server lifecycle:
import {
ensureBinary,
getInstallPaths,
isServerRunning,
startServer,
stopServer,
waitForReady,
} from '@dotdo/db-clickhouse/local'
// Check if binary is installed, download if not
const binaryPath = await ensureBinary({ installDir: '~/.clickhouse' })
// Get installation paths
const paths = getInstallPaths({ installDir: '~/.clickhouse' })
// { binaryPath, binDir, dataDir, logDir, installDir }
// Check server status
const status = await isServerRunning({ port: 8123 })
if (!status.isRunning) {
// Start as daemon
await startServer({
binaryPath: paths.binaryPath,
dataDir: paths.dataDir,
logDir: paths.logDir,
httpPort: 8123,
})
}
// Wait for server to be ready (with timeout)
await waitForReady({ port: 8123, timeout: 30000 })
// Gracefully stop server
await stopServer({ port: 8123 })Installation Paths
ClickHouse is installed to ~/.clickhouse by default:
~/.clickhouse/
bin/
clickhouse # Binary (~500MB)
data/
config.xml # Auto-generated config
... # ClickHouse data files
logs/
clickhouse-server.log
clickhouse-server.err.logAPI Reference
Adapter Options
| Option | Type | Default | Description |
|---|---|---|---|
basePath |
string |
required | Base directory for content files |
db |
DatabaseAdapterObj |
required | Payload database adapter (ClickHouse, RPC, etc.) |
dbOnly |
string[] |
[] |
Collections to store in database only (no MDX files) |
syncToRemote |
boolean |
true |
Whether MDX writes sync to remote database |
collections |
object |
{} |
Per-collection config (see below) |
ns |
string |
'default' |
Namespace for multi-tenancy |
git |
object |
see below | Git auto-commit configuration |
Collection Options:
| Option | Type | Default | Description |
|---|---|---|---|
pathPattern |
string |
- | Custom file path pattern |
template |
string |
- | Custom MDX template |
Git Options:
| Option | Type | Default | Description |
|---|---|---|---|
autoCommit |
boolean |
false |
Enable automatic Git commits |
debounceMs |
number |
5000 |
Debounce delay for auto-commit |
messageTemplate |
string |
'{action}({collection}): {id}' |
Template for commit messages |
Exported Types
import type { MdxdbAdapter, MdxdbAdapterArgs, MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'Exported Functions
// Main adapter
import { mdxdb } from '@mdxdb/payload'
// ClickHouse adapter with daemon for local development
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
import {
connect,
createClickHouseClient,
ensureBinary,
getInstallPaths,
isServerRunning,
startServer,
stopServer,
waitForReady,
} from '@dotdo/db-clickhouse/local'
// RPC adapter for remote database
import { rpcAdapter } from '@dotdo/db-rpc'
// Search
import { fullTextSearch, vectorSearch } from '@mdxdb/payload/search/queries'
// Embedding
import { processPendingChunks } from '@mdxdb/payload/embedding/processor'
// Git metadata
import { getGitMetadata } from '@mdxdb/payload/git/metadata'
// Utilities
import { isFileCollection } from '@mdxdb/payload/utilities/isFileCollection'Performance
When to Use mdxdb
Ideal for:
- Content-heavy sites (blogs, docs, marketing pages)
- Git-based content workflows
- Teams comfortable with GitHub
- Projects needing human-readable content storage
- Sites with semantic search requirements
Consider alternatives for:
- High-frequency writes (>100/sec)
- Complex multi-document transactions
- Real-time collaborative editing
- Very large collections (>100k documents without pagination)
Optimization Tips
- Use pagination - Don't fetch all documents at once
- Index strategically - ClickHouse handles most queries well
- Batch embeddings - Process in batches of 50-100
- Namespace separation - Use namespaces for multi-tenant isolation
Limitations
- No cross-boundary transactions - Database and files don't share transactional boundaries
- No Payload versions - Use Git history instead of Payload's version system
- Eventual consistency - File writes may briefly lag behind database sync
- No real-time sync - Changes require explicit sync for multi-instance setups
Troubleshooting
ClickHouse won't start (local development)
# Check if port is in use
lsof -i :8123
# Remove stale lock files
rm -rf ~/.clickhouse/data/clickhouse-server.pid
# Check logs
tail -f ~/.clickhouse/logs/clickhouse-server.logEmbeddings not generating
# Check for pending chunks
npx mdxdb status
# Verify API credentials
echo $CLOUDFLARE_ACCOUNT_ID
echo $CLOUDFLARE_API_TOKENFile/database mismatch
# Full resync from files
npx mdxdb rebuild --path ./contentContributing
# Clone the repo
git clone https://github.com/mdxdb/payload.git
# Install dependencies
pnpm install
# Run tests
pnpm test
# Build
pnpm build
# Lint
pnpm lintLicense
MIT
Built with love for content creators who believe in open, readable, versionable content.