Package Exports

@mdxdb/payload
@mdxdb/payload/embedding/processor
@mdxdb/payload/git/metadata
@mdxdb/payload/search/queries
@mdxdb/payload/utilities/isFileCollection

Readme

@mdxdb/payload

A hybrid database adapter for Payload CMS that combines human-readable MDX files with powerful ClickHouse querying. Version your content with Git while enjoying full-text search, vector similarity, and SQL-level query performance.

Why mdxdb?

Traditional DB	mdxdb
Opaque binary storage	Human-readable `.mdx` files
Vendor lock-in	Plain files, zero lock-in
Complex backup/restore	Git push/pull
Limited history	Full Git history with blame
Review in custom UI	Review in GitHub PRs

Best of both worlds: Content lives as files you can read, edit, and version with Git. Queries run on your choice of database (ClickHouse, RPC, etc.) for powerful querying with full-text and vector search.

Features

Storage & Versioning

Human-readable MDX - Documents stored as .mdx files with YAML frontmatter
Git-native - Full history, branching, blame, and PR workflows
GitHub-friendly - Edit content directly on GitHub, review changes in PRs

Query Performance

Pluggable database - Use any Payload database adapter (ClickHouse, RPC, etc.)
Full-text search - Inverted indexes with relevance scoring (when using ClickHouse)
Vector similarity - HNSW indexes for semantic search (when using ClickHouse)
Smart chunking - Markdown-aware content splitting for search

AI-Ready

Embedding pipeline - Generate embeddings with Workers AI (@cf/baai/bge-m3)
1024-dimension vectors - State-of-the-art semantic search
Background processing - Non-blocking embedding generation

Architecture

Dual storage - Content collections: files + remote database
Database-only mode - Auth & internal collections skip files entirely
Sync control - syncToRemote option controls MDX→database sync
Namespace support - Multi-tenant and cross-app search
Git metadata - Track author, commit hash, and message per document

Installation

npm install @mdxdb/payload @dotdo/db-clickhouse
# or
pnpm add @mdxdb/payload @dotdo/db-clickhouse
# or
yarn add @mdxdb/payload @dotdo/db-clickhouse

For local development, ClickHouse can be auto-downloaded and managed using the daemon module (see Local Development below).

Quick Start

// payload.config.ts
import { buildConfig } from 'payload'
import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'

export default buildConfig({
  db: mdxdb({
    basePath: './content',
    db: clickhouseAdapter({
      url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
      namespace: 'my-app',
    }),
  }),
  collections: [
    {
      slug: 'posts',
      fields: [
        { name: 'title', type: 'text', required: true },
        { name: 'status', type: 'select', options: ['draft', 'published'] },
        { name: 'content', type: 'richText' },
      ],
    },
  ],
})

This creates documents like:

content/
  posts/
    my-first-post.mdx
    another-post.mdx

Configuration

Database Adapters

mdxdb accepts any Payload database adapter via the db option. Common options:

ClickHouse (@dotdo/db-clickhouse) - High-performance columnar database with vector search
RPC (@dotdo/db-rpc) - Connect to remote database via RPC protocol
Any other Payload-compatible database adapter

Options

import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
// OR
import { rpcAdapter } from '@dotdo/db-rpc'

mdxdb({
  // Required: Base directory for content files
  basePath: './content',

  // Required: Payload database adapter (ClickHouse, RPC, etc.)
  db: clickhouseAdapter({
    url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
    namespace: 'my-app',
  }),
  // OR use RPC adapter for remote database
  // db: rpcAdapter({
  //   url: process.env.DB_RPC_URL,
  // }),

  // Optional: Collections to store in database only (no MDX files)
  // Useful for analytics, sessions, logs, etc.
  dbOnly: ['analytics', 'sessions'],

  // Optional: Whether to sync MDX file writes to remote database (default: true)
  // Set to false to only write MDX files, skipping database sync
  syncToRemote: true,

  // Optional: Per-collection configuration for path patterns and templates
  collections: {
    posts: {
      pathPattern: 'blog/{slug}',
      template: 'post-template.mdx',
    },
  },

  // Optional: Namespace for multi-tenancy (default: 'default')
  ns: 'my-app',

  // Optional: Git configuration for auto-commits
  git: {
    autoCommit: true,
    debounceMs: 5000,
    messageTemplate: '{action}({collection}): {id}',
  },
})

Storage Model

Content Collections (Default)

Content collections use dual storage:

MDX files - Source of truth, human-readable, Git-versioned
Remote database - Query index for fast search and filtering (ClickHouse, RPC, etc.)

Database-Only Collections

For collections that don't need file storage (e.g., analytics, logs, sessions), use the dbOnly array:

mdxdb({
  basePath: './content',
  db: clickhouseAdapter({ url: process.env.CLICKHOUSE_URL }),
  // These collections are stored in database only, no MDX files
  dbOnly: ['analytics', 'sessions', 'logs'],
})

Automatically database-only:

Auth collections (with auth: true)
Internal Payload collections (payload-*)

content/
  posts/
    hello-world.mdx      # File storage
    my-second-post.mdx
  pages/
    about.mdx

Auth & Internal Collections

Auth collections (users, etc.) and internal Payload collections (payload-*) are stored in the database only - never written to files:

Passwords are hashed, not stored in plaintext
Verification tokens and reset tokens are database-only
No sensitive data in your Git history

MDX File Format

Documents are stored as MDX files with YAML frontmatter:

---
id: my-post
status: published
publishedAt: 2024-01-15T10:30:00Z
views: 1234
createdAt: 2024-01-10T08:00:00Z
updatedAt: 2024-01-15T10:30:00Z
---

# My Post Title

This is the main content of the post. It supports **markdown** formatting,
including code blocks, lists, and more.

## Code Example

```typescript
export function hello() {
  console.log('Hello, world!')
}
```


### Field Mapping

| Payload Field Type | MDX Representation |
|-------------------|-------------------|
| text, number, email, etc. | YAML frontmatter |
| richText | Markdown content after `# Title` |
| date, point, json | YAML frontmatter (serialized) |
| relationship | ID reference in frontmatter |
| array, blocks | YAML array in frontmatter |

## Querying

### Standard Payload Queries

All Payload query operators work as expected:

```typescript
// Find published posts
const posts = await payload.find({
  collection: 'posts',
  where: {
    status: { equals: 'published' },
    views: { greater_than: 100 },
  },
  sort: '-publishedAt',
  limit: 10,
})

Full-Text Search

Search across document content:

import { fullTextSearch } from '@mdxdb/payload/search/queries'

const results = await fullTextSearch({
  client: adapter.client,
  database: adapter.database,
  tableName: adapter.tables.search,
  query: 'typescript react hooks',
  collection: 'posts', // optional: filter by collection
  ns: 'default', // optional: filter by namespace
  limit: 20,
})

// Returns: [{ id, docId, collection, content, path, score }]

Vector Similarity Search

Find semantically similar content:

import { vectorSearch } from '@mdxdb/payload/search/queries'

const results = await vectorSearch({
  client: adapter.client,
  database: adapter.database,
  tableName: adapter.tables.search,
  embedding: queryVector, // 1024-dimension float array
  collection: 'posts',
  limit: 10,
})

// Returns: [{ id, docId, collection, content, path, score }]

Search & Embeddings

How It Works

Document Creation/Update - Content is chunked at heading boundaries
Chunk Storage - Each chunk stored with path hierarchy (e.g., posts > Introduction > Getting Started)
Embedding Generation - Background process generates vectors via Workers AI
Search - Full-text via inverted index, semantic via HNSW vector index

Chunk Structure

interface SearchChunk {
  id: string // "{docId}_{chunkIndex}"
  ns: string // Namespace
  collection: string // Collection slug
  docId: string // Parent document ID
  chunkIndex: number // Position in document
  path: string // Heading hierarchy
  content: string // Chunk text (max ~1500 tokens)
  embedding: number[] // 1024-dim vector (when ready)
  status: 'pending' | 'ready' | 'failed'
}

Processing Pending Embeddings

import { processPendingChunks } from '@mdxdb/payload/embedding/processor'

// Process all pending chunks
await processPendingChunks({
  client: adapter.client,
  database: adapter.database,
  tableName: adapter.tables.search,
  embeddingConfig: adapter.embeddingConfig,
  batchSize: 50,
})

Database Schema

When using ClickHouse as the database adapter, mdxdb creates the following schema:

Data Table

CREATE TABLE mdxdb_data (
  ns String,                    -- Namespace
  collection String,            -- Collection slug
  id String,                    -- Document ID
  data String,                  -- JSON document data
  filepath Nullable(String),    -- File path (null for DB-only)
  gitHash Nullable(String),     -- Last commit hash
  gitAuthor Nullable(String),   -- Last commit author
  gitDate Nullable(DateTime64), -- Last commit date
  gitMessage Nullable(String),  -- Last commit message
  v DateTime64(3),              -- Version timestamp
  deletedAt Nullable(DateTime64(3))
) ENGINE = ReplacingMergeTree(v)
ORDER BY (ns, collection, id)

Search Table

CREATE TABLE mdxdb_search (
  -- ... chunk fields ...
  embedding Array(Float32),     -- 1024-dim vector
  INDEX idx_fts content TYPE full_text,
  INDEX idx_vec embedding TYPE vector_similarity('hnsw', 'cosineDistance')
) ENGINE = ReplacingMergeTree(updatedAt)
ORDER BY (ns, collection, docId, chunkIndex)

Migrations

Create data transformation migrations (MongoDB-style, not SQL schema):

// migrations/20240115_add_default_status.ts
import type { MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'

export async function up({ payload }: MigrateUpArgs): Promise<void> {
  const { docs } = await payload.find({
    collection: 'posts',
    where: { status: { exists: false } },
    limit: 0,
  })

  for (const doc of docs) {
    await payload.update({
      collection: 'posts',
      id: doc.id,
      data: { status: 'draft' },
    })
  }
}

export async function down({ payload }: MigrateDownArgs): Promise<void> {
  // Reverse the migration if needed
}

Run migrations:

payload migrate           # Run pending migrations
payload migrate:down      # Roll back last batch
payload migrate:status    # Check migration status
payload migrate:fresh     # Reset and re-run all

CLI

# Sync MDX files to ClickHouse
npx mdxdb sync --path ./content

# Watch for changes and auto-sync
npx mdxdb watch --path ./content

# Rebuild search index
npx mdxdb rebuild --path ./content

# Process pending embeddings
npx mdxdb embed --batch-size 50

Directory Structure

Default organization by collection slug:

content/
  posts/
    hello-world.mdx
    my-second-post.mdx
  pages/
    about.mdx
    contact.mdx
  site-settings.mdx      # Global

With admin groups:

content/
  blog/                   # admin.group: 'Blog'
    posts/
    categories/
  settings/               # admin.group: 'Settings'
    navigation.mdx

Git Integration

Automatic Metadata

Every document tracks its Git history:

const post = await payload.findOne({
  collection: 'posts',
  where: { id: { equals: 'my-post' } },
})

// Access via ClickHouse query
// gitHash, gitAuthor, gitDate, gitMessage

Best Practices

Commit frequently - Small, focused commits for better history
Use branches - Feature branches for content changes
Review in PRs - Human-readable diffs in GitHub
Automate deploys - Push to main triggers site rebuild

Local Development

For local development, @dotdo/db-clickhouse provides a daemon module that automatically downloads and manages a local ClickHouse server:

import { mdxdb } from '@mdxdb/payload'
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
import { connect } from '@dotdo/db-clickhouse/local'

// Auto-download binary and start daemon if needed
const client = await connect({
  database: 'mydb',
  installDir: '~/.clickhouse', // default
  port: 8123,
})

// Use in mdxdb config
export default buildConfig({
  db: mdxdb({
    basePath: './content',
    db: clickhouseAdapter({
      url: 'http://localhost:8123',
      namespace: 'my-app',
    }),
  }),
})

Daemon Management

Control the ClickHouse server lifecycle:

import {
  ensureBinary,
  getInstallPaths,
  isServerRunning,
  startServer,
  stopServer,
  waitForReady,
} from '@dotdo/db-clickhouse/local'

// Check if binary is installed, download if not
const binaryPath = await ensureBinary({ installDir: '~/.clickhouse' })

// Get installation paths
const paths = getInstallPaths({ installDir: '~/.clickhouse' })
// { binaryPath, binDir, dataDir, logDir, installDir }

// Check server status
const status = await isServerRunning({ port: 8123 })
if (!status.isRunning) {
  // Start as daemon
  await startServer({
    binaryPath: paths.binaryPath,
    dataDir: paths.dataDir,
    logDir: paths.logDir,
    httpPort: 8123,
  })
}

// Wait for server to be ready (with timeout)
await waitForReady({ port: 8123, timeout: 30000 })

// Gracefully stop server
await stopServer({ port: 8123 })

Installation Paths

ClickHouse is installed to ~/.clickhouse by default:

~/.clickhouse/
  bin/
    clickhouse          # Binary (~500MB)
  data/
    config.xml          # Auto-generated config
    ...                 # ClickHouse data files
  logs/
    clickhouse-server.log
    clickhouse-server.err.log

API Reference

Adapter Options

Option	Type	Default	Description
`basePath`	`string`	required	Base directory for content files
`db`	`DatabaseAdapterObj`	required	Payload database adapter (ClickHouse, RPC, etc.)
`dbOnly`	`string[]`	`[]`	Collections to store in database only (no MDX files)
`syncToRemote`	`boolean`	`true`	Whether MDX writes sync to remote database
`collections`	`object`	`{}`	Per-collection config (see below)
`ns`	`string`	`'default'`	Namespace for multi-tenancy
`git`	`object`	see below	Git auto-commit configuration

Collection Options:

Option	Type	Default	Description
`pathPattern`	`string`	-	Custom file path pattern
`template`	`string`	-	Custom MDX template

Git Options:

Option	Type	Default	Description
`autoCommit`	`boolean`	`false`	Enable automatic Git commits
`debounceMs`	`number`	`5000`	Debounce delay for auto-commit
`messageTemplate`	`string`	`'{action}({collection}): {id}'`	Template for commit messages

Exported Types

import type { MdxdbAdapter, MdxdbAdapterArgs, MigrateUpArgs, MigrateDownArgs } from '@mdxdb/payload'

Exported Functions

// Main adapter
import { mdxdb } from '@mdxdb/payload'

// ClickHouse adapter with daemon for local development
import { clickhouseAdapter } from '@dotdo/db-clickhouse'
import {
  connect,
  createClickHouseClient,
  ensureBinary,
  getInstallPaths,
  isServerRunning,
  startServer,
  stopServer,
  waitForReady,
} from '@dotdo/db-clickhouse/local'

// RPC adapter for remote database
import { rpcAdapter } from '@dotdo/db-rpc'

// Search
import { fullTextSearch, vectorSearch } from '@mdxdb/payload/search/queries'

// Embedding
import { processPendingChunks } from '@mdxdb/payload/embedding/processor'

// Git metadata
import { getGitMetadata } from '@mdxdb/payload/git/metadata'

// Utilities
import { isFileCollection } from '@mdxdb/payload/utilities/isFileCollection'

Performance

When to Use mdxdb

Ideal for:

Content-heavy sites (blogs, docs, marketing pages)
Git-based content workflows
Teams comfortable with GitHub
Projects needing human-readable content storage
Sites with semantic search requirements

Consider alternatives for:

High-frequency writes (>100/sec)
Complex multi-document transactions
Real-time collaborative editing
Very large collections (>100k documents without pagination)

Optimization Tips

Use pagination - Don't fetch all documents at once
Index strategically - ClickHouse handles most queries well
Batch embeddings - Process in batches of 50-100
Namespace separation - Use namespaces for multi-tenant isolation

Limitations

No cross-boundary transactions - Database and files don't share transactional boundaries
No Payload versions - Use Git history instead of Payload's version system
Eventual consistency - File writes may briefly lag behind database sync
No real-time sync - Changes require explicit sync for multi-instance setups

Troubleshooting

ClickHouse won't start (local development)

# Check if port is in use
lsof -i :8123

# Remove stale lock files
rm -rf ~/.clickhouse/data/clickhouse-server.pid

# Check logs
tail -f ~/.clickhouse/logs/clickhouse-server.log

Embeddings not generating

# Check for pending chunks
npx mdxdb status

# Verify API credentials
echo $CLOUDFLARE_ACCOUNT_ID
echo $CLOUDFLARE_API_TOKEN

File/database mismatch

# Full resync from files
npx mdxdb rebuild --path ./content

Contributing

# Clone the repo
git clone https://github.com/mdxdb/payload.git

# Install dependencies
pnpm install

# Run tests
pnpm test

# Build
pnpm build

# Lint
pnpm lint

License

MIT

Built with love for content creators who believe in open, readable, versionable content.