Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@paradyno/pdf-mcp-server) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
π PDF MCP Server
A high-performance MCP server for PDF processing, built in Rust.
Give your AI agents powerful PDF capabilities β extract text, search, split, merge, encrypt, and more. All dependencies are Apache 2.0 licensed, keeping your project clean and permissive.
β¨ Features
| Category | Tools |
|---|---|
| π Reading | extract_text Β· extract_metadata Β· extract_outline Β· extract_annotations Β· extract_links Β· extract_form_fields |
| π Search & Discovery | search Β· list_pdfs Β· get_page_info Β· summarize_structure |
| πΌοΈ Media | Image extraction (via extract_text) Β· convert_page_to_image |
| βοΈ Manipulation | split_pdf Β· merge_pdfs Β· compress_pdf Β· fill_form |
| π Security | protect_pdf Β· unprotect_pdf Β· Password-protected PDF support |
| π¦ Resources | Expose PDFs as MCP Resources for direct client access |
| β‘ Performance | Batch processing Β· LRU caching Β· Operation chaining via cache keys |
π Installation
npm (Recommended)
npm install -g @paradyno/pdf-mcp-serverPre-built Binaries
Download from GitHub Releases:
| Platform | x86_64 | ARM64 |
|---|---|---|
| π§ Linux | pdf-mcp-server-linux-x64 |
pdf-mcp-server-linux-arm64 |
| π macOS | pdf-mcp-server-darwin-x64 |
pdf-mcp-server-darwin-arm64 |
| πͺ Windows | pdf-mcp-server-windows-x64.exe |
β |
From Source
cargo install --git https://github.com/paradyno/pdf-mcp-serverβοΈ Configuration
Claude Desktop
Add to your claude_desktop_config.json:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"pdf": {
"command": "npx",
"args": ["@paradyno/pdf-mcp-server"]
}
}
}Claude Code
claude mcp add pdf -- npx @paradyno/pdf-mcp-serverVS Code
{
"mcp.servers": {
"pdf": {
"command": "npx",
"args": ["@paradyno/pdf-mcp-server"]
}
}
}π οΈ Tools
Source Types
All tools accept PDF sources in multiple formats:
{ "path": "/documents/file.pdf" }
{ "base64": "JVBERi0xLjQK..." }
{ "url": "https://example.com/document.pdf" }
{ "cache_key": "abc123" }π extract_text
Extract text content with LLM-optimized formatting (paragraph detection, multi-column reordering, watermark removal).
Example & Parameters
{
"sources": [{ "path": "/documents/report.pdf" }],
"pages": "1-10",
"include_metadata": true
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
pages |
string | No | all | Page selection (e.g., "1-5,10,15-20") |
include_metadata |
boolean | No | true | Include PDF metadata |
include_images |
boolean | No | false | Include extracted images (base64 PNG) |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
π extract_outline
Extract PDF bookmarks / table of contents.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/book.pdf" }]
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
Response:
{
"results": [{
"source": "/documents/book.pdf",
"outline": [
{
"title": "Chapter 1: Introduction",
"page": 1,
"children": [
{ "title": "1.1 Background", "page": 3, "children": [] }
]
}
]
}]
}π extract_metadata
Extract PDF metadata (author, title, dates, etc.) without loading full content.
Example & Parameters
{
"sources": [{ "path": "/documents/report.pdf" }]
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
π extract_annotations
Extract highlights, comments, underlines, and other annotations.
Example & Parameters
{
"sources": [{ "path": "/documents/report.pdf" }],
"annotation_types": ["highlight", "text"],
"pages": "1-5"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
annotation_types |
array | No | all | Filter by types (highlight, underline, text, etc.) |
pages |
string | No | all | Page selection |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
π extract_links
Extract hyperlinks and internal page navigation links.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/paper.pdf" }],
"pages": "1-10"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
pages |
string | No | all | Page selection |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
Response:
{
"results": [{
"source": "/documents/paper.pdf",
"links": [
{ "page": 1, "url": "https://example.com", "text": "Click here" },
{ "page": 3, "dest_page": 10, "text": "See Chapter 5" }
],
"total_count": 2
}]
}π extract_form_fields
Read form field names, types, current values, and properties from PDF forms.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/form.pdf" }],
"pages": "1"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
pages |
string | No | all | Page selection |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
Response:
{
"results": [{
"source": "/documents/form.pdf",
"fields": [
{
"page": 1,
"name": "full_name",
"field_type": "text",
"value": "John Doe",
"is_read_only": false,
"is_required": true,
"properties": { "is_multiline": false, "is_password": false }
},
{
"page": 1,
"name": "agree_terms",
"field_type": "checkbox",
"is_checked": true,
"is_read_only": false,
"is_required": false,
"properties": {}
}
],
"total_fields": 2
}]
}πΌοΈ convert_page_to_image
Render PDF pages as PNG images (base64). Enables Vision LLMs to understand visual layouts, charts, and diagrams.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/chart.pdf" }],
"pages": "1-3",
"width": 1200
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
pages |
string | No | all | Page selection |
width |
integer | No | 1200 | Target width in pixels |
height |
integer | No | β | Target height in pixels |
scale |
float | No | β | Scale factor (overrides width/height) |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
Response:
{
"results": [{
"source": "/documents/chart.pdf",
"pages": [
{
"page": 1,
"width": 1200,
"height": 1553,
"data_base64": "iVBORw0KGgo...",
"mime_type": "image/png"
}
]
}]
}π search
Full-text search within PDFs with surrounding context.
Example & Parameters
{
"sources": [{ "path": "/documents/manual.pdf" }],
"query": "error handling",
"context_chars": 100
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
query |
string | Yes | β | Search query |
case_sensitive |
boolean | No | false | Case-sensitive search |
max_results |
integer | No | 100 | Maximum results to return |
context_chars |
integer | No | 50 | Characters of context around match |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
π get_page_info
Get page dimensions, word/char counts, token estimates, and file sizes. Useful for planning LLM context usage.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/report.pdf" }]
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
skip_file_sizes |
boolean | No | false | Skip file size calculation (faster) |
Response:
{
"results": [{
"source": "/documents/report.pdf",
"pages": [{
"page": 1,
"width": 612.0, "height": 792.0,
"rotation": 0, "orientation": "portrait",
"char_count": 2500, "word_count": 450,
"estimated_token_count": 625,
"file_size": 102400
}],
"total_pages": 10,
"total_chars": 25000,
"total_words": 4500,
"total_estimated_token_count": 6250
}]
}Note: Token counts are model-dependent approximations (~4 chars/token for Latin, ~2 tokens/char for CJK). Use as rough guidance only.
π summarize_structure
One-call comprehensive overview of a PDF's structure. Helps LLMs decide how to process a document.
Example, Parameters & Response
{
"sources": [{ "path": "/documents/report.pdf" }]
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources |
password |
string | No | β | PDF password if encrypted |
cache |
boolean | No | false | Enable caching |
Response:
{
"results": [{
"source": "/documents/report.pdf",
"page_count": 25,
"file_size": 1048576,
"metadata": { "title": "Annual Report", "author": "Acme Corp" },
"has_outline": true,
"outline_items": 12,
"total_chars": 50000,
"total_words": 9000,
"total_estimated_tokens": 12500,
"pages": [
{ "page": 1, "width": 612.0, "height": 792.0, "char_count": 2000, "word_count": 360, "has_images": true, "has_links": false, "has_annotations": false }
],
"total_images": 5,
"total_links": 3,
"total_annotations": 2,
"has_form": false,
"form_field_count": 0,
"form_field_types": {},
"is_encrypted": false
}]
}π list_pdfs
Discover PDF files in a directory with optional filtering.
Example & Parameters
{
"directory": "/documents",
"recursive": true,
"pattern": "invoice*.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
directory |
string | Yes | β | Directory to search |
recursive |
boolean | No | false | Search subdirectories |
pattern |
string | No | β | Filename pattern (e.g., "report*.pdf") |
βοΈ split_pdf
Extract specific pages from a PDF to create a new PDF.
Example, Parameters & Page Range Syntax
{
"source": { "path": "/documents/book.pdf" },
"pages": "1-10,15,20-z",
"output_path": "/output/excerpt.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source |
object | Yes | β | PDF source |
pages |
string | Yes | β | Page range (see syntax below) |
output_path |
string | No | β | Save output to file |
password |
string | No | β | PDF password if encrypted |
Page Range Syntax:
| Syntax | Description |
|---|---|
1-5 |
Pages 1 through 5 |
1,3,5 |
Specific pages |
z |
Last page |
r1 |
Last page (reverse) |
5-z |
Page 5 to end |
z-1 |
All pages reversed |
1-z:odd |
Odd pages only |
1-z:even |
Even pages only |
1-10,x5 |
Pages 1β10 except page 5 |
βοΈ merge_pdfs
Merge multiple PDFs into a single file.
Example & Parameters
{
"sources": [
{ "path": "/documents/chapter1.pdf" },
{ "path": "/documents/chapter2.pdf" }
],
"output_path": "/output/complete-book.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sources |
array | Yes | β | PDF sources to merge (in order) |
output_path |
string | No | β | Save output to file |
βοΈ compress_pdf
Reduce PDF file size using stream optimization, object deduplication, and compression.
Example, Parameters & Response
{
"source": { "path": "/documents/large-report.pdf" },
"compression_level": 9,
"output_path": "/output/compressed.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source |
object | Yes | β | PDF source |
object_streams |
string | No | "generate" |
"generate" (best) Β· "preserve" Β· "disable" |
compression_level |
integer | No | 9 | 1β9 (higher = better compression) |
output_path |
string | No | β | Save output to file |
password |
string | No | β | PDF password if encrypted |
Response:
{
"results": [{
"source": "/documents/large-report.pdf",
"original_size": 5242880,
"compressed_size": 2097152,
"compression_ratio": 0.4,
"bytes_saved": 3145728
}]
}βοΈ fill_form
Write values into existing PDF form fields and produce a new PDF.
Example, Parameters & Limitations
{
"source": { "path": "/documents/form.pdf" },
"field_values": [
{ "name": "full_name", "value": "Jane Smith" },
{ "name": "agree_terms", "checked": true }
],
"output_path": "/output/filled-form.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source |
object | Yes | β | PDF source |
field_values |
array | Yes | β | Fields to fill (see below) |
output_path |
string | No | β | Save output to file |
password |
string | No | β | PDF password if encrypted |
Field value format:
| Field | Type | Description |
|---|---|---|
name |
string | Field name (use extract_form_fields to discover names) |
value |
string | Text value (for text fields) |
checked |
boolean | Checked state (for checkbox/radio fields) |
Supported field types: Text fields, checkboxes, radio buttons. ComboBox/ListBox selection is read-only.
π protect_pdf
Add password protection using 256-bit AES encryption.
Example & Parameters
{
"source": { "path": "/documents/confidential.pdf" },
"user_password": "secret123",
"allow_print": "none",
"allow_copy": false
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source |
object | Yes | β | PDF source |
user_password |
string | Yes | β | Password to open the PDF |
owner_password |
string | No | user_password | Password to change permissions |
allow_print |
string | No | "full" |
"full" Β· "low" Β· "none" |
allow_copy |
boolean | No | true | Allow copying text/images |
allow_modify |
boolean | No | true | Allow modifying the document |
output_path |
string | No | β | Save output to file |
password |
string | No | β | Password for source PDF if encrypted |
π unprotect_pdf
Remove password protection from an encrypted PDF.
Example & Parameters
{
"source": { "path": "/documents/protected.pdf" },
"password": "secret123",
"output_path": "/output/unprotected.pdf"
}| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
source |
object | Yes | β | PDF source |
password |
string | Yes | β | Password for the encrypted PDF |
output_path |
string | No | β | Save output to file |
π¦ MCP Resources
Expose PDFs from configured directories as MCP Resources for direct client discovery and reading.
Configuration & Details
Enabling Resources
# Command line
pdf-mcp-server --resource-dir /documents --resource-dir /data/pdfs
# Short form
pdf-mcp-server -r /documents -r /data/pdfs
# Environment variable (colon-separated)
PDF_RESOURCE_DIRS=/documents:/data/pdfs pdf-mcp-serverClaude Desktop with resources:
{
"mcpServers": {
"pdf": {
"command": "npx",
"args": ["@paradyno/pdf-mcp-server", "--resource-dir", "/documents"],
"env": {
"PDF_RESOURCE_DIRS": "/data/pdfs:/shared/documents"
}
}
}
}Both methods can be combined β command line arguments are added to environment variable paths.
Resource URIs
PDFs are exposed with file:// URIs:
file:///documents/report.pdf
file:///documents/2024/invoice.pdfOperations
resources/listβ Returns all PDFs with URI, name, MIME type, size, and descriptionresources/readβ Returns extracted text content, formatted for LLM consumption
Resources vs Tools vs Caching
| Feature | Purpose | Use Case |
|---|---|---|
| Resources | Passive file discovery | Browse and preview available PDFs |
| Tools | Active PDF processing | Extract, search, manipulate PDFs |
| CacheRef | Tool chaining | Pass output between operations |
π Caching
When cache: true is specified, the server returns a cache_key for use in subsequent requests:
// Step 1: Extract with caching
{ "sources": [{ "path": "/documents/large.pdf" }], "cache": true }
// Step 2: Use cache_key from response
{ "sources": [{ "cache_key": "a1b2c3d4" }], "pages": "50-60" }ποΈ Architecture
block-beta
columns 1
block:server["MCP Server (rmcp)"]
columns 3
extract_text search split_pdf
end
block:common["Common Layer"]
columns 3
Cache["Cache Manager"] Source["Source Resolver"] Batch["Batch Executor"]
end
block:pdf["PDF Processing"]
columns 2
PDFium["pdfium-render\n(reading)"] qpdf["qpdf FFI\n(manipulation)"]
end
server --> common --> pdfβ‘ Performance
Benchmarked with a 14-page technical paper (tracemonkey.pdf, ~1 MB) on Docker (Apple Silicon):
| Operation | Time | What it means |
|---|---|---|
| Extract text (14 pages) | 170 ms | Process ~80 documents per minute |
| Metadata only | 0.26 ms | ~4,000 documents per second |
| Search | 0.01 ms | Instant results on extracted text |
| 100 files batch | 4.8 s | ~21 documents per second |
Key takeaways
- Fast enough for interactive use β Text extraction completes in under 200ms
- Metadata is nearly instant β Use
extract_metadataorsummarize_structureto quickly assess documents before full processing - Search is blazing fast β Once text is extracted, searching is essentially free
- Batch processing scales linearly β No significant overhead when processing many files
Run benchmarks yourself:
docker compose --profile dev run --rm benchπ§βπ» Development
Docker (Recommended)
# Build
docker compose --profile dev run --rm dev cargo build
# Run tests
docker compose --profile dev run --rm test
# Run tests with coverage
docker compose --profile dev run --rm coverage
# Format code
docker compose --profile dev run --rm dev cargo fmt --all
# Lint
docker compose --profile dev run --rm clippy
# Performance benchmarks
docker compose --profile dev run --rm bench
# Build production image (~120MB)
docker compose --profile prod build production
# Clean up
docker compose --profile dev down --rmi localNative Development
Requires PDFium installed locally. Download from pdfium-binaries and set PDFIUM_PATH.
cargo build --release
cargo test
cargo bench
cargo llvm-cov --htmlProject Structure
src/
βββ main.rs # Entry point, CLI args
βββ lib.rs # Library root
βββ server.rs # MCP server & tool handlers
βββ error.rs # Error types
βββ pdf/
β βββ reader.rs # PDFium wrapper (text, metadata, outline)
β βββ annotations.rs # Annotation extraction
β βββ images.rs # Image extraction
β βββ qpdf.rs # qpdf FFI (split, merge, encrypt)
βββ source/
βββ resolver.rs # Path/URL/Base64 resolution
βββ cache.rs # LRU caching layerπΊοΈ Roadmap
Completed Phases
Phase 1: Core Reading β
extract_text Β· extract_outline Β· search Β· extract_metadata Β· extract_annotations Β· Image extraction Β· Batch processing Β· Caching
Phase 2: PDF Manipulation β
split_pdf Β· merge_pdfs Β· protect_pdf Β· unprotect_pdf Β· compress_pdf Β· extract_links Β· get_page_info
Phase 2.5: LLM-Optimized Text β
Dynamic thresholds Β· Paragraph detection Β· Multi-column layout Β· Watermark removal
Phase 2.6: Discovery & Resources β
list_pdfs Β· MCP Resources Β· Resource directory configuration
Phase 2.7: Vision & Forms β
convert_page_to_image Β· extract_form_fields Β· fill_form Β· summarize_structure
Phase 3: Advanced Features (Planned)
rotate_pagesβ Rotate specific pagesextract_tablesβ Structured table extractionadd_watermarkβ Text/image watermarkslinearize_pdfβ Web optimization- OCR support Β· PDF/A validation Β· Digital signature verification
Waiting for MCP Protocol
- Large file upload β MCP lacks a standard API for uploading large files (>20MB). Discussed in #1197, #1220, #1659.
- Chunked file transfer β No standard mechanism exists yet.
Current workarounds: shared filesystem (path), object storage with pre-signed URLs (url), or base64 encoding.
Deferred Features
These provide limited value for LLM use cases:
- Hyphenation merging β LLMs understand hyphenated words
- Fixed-pitch mode β Limited use cases
- Bounding box output β LLMs don't need coordinates
- Invisible text removal β Not supported by pdfium-render API
π License
Apache License 2.0
π Acknowledgments
- PDFium β PDF rendering engine (Apache 2.0)
- pdfium-render β Rust PDFium bindings (Apache 2.0)
- qpdf β PDF transformation library, vendored via FFI (Apache 2.0)
- rmcp β Rust MCP SDK