Package Exports

pdftotext-mcp
pdftotext-mcp/src/server.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (pdftotext-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

🚀 Why This Server?

Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:

✅ Actually works - Clean JSON-RPC communication without stdout pollution
✅ Reliable - Built on mature pdftotext from poppler-utils (used by millions)
✅ Lightweight - Minimal dependencies, maximum compatibility
✅ Production tested - Successfully tested with Claude Desktop and other MCP clients
✅ Feature complete - Page-specific extraction, layout preservation, encoding options
✅ Error handling - Comprehensive validation and helpful error messages

📋 Features

📄 Extract text from entire PDF documents or specific pages
🎨 Preserve original layout formatting (optional)
🔤 Multiple text encoding support (UTF-8, Latin1, ASCII)
📊 Comprehensive metadata in responses (word count, file info, etc.)
🛡️ File validation and security checks
⚡ Fast processing with configurable timeouts
🔍 Detailed error reporting with troubleshooting hints

🔧 Prerequisites

You must have pdftotext installed on your system:

Ubuntu/Debian

sudo apt update
sudo apt install poppler-utils

macOS

brew install poppler

Windows

# Using Chocolatey
choco install poppler

# Using Scoop
scoop install poppler

Verify Installation

pdftotext -v

📦 Installation

Option 1: Global Installation (Recommended)

npm install -g pdftotext-mcp

Option 2: Use with npx (No Installation)

npx pdftotext-mcp

Option 3: Local Development

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm start

⚙️ Configuration

Add to your MCP client configuration:

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdftotext": {
      "command": "pdftotext-mcp"
    }
  }
}

Or with npx:

{
  "mcpServers": {
    "pdftotext": {
      "command": "npx",
      "args": ["pdftotext-mcp"]
    }
  }
}

Other MCP Clients

The server works with any MCP-compatible client. Use pdftotext-mcp as the command.

🎯 Usage

The server provides a single, powerful tool: read_pdf_text

Basic Usage

Extract entire document

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf"
  }
}

Extract specific page

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "page": 2
  }
}

Preserve layout formatting

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "layout": true
  }
}

Custom encoding

{
  "tool": "read_pdf_text",
  "arguments": {
    "path": "./document.pdf",
    "encoding": "Latin1"
  }
}

Response Format

Success Response

{
  "success": true,
  "file": "document.pdf",
  "path": "/absolute/path/to/document.pdf",
  "extractedText": "Full text content...",
  "pageSpecific": "all",
  "layoutPreserved": false,
  "encoding": "UTF-8",
  "fileSize": 1048576,
  "lastModified": "2024-01-15T10:30:00.000Z",
  "extractedAt": "2024-01-15T10:35:00.000Z",
  "textLength": 5234,
  "wordCount": 892
}

Error Response

{
  "success": false,
  "error": "File not found: ./nonexistent.pdf",
  "errorType": "FILE_NOT_FOUND",
  "file": "./nonexistent.pdf",
  "timestamp": "2024-01-15T10:35:00.000Z"
}

📚 API Reference

Tool: `read_pdf_text`

Extracts text content from PDF files using pdftotext.

Parameters

Parameter	Type	Required	Default	Description
`path`	string	✅	-	Path to PDF file (relative or absolute)
`page`	number	❌	all pages	Specific page to extract (1-based)
`layout`	boolean	❌	`false`	Preserve original text layout
`encoding`	string	❌	`"UTF-8"`	Output text encoding

Supported Encodings

UTF-8 (default)
Latin1
ASCII

Error Types

FILE_NOT_FOUND - PDF file doesn't exist
PERMISSION_DENIED - Cannot read the file
INVALID_PDF - File is not a valid PDF
PDFTOTEXT_ERROR - pdftotext utility error
UNKNOWN_ERROR - Unexpected error

🔧 Troubleshooting

"pdftotext is not available"

Solution: Install poppler-utils (see Prerequisites)

"File not found"

Solutions:

Use absolute paths: /home/user/document.pdf
Check file exists: ls -la /path/to/file.pdf
Verify MCP server working directory

"Permission denied"

Solutions:

Check file permissions: chmod 644 document.pdf
Ensure directory is readable: chmod 755 /path/to/directory/

"File is not a valid PDF"

Solutions:

Verify file is actually a PDF: file document.pdf
Check for file corruption
Try with a different PDF file

MCP Connection Issues

Solutions:

Restart your MCP client completely
Check configuration syntax in config file
Verify pdftotext-mcp is accessible in PATH
Check MCP client logs for detailed errors

🧪 Testing

# Run tests
npm test

# Run tests with watch mode
npm run test:watch

# Run linter
npm run lint

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install

Running Locally

npm start

Code Style

This project uses ESLint. Run npm run lint to check code style.

📄 License

MIT - see LICENSE file for details.

🙏 Acknowledgments

Built for the Model Context Protocol ecosystem
Uses poppler-utils pdftotext utility
Inspired by the need for reliable PDF processing in MCP environments

Made for the MCP community

pdftotext-mcp

Package Exports

Readme

PDFtotext MCP Server

🚀 Why This Server?

📋 Features

🔧 Prerequisites

Ubuntu/Debian

macOS

Windows

Verify Installation

📦 Installation

Option 1: Global Installation (Recommended)

Option 2: Use with npx (No Installation)

Option 3: Local Development

⚙️ Configuration

Claude Desktop

Other MCP Clients

🎯 Usage

Basic Usage

Extract entire document

Extract specific page

Preserve layout formatting

Custom encoding

Response Format

Success Response

Error Response

📚 API Reference

Tool: read_pdf_text

Parameters

Supported Encodings

Error Types

🔧 Troubleshooting

"pdftotext is not available"

"File not found"

"Permission denied"

"File is not a valid PDF"

MCP Connection Issues

🧪 Testing

🤝 Contributing

Development Setup

Running Locally

Code Style

📄 License

🙏 Acknowledgments

🔗 Related

Tool: `read_pdf_text`