Package Exports
- pdftotext-mcp
- pdftotext-mcp/src/server.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (pdftotext-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
PDFtotext MCP Server
A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.
๐ Why This Server?
Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:
- โ Actually works - Clean JSON-RPC communication without stdout pollution
- โ
Reliable - Built on mature
pdftotextfrom poppler-utils (used by millions) - โ Lightweight - Minimal dependencies, maximum compatibility
- โ Production tested - Successfully tested with Claude Desktop and other MCP clients
- โ Feature complete - Page-specific extraction, layout preservation, encoding options
- โ Error handling - Comprehensive validation and helpful error messages
๐ Features
- ๐ Extract text from entire PDF documents or specific pages
- ๐จ Preserve original layout formatting (optional)
- ๐ค Multiple text encoding support (UTF-8, Latin1, ASCII)
- ๐ Comprehensive metadata in responses (word count, file info, etc.)
- ๐ก๏ธ File validation and security checks
- โก Fast processing with configurable timeouts
- ๐ Detailed error reporting with troubleshooting hints
๐ง Prerequisites
You must have pdftotext installed on your system:
Ubuntu/Debian
sudo apt update
sudo apt install poppler-utilsmacOS
brew install popplerWindows
# Using Chocolatey
choco install poppler
# Using Scoop
scoop install popplerVerify Installation
pdftotext -v๐ฆ Installation
Option 1: Global Installation (Recommended)
npm install -g pdftotext-mcpOption 2: Use with npx (No Installation)
npx pdftotext-mcpOption 3: Local Development
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm install
npm startโ๏ธ Configuration
Add to your MCP client configuration:
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"pdftotext": {
"command": "pdftotext-mcp"
}
}
}Or with npx:
{
"mcpServers": {
"pdftotext": {
"command": "npx",
"args": ["pdftotext-mcp"]
}
}
}Other MCP Clients
The server works with any MCP-compatible client. Use pdftotext-mcp as the command.
๐ฏ Usage
The server provides a single, powerful tool: read_pdf_text
Basic Usage
Extract entire document
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf"
}
}Extract specific page
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"page": 2
}
}Preserve layout formatting
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"layout": true
}
}Custom encoding
{
"tool": "read_pdf_text",
"arguments": {
"path": "./document.pdf",
"encoding": "Latin1"
}
}Response Format
Success Response
{
"success": true,
"file": "document.pdf",
"path": "/absolute/path/to/document.pdf",
"extractedText": "Full text content...",
"pageSpecific": "all",
"layoutPreserved": false,
"encoding": "UTF-8",
"fileSize": 1048576,
"lastModified": "2024-01-15T10:30:00.000Z",
"extractedAt": "2024-01-15T10:35:00.000Z",
"textLength": 5234,
"wordCount": 892
}Error Response
{
"success": false,
"error": "File not found: ./nonexistent.pdf",
"errorType": "FILE_NOT_FOUND",
"file": "./nonexistent.pdf",
"timestamp": "2024-01-15T10:35:00.000Z"
}๐ API Reference
Tool: read_pdf_text
Extracts text content from PDF files using pdftotext.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
path |
string | โ | - | Path to PDF file (relative or absolute) |
page |
number | โ | all pages | Specific page to extract (1-based) |
layout |
boolean | โ | false |
Preserve original text layout |
encoding |
string | โ | "UTF-8" |
Output text encoding |
Supported Encodings
UTF-8(default)Latin1ASCII
Error Types
FILE_NOT_FOUND- PDF file doesn't existPERMISSION_DENIED- Cannot read the fileINVALID_PDF- File is not a valid PDFPDFTOTEXT_ERROR- pdftotext utility errorUNKNOWN_ERROR- Unexpected error
๐ง Troubleshooting
"pdftotext is not available"
Solution: Install poppler-utils (see Prerequisites)
"File not found"
Solutions:
- Use absolute paths:
/home/user/document.pdf - Check file exists:
ls -la /path/to/file.pdf - Verify MCP server working directory
"Permission denied"
Solutions:
- Check file permissions:
chmod 644 document.pdf - Ensure directory is readable:
chmod 755 /path/to/directory/
"File is not a valid PDF"
Solutions:
- Verify file is actually a PDF:
file document.pdf - Check for file corruption
- Try with a different PDF file
MCP Connection Issues
Solutions:
- Restart your MCP client completely
- Check configuration syntax in config file
- Verify
pdftotext-mcpis accessible in PATH - Check MCP client logs for detailed errors
๐งช Testing
# Run tests
npm test
# Run tests with watch mode
npm run test:watch
# Run linter
npm run lint๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
git clone https://github.com/jpwebb/pdftotext-mcp.git
cd pdftotext-mcp
npm installRunning Locally
npm startCode Style
This project uses ESLint. Run npm run lint to check code style.
๐ License
MIT - see LICENSE file for details.
๐ Acknowledgments
- Built for the Model Context Protocol ecosystem
- Uses poppler-utils
pdftotextutility - Inspired by the need for reliable PDF processing in MCP environments
๐ Related
Made for the MCP community