Package Exports

parseflow-mcp-server
parseflow-mcp-server/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (parseflow-mcp-server) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

ParseFlow MCP Server

Model Context Protocol (MCP) server for comprehensive PDF parsing and analysis.

🚀 Features

Text Extraction: Extract text from PDF files with multiple formatting strategies
Metadata Retrieval: Get PDF document information (title, author, pages, etc.)
Keyword Search: Search for specific text within PDF documents
Image Extraction: Extract images from PDF files (requires poppler-utils)
Table of Contents: Get bookmarks and navigation structure

📦 Installation

Global Installation (Recommended for MCP)

npm install -g parseflow-mcp-server

Local Installation

npm install parseflow-mcp-server

🔧 Usage

With Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "parseflow": {
      "command": "parseflow"
    }
  }
}

With Windsurf / Cursor

Add to your MCP settings:

{
  "mcpServers": {
    "parseflow": {
      "command": "parseflow",
      "args": []
    }
  }
}

Standalone Usage

# Run the server
parseflow

# Or with custom options
node /path/to/parseflow-mcp-server/dist/index.js

🛠️ Available Tools

When connected via MCP, the following tools are available:

1. `extract_text`

Extract text content from PDF files.

Parameters:

path (string, required): Absolute path to PDF file
page (number, optional): Extract specific page
range (string, optional): Extract page range (e.g., "1-10")
strategy (string, optional): Extraction strategy - raw, formatted, or clean

2. `get_metadata`

Get PDF document metadata and properties.

Parameters:

path (string, required): Absolute path to PDF file

3. `search_pdf`

Search for keywords or phrases within a PDF.

Parameters:

path (string, required): Absolute path to PDF file
query (string, required): Search term or phrase
caseSensitive (boolean, optional): Case-sensitive search (default: false)
maxResults (number, optional): Maximum results to return (default: 10)

4. `extract_images`

Extract images from PDF files (requires poppler-utils).

Parameters:

path (string, required): Absolute path to PDF file
outputDir (string, required): Directory to save extracted images
format (string, optional): Output format - png or jpg (default: png)

5. `get_toc`

Get table of contents (bookmarks) from PDF.

Parameters:

path (string, required): Absolute path to PDF file

📋 Requirements

Node.js: >= 18.0.0
poppler-utils (optional, for image extraction):
- macOS: brew install poppler
- Ubuntu/Debian: apt-get install poppler-utils
- Windows: Download from poppler releases

parseflow-core: Core PDF parsing library
Use parseflow-core directly if you want to integrate PDF parsing into your Node.js applications

📖 Documentation

Full documentation: https://github.com/Libres-coder/ParseFlow

🐛 Bug Reports

Report issues: https://github.com/Libres-coder/ParseFlow/issues

📄 License

🌟 MCP Registry

Find this server on the official MCP Registry:
https://registry.modelcontextprotocol.io/

Search for: parseflow