JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 13
  • Score
    100M100P100Q55312F
  • License MIT

MCP Server for PDF parsing with text extraction, metadata retrieval, keyword search, and more

Package Exports

  • parseflow-mcp-server
  • parseflow-mcp-server/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (parseflow-mcp-server) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

ParseFlow MCP Server

npm version License: MIT

Model Context Protocol (MCP) server for comprehensive PDF parsing and analysis.

🚀 Features

  • Text Extraction: Extract text from PDF files with multiple formatting strategies
  • Metadata Retrieval: Get PDF document information (title, author, pages, etc.)
  • Keyword Search: Search for specific text within PDF documents
  • Image Extraction: Extract images from PDF files (requires poppler-utils)
  • Table of Contents: Get bookmarks and navigation structure

📦 Installation

npm install -g parseflow-mcp-server

Local Installation

npm install parseflow-mcp-server

🔧 Usage

With Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "parseflow": {
      "command": "parseflow"
    }
  }
}

With Windsurf / Cursor

Add to your MCP settings:

{
  "mcpServers": {
    "parseflow": {
      "command": "parseflow",
      "args": []
    }
  }
}

Standalone Usage

# Run the server
parseflow

# Or with custom options
node /path/to/parseflow-mcp-server/dist/index.js

🛠️ Available Tools

When connected via MCP, the following tools are available:

1. extract_text

Extract text content from PDF files.

Parameters:

  • path (string, required): Absolute path to PDF file
  • page (number, optional): Extract specific page
  • range (string, optional): Extract page range (e.g., "1-10")
  • strategy (string, optional): Extraction strategy - raw, formatted, or clean

2. get_metadata

Get PDF document metadata and properties.

Parameters:

  • path (string, required): Absolute path to PDF file

3. search_pdf

Search for keywords or phrases within a PDF.

Parameters:

  • path (string, required): Absolute path to PDF file
  • query (string, required): Search term or phrase
  • caseSensitive (boolean, optional): Case-sensitive search (default: false)
  • maxResults (number, optional): Maximum results to return (default: 10)

4. extract_images

Extract images from PDF files (requires poppler-utils).

Parameters:

  • path (string, required): Absolute path to PDF file
  • outputDir (string, required): Directory to save extracted images
  • format (string, optional): Output format - png or jpg (default: png)

5. get_toc

Get table of contents (bookmarks) from PDF.

Parameters:

  • path (string, required): Absolute path to PDF file

📋 Requirements

  • Node.js: >= 18.0.0
  • poppler-utils (optional, for image extraction):
    • macOS: brew install poppler
    • Ubuntu/Debian: apt-get install poppler-utils
    • Windows: Download from poppler releases
  • parseflow-core: Core PDF parsing library
  • Use parseflow-core directly if you want to integrate PDF parsing into your Node.js applications

📖 Documentation

Full documentation: https://github.com/Libres-coder/ParseFlow

🐛 Bug Reports

Report issues: https://github.com/Libres-coder/ParseFlow/issues

📄 License

MIT © Libres-coder

🌟 MCP Registry

Find this server on the official MCP Registry:
https://registry.modelcontextprotocol.io/

Search for: parseflow