JSPM

@agenson-horrowitz/document-parser-mcp

1.0.8
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 64
  • Score
    100M100P100Q76960F
  • License MIT

Multi-format document parser MCP server - extract text, tables, and metadata from PDFs, images, HTML, and office documents for AI agents

Package Exports

  • @agenson-horrowitz/document-parser-mcp
  • @agenson-horrowitz/document-parser-mcp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@agenson-horrowitz/document-parser-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Multi-Format Document Parser MCP Server

Smithery npm version Smithery License: MIT MCP Server

A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.

๐Ÿค– Why This Exists

AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.

โšก Key Features

  • Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
  • Intelligent OCR: Image-to-text with confidence scoring and preprocessing
  • HTML to Markdown: Clean conversion preserving structure and links
  • Universal Table Extraction: Extract structured data from any document format
  • Document Summarization: Configurable summary generation with keyword extraction
  • Agent-Optimized Output: Fast processing, structured JSON responses
  • Multi-Format Support: PDF, images, HTML, text files

๐Ÿš€ Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/document-parser-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

๐Ÿ› ๏ธ Available Tools

1. parse_pdf

Extract comprehensive information from PDF documents.

Perfect for: Reports, invoices, contracts, research papers, forms

Features:

  • Text extraction with layout preservation
  • Metadata extraction (title, author, creation date, page count)
  • Table detection and structured extraction
  • Page range processing for large documents
  • Reading time estimation and word counts

Example:

{
  "file_path": "/path/to/document.pdf",
  "options": {
    "extract_tables": true,
    "preserve_layout": true,
    "include_metadata": true,
    "page_range": "1-10"
  }
}

2. parse_image_text

Perform high-quality OCR on images with confidence scoring.

Perfect for: Screenshots, scanned documents, photos of text, receipts

Features:

  • Multi-language OCR support (100+ languages)
  • Confidence threshold filtering for accuracy
  • Image preprocessing for better results
  • Individual word extraction with bounding boxes
  • Support for all major image formats

Example:

{
  "image_path": "/path/to/screenshot.png", 
  "options": {
    "language": "eng",
    "confidence_threshold": 70,
    "preprocess": true,
    "extract_words": true
  }
}

3. html_to_markdown

Convert HTML documents to clean, structured markdown.

Perfect for: Web pages, HTML emails, documentation, blog posts

Features:

  • Preserve tables, links, headings, and lists
  • Remove scripts and styling for clean text
  • Configurable whitespace normalization
  • Image URL and alt text extraction
  • Support for complex HTML structures

Example:

{
  "html_content": "<html>...</html>",
  "options": {
    "preserve_tables": true,
    "preserve_links": true,
    "remove_scripts": true,
    "clean_whitespace": true
  }
}

4. extract_tables

Extract structured table data from any document format.

Perfect for: Pricing lists, data reports, spreadsheets, forms

Features:

  • Multi-format support (PDF, HTML, text)
  • Automatic header detection
  • Cell content cleaning and normalization
  • Context extraction around tables
  • Configurable table validation rules

Example:

{
  "file_path": "/path/to/report.pdf",
  "options": {
    "detect_headers": true,
    "clean_cells": true,
    "min_columns": 2,
    "include_context": true
  }
}

5. summarize_document

Generate intelligent summaries of any document type.

Perfect for: Long reports, research papers, articles, documentation

Features:

  • Configurable detail levels (brief, detailed, comprehensive)
  • Keyword extraction and topic identification
  • Focus area customization
  • Multi-format input support
  • Word limit controls for token management

Example:

{
  "file_path": "/path/to/research.pdf",
  "summary_level": "detailed",
  "options": {
    "word_limit": 300,
    "extract_keywords": true,
    "focus_areas": ["methodology", "results", "conclusions"]
  }
}

๐Ÿ’ฐ Pricing

Free Tier

  • 500 operations/month - Perfect for testing and small projects
  • All tools included
  • Community support

Pro Tier - $9/month

  • 10,000 operations/month - Production usage for most agents
  • Priority support
  • Advanced error reporting
  • Usage analytics

Scale Tier - $29/month

  • 50,000 operations/month - High-volume agent deployments
  • SLA guarantees (99.5% uptime)
  • Custom rate limits
  • Direct technical support

Overage pricing: $0.02 per operation beyond your plan limits

๐Ÿ” Authentication & Payment

MCPize (Easiest)

  • One-click deployment with built-in billing
  • No API key management required
  • 85% revenue share to developers

Direct API Access

Crypto Micropayments

  • Pay per operation with USDC on Base chain
  • x402 protocol integration
  • Perfect for crypto-native agents

๐Ÿ“Š Performance

  • Average processing time: < 3 seconds for typical documents
  • Uptime SLA: 99.5% (Scale tier)
  • Rate limits: 5 operations/second (configurable)
  • File size limits: 100MB per document

๐Ÿงช Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test

See Also

๐Ÿค Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "document-parser-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

๐Ÿ”ง API Reference

All tools return consistent response formats:

{
  "success": true,
  "file_path": "/path/to/document.pdf",
  "content": "extracted text...",
  "metadata": {
    "processing_time_ms": 2500,
    "word_count": 1200,
    "confidence": 95
  }
}

Error responses:

{
  "success": false,
  "file_path": "/path/to/document.pdf", 
  "error": "Detailed error message",
  "tool": "parse_pdf"
}

๐Ÿ›Ÿ Support

๐Ÿ“ License

MIT License - feel free to use in commercial AI agent deployments.

๐Ÿ—๏ธ Built With


Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.