Package Exports
- mcp-upstage-server
- mcp-upstage-server/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mcp-upstage-server) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
MCP-Upstage-Server
Node.js/TypeScript implementation of the MCP server for Upstage AI services.
Features
- Document Parsing: Extract structure and content from various document types (PDF, images, Office files)
- Information Extraction: Extract structured information using custom or auto-generated schemas
- Schema Generation: Automatically generate extraction schemas from document analysis
- Document Classification: Classify documents into predefined categories (invoice, receipt, contract, etc.)
- Built with TypeScript for type safety
- Dual transport support: stdio (default) and HTTP Streamable
- Async/await pattern throughout
- Comprehensive error handling and retry logic
- Progress reporting support
Installation
Prerequisites
- Node.js 18.0.0 or higher
- Upstage API key from Upstage Console
Install from npm
# Install globally
npm install -g mcp-upstage-server
# Or use with npx (no installation required)
npx mcp-upstage-server
Install from source
# Clone the repository
git clone https://github.com/UpstageAI/mcp-upstage.git
cd mcp-upstage/mcp-upstage-node
# Install dependencies
npm install
# Build the project
npm run build
# Set up environment variables
cp .env.example .env
# Edit .env and add your UPSTAGE_API_KEY
Usage
Running the server
# With stdio transport (default)
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server
# With HTTP Streamable transport
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http
# With HTTP transport on custom port
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http --port 8080
# Show help
npx mcp-upstage-server --help
# Development mode (from source)
npm run dev
# Production mode (from source)
npm start
Integration with Claude Desktop
Option 1: stdio transport (default)
{
"mcpServers": {
"upstage": {
"command": "npx",
"args": ["mcp-upstage-server"],
"env": {
"UPSTAGE_API_KEY": "your-api-key-here"
}
}
}
}
Option 2: HTTP Streamable transport
{
"mcpServers": {
"upstage-http": {
"command": "npx",
"args": ["mcp-upstage-server", "--http", "--port", "3000"],
"env": {
"UPSTAGE_API_KEY": "your-api-key-here"
}
}
}
}
Transport Options
stdio Transport (Default)
- Pros: Simple setup, direct process communication
- Cons: Single client connection only
- Usage: Default mode, no additional configuration needed
HTTP Streamable Transport
- Pros: Multiple client support, network accessible, RESTful API
- Cons: Requires port management, network configuration
- Endpoints:
POST /mcp
- Main MCP communication endpointGET /mcp
- Server-Sent Events streamGET /health
- Health check endpoint
Available Tools
parse_document
Parse a document using Upstage AI's document digitization API.
Parameters:
file_path
(required): Path to the document fileoutput_formats
(optional): Array of output formats (e.g., ['html', 'text', 'markdown'])
Supported formats: PDF, JPEG, PNG, TIFF, BMP, GIF, WEBP
extract_information
Extract structured information from documents using Upstage Universal Information Extraction.
Parameters:
file_path
(required): Path to the document fileschema_path
(optional): Path to JSON schema fileschema_json
(optional): JSON schema as stringauto_generate_schema
(optional, default: true): Auto-generate schema if none provided
Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
generate_schema
Generate an extraction schema for a document using Upstage AI's schema generation API.
Parameters:
file_path
(required): Path to the document file to analyze
Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
This tool analyzes a document and automatically generates a JSON schema that defines the structure and fields that can be extracted from similar documents. The generated schema can then be used with the extract_information
tool when auto_generate_schema
is set to false
.
Use cases:
- Create reusable schemas for multiple similar documents
- Have more control over extraction fields
- Ensure consistent field naming across extractions
The tool returns both a readable schema object and a schema_json
string that can be directly copied and used with the extract_information
tool.
classify_document
Classify a document into predefined categories using Upstage AI's document classification API.
Parameters:
file_path
(required): Path to the document file to classifyschema_path
(optional): Path to JSON file containing custom classification schemaschema_json
(optional): JSON string containing custom classification schema
Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
This tool analyzes a document and classifies it into categories. By default, it uses a comprehensive set of document types, but you can provide custom classification categories.
Default categories:
- invoice, receipt, contract, cv, bank_statement, tax_document, insurance, business_card, letter, form, certificate, report, others
Use cases:
- Automatically sort and organize documents by type
- Filter documents for specific processing workflows
- Build document management systems with automatic categorization
Schema Guide for Information Extraction
When auto_generate_schema
is false
, you need to provide a custom schema. Here's how to format it correctly:
📋 Basic Schema Structure
The schema must follow this exact structure:
{
"type": "json_schema",
"json_schema": {
"name": "document_schema",
"schema": {
"type": "object",
"properties": {
"field_name": {
"type": "string|number|array|object",
"description": "Description of what to extract"
}
}
}
}
}
❌ Common Mistakes
Wrong: Missing nested structure
{
"company_name": {
"type": "string"
}
}
Wrong: Incorrect response_format
{
"schema": {
"company_name": "string"
}
}
Wrong: Missing properties wrapper
{
"type": "json_schema",
"json_schema": {
"name": "document_schema",
"schema": {
"type": "object",
"company_name": {
"type": "string"
}
}
}
}
✅ Correct Examples
Simple schema:
{
"type": "json_schema",
"json_schema": {
"name": "document_schema",
"schema": {
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "Name of the company"
},
"invoice_number": {
"type": "string",
"description": "Invoice number"
},
"total_amount": {
"type": "number",
"description": "Total invoice amount"
}
}
}
}
}
Complex schema with arrays and objects:
{
"type": "json_schema",
"json_schema": {
"name": "document_schema",
"schema": {
"type": "object",
"properties": {
"company_info": {
"type": "object",
"properties": {
"name": {"type": "string"},
"address": {"type": "string"},
"phone": {"type": "string"}
},
"description": "Company information"
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"item_name": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
},
"description": "List of invoice items"
},
"invoice_date": {
"type": "string",
"description": "Invoice date in YYYY-MM-DD format"
}
}
}
}
}
🛠️ Schema Creation Helper
You can create schemas programmatically:
function createSchema(fields) {
return JSON.stringify({
"type": "json_schema",
"json_schema": {
"name": "document_schema",
"schema": {
"type": "object",
"properties": fields
}
}
});
}
// Usage example:
const schema = createSchema({
"company_name": {
"type": "string",
"description": "Company name"
},
"total": {
"type": "number",
"description": "Total amount"
}
});
💡 Data Types
"string"
: Text data (names, addresses, etc.)"number"
: Numeric data (amounts, quantities, etc.)"boolean"
: True/false values"array"
: Lists of items"object"
: Nested structures"null"
: Null values
📝 Best Practices
- Always include descriptions: They help the AI understand what to extract
- Use specific field names:
invoice_date
instead ofdate
- Nest related fields: Group related information in objects
- Validate your JSON: Use a JSON validator before using the schema
- Test with simple schemas first: Start with basic fields before adding complexity
Classification Schema Guide
The classify_document
tool uses a different schema format optimized for classification tasks. Here's how to create custom classification schemas:
📋 Simple Classification Categories
For custom categories, just provide an array of category objects:
[
{"const": "category1", "description": "Description of category 1"},
{"const": "category2", "description": "Description of category 2"},
{"const": "others", "description": "Fallback category"}
]
The tool automatically wraps this in the proper schema structure for the API.
✅ Correct Classification Examples
Medical document classifier:
[
{"const": "prescription", "description": "Medical prescription document"},
{"const": "lab_result", "description": "Laboratory test results"},
{"const": "medical_record", "description": "Patient medical record"},
{"const": "insurance_claim", "description": "Medical insurance claim"},
{"const": "others", "description": "Other medical documents"}
]
Business document classifier:
[
{"const": "purchase_order", "description": "Purchase order document"},
{"const": "delivery_note", "description": "Delivery or shipping note"},
{"const": "quotation", "description": "Price quotation or estimate"},
{"const": "meeting_minutes", "description": "Meeting minutes or notes"},
{"const": "others", "description": "Other business documents"}
]
❌ Common Classification Mistakes
Wrong: Missing description field
[
{"const": "invoice"},
{"const": "receipt"}
]
Wrong: Missing const field
[
{"description": "Invoice document"},
{"description": "Receipt document"}
]
Wrong: Using different field names
[
{"value": "invoice", "label": "Invoice document"},
{"type": "receipt", "desc": "Receipt document"}
]
💡 Classification Best Practices
- Always include "others" category: Provides fallback for unexpected document types
- Use descriptive const values: Clear category names like "medical_prescription" vs "doc1"
- Add meaningful descriptions: Help the AI understand what each category represents
- Keep categories mutually exclusive: Avoid overlapping categories that could confuse classification
- Limit category count: Too many categories can reduce accuracy (recommended: 3-10 categories)
- Use consistent naming: Stick to snake_case or kebab-case throughout
🛠️ Classification Categories Helper
function createClassificationCategories(categories) {
return JSON.stringify(categories.map(cat => ({
"const": cat.value,
"description": cat.description
})));
}
// Usage example:
const categoriesJson = createClassificationCategories([
{value: "legal_contract", description: "Legal contracts and agreements"},
{value: "financial_report", description: "Financial statements and reports"},
{value: "others", description: "Other document types"}
]);
// Result: Ready to use as schema_json parameter
// [{"const":"legal_contract","description":"Legal contracts and agreements"},{"const":"financial_report","description":"Financial statements and reports"},{"const":"others","description":"Other document types"}]
Development
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Lint code
npm run lint
# Format code
npm run format
# Clean build artifacts
npm run clean
Project Structure
mcp-upstage-node/
├── src/
│ ├── index.ts # Entry point
│ ├── server.ts # MCP server implementation
│ ├── tools/ # Tool implementations
│ │ ├── documentParser.ts
│ │ └── informationExtractor.ts
│ └── utils/ # Utility modules
│ ├── apiClient.ts # HTTP client with retry
│ ├── fileUtils.ts # File operations
│ ├── validators.ts # Input validation
│ └── constants.ts # Configuration constants
├── dist/ # Compiled JavaScript (generated)
├── package.json
├── tsconfig.json
└── README.md
Output Files
Results are saved to:
- Document parsing:
~/.mcp-upstage/outputs/document_parsing/
- Information extraction:
~/.mcp-upstage/outputs/information_extraction/
- Generated schemas:
~/.mcp-upstage/outputs/information_extraction/schemas/
- Document classification:
~/.mcp-upstage/outputs/document_classification/
License
MIT