Package Exports

mcp-upstage-server
mcp-upstage-server/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mcp-upstage-server) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

MCP-Upstage-Server

Node.js/TypeScript implementation of the MCP server for Upstage AI services.

Features

Document Parsing: Extract structure and content from various document types (PDF, images, Office files)
Information Extraction: Extract structured information using custom or auto-generated schemas
Built with TypeScript for type safety
Async/await pattern throughout
Comprehensive error handling and retry logic
Progress reporting support

Installation

Prerequisites

Node.js 18.0.0 or higher
Upstage API key from Upstage Console

Install from npm

# Install globally
npm install -g mcp-upstage-server

# Or use with npx (no installation required)
npx mcp-upstage-server

Install from source

# Clone the repository
git clone https://github.com/UpstageAI/mcp-upstage.git
cd mcp-upstage/mcp-upstage-node

# Install dependencies
npm install

# Build the project
npm run build

# Set up environment variables
cp .env.example .env
# Edit .env and add your UPSTAGE_API_KEY

Usage

Running the server

# With stdio transport (default)
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server

# With HTTP Streamable transport
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http

# With HTTP transport on custom port
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http --port 8080

# Show help
npx mcp-upstage-server --help

# Development mode (from source)
npm run dev

# Production mode (from source)
npm start

Integration with Claude Desktop

Option 1: stdio transport (default)

{
  "mcpServers": {
    "upstage": {
      "command": "npx",
      "args": ["mcp-upstage-server"],
      "env": {
        "UPSTAGE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Option 2: HTTP Streamable transport

{
  "mcpServers": {
    "upstage-http": {
      "command": "npx",
      "args": ["mcp-upstage-server", "--http", "--port", "3000"],
      "env": {
        "UPSTAGE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Transport Options

stdio Transport (Default)

Pros: Simple setup, direct process communication
Cons: Single client connection only
Usage: Default mode, no additional configuration needed

HTTP Streamable Transport

Pros: Multiple client support, network accessible, RESTful API
Cons: Requires port management, network configuration
Endpoints:
- POST /mcp - Main MCP communication endpoint
- GET /mcp - Server-Sent Events stream
- GET /health - Health check endpoint

Available Tools

parse_document

Parse a document using Upstage AI's document digitization API.

Parameters:

file_path (required): Path to the document file
output_formats (optional): Array of output formats (e.g., ['html', 'text', 'markdown'])

Supported formats: PDF, JPEG, PNG, TIFF, BMP, GIF, WEBP

extract_information

Extract structured information from documents using Upstage Universal Information Extraction.

Parameters:

file_path (required): Path to the document file
schema_path (optional): Path to JSON schema file
schema_json (optional): JSON schema as string
auto_generate_schema (optional, default: true): Auto-generate schema if none provided

Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX

Schema Guide for Information Extraction

When auto_generate_schema is false, you need to provide a custom schema. Here's how to format it correctly:

📋 Basic Schema Structure

The schema must follow this exact structure:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "field_name": {
          "type": "string|number|array|object",
          "description": "Description of what to extract"
        }
      }
    }
  }
}

❌ Common Mistakes

Wrong: Missing nested structure

{
  "company_name": {
    "type": "string"
  }
}

Wrong: Incorrect response_format

{
  "schema": {
    "company_name": "string"
  }
}

Wrong: Missing properties wrapper

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema", 
    "schema": {
      "type": "object",
      "company_name": {
        "type": "string"
      }
    }
  }
}

✅ Correct Examples

Simple schema:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "company_name": {
          "type": "string",
          "description": "Name of the company"
        },
        "invoice_number": {
          "type": "string",
          "description": "Invoice number"
        },
        "total_amount": {
          "type": "number",
          "description": "Total invoice amount"
        }
      }
    }
  }
}

Complex schema with arrays and objects:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "company_info": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"},
            "phone": {"type": "string"}
          },
          "description": "Company information"
        },
        "items": {
          "type": "array",
          "items": {
            "type": "object", 
            "properties": {
              "item_name": {"type": "string"},
              "quantity": {"type": "number"},
              "price": {"type": "number"}
            }
          },
          "description": "List of invoice items"
        },
        "invoice_date": {
          "type": "string",
          "description": "Invoice date in YYYY-MM-DD format"
        }
      }
    }
  }
}

🛠️ Schema Creation Helper

You can create schemas programmatically:

function createSchema(fields) {
  return JSON.stringify({
    "type": "json_schema",
    "json_schema": {
      "name": "document_schema",
      "schema": {
        "type": "object",
        "properties": fields
      }
    }
  });
}

// Usage example:
const schema = createSchema({
  "company_name": {
    "type": "string",
    "description": "Company name"
  },
  "total": {
    "type": "number", 
    "description": "Total amount"
  }
});

💡 Data Types

"string": Text data (names, addresses, etc.)
"number": Numeric data (amounts, quantities, etc.)
"boolean": True/false values
"array": Lists of items
"object": Nested structures
"null": Null values

📝 Best Practices

Always include descriptions: They help the AI understand what to extract
Use specific field names: invoice_date instead of date
Nest related fields: Group related information in objects
Validate your JSON: Use a JSON validator before using the schema
Test with simple schemas first: Start with basic fields before adding complexity

Development

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Lint code
npm run lint

# Format code
npm run format

# Clean build artifacts
npm run clean

Project Structure

mcp-upstage-node/
├── src/
│   ├── index.ts           # Entry point
│   ├── server.ts          # MCP server implementation
│   ├── tools/             # Tool implementations
│   │   ├── documentParser.ts
│   │   └── informationExtractor.ts
│   └── utils/             # Utility modules
│       ├── apiClient.ts   # HTTP client with retry
│       ├── fileUtils.ts   # File operations
│       ├── validators.ts  # Input validation
│       └── constants.ts   # Configuration constants
├── dist/                  # Compiled JavaScript (generated)
├── package.json
├── tsconfig.json
└── README.md

Output Files

Results are saved to:

Document parsing: ~/.mcp-upstage/outputs/document_parsing/
Information extraction: ~/.mcp-upstage/outputs/information_extraction/
Generated schemas: ~/.mcp-upstage/outputs/information_extraction/schemas/

License

MIT