JSPM

@lutery/vision-mcp

1.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 14
  • Score
    100M100P100Q69173F
  • License MIT

MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@lutery/vision-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    Vision MCP

    MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope. This server enables LLMs without native vision support or with expensive vision models to access cost-effective visual analysis capabilities.

    Features

    • πŸ€– Multiple Model Support: GLM-4.6V, SiliconFlow, and ModelScope vision models
    • πŸ–ΌοΈ Flexible Image Input: URL, base64 data URL, or local file paths
    • πŸ“Š Multiple Analysis Types: Image description, UI analysis, object detection, OCR, and structured extraction
    • πŸ”§ System Prompt Templates: Built-in templates for common vision tasks
    • πŸ“¦ Easy Deployment: STDIO MCP Server, runs with npx
    • πŸ”’ Secure: Environment-based configuration, sensitive data masking in logs

    Streaming Response Support

    Current adapters explicitly disable streaming responses (stream: false) and are designed for complete JSON responses. This ensures compatibility with both GLM-4.6V and SiliconFlow APIs.

    Note: Streaming-only providers are not currently supported. If a provider only supports streaming responses (Server-Sent Events/text/event-stream format), the adapter will fail as it expects a complete JSON response. To add support for streaming providers, a streaming response parser would need to be implemented.

    Quick Start

    Installation

    1. Clone or download this repository
    2. Install dependencies:
    cd vision_mcp
    npm install

    Configuration

    Create a .env file in the project root:

    Option 1: GLM-4.6V

    VISION_MODEL_TYPE=glm-4.6v
    VISION_MODEL_NAME=glm-4.6v
    VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
    VISION_API_KEY=your-glm-api-key

    Option 2: SiliconFlow

    VISION_MODEL_TYPE=siliconflow
    VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
    VISION_API_BASE_URL=https://api.siliconflow.cn/v1
    VISION_API_KEY=your-siliconflow-api-key

    Option 3: ModelScope API-Inference

    VISION_MODEL_TYPE=modelscope
    VISION_MODEL_NAME=ZhipuAI/GLM-4.6V
    VISION_API_BASE_URL=https://api-inference.modelscope.cn/v1
    VISION_API_KEY=your-modelscope-token

    Note: ModelScope requires:

    • Real-name authentication on your ModelScope account
    • Aliyun account binding
    • API usage limits apply (see API Limits)

    Build

    npm run build

    Run (local)

    node dist/index.js

    If successful, you'll see: Vision MCP Server is running on stdio in stderr.

    Run (npx)

    # Local package (requires build first)
    npx .
    
    # Published package
    npx -y @lutery/vision-mcp

    MCP Client Configuration

    Claude Desktop

    Add to your Claude Desktop configuration:

    {
      "mcpServers": {
        "vision-mcp": {
          "command": "npx",
          "args": ["-y", "@lutery/vision-mcp"],
          "env": {
            "VISION_MODEL_TYPE": "glm-4.6v",
            "VISION_MODEL_NAME": "glm-4.6v",
            "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
            "VISION_API_KEY": "your-api-key"
          }
        }
      }
    }

    Or with a local installation:

    {
      "mcpServers": {
        "vision-mcp": {
          "command": "node",
          "args": ["/path/to/vision_mcp/dist/index.js"],
          "env": {
            "VISION_MODEL_TYPE": "glm-4.6v",
            "VISION_MODEL_NAME": "glm-4.6v",
            "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
            "VISION_API_KEY": "your-api-key"
          }
        }
      }
    }

    Cursor/Codex CLI

    Similar configuration for other MCP-compatible clients.

    Using the Tools

    1. Analyze Image

    Main tool for image analysis:

    // Tool: analyze_image
    // Parameters:
    {
      "image": "https://example.com/image.jpg",        // Image URL, base64, or local path
      "prompt": "Describe this UI design in detail",   // Analysis prompt
      "output_format": "text",                          // Optional: "text" or "json"
      "template": "ui-analysis"                         // Optional: see templates below
    }

    Example Prompts

    UI Analysis:

    {
      "image": "./screenshot.png",
      "prompt": "Analyze this UI design and extract all UI components with their positions and styles",
      "template": "ui-analysis"
    }

    Object Detection:

    {
      "image": "https://example.com/photo.jpg",
      "prompt": "Detect all objects and provide their coordinates",
      "template": "object-detection"
    }

    OCR:

    {
      "image": "data:image/png;base64,iVBORw0KGgo...",
      "prompt": "Extract all text from this image",
      "template": "ocr"
    }

    Structured Extraction:

    {
      "image": "./form.jpg",
      "prompt": "Extract all form fields and values as JSON",
      "output_format": "json"
    }

    2. List Templates

    List available system prompt templates:

    // Tool: list_templates
    // Parameters: none

    Available templates:

    • general-description - General image description
    • ui-analysis - UI prototype and interface analysis
    • object-detection - Object detection and localization
    • ocr - Text extraction (OCR)
    • structured-extraction - Structured data extraction

    3. Get Config

    Get current model configuration:

    // Tool: get_config
    // Parameters: none

    Image Input Formats

    1. URL

    https://example.com/image.jpg

    2. Base64 Data URL

    data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...

    3. Local File Path

    /path/to/image.png
    ./relative/path/image.jpg

    Note: Local paths only work if the MCP server has access to the filesystem. Note: URL validation is strict by default (see VISION_STRICT_URL_VALIDATION).

    Environment Variables

    Variable Description Default Required
    VISION_MODEL_TYPE Model type: glm (alias for glm-4.6v), glm-4.6v, siliconflow, or modelscope - Yes
    VISION_MODEL_NAME Model name for the API See defaults below Yes
    VISION_API_BASE_URL API base URL (must be base path, no /chat/completions) See defaults below Yes
    VISION_API_KEY API key for authentication - Yes
    VISION_API_TIMEOUT Request timeout in milliseconds 60000 No
    VISION_MAX_RETRIES Maximum retry attempts 2 No
    VISION_STRICT_URL_VALIDATION Enforce strict image URL validation true No
    LOG_LEVEL Log level: debug, info, warn, error info No

    Notes:

    • VISION_STRICT_URL_VALIDATION defaults to true, enforcing strict validation that URLs must end with supported image extensions (.jpg, .jpeg, .png, .webp). Set to false to allow non-image URLs with a warning only.
    • For GLM-4.6V provider, both glm and glm-4.6v values work for VISION_MODEL_TYPE. glm is provided as a convenient alias.

    Model Defaults

    GLM-4.6V:

    VISION_MODEL_NAME=glm-4.6v
    VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4

    SiliconFlow:

    VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
    VISION_API_BASE_URL=https://api.siliconflow.cn/v1

    API Keys

    GLM-4.6V

    Get your API key from: ζ™Ίθ°± AI 开放平台

    Format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxx

    SiliconFlow

    Get your API key from: SiliconFlow

    Format: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    MCP Protocol Note

    IMPORTANT: This is a STDIO-based MCP Server. According to MCP protocol:

    • DO NOT use console.log() or write to stdout
    • USE ONLY console.error() for logging (stderr)
    • stdout is reserved for JSON-RPC communication

    The server handles this automatically. If you fork this project, ensure you follow this rule.

    Development

    Project Structure

    vision_mcp/
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ index.ts              # MCP Server entry point
    β”‚   β”œβ”€β”€ config/
    β”‚   β”‚   └── model-config.ts   # Configuration management
    β”‚   β”œβ”€β”€ tools/
    β”‚   β”‚   └── vision-tool.ts    # Vision analysis tool
    β”‚   β”œβ”€β”€ adapters/
    β”‚   β”‚   β”œβ”€β”€ base-adapter.ts   # Base adapter class
    β”‚   β”‚   β”œβ”€β”€ glm-adapter.ts    # GLM-4.6V adapter
    β”‚   β”‚   └── siliconflow-adapter.ts  # SiliconFlow adapter
    β”‚   β”œβ”€β”€ prompts/
    β”‚   β”‚   └── system.ts         # System prompt templates
    β”‚   └── utils/
    β”‚       β”œβ”€β”€ errors.ts         # Error handling
    β”‚       β”œβ”€β”€ logger.ts         # Logging utilities
    β”‚       └── image-input.ts    # Image input normalization
    β”œβ”€β”€ package.json
    β”œβ”€β”€ tsconfig.json
    └── README.md

    Building

    # Install dependencies
    npm install
    
    # Build TypeScript
    npm run build
    
    # Run tests
    npm test

    Testing Notes

    • npm test uses VISION_API_KEY (default) or provider-specific keys in the test script:
      • SILICONFLOW_API_KEY
      • GLM_API_KEY
    • If no API key is set, the tests will exit with a clear error message.

    Troubleshooting

    1. "Failed to load model configuration"

    • Check all required environment variables are set
    • Verify VISION_MODEL_TYPE is either glm-4.6v or siliconflow

    2. "API Key not found"

    • Set VISION_API_KEY in your environment
    • Verify the API key format matches the model requirements

    3. "Connection timeout"

    • Increase VISION_API_TIMEOUT value
    • Check network connectivity to the API endpoint
    • Verify API endpoint URL is correct

    4. "Invalid image URL"

    • Ensure URL is publicly accessible
    • Check URL format (http:// or https://)
    • Verify image format is supported

    5. "Permission denied reading file"

    • MCP server needs filesystem access for local files
    • Use absolute paths or ensure relative paths are accessible
    • Check file permissions

    6. "Invalid API endpoint" or "404 Not Found"

    • Ensure VISION_API_BASE_URL is the base path only, without /chat/completions
    • Correct: https://api.siliconflow.cn/v1
    • Incorrect: https://api.siliconflow.cn/v1/chat/completions
    • Check the error details for the full request URL to diagnose endpoint issues

    Security Notes

    • API keys are loaded from environment variables, never hardcoded
    • API keys are masked in logs
    • Images are not persisted by default
    • MCP server should run in trusted environments only (no built-in auth)
    • Thinking/Reasoning Content Filtering: Model thinking/reasoning content is automatically filtered from responses to prevent exposing internal reasoning to MCP clients. This filtering is unconditional and applied to all supported models regardless of configuration.

    Security Best Practices

    ⚠️ IMPORTANT: Never commit API keys or credentials to the repository!

    • Use environment variables for sensitive data (.env file)
    • Keep local test credentials in .gitignore'd files (e.g., test_key.local.md)
    • Rotate keys immediately if accidentally exposed or committed
    • See doc/test_key.example.md for test setup template
    • Never copy real API keys into documentation, code comments, or issue trackers

    Key Protection Checklist:

    • .env is in .gitignore
    • .env.local is in .gitignore
    • No real keys in test_key.md (use test_key.example.md instead)
    • No keys in documentation or comments
    • Review git history for accidental key commits (git log --all --full-history -S --source --all -- "*secret*" "*key*" "*password*" "test_key.md")

    License

    MIT

    Contributing

    1. Fork the repository
    2. Create a feature branch
    3. Make your changes
    4. Add tests
    5. Submit a pull request

    Support

    For issues and questions:

    TODO