Package Exports

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@lutery/vision-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Vision MCP

MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope. This server enables LLMs without native vision support or with expensive vision models to access cost-effective visual analysis capabilities.

Features

🤖 Multiple Model Support: GLM-4.6V, SiliconFlow, and ModelScope vision models
🖼️ Flexible Image Input: URL, base64 data URL, or local file paths
📊 Multiple Analysis Types: Image description, UI analysis, object detection, OCR, and structured extraction
🔧 System Prompt Templates: Built-in templates for common vision tasks
📦 Easy Deployment: STDIO MCP Server, runs with npx
🔒 Secure: Environment-based configuration, sensitive data masking in logs

Streaming Response Support

Current adapters explicitly disable streaming responses (stream: false) and are designed for complete JSON responses. This ensures compatibility with both GLM-4.6V and SiliconFlow APIs.

Note: Streaming-only providers are not currently supported. If a provider only supports streaming responses (Server-Sent Events/text/event-stream format), the adapter will fail as it expects a complete JSON response. To add support for streaming providers, a streaming response parser would need to be implemented.

Quick Start

Installation

Clone or download this repository
Install dependencies:

cd vision_mcp
npm install

Configuration

Create a .env file in the project root:

Option 1: GLM-4.6V

VISION_MODEL_TYPE=glm-4.6v
VISION_MODEL_NAME=glm-4.6v
VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
VISION_API_KEY=your-glm-api-key

Option 2: SiliconFlow

VISION_MODEL_TYPE=siliconflow
VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
VISION_API_BASE_URL=https://api.siliconflow.cn/v1
VISION_API_KEY=your-siliconflow-api-key

Option 3: ModelScope API-Inference

VISION_MODEL_TYPE=modelscope
VISION_MODEL_NAME=ZhipuAI/GLM-4.6V
VISION_API_BASE_URL=https://api-inference.modelscope.cn/v1
VISION_API_KEY=your-modelscope-token

Note: ModelScope requires:

Real-name authentication on your ModelScope account
Aliyun account binding
API usage limits apply (see API Limits)

Build

npm run build

Run (local)

node dist/index.js

If successful, you'll see: Vision MCP Server is running on stdio in stderr.

Run (npx)

# Local package (requires build first)
npx .

# Published package
npx -y @lutery/vision-mcp

MCP Client Configuration

Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "vision-mcp": {
      "command": "npx",
      "args": ["-y", "@lutery/vision-mcp"],
      "env": {
        "VISION_MODEL_TYPE": "glm-4.6v",
        "VISION_MODEL_NAME": "glm-4.6v",
        "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
        "VISION_API_KEY": "your-api-key"
      }
    }
  }
}

Or with a local installation:

{
  "mcpServers": {
    "vision-mcp": {
      "command": "node",
      "args": ["/path/to/vision_mcp/dist/index.js"],
      "env": {
        "VISION_MODEL_TYPE": "glm-4.6v",
        "VISION_MODEL_NAME": "glm-4.6v",
        "VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
        "VISION_API_KEY": "your-api-key"
      }
    }
  }
}

Cursor/Codex CLI

Similar configuration for other MCP-compatible clients.

Using the Tools

1. Analyze Image

Main tool for image analysis:

// Tool: analyze_image
// Parameters:
{
  "image": "https://example.com/image.jpg",        // Image URL, base64, or local path
  "prompt": "Describe this UI design in detail",   // Analysis prompt
  "output_format": "text",                          // Optional: "text" or "json"
  "template": "ui-analysis"                         // Optional: see templates below
}

Example Prompts

UI Analysis:

{
  "image": "./screenshot.png",
  "prompt": "Analyze this UI design and extract all UI components with their positions and styles",
  "template": "ui-analysis"
}

Object Detection:

{
  "image": "https://example.com/photo.jpg",
  "prompt": "Detect all objects and provide their coordinates",
  "template": "object-detection"
}

OCR:

{
  "image": "data:image/png;base64,iVBORw0KGgo...",
  "prompt": "Extract all text from this image",
  "template": "ocr"
}

Structured Extraction:

{
  "image": "./form.jpg",
  "prompt": "Extract all form fields and values as JSON",
  "output_format": "json"
}

2. List Templates

List available system prompt templates:

// Tool: list_templates
// Parameters: none

Available templates:

general-description - General image description
ui-analysis - UI prototype and interface analysis
object-detection - Object detection and localization
ocr - Text extraction (OCR)
structured-extraction - Structured data extraction

3. Get Config

Get current model configuration:

// Tool: get_config
// Parameters: none

Image Input Formats

1. URL

https://example.com/image.jpg

2. Base64 Data URL

data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...

3. Local File Path

/path/to/image.png
./relative/path/image.jpg

Note: Local paths only work if the MCP server has access to the filesystem. Note: URL validation is strict by default (see VISION_STRICT_URL_VALIDATION).

Environment Variables

Variable	Description	Default	Required
`VISION_MODEL_TYPE`	Model type: `glm` (alias for `glm-4.6v`), `glm-4.6v`, `siliconflow`, or `modelscope`	-	Yes
`VISION_MODEL_NAME`	Model name for the API	See defaults below	Yes
`VISION_API_BASE_URL`	API base URL (must be base path, no `/chat/completions`)	See defaults below	Yes
`VISION_API_KEY`	API key for authentication	-	Yes
`VISION_API_TIMEOUT`	Request timeout in milliseconds	60000	No
`VISION_MAX_RETRIES`	Maximum retry attempts	2	No
`VISION_STRICT_URL_VALIDATION`	Enforce strict image URL validation	`true`	No
`LOG_LEVEL`	Log level: `debug`, `info`, `warn`, `error`	`info`	No

Notes:

VISION_STRICT_URL_VALIDATION defaults to true, enforcing strict validation that URLs must end with supported image extensions (.jpg, .jpeg, .png, .webp). Set to false to allow non-image URLs with a warning only.
For GLM-4.6V provider, both glm and glm-4.6v values work for VISION_MODEL_TYPE. glm is provided as a convenient alias.

Model Defaults

GLM-4.6V:

VISION_MODEL_NAME=glm-4.6v
VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4

SiliconFlow:

VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
VISION_API_BASE_URL=https://api.siliconflow.cn/v1

API Keys

GLM-4.6V

Get your API key from: 智谱 AI 开放平台

Format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxx

SiliconFlow

Get your API key from: SiliconFlow

Format: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

MCP Protocol Note

IMPORTANT: This is a STDIO-based MCP Server. According to MCP protocol:

DO NOT use console.log() or write to stdout
USE ONLY console.error() for logging (stderr)
stdout is reserved for JSON-RPC communication

The server handles this automatically. If you fork this project, ensure you follow this rule.

Development

Project Structure

vision_mcp/
├── src/
│   ├── index.ts              # MCP Server entry point
│   ├── config/
│   │   └── model-config.ts   # Configuration management
│   ├── tools/
│   │   └── vision-tool.ts    # Vision analysis tool
│   ├── adapters/
│   │   ├── base-adapter.ts   # Base adapter class
│   │   ├── glm-adapter.ts    # GLM-4.6V adapter
│   │   └── siliconflow-adapter.ts  # SiliconFlow adapter
│   ├── prompts/
│   │   └── system.ts         # System prompt templates
│   └── utils/
│       ├── errors.ts         # Error handling
│       ├── logger.ts         # Logging utilities
│       └── image-input.ts    # Image input normalization
├── package.json
├── tsconfig.json
└── README.md

Building

# Install dependencies
npm install

# Build TypeScript
npm run build

# Run tests
npm test

Testing Notes

npm test uses VISION_API_KEY (default) or provider-specific keys in the test script:
- SILICONFLOW_API_KEY
- GLM_API_KEY
If no API key is set, the tests will exit with a clear error message.

Troubleshooting

1. "Failed to load model configuration"

Check all required environment variables are set
Verify VISION_MODEL_TYPE is either glm-4.6v or siliconflow

2. "API Key not found"

Set VISION_API_KEY in your environment
Verify the API key format matches the model requirements

3. "Connection timeout"

Increase VISION_API_TIMEOUT value
Check network connectivity to the API endpoint
Verify API endpoint URL is correct

4. "Invalid image URL"

Ensure URL is publicly accessible
Check URL format (http:// or https://)
Verify image format is supported

5. "Permission denied reading file"

MCP server needs filesystem access for local files
Use absolute paths or ensure relative paths are accessible
Check file permissions

6. "Invalid API endpoint" or "404 Not Found"

Ensure VISION_API_BASE_URL is the base path only, without /chat/completions
Correct: https://api.siliconflow.cn/v1
Incorrect: https://api.siliconflow.cn/v1/chat/completions
Check the error details for the full request URL to diagnose endpoint issues

Security Notes

API keys are loaded from environment variables, never hardcoded
API keys are masked in logs
Images are not persisted by default
MCP server should run in trusted environments only (no built-in auth)
Thinking/Reasoning Content Filtering: Model thinking/reasoning content is automatically filtered from responses to prevent exposing internal reasoning to MCP clients. This filtering is unconditional and applied to all supported models regardless of configuration.

Security Best Practices

⚠️ IMPORTANT: Never commit API keys or credentials to the repository!

Use environment variables for sensitive data (.env file)
Keep local test credentials in .gitignore'd files (e.g., test_key.local.md)
Rotate keys immediately if accidentally exposed or committed
See doc/test_key.example.md for test setup template
Never copy real API keys into documentation, code comments, or issue trackers

Key Protection Checklist:

.env is in .gitignore
.env.local is in .gitignore
No real keys in test_key.md (use test_key.example.md instead)
No keys in documentation or comments
Review git history for accidental key commits (git log --all --full-history -S --source --all -- "*secret*" "*key*" "*password*" "test_key.md")

License

MIT

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions:

Open an issue on the repository
Check model documentation:
- GLM-4.6V Docs
- SiliconFlow Docs

TODO

适配modelscope的视觉模型接口请求：https://www.modelscope.cn/docs/model-service/API-Inference/intro