Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@lutery/vision-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Vision MCP
MCP Server providing vision capabilities for LLMs via GLM-4.6V, SiliconFlow, and ModelScope. This server enables LLMs without native vision support or with expensive vision models to access cost-effective visual analysis capabilities.
Features
- π€ Multiple Model Support: GLM-4.6V, SiliconFlow, and ModelScope vision models
- πΌοΈ Flexible Image Input: URL, base64 data URL, or local file paths
- π Multiple Analysis Types: Image description, UI analysis, object detection, OCR, and structured extraction
- π§ System Prompt Templates: Built-in templates for common vision tasks
- π¦ Easy Deployment: STDIO MCP Server, runs with npx
- π Secure: Environment-based configuration, sensitive data masking in logs
Streaming Response Support
Current adapters explicitly disable streaming responses (stream: false) and are designed for complete JSON responses. This ensures compatibility with both GLM-4.6V and SiliconFlow APIs.
Note: Streaming-only providers are not currently supported. If a provider only supports streaming responses (Server-Sent Events/text/event-stream format), the adapter will fail as it expects a complete JSON response. To add support for streaming providers, a streaming response parser would need to be implemented.
Quick Start
Installation
- Clone or download this repository
- Install dependencies:
cd vision_mcp
npm installConfiguration
Create a .env file in the project root:
Option 1: GLM-4.6V
VISION_MODEL_TYPE=glm-4.6v
VISION_MODEL_NAME=glm-4.6v
VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4
VISION_API_KEY=your-glm-api-keyOption 2: SiliconFlow
VISION_MODEL_TYPE=siliconflow
VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
VISION_API_BASE_URL=https://api.siliconflow.cn/v1
VISION_API_KEY=your-siliconflow-api-keyOption 3: ModelScope API-Inference
VISION_MODEL_TYPE=modelscope
VISION_MODEL_NAME=ZhipuAI/GLM-4.6V
VISION_API_BASE_URL=https://api-inference.modelscope.cn/v1
VISION_API_KEY=your-modelscope-tokenNote: ModelScope requires:
- Real-name authentication on your ModelScope account
- Aliyun account binding
- API usage limits apply (see API Limits)
Build
npm run buildRun (local)
node dist/index.jsIf successful, you'll see: Vision MCP Server is running on stdio in stderr.
Run (npx)
# Local package (requires build first)
npx .
# Published package
npx -y @lutery/vision-mcpMCP Client Configuration
Claude Desktop
Add to your Claude Desktop configuration:
{
"mcpServers": {
"vision-mcp": {
"command": "npx",
"args": ["-y", "@lutery/vision-mcp"],
"env": {
"VISION_MODEL_TYPE": "glm-4.6v",
"VISION_MODEL_NAME": "glm-4.6v",
"VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
"VISION_API_KEY": "your-api-key"
}
}
}
}Or with a local installation:
{
"mcpServers": {
"vision-mcp": {
"command": "node",
"args": ["/path/to/vision_mcp/dist/index.js"],
"env": {
"VISION_MODEL_TYPE": "glm-4.6v",
"VISION_MODEL_NAME": "glm-4.6v",
"VISION_API_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
"VISION_API_KEY": "your-api-key"
}
}
}
}Cursor/Codex CLI
Similar configuration for other MCP-compatible clients.
Using the Tools
1. Analyze Image
Main tool for image analysis:
// Tool: analyze_image
// Parameters:
{
"image": "https://example.com/image.jpg", // Image URL, base64, or local path
"prompt": "Describe this UI design in detail", // Analysis prompt
"output_format": "text", // Optional: "text" or "json"
"template": "ui-analysis" // Optional: see templates below
}Example Prompts
UI Analysis:
{
"image": "./screenshot.png",
"prompt": "Analyze this UI design and extract all UI components with their positions and styles",
"template": "ui-analysis"
}Object Detection:
{
"image": "https://example.com/photo.jpg",
"prompt": "Detect all objects and provide their coordinates",
"template": "object-detection"
}OCR:
{
"image": "data:image/png;base64,iVBORw0KGgo...",
"prompt": "Extract all text from this image",
"template": "ocr"
}Structured Extraction:
{
"image": "./form.jpg",
"prompt": "Extract all form fields and values as JSON",
"output_format": "json"
}2. List Templates
List available system prompt templates:
// Tool: list_templates
// Parameters: noneAvailable templates:
general-description- General image descriptionui-analysis- UI prototype and interface analysisobject-detection- Object detection and localizationocr- Text extraction (OCR)structured-extraction- Structured data extraction
3. Get Config
Get current model configuration:
// Tool: get_config
// Parameters: noneImage Input Formats
1. URL
https://example.com/image.jpg2. Base64 Data URL
data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD...3. Local File Path
/path/to/image.png
./relative/path/image.jpgNote: Local paths only work if the MCP server has access to the filesystem.
Note: URL validation is strict by default (see VISION_STRICT_URL_VALIDATION).
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
VISION_MODEL_TYPE |
Model type: glm (alias for glm-4.6v), glm-4.6v, siliconflow, or modelscope |
- | Yes |
VISION_MODEL_NAME |
Model name for the API | See defaults below | Yes |
VISION_API_BASE_URL |
API base URL (must be base path, no /chat/completions) |
See defaults below | Yes |
VISION_API_KEY |
API key for authentication | - | Yes |
VISION_API_TIMEOUT |
Request timeout in milliseconds | 60000 | No |
VISION_MAX_RETRIES |
Maximum retry attempts | 2 | No |
VISION_STRICT_URL_VALIDATION |
Enforce strict image URL validation | true |
No |
LOG_LEVEL |
Log level: debug, info, warn, error |
info |
No |
Notes:
VISION_STRICT_URL_VALIDATIONdefaults totrue, enforcing strict validation that URLs must end with supported image extensions (.jpg,.jpeg,.png,.webp). Set tofalseto allow non-image URLs with a warning only.- For GLM-4.6V provider, both
glmandglm-4.6vvalues work forVISION_MODEL_TYPE.glmis provided as a convenient alias.
Model Defaults
GLM-4.6V:
VISION_MODEL_NAME=glm-4.6v
VISION_API_BASE_URL=https://open.bigmodel.cn/api/paas/v4SiliconFlow:
VISION_MODEL_NAME=Qwen/Qwen2-VL-72B-Instruct
VISION_API_BASE_URL=https://api.siliconflow.cn/v1API Keys
GLM-4.6V
Get your API key from: ζΊθ°± AI εΌζΎεΉ³ε°
Format: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxx
SiliconFlow
Get your API key from: SiliconFlow
Format: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MCP Protocol Note
IMPORTANT: This is a STDIO-based MCP Server. According to MCP protocol:
- DO NOT use
console.log()or write to stdout - USE ONLY
console.error()for logging (stderr) - stdout is reserved for JSON-RPC communication
The server handles this automatically. If you fork this project, ensure you follow this rule.
Development
Project Structure
vision_mcp/
βββ src/
β βββ index.ts # MCP Server entry point
β βββ config/
β β βββ model-config.ts # Configuration management
β βββ tools/
β β βββ vision-tool.ts # Vision analysis tool
β βββ adapters/
β β βββ base-adapter.ts # Base adapter class
β β βββ glm-adapter.ts # GLM-4.6V adapter
β β βββ siliconflow-adapter.ts # SiliconFlow adapter
β βββ prompts/
β β βββ system.ts # System prompt templates
β βββ utils/
β βββ errors.ts # Error handling
β βββ logger.ts # Logging utilities
β βββ image-input.ts # Image input normalization
βββ package.json
βββ tsconfig.json
βββ README.mdBuilding
# Install dependencies
npm install
# Build TypeScript
npm run build
# Run tests
npm testTesting Notes
npm testusesVISION_API_KEY(default) or provider-specific keys in the test script:SILICONFLOW_API_KEYGLM_API_KEY
- If no API key is set, the tests will exit with a clear error message.
Troubleshooting
1. "Failed to load model configuration"
- Check all required environment variables are set
- Verify
VISION_MODEL_TYPEis eitherglm-4.6vorsiliconflow
2. "API Key not found"
- Set
VISION_API_KEYin your environment - Verify the API key format matches the model requirements
3. "Connection timeout"
- Increase
VISION_API_TIMEOUTvalue - Check network connectivity to the API endpoint
- Verify API endpoint URL is correct
4. "Invalid image URL"
- Ensure URL is publicly accessible
- Check URL format (http:// or https://)
- Verify image format is supported
5. "Permission denied reading file"
- MCP server needs filesystem access for local files
- Use absolute paths or ensure relative paths are accessible
- Check file permissions
6. "Invalid API endpoint" or "404 Not Found"
- Ensure
VISION_API_BASE_URLis the base path only, without/chat/completions - Correct:
https://api.siliconflow.cn/v1 - Incorrect:
https://api.siliconflow.cn/v1/chat/completions - Check the error details for the full request URL to diagnose endpoint issues
Security Notes
- API keys are loaded from environment variables, never hardcoded
- API keys are masked in logs
- Images are not persisted by default
- MCP server should run in trusted environments only (no built-in auth)
- Thinking/Reasoning Content Filtering: Model thinking/reasoning content is automatically filtered from responses to prevent exposing internal reasoning to MCP clients. This filtering is unconditional and applied to all supported models regardless of configuration.
Security Best Practices
β οΈ IMPORTANT: Never commit API keys or credentials to the repository!
- Use environment variables for sensitive data (
.envfile) - Keep local test credentials in
.gitignore'd files (e.g.,test_key.local.md) - Rotate keys immediately if accidentally exposed or committed
- See
doc/test_key.example.mdfor test setup template - Never copy real API keys into documentation, code comments, or issue trackers
Key Protection Checklist:
-
.envis in.gitignore -
.env.localis in.gitignore - No real keys in
test_key.md(usetest_key.example.mdinstead) - No keys in documentation or comments
- Review git history for accidental key commits (
git log --all --full-history -S --source --all -- "*secret*" "*key*" "*password*" "test_key.md")
License
MIT
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Support
For issues and questions:
- Open an issue on the repository
- Check model documentation:
TODO
- ιι modelscopeηθ§θ§ζ¨‘εζ₯ε£θ―·ζ±οΌhttps://www.modelscope.cn/docs/model-service/API-Inference/intro