Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (qwen-vision-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Qwen Vision MCP
MCP server that adds image analysis capabilities to AI models that don't natively support vision.
Why?
Models like MiMo V2 Pro, MiniMax M2.1, Nemotron 3, and many others are powerful for text but cannot read or analyze images. This MCP server bridges that gap by routing image tasks to Qwen 3.5 via Ollama — giving any model vision capabilities through tool calls.
How It Works
Your model (MiMo V2 Pro, etc.) → calls qwen-vision tool → Qwen 3.5 analyzes image → returns resultThe vision model can be:
- Ollama Cloud (
qwen3.5:cloud) — fast, inference on Ollama's servers (requiresollama serveas gateway) - Local Ollama (
qwen3.5:9b,qwen3.5:4b) — private, runs on your machine (requiresollama serve)
Prerequisites
- Ollama installed (
brew install ollamaorcurl -fsSL https://ollama.ai/install.sh | sh) - Ollama authentication — sign in with
ollamaand authenticate your account ollama serverunning — required for both cloud and local models (Ollama acts as the gateway to cloud inference)
Tools
| Tool | Description |
|---|---|
analyze_image |
Ask any question about an image |
extract_text |
OCR: extract all visible text |
describe_image |
Detailed image description |
compare_images |
Compare two images side-by-side |
analyze_screenshot |
UI screenshot analysis with structured output |
Input Formats
- File paths:
/path/to/image.png - URLs:
https://example.com/image.jpg - Base64 data URIs
Setup
1. Pull the model
# Cloud (recommended — no local resources needed)
ollama pull qwen3.5:cloud
# Or local (requires ~6GB RAM for 9B)
ollama pull qwen3.5:9b2. Add to opencode config (~/.config/opencode/opencode.json)
Option A: Ollama Cloud (recommended)
ollama serve must be running — it acts as the gateway to Ollama's cloud inference.
{
"qwen-vision": {
"type": "local",
"command": ["npx", "-y", "qwen-vision-mcp"],
"environment": {
"QWEN_VISION_MODEL": "qwen3.5:cloud",
"OLLAMA_BASE_URL": "http://localhost:11434",
"QWEN_VISION_THINK": "false"
},
"enabled": true,
"timeout": 30000
}
}Option B: Local Ollama
Requires ollama serve running and ~6GB RAM for qwen3.5:9b.
{
"qwen-vision": {
"type": "local",
"command": ["npx", "-y", "qwen-vision-mcp"],
"environment": {
"QWEN_VISION_MODEL": "qwen3.5:9b",
"OLLAMA_BASE_URL": "http://localhost:11434",
"QWEN_VISION_THINK": "false"
},
"enabled": true,
"timeout": 30000
}
}Other local options: qwen3.5:4b (3GB RAM), 1.5GB RAM).qwen3.5:2b (
3. Restart opencode
opencodeThe qwen-vision_* tools will be available to any model — even ones without native vision.
Configuration
| Variable | Default | Description |
|---|---|---|
QWEN_VISION_MODEL |
qwen3.5:cloud |
Ollama model name |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
QWEN_VISION_THINK |
false |
Enable thinking/reasoning mode |
Development
git clone https://github.com/sahilchouksey/qwen-vision-mcp.git
cd qwen-vision-mcp
bun install
bun run devPlugin Mode (Non-Multimodal Models)
For models that cannot call tools or don't support MCP, use the OpenCode Plugin in .opencode/plugin.ts. This plugin intercepts chat messages containing images, analyzes them automatically via Qwen 3.5, and injects the analysis into the conversation — no tool calls needed.
How Plugin Mode Works
User pastes image → Plugin detects image → Qwen 3.5 analyzes it → Analysis injected as text → Model responds using analysisWhen to Use Plugin vs MCP
| Approach | Best For | Setup |
|---|---|---|
| MCP | Models that support tool calling (Claude, GPT-4, etc.) | Add to mcp section in opencode.json |
| Plugin | Models without tool support or vision (MiMo V2 Pro, MiniMax, Nemotron 3, etc.) | Add to plugin section in opencode.json |
Plugin Setup
- Copy the plugin file to your opencode config:
cp .opencode/plugin.ts ~/.config/opencode/plugins/qwen-vision.ts- Add to
~/.config/opencode/opencode.json:
{
"plugin": [
"file:///Users/YOUR_USERNAME/.config/opencode/plugins/qwen-vision.ts"
]
}- Ensure Ollama is running:
ollama serve- Restart opencode. The plugin will:
- Detect when you paste an image
- Automatically analyze it with Qwen 3.5
- Inject the analysis into your message text
- The model will respond as if it can see the image
Plugin Features
- Automatic image detection — works on paste/upload
- No tool calls required — models don't need tool support
- Transparent injection — analysis appended to user message
- Fallback mode — injects as system message if text part not found
- Built-in tools —
visionandvision_texttools available for explicit calls
Plugin Configuration
Same environment variables as MCP mode:
| Variable | Default | Description |
|---|---|---|
QWEN_VISION_MODEL |
qwen3.5:cloud |
Ollama model name |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
License
MIT