Package Exports
- awesome-gemini-image-mcp
- awesome-gemini-image-mcp/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (awesome-gemini-image-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Gemini MCP Server
A TypeScript-based Model Context Protocol (MCP) server for Google Gemini AI media operations, specifically designed for coding agents and AI development workflows. Optimized for programmatic consumption with comprehensive prompt engineering best practices.
โจ Key Features for Coding Agents
- ๐จ AI-Guided Content Generation: Generate detailed image/video specifications for UI mockups, design documentation, and automated content creation
- ๐ Intelligent Media Analysis: Extract structured metadata, accessibility descriptions, OCR data, and technical specifications for content management systems
- โจ Smart Manipulation Instructions: Create programmatic workflows for automated media processing pipelines and batch operations
- ๐ File System Integration: Read media from filesystem, write structured outputs to /tmp/gemini_mcp for seamless workflow integration
- ๐ค Gemini 1.5 Multimodal: Powered by Google's latest vision and language models with structured output optimization
- ๐ฏ AI Agent Optimized: Extensive prompt engineering guidance built into tool descriptions for maximum effectiveness
- ๐ Multiple Media Types: Support for images, videos, and audio files with format-specific optimizations
๐ค Designed for Coding Agents
This MCP server is specifically optimized for coding agents building media-aware applications:
Content Generation for Development
- Generate detailed image specifications for UI/UX mockups and design systems
- Create comprehensive video storyboards for automated video generation tools
- Produce structured media descriptions for database seeding and CMS integration
- Generate prompts for AI image generators (DALL-E, Midjourney, Stable Diffusion)
Media Analysis for Applications
- Extract actionable insights for automated testing of UI components
- Generate structured metadata for content management and search systems
- Create accessibility descriptions for WCAG compliance automation
- Extract OCR data for document processing and form automation workflows
Processing Pipeline Development
- Generate programmatic workflows for automated media processing
- Create detailed specifications for image manipulation APIs and services
- Develop quality assurance checklists for media processing validation
- Generate configuration files for batch processing operations
Each tool includes extensive documentation on effective prompting strategies, structured output formats, and integration patterns optimized for programmatic consumption.
๐ Quick Start
Configuration
Create a .env file in your project root:
GEMINI_API_KEY=your-api-key-here
GEMINI_MODEL=gemini-2.5-flash
GEMINI_OUTPUT_DIR=/tmp/gemini_mcpGet your API key from Google AI Studio.
๐ Adding the MCP Server to Your Client
The Gemini MCP server works with any MCP client that supports standard I/O (stdio) as the transport medium.
Claude Desktop
Edit the claude_desktop_config.json file:
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "awesome-gemini-image-mcp@latest"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}Cline
Edit the cline_mcp_settings.json file:
{
"mcpServers": {
"gemini": {
"command": "npx",
"args": ["-y", "awesome-gemini-image-mcp@latest"],
"disabled": false,
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}Visual Studio Code Copilot
Edit .vscode/mcp.json in your workspace:
{
"servers": {
"gemini": {
"type": "stdio",
"command": "npx",
"args": ["-y", "awesome-gemini-image-mcp@latest"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}๐ก MCP Tools
generate_media
Generate actual images using Gemini's AI image generation capabilities.
Best for: Creating visual assets, UI/UX mockups, product photography, illustrations, graphics, placeholder images
await mcpClient.callTool('generate_media', {
prompt: 'A photorealistic sunset over a mountain lake, golden hour lighting, reflection on calm water, 4K quality, dramatic sky with orange and purple hues',
outputFile: 'sunset-landscape'
});Prompt Engineering Best Practices:
- Be Specific: Include details about style, composition, lighting, colors, mood, and subject
- Photorealistic: Use photography terms (camera angles, lens types, lighting setups)
- Artistic Styles: Specify watercolor, sketch, digital art, impressionism, etc.
- Technical Details: Mention quality level, rendering style, aspect ratios
- Composition: Specify framing, rule of thirds, focal points, depth of field
- Mood: Describe the emotional tone and atmosphere you want to convey
analyze_media
Analyze images, videos, or audio files using Gemini's multimodal capabilities with intelligent prompting.
Best for: Object detection, scene understanding, OCR, quality assessment, composition analysis, video summarization
// Image analysis
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/product-photo.jpg',
prompt: 'Analyze this product photo: 1) List all visible objects and their positions, 2) Evaluate composition using rule of thirds, 3) Assess lighting quality and suggest improvements, 4) Identify any quality issues (blur, noise, exposure)'
});
// Video analysis
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/promo-video.mp4',
prompt: 'Break down this promotional video into scenes with timestamps. For each scene, describe: camera movement, subject actions, lighting conditions, and transition type. Suggest pacing improvements.'
});Analysis Categories:
Images:
- Object detection and positioning
- Composition and framing analysis
- Color palette and harmony
- Quality assessment (sharpness, exposure, noise)
- Text extraction (OCR)
- Emotion and expression detection
- Accessibility descriptions
Videos:
- Content summarization and scene breakdown
- Action recognition and progression
- Cinematography analysis (shots, movements, lighting)
- Audio description (dialogue, music, effects)
- Production quality assessment
- Accessibility caption generation
Supported formats:
- Images: JPEG, PNG, GIF, WebP
- Videos: MP4, MPEG, MOV, AVI, WebM
- Audio: MP3, WAV, AAC
manipulate_media
Generate intelligent instructions for transforming, editing, or enhancing images and videos.
Note: This tool creates INSTRUCTIONS for manipulation, not the actual edited media. Output is actionable guidance for editing tools or human editors.
await mcpClient.callTool('manipulate_media', {
inputFile: '/path/to/portrait.jpg',
prompt: 'Create detailed retouching instructions for this portrait: 1) Professional skin retouching while maintaining texture, 2) Eye enhancement and teeth whitening, 3) Background blur to f/2.8 equivalent, 4) Color grading for warm, natural look, 5) Export settings for web use (1200px width, 85% JPEG quality)',
outputFile: 'portrait_editing_plan.txt'
});Manipulation Types:
Image Editing:
- Enhancement instructions (exposure, contrast, color)
- Crop and composition suggestions
- Style transfer guidance
- Object removal strategies
- Color grading plans
- Restoration instructions
- Background replacement guidance
Video Editing:
- Shot-by-shot editing blueprints
- Color correction across scenes
- Audio sync instructions
- Special effects guidance
- Title and text overlay placement
- Transition recommendations
- Pacing and rhythm optimization
๐ก Example Use Cases
Content Generation for AI Agents
// Generate SaaS landing page hero image
await mcpClient.callTool('generate_media', {
prompt: 'Modern, minimalist SaaS landing page hero image with gradient background (blue to purple), floating UI elements showing analytics dashboard, subtle glow effects, isometric perspective, high-tech feel, suitable for dark mode website, professional and clean aesthetic',
outputFile: 'saas-hero-image'
});
// Generate product photography
await mcpClient.callTool('generate_media', {
prompt: 'Professional product photography of a smartwatch on white background, studio lighting with soft shadows, 45-degree angle, macro lens detail showing screen interface, reflective surface, commercial quality, clean and minimal composition',
outputFile: 'smartwatch-product-shot'
});Advanced Image Analysis
// Comprehensive image quality assessment
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/marketing-photo.jpg',
prompt: 'Perform a professional quality assessment: 1) Technical quality (sharpness, noise, exposure, white balance), 2) Composition analysis (rule of thirds, leading lines, balance), 3) Color theory evaluation (harmony, contrast, mood), 4) Subject assessment (focus, expression, positioning), 5) Background evaluation (distractions, bokeh, context), 6) Overall rating (1-10) with specific improvement recommendations ranked by impact'
});
// OCR with structure preservation
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/document-scan.jpg',
prompt: 'Extract all text from this document maintaining the original layout structure. Format as markdown with headings, bullet points, and tables preserved. Identify the document type (invoice, contract, form, etc.) and highlight any important dates, numbers, or signatures.'
});
// Brand and logo detection
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/street-photo.jpg',
prompt: 'Identify all visible brands, logos, and trademarks in this image. For each, provide: brand name, approximate position in frame, visibility level (prominent/subtle), and potential trademark concerns for commercial use.'
});Video Analysis for Production
// Scene-by-scene breakdown
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/raw-footage.mp4',
prompt: 'Create a detailed scene breakdown: For each distinct scene provide timestamp, duration, shot type (wide/medium/close-up), camera movement (static/pan/tilt/dolly), subject actions, lighting quality, audio description, and transition recommendation to next scene. Flag any technical issues (focus problems, exposure issues, audio glitches).'
});
// Accessibility caption generation
await mcpClient.callTool('analyze_media', {
filePath: '/path/to/interview.mp4',
prompt: 'Generate comprehensive accessibility captions: Include speaker identification, dialogue transcription, sound effect descriptions [door closes], music cues [upbeat jazz playing], and relevant visual descriptions for context. Format as SRT subtitle file structure with timestamps.'
});Professional Editing Instructions
// Portrait retouching workflow
await mcpClient.callTool('manipulate_media', {
inputFile: '/path/to/headshot.jpg',
prompt: 'Create a professional retouching workflow for this corporate headshot: Step 1: Skin retouching (frequency separation, opacity 60%, preserve texture), Step 2: Eye enhancement (sharpen iris, brighten catchlights, reduce redness), Step 3: Teeth whitening (hue shift, lightness +15%, saturation -20%), Step 4: Hair refinement (flyaway removal, add definition), Step 5: Background (gaussian blur 25px, vignette subtle), Step 6: Color grade (slight warm tone, +5% saturation), Step 7: Export (JPEG 90% quality, sRGB, 2000px longest edge). Include specific tool names and parameter values.'
});
// Video color grading plan
await mcpClient.callTool('manipulate_media', {
inputFile: '/path/to/footage.mp4',
prompt: 'Design a cinematic color grading plan: Analyze current color issues, then provide scene-by-scene color correction instructions including: primary color wheels (lift/gamma/gain values), secondary color isolation (skin tones, sky, foliage), HSL adjustments, film grain settings (amount, size), vignette parameters, LUT recommendations, and final export settings for YouTube (h.264, 4K, 60fps, 40Mbps).'
});
// Batch editing instructions
await mcpClient.callTool('manipulate_media', {
inputFile: '/path/to/sample-product.jpg',
prompt: 'Create a batch editing workflow applicable to 50 similar product photos: Develop consistent crop ratio, white balance correction method, exposure adjustment formula, background removal technique, shadow/highlight recovery, color consistency approach, and watermark placement. Provide as step-by-step instructions compatible with Photoshop actions or Lightroom presets.'
});Project Structure
Gemini-mcp/
โโโ src/
โ โโโ index.ts # Entry point
โ โโโ client/
โ โ โโโ gemini-client.ts # Gemini API client
โ โโโ mcp/
โ โ โโโ server.ts # MCP server
โ โ โโโ tools/ # Tool implementations
โ โ โโโ generate-media.ts
โ โ โโโ analyze-media.ts
โ โ โโโ manipulate-media.ts
โ โโโ types/
โ โโโ index.ts # TypeScript types
โโโ dist/ # Compiled JavaScript
โโโ .github/ # GitHub configurations
โโโ package.json
โโโ tsconfig.json๐ Security
- API Key Storage: Store your API key in
.envfile (never in code) - Git Ignore:
.envis in.gitignoreto prevent committing secrets - Rate Limits: Be aware of Gemini API rate limits and quotas
- File Access: The server reads files from the filesystem - ensure proper permissions
๐งช Testing
The server includes comprehensive tests to verify MCP protocol compliance and tool functionality:
Quick Testing
Unix/Linux/macOS:
./scripts/quick-test.shWindows:
scripts\quick-test.batManual Testing
# Run all tests (files cleaned up after)
npm test
# Run tests with verbose output
npm run test:verbose
# Run tests and preserve generated files
npm run test:keep-files
# Build and test
npm run build && npm testPreserve Generated Files:
Use npm run test:keep-files or ./scripts/quick-test.sh --keep-files to keep the AI-generated content after testing. This is useful for:
- Examining the actual AI outputs
- Testing with real generated content
- Debugging and development
- Showcasing the tool capabilities
The test suite:
- โ Validates MCP protocol handshake
- โ Tests all tool endpoints with real API calls
- โ Verifies error handling and validation
- โ Generates sample output files with proper extensions
- โ Checks file system integration
Test output files are saved to /tmp/gemini_mcp_test/ (or %TEMP%\gemini_mcp_test\ on Windows) and include:
creative-description.txt- AI-generated scene descriptionsimage-generation-prompt.txt- Detailed prompts for image generatorsvideo-script.txt- Professional video scriptsediting-instructions.md- Step-by-step editing guidelines
๐ Output Directory
All generated and manipulated content is saved to /tmp/gemini_mcp by default. You can customize this by setting the GEMINI_OUTPUT_DIR environment variable.
export GEMINI_OUTPUT_DIR=/your/custom/path๐ Troubleshooting
API Key Errors
- Verify your API key is correct
- Check that the key is properly set in
.envor environment variables - Ensure you have API access enabled in Google AI Studio
File Not Found Errors
- Check that file paths are absolute
- Verify file permissions allow reading
- Ensure the file format is supported
Output Directory Issues
- Verify
/tmp/gemini_mcpexists and is writable - Check disk space
- Set a custom output directory if needed
๐ Resources
๐ License
MIT License - See LICENSE file for details
๐ Version
Current version: 1.0.0
- โ Full MCP protocol support
- โ Content generation
- โ Media analysis (images, videos, audio)
- โ Media manipulation with prompts
- โ File system integration