JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 21174
  • Score
    100M100P100Q141046F
  • License MIT

CLI tool for Genspark Tool API - search, crawl, analyze images, generate media

Package Exports

  • @genspark/cli
  • @genspark/cli/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@genspark/cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Genspark CLI (gsk)

One CLI. Every AI capability. Search, generate, analyze, communicate — all from your terminal.

gsk is the command-line interface for the Genspark AI platform. It unifies 90+ AI tools behind a single binary: web search, image/video/audio generation with 40+ models, document analysis, media transcription, cloud file management, email (Gmail & Outlook), calendar, GitHub, Slack, Notion, Microsoft Teams, OneDrive, SharePoint, AI phone calls, stock data, social media data (Twitter, Instagram, Reddit), and autonomous AI agents — all with clean JSON output for seamless integration with AI coding assistants, automation pipelines, and scripts.

Capability Map

Category What You Get
🔍 Search Web search, image search
📄 Documents Crawl pages, summarize PDFs/docs
🎨 Images 16 models: GPT Image, Gemini, Flux 2, Imagen 4, Recraft, Ideogram, Seedream ...
🎬 Videos 14 models: Kling V3, Veo 3.1, Sora 2, Hailuo, Wan, Runway, PixVerse, Seedance ...
🎵 Audio 14 models: Gemini TTS, ElevenLabs, MiniMax, Mureka, CassetteAI, Lyria 2 ...
🧠 Analysis Image/video/audio understanding, OCR, video style replication
📝 Transcribe Whisper, Gemini, ElevenLabs Scribe
☁️ AI Drive Cloud file storage, download, compress
📧 Email Gmail & Outlook: read, search, send, reply, forward, archive, labels, attachments
📅 Calendar Google & Outlook: list, create, delete events
💬 Collaboration Slack, Microsoft Teams, Notion — send messages, search, manage channels/pages
📂 Cloud Storage Google Drive, OneDrive, SharePoint, Google Sheets, Google Docs, Google Contacts
🐙 GitHub List repos, search/create/update issues
📞 Phone AI-powered phone calls to businesses
📈 Stocks Real-time stock prices
📱 Social Media Twitter/X, Instagram, Reddit — search posts/users, get comments, connections, and more (30 APIs)
🤖 Agents Podcasts, docs, slides, deep research, fact-checking, websites, batch media generation
🔊 Voice Voice cloning, voice changer

Table of Contents

Installation

npm install -g @genspark/cli

Requires Node.js >= 18.

Quick Start

# Log in via browser
gsk login

# Search the web
gsk search "latest AI news"

# Generate an image
gsk img "A beautiful sunset over mountains" -o ./sunset.png

# Crawl a webpage
gsk crawl "https://example.com/article"

Authentication

Log in with your Genspark account:

gsk login

This opens a browser for authentication and saves the API key to ~/.genspark-tool-cli/config.json.

Alternatively, provide an API key directly:

# Via environment variable
export GSK_API_KEY="gsk_..."

# Via CLI option
gsk search "query" --api-key "gsk_..."

To check your current identity:

gsk login-info
gsk me          # shorthand

To log out:

gsk logout

Configuration

Config is loaded from three sources (highest priority first):

  1. CLI options--api-key, --base-url, etc.
  2. Environment variablesGSK_API_KEY, GSK_BASE_URL, GSK_PROJECT_ID
  3. Config file~/.genspark-tool-cli/config.json
{
  "api_key": "gsk_...",
  "base_url": "https://www.genspark.ai",
  "project_id": "project_abc123",
  "debug": false,
  "timeout": 300000
}

Global Options

Option Env Var Default Description
--api-key <key> GSK_API_KEY API key (required)
--base-url <url> GSK_BASE_URL https://www.genspark.ai API base URL
--project-id <id> GSK_PROJECT_ID Project ID for access control
--debug false Enable debug output
--timeout <ms> 300000 (5 min) Request timeout
--output <format> json Output format: json or text
--refresh Force refresh cached tool schemas

Commands

list-tools (alias: ls)

List all available tools.

gsk list-tools
gsk ls

login-info (alias: me)

Show your current account info — email, name, and membership plan.

gsk login-info
gsk me

init-opencode

Generate an .opencode.json config file for OpenCode, pre-configured to use Genspark's LLM proxy with your API key.

# Generate with default model (claude-opus-4-6-1m)
gsk init-opencode

# Specify a different default model
gsk init-opencode --model claude-sonnet-4-6

# Write to a custom path
gsk init-opencode -o ./my-project/.opencode.json
Option Default Description
--model <name> claude-opus-4-6-1m Default model for OpenCode
-o, --out <path> .opencode.json (cwd) Output file path

init-skills

Sync GSK skill documents into the current project for AI agent discovery. Copies all skill docs and generates a CONTEXT.md entry point that AI agents (Claude Code, Gemini, etc.) can load automatically.

# Copy skills to .gsk/skills/ and generate CONTEXT.md
gsk init-skills

# Also generate .claude/ config for Claude Code
gsk init-skills --agent claude

# Generate config for all supported agents (Claude, Gemini)
gsk init-skills --agent all

# Custom output directory
gsk init-skills -o ./docs/gsk-skills
Option Default Description
-o, --out <dir> .gsk/skills (cwd) Output directory for skills
--agent <type> Generate agent config: claude, gemini, or all

Search & Crawl

Search the web.

gsk search "latest AI news"

crawler (alias: crawl)

Extract content from a web page or document.

gsk crawl "https://example.com/article"

summarize_large_document (alias: summarize)

Analyze a document and answer questions about it.

gsk summarize "https://example.com/report.pdf" --question "What are the key findings?"
Option Description
<url> Document URL (required, positional)
--question <text> Question about the document

Search for images on the web.

gsk img-search "modern architecture"

Media Analysis & Transcription

understand_images (alias: analyze)

Analyze images with AI vision model.

gsk analyze "Describe this image" -i "https://example.com/image.jpg"
gsk analyze "Extract all text" -i "https://img1.jpg" "https://img2.jpg"
gsk analyze "What's in this photo?" -i ./photo.jpg
Option Default Description
-i, --image_urls <url...> Image URL(s) or local file path(s) to analyze (required)
-r, --instruction <text> Custom analysis instruction

Image Generation

image_generation (alias: img)

Generate images using AI. Supports text-to-image and image-to-image.

# Text-to-image
gsk img "A beautiful sunset over mountains" -r "16:9" -o ./sunset.png
gsk img "Modern office at night" -s "4k" -r "1:1"

# Image-to-image (reference-based)
gsk img "A portrait in similar style" -i ./reference.png
Option Default Description
-r, --aspect_ratio <ratio> 1:1 Aspect ratio (1:1, 16:9, 9:16)
-s, --image_size <size> auto Image size: auto, 2k, 4k
-m, --model <name> Model to use (optional)
-i, --image_urls <url...> Reference image URL(s) or local file(s) for image-to-image
-o, --output-file <path> Download the generated file to a local path

Video Generation

video_generation (alias: video)

Generate videos using AI.

gsk video "A cat playing with yarn" -m "kling/v1.6/standard" -d 5 -o ./cat.mp4
gsk video "Sunrise over a beach" -m "minimax/hailuo-02/standard" -r "16:9" -d 8

# Image-to-video
gsk video "Camera pan around the subject" -m "kling/v1.6/standard" -i ./photo.jpg
Option Default Description
-m, --model <name> Model (required). e.g., kling/v1.6/standard, minimax/hailuo-02/standard
-r, --aspect_ratio <ratio> 16:9 Aspect ratio
-d, --duration <sec> 5 Duration in seconds (2-15)
-i, --image_urls <url...> Reference image URL(s) or local file(s)
-a, --audio_url <url> Audio URL for soundtrack
-o, --output-file <path> Download the generated file to a local path

Audio Generation

audio_generation (alias: audio)

Generate audio: TTS, music, or sound effects.

# Text-to-speech
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -r "professional female voice"
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -o ./hello.mp3

# Music with lyrics
gsk audio "A pop song" -m "fal-ai/minimax/speech-2.6-hd" -l "Verse 1: ..." -d 120

# Sound effect
gsk audio "Door creaking slowly open" -m "sfx-model"
Option Default Description
-m, --model <name> Model (required). e.g., elevenlabs/v3-tts, fal-ai/minimax/speech-2.6-hd
-d, --duration <sec> 0 (auto) Duration in seconds
-r, --requirements <text> Voice requirements for TTS
-l, --lyrics <text> Lyrics for song generation
-o, --output-file <path> Download the generated file to a local path

File Transfer

upload

Upload a local file and get a URL for use in other commands.

gsk upload "./image.png"
gsk upload "./document.pdf"

download

Download a file from a file wrapper URL.

# Get download URL only
gsk download "/api/files/s/abc123"

# Download and save to local file
gsk download "/api/files/s/abc123" -s "./downloaded.png"
Option Description
-s, --save <path> Download and save to local file path

analyze_media (alias: media-analyze)

Analyze various types of media content including images, audio, and video.

gsk media-analyze -i "https://example.com/image.jpg" -r "Describe the content"
gsk media-analyze -i "https://example.com/video.mp4" -r "Summarize the video"
Option Default Description
-i, --media_urls <urls> Media URL(s) to analyze (required)
-r, --requirements <text> Analysis instructions

audio_transcribe (alias: transcribe)

Transcribe audio files to text.

gsk transcribe -i "https://example.com/audio.mp3"
gsk transcribe -i ./meeting.wav -m "whisper-large-v3"
Option Default Description
-i, --audio_urls <url...> Audio URL(s) or local file(s) to transcribe (required)
-m, --model <name> Transcription model to use

AI Drive (Cloud Storage)

aidrive (alias: drive)

AI-Drive file storage and management. List, create, delete, move files and directories. Download videos, audio, and files from URLs directly to AI-Drive.

# List files in root directory
gsk drive ls
gsk drive ls -p "/documents" -f file

# Create directory
gsk drive mkdir -p "/my-folder"

# Move file
gsk drive move -p "/old-path/file.txt" --target_path "/new-path/file.txt"

# Download video/audio/file to AI-Drive
gsk drive download_video --video_url "https://example.com/video.mp4" --target_folder "/videos"
gsk drive download_file --file_url "https://example.com/doc.pdf" --target_folder "/docs"

# Upload inline text content to AI-Drive
gsk drive upload --file_content "Hello World" --upload_path "/notes/hello.txt"

# Upload a local file directly to AI-Drive (streaming, supports 100MB+ files)
gsk drive upload --local_file ./report.pdf --upload_path /docs/report.pdf
gsk drive upload --local_file ./video.mp4 --upload_path /videos/demo.mp4
gsk drive upload --local_file ./photo.png              # upload_path defaults to /photo.png
gsk drive upload --local_file ./doc.pdf --upload_path /docs/doc.pdf --override  # overwrite existing

# Get readable URL for a file
gsk drive get_readable_url -p "/documents/report.pdf"
Option Default Description
-p, --path <path> File or directory path in AI-Drive
-f, --filter_type <type> all Filter: all, file, directory
--file_type <type> all File type filter: all, audio, video, image
--target_path <path> Target path for move operations
--target_folder <path> Target folder for downloads
--video_url <url> Video URL for download_video action
--audio_url <url> Audio URL for download_audio action
--file_url <url> File URL for download_file action
--file_name <name> Custom file name for downloads
--file_content <text> Inline text content to upload
--local_file <path> Local file path to upload directly to AI-Drive (streaming, no size limit)
--upload_path <path> Destination path for upload (defaults to /<filename> for --local_file)
--override false Overwrite an existing file at the destination path

AI Agents & Tasks

create_task (alias: task)

Create and execute tasks using specialized AI agents.

# Create a podcast
gsk task podcasts --task_name "AI Trends" --query "Create a podcast about AI trends" --instructions "Focus on practical applications"

# Create a document
gsk task docs --task_name "Quantum Report" --query "Write a report on quantum computing" --instructions "Include recent breakthroughs"

# Create slides
gsk task slides --task_name "Q4 Results" --query "Create a Q4 results presentation" --instructions "Use charts and data"

# Create a spreadsheet (returns file wrapper URL, use `gsk download` to save)
gsk task sheets --task_name "Sales Report" --query "Create a quarterly sales report with formulas" --instructions "Use formulas and formatting"

# Deep research
gsk task deep_research --task_name "Fusion Energy" --query "Research fusion energy advances" --instructions "Cover public and private sector"

# Fact-check a claim
gsk task cross_check --task_name "Earth shape" --query "The Earth is flat" --instructions "Verify this claim with evidence"
Option Default Description
--task_name <name> Name for the task (required)
--query <text> Query describing what to create (required)
--instructions <text> Detailed instructions (required)
--acp false Start as ACP (Agent Client Protocol) stdio agent for multi-turn use with Genspark Claw

Supported task types: super_agent, podcasts, docs, slides, sheets, deep_research, website, video_generation, audio_generation, meeting_notes, cross_check

ACP Mode

Use --acp to start a task agent as an Agent Client Protocol stdio server. This enables AI agent platforms like Genspark Claw to natively discover and interact with GSK agents, with multi-turn conversation support.

# Start an ACP agent for slides (used by acpx, not typically run manually)
gsk task slides --acp

# Start an ACP agent for documents
gsk task docs --acp

acpx configuration (~/.acpx/config.json):

{
  "agents": {
    "gsk-slides": { "command": "gsk task slides --acp" },
    "gsk-docs":   { "command": "gsk task docs --acp" },
    "gsk-sheets": { "command": "gsk task sheets --acp" }
  }
}

Then in Genspark Claw: /acp spawn gsk-slides to create and iterate on presentations via natural language.

Stock Prices

stock_price (alias: stock)

Retrieve stock price information and financial data.

gsk stock AAPL
gsk stock MSFT

Service-Level Tools

External service integrations are available as service-level tools — each service is a single command with an action parameter that dispatches to the underlying operation.

Requirements: Connect services in Genspark Account Settings → Integrations.

gmail

Gmail operations: search, read, send, reply, forward, delete, archive, move, mark_as_read, add_label, remove_label, create_label, get_attachment, list_send_as.

gsk gmail search --query "from:boss subject:report"
gsk gmail read --id 19cbfecd7fb14d46
gsk gmail send --to user@example.com --subject "Hello" --body "<p>Hi!</p>"
gsk gmail forward --message_id 19cbfecd7fb14d46 --to colleague@example.com
gsk gmail archive --message_id 19cbfecd7fb14d46

outlook_email

Outlook Email operations: search, read, send, reply, reply_draft, forward, delete, archive, move, mark_as_read, add_category, remove_category, get_attachment, group_list, group_search, group_read, group_reply.

gsk outlook_email search --queryString "quarterly report"
gsk outlook_email read --messageId AAMkAG...
gsk outlook_email send --to user@example.com --subject "Update" --body "Hi!"

google_calendar

Google Calendar operations: list, create, delete.

gsk google_calendar list
gsk google_calendar create --summary "Team Sync" --start_time "2026-04-20T10:00:00Z" --end_time "2026-04-20T11:00:00Z"

outlook_calendar

Outlook Calendar operations: list, create, delete.

gsk outlook_calendar list

meeting

Meeting notes operations: list, search, get.

gsk meeting list
gsk meeting search --keyword "quarterly planning"
gsk meeting get --task_id "e02fd0f1-..."

google_drive

Google Drive file operations: search, read, upload.

gsk google_drive search --query "budget 2026"
gsk google_drive read --file_id 1hq9kH63sc...

google_sheets

Google Sheets operations: create, read, write, append, search, export.

gsk google_sheets search --query "sales report"
gsk google_sheets read --spreadsheet_id 1ABC... --range "Sheet1!A1:D10"

google_docs

Google Docs operations: create, read, append, search.

gsk google_docs search --query "meeting notes"
gsk google_docs read --document_id 1ABC...

google_contacts

Google Contacts operations: search, get, create, update.

gsk google_contacts search --query "John"

github

GitHub operations: list_repos, search_issues, create_issue, update_issue.

gsk github list_repos
gsk github search_issues --q "repo:owner/repo is:open label:bug"
gsk github create_issue --owner myorg --repo myrepo --title "Bug report" --body "Description..."

slack

Slack messaging operations: send, search, lookup.

gsk slack search --query "deployment update"
gsk slack lookup --lookup_type channels --search_query "engineering"
gsk slack send --recipient "#general" --message "Hello team!"

notion

Notion page operations: search, read, create.

gsk notion search --query "project roadmap"
gsk notion read --page_id 2ce8b6a5-...

microsoft_teams

Microsoft Teams operations: send, list_channels, list_chats, list_teams, search, search_users, create_chat.

gsk microsoft_teams list_teams
gsk microsoft_teams list_channels --team_id 6c0db3a9-...
gsk microsoft_teams search --query "release notes"

onedrive

OneDrive file operations: list, search, read.

gsk onedrive search --query "presentation"
gsk onedrive list --folder_path "/Documents"

sharepoint

SharePoint operations: list, search, read_content, read_file.

gsk sharepoint search --query "company wiki"
gsk sharepoint list --site_id abc123

outlook_contacts

Outlook Contacts operations: search.

gsk outlook_contacts search --query "John"

AI Phone Calls

phone-call (alias: call-for-me)

Make an AI phone call on your behalf. The AI validates prerequisites, resolves contact info, and initiates the call.

# Call a business by phone number
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check if they deliver to my area"

# Call a business by Google Maps place_id
gsk phone-call "Joe's Pizza" -c "ChIJxxxxxxxx" --is_place_id -p "Reserve a table for 4"

# Dry run: validate and resolve contact info without initiating the call
gsk phone-call "Pizza Hut" -c "+1-555-123-4567" -p "Check hours" --dry-run
Option Default Description
<recipient> Name of the person or business to call (required, positional)
-c, --contact_info <value> Phone number or Google Maps place_id (required)
--is_place_id false Treat contact_info as a Google Maps place_id
-p, --purpose <value> Purpose of the call (required)
--dry-run Only validate and resolve contact info, do not initiate the call

Social Media

Retrieve data from Twitter/X, Instagram, and Reddit. All social commands are grouped under gsk social.

social twitter

Search and retrieve data from Twitter/X. 12 actions available.

# Search tweets
gsk social twitter search_posts -q "artificial intelligence" --start_date 2026-03-01 --language en

# Search users
gsk social twitter search_users -q "openai" --limit 5

# Get tweets by a specific author
gsk social twitter get_posts_by_author -q "elonmusk" --start_date 2026-01-01

# Get tweets by IDs
gsk social twitter get_posts_by_ids --post_ids "123456789,987654321"

# Get user profile
gsk social twitter get_user -q "elonmusk"

# Get followers or following
gsk social twitter get_user_connections -q "elonmusk" --connection_type followers

# Get users by keywords (mentioned in tweets)
gsk social twitter get_users_by_keywords -q "machine learning" --start_date 2026-01-01

# Get comments on a tweet
gsk social twitter get_comments -p "123456789" --start_date 2026-03-01

# Get quotes of a tweet
gsk social twitter get_quotes -p "123456789"

# Get retweets of a tweet
gsk social twitter get_retweets -p "123456789"

# Get users who interacted with a tweet
gsk social twitter get_post_interacting_users -p "123456789" --interaction_type retweeters

# Count posts matching a query
gsk social twitter count_posts -q "AI" --start_date 2026-03-01 --end_date 2026-03-10
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or identifier
-p, --post_id <id> Tweet/post ID
--post_ids <ids> Comma-separated tweet IDs
--connection_type <type> followers followers or following
--interaction_type <type> retweeters commenters, quoters, or retweeters
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--language <code> Language filter (e.g., en, zh)
--limit <n> Max number of results

Actions: search_posts, search_users, get_posts_by_author, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_quotes, get_retweets, get_post_interacting_users, count_posts

social instagram

Search and retrieve data from Instagram. 9 actions available.

# Search posts
gsk social instagram search_posts -q "travel photography" --start_date 2026-01-01

# Search users
gsk social instagram search_users -q "natgeo" --limit 5

# Get posts by a specific user
gsk social instagram get_posts_by_user -q "natgeo" --start_date 2026-03-01

# Get posts by IDs
gsk social instagram get_posts_by_ids --post_ids "abc123,def456"

# Get user profile
gsk social instagram get_user -q "natgeo"

# Get followers or following
gsk social instagram get_user_connections -q "natgeo" --connection_type following

# Get users by keywords
gsk social instagram get_users_by_keywords -q "landscape photographer"

# Get comments on a post
gsk social instagram get_comments -p "abc123" --start_date 2026-03-01

# Get users who liked or commented on a post
gsk social instagram get_post_interacting_users -p "abc123" --interaction_type likers
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or identifier
-p, --post_id <id> Post ID
--post_ids <ids> Comma-separated post IDs
--connection_type <type> followers followers or following
--interaction_type <type> likers likers or commenters
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--limit <n> Max number of results

Actions: search_posts, search_users, get_posts_by_user, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_post_interacting_users

social reddit

Search and retrieve data from Reddit. 9 actions available.

# Search posts (with sort and time filters)
gsk social reddit search_posts -q "rust programming" --sort top --time week -s "programming"

# Search comments
gsk social reddit search_comments -q "async await" -s "rust"

# Search users
gsk social reddit search_users -q "spez" --limit 5

# Search subreddits
gsk social reddit search_subreddits -q "machine learning" --limit 10

# Get a post with its comments
gsk social reddit get_post_with_comments -p "1abc2de"

# Get subreddit info with recent posts
gsk social reddit get_subreddit_with_posts -q "programming"

# Get subreddits by keywords
gsk social reddit get_subreddits_by_keywords -q "artificial intelligence"

# Get user profile
gsk social reddit get_user -q "spez"

# Get users by keywords (active in discussions)
gsk social reddit get_users_by_keywords -q "neural networks" -s "MachineLearning"
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or subreddit name
-p, --post_id <id> Post ID
-s, --subreddit <name> Subreddit name filter
--sort <order> Sort: relevance, hot, top, new, comments
--time <range> Time filter: hour, day, week, month, year, all
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--limit <n> Max number of results

Actions: search_posts, search_comments, search_users, search_subreddits, get_post_with_comments, get_subreddit_with_posts, get_subreddits_by_keywords, get_user, get_users_by_keywords

Local File Handling

Most commands that accept URLs also accept local file paths. The CLI automatically uploads local files before passing them to the API:

# These are equivalent:
gsk analyze "Describe this" -i ./photo.jpg
gsk img "Enhance this" -i ./photo.png -o ./result.png
gsk video "Animate this" -i ./frame.jpg -o ./video.mp4

Use -o / --output-file to save generated results directly to a local file.

Auto-Update

The CLI checks for updates every 4 hours and installs new versions in the background.

To disable auto-update:

# Via environment variable
export GSK_NO_AUTO_UPDATE=1

# Via config file
# Add "auto_update": false to ~/.genspark-tool-cli/config.json

Output Conventions

Stream Content Consumer
stdout JSON result Programs / AI agents
stderr Progress, debug, error messages Human / logs

This separation allows programs to parse clean JSON from stdout while humans can follow progress on stderr.

Available Models

Image Generation Models — gsk img -m <model>
Model Description
nano-banana-2 Gemini 3.1 Flash Image - Fast and efficient with advanced reasoning. Multi-image fusion with up to 14 references. Supports 0.5K-4K resolution
fal-ai/gpt-image-1.5 GPT Image 1.5 - Supports text-to-image and image editing with multi-image input
imagen4 Latest high quality image generation model, upgrade from Imagen 3
recraft-v3 Realistic image generation model
fal-ai/bytedance/seedream/v5/lite Bytedance Seedream v5 Lite - Text-to-image and image editing with native 2K resolution and excellent text layout
fal-ai/flux-2 Flux 2 - Text-to-image and image editing with enhanced realism and crisp text generation. Supports up to 3 images for edit mode
fal-ai/flux-2-pro Flux 2 Pro - Higher quality version of Flux 2 with professional-grade output
fal-ai/z-image/turbo Z-Image Turbo - Optimized for speed. Good for quick iterations, bulk generation, and style transfer
ideogram/V_3 Ideogram V3 - Character reference specialist with superior facial feature preservation and character consistency
qwen-image Chinese poster specialist with outstanding Chinese text rendering and cultural context mastery
bbox-segment Extract subjects from images based on bounding box region
fal-bria-rmbg Remove background from image
fal-ai/recraft-clarity-upscale Upscale image
fal-ai/image-editing/text-removal Remove text and watermarks from images while preserving background
flux-pro/outpaint Expand image to a specific aspect ratio
Video Generation Models — gsk video -m <model>
Model Capabilities Aspect Ratios Duration Notes
kling/v3 Text/Image-to-video 16:9, 9:16, 1:1 3-15s Latest Kling V3 with audio. Pro/Standard quality modes
gemini/veo3.1 Text/Image-to-video 16:9, 9:16 8s Latest Veo with enhanced quality. Supports fast_mode and hd_mode (1080p)
gemini/veo3.1/reference-to-video Reference-to-video 16:9, 9:16 8s Generate video using 1+ reference images. Supports fast_mode and hd_mode
gemini/veo3.1/first-last-frame-to-video Frame transition 16:9, 9:16 8s Precise transitions from first to last frame. Requires exactly 2 images
minimax/hailuo-2.3/standard Text/Image-to-video 16:9, 9:16 6s, 10s Fast (~4min), cost-effective. Supports first & last frame control
wan/v2.6 Text/Image/Video-to-video 16:9, 9:16, 1:1, 4:3, 3:4 5s, 10s, 15s 1080p with audio. Supports reference-to-video with 1-3 reference videos
vidu/q3 Text/Image-to-video 16:9, 9:16, 4:3, 3:4, 1:1 1-16s Enhanced quality with audio generation. Resolution: 720p, 1080p
runway/gen4_turbo Image-to-video 5:3, 3:5 5s, 10s Fast, high quality. Requires reference image
pixverse/v5 Text/Image-to-video 16:9, 9:16, 4:3, 1:1, 3:4 5s Fast (~30s). Supports start/end frame transitions
fal-ai/bytedance/seedance/v1.5/pro Text/Image-to-video 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 4-12s Seedance v1.5 Pro with native audio support. Supports first & last frame control
sora-2 Text/Image/Video-to-video 16:9, 9:16 4s, 8s, 12s OpenAI Sora 2 for fast, creative videos. Supports video remixing
sora-2-pro Text/Image-to-video 16:9, 9:16 4s, 8s Sora 2 Pro - Higher fidelity, cinematic quality. 720p and 1080p
fal-ai/bytedance-upscaler/upscale/video Video upscaling Upscale existing videos to 2K. Requires video_url parameter
xai/grok-imagine-video Text/Image-to-video 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, 9:21 1-15s xAI Grok Imagine Video. 720p HD output
Audio Generation Models — gsk audio -m <model>

Text-to-Speech (TTS)

Model Description
google/gemini-2.5-pro-preview-tts Best, high-quality, realistic TTS. Supports one or multiple speakers with speaker prefixes (e.g., Speaker1: text, Speaker2: text)
elevenlabs/v3-tts Advanced multilingual TTS with multi-speaker dialogue support. Supports emotional tags like [excited], [whispers], [laughs]
fal-ai/elevenlabs/tts/multilingual-v2 High-quality multilingual TTS. Preferred for English
fal-ai/minimax/speech-2.8-hd High-quality multilingual TTS. Preferred for Chinese, Cantonese, Japanese, Korean. One speaker per generation

Sound Effects

Model Description
elevenlabs/sound-effects Sound effect generation. Duration: 0.1-22 seconds

Music Generation

Model Description
elevenlabs/music ElevenLabs music generation with vocals/singing. Lyrics auto-generated (no custom lyrics). Duration: 10s-5min
CassetteAI/music-generator Background music generation. Duration: 10-180 seconds
mureka/song-generator Professional song generation with lyrics. Supports style prompts, reference tracks, vocal and melody inputs. Max: 180s
mureka/instrumental-generator Instrumental music generation without vocals. Supports style prompts and reference tracks. Max: 180s
fal-ai/lyria2 Google Lyria 2 text-to-music. Good for sound effects and lyrics-free music. Max: 30 seconds
fal-ai/minimax-music/v2.6 Song generation with lyrics using MiniMax Music 2.6. Supports markers (Verse), (Chorus), (Bridge), etc. Requires style prompt and lyrics

Voice Cloning & Transformation

Model Description
elevenlabs/voice-clone Clone a voice from audio samples. Returns voice ID for use in TTS generation
elevenlabs/voice-changer Transform audio from one voice to another. Requires source audio and target voice ID
fal-ai/minimax/voice-clone Clone a voice from a sample audio and generate speech from text prompts (gated feature)

License

MIT