JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 21174
  • Score
    100M100P100Q141056F
  • License MIT

CLI tool for Genspark Tool API - search, crawl, analyze images, generate media

Package Exports

  • @genspark/cli
  • @genspark/cli/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@genspark/cli) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Genspark CLI (gsk)

One CLI. Every AI capability. Search, generate, analyze, communicate — all from your terminal.

gsk is the command-line interface for the Genspark AI platform. It unifies 30+ AI tools behind a single binary: web search, image/video/audio generation with 40+ models, document analysis, media transcription, cloud file management, email, calendar, AI phone calls, stock data, social media data (Twitter, Instagram, Reddit), and autonomous AI agents — all with clean JSON output for seamless integration with AI coding assistants, automation pipelines, and scripts.

Capability Map

Category What You Get
🔍 Search Web search, image search
📄 Documents Crawl pages, summarize PDFs/docs
🎨 Images 16 models: GPT Image, Gemini, Flux 2, Imagen 4, Recraft, Ideogram, Seedream ...
🎬 Videos 14 models: Kling V3, Veo 3.1, Sora 2, Hailuo, Wan, Runway, PixVerse, Seedance ...
🎵 Audio 14 models: Gemini TTS, ElevenLabs, MiniMax, Mureka, CassetteAI, Lyria 2 ...
🧠 Analysis Image/video/audio understanding, OCR, video style replication
📝 Transcribe Whisper, Gemini, ElevenLabs Scribe
☁️ AI Drive Cloud file storage, download, compress
📧 Email Gmail & Outlook: read, search, send
📅 Calendar Google & Outlook: list, create events
📞 Phone AI-powered phone calls to businesses
📈 Stocks Real-time stock prices
📱 Social Media Twitter/X, Instagram, Reddit — search posts/users, get comments, connections, and more (30 APIs)
🤖 Agents Podcasts, docs, slides, deep research, fact-checking, websites, batch media generation
🔊 Voice Voice cloning, voice changer

Table of Contents

Installation

npm install -g @genspark/cli

Requires Node.js >= 18.

Quick Start

# Log in via browser
gsk login

# Search the web
gsk search "latest AI news"

# Generate an image
gsk img "A beautiful sunset over mountains" -o ./sunset.png

# Crawl a webpage
gsk crawl "https://example.com/article"

Authentication

Log in with your Genspark account:

gsk login

This opens a browser for authentication and saves the API key to ~/.genspark-tool-cli/config.json.

Alternatively, provide an API key directly:

# Via environment variable
export GSK_API_KEY="gsk_..."

# Via CLI option
gsk search "query" --api-key "gsk_..."

To check your current identity:

gsk login-info
gsk me          # shorthand

To log out:

gsk logout

Configuration

Config is loaded from three sources (highest priority first):

  1. CLI options--api-key, --base-url, etc.
  2. Environment variablesGSK_API_KEY, GSK_BASE_URL, GSK_PROJECT_ID
  3. Config file~/.genspark-tool-cli/config.json
{
  "api_key": "gsk_...",
  "base_url": "https://www.genspark.ai",
  "project_id": "project_abc123",
  "debug": false,
  "timeout": 300000
}

Global Options

Option Env Var Default Description
--api-key <key> GSK_API_KEY API key (required)
--base-url <url> GSK_BASE_URL https://www.genspark.ai API base URL
--project-id <id> GSK_PROJECT_ID Project ID for access control
--debug false Enable debug output
--timeout <ms> 300000 (5 min) Request timeout
--output <format> json Output format: json or text
--refresh Force refresh cached tool schemas

Commands

list-tools (alias: ls)

List all available tools.

gsk list-tools
gsk ls

login-info (alias: me)

Show your current account info — email, name, and membership plan.

gsk login-info
gsk me

init-opencode

Generate an .opencode.json config file for OpenCode, pre-configured to use Genspark's LLM proxy with your API key.

# Generate with default model (claude-opus-4-6-1m)
gsk init-opencode

# Specify a different default model
gsk init-opencode --model claude-sonnet-4-6

# Write to a custom path
gsk init-opencode -o ./my-project/.opencode.json
Option Default Description
--model <name> claude-opus-4-6-1m Default model for OpenCode
-o, --out <path> .opencode.json (cwd) Output file path

init-skills

Sync GSK skill documents into the current project for AI agent discovery. Copies all skill docs and generates a CONTEXT.md entry point that AI agents (Claude Code, Gemini, etc.) can load automatically.

# Copy skills to .gsk/skills/ and generate CONTEXT.md
gsk init-skills

# Also generate .claude/ config for Claude Code
gsk init-skills --agent claude

# Generate config for all supported agents (Claude, Gemini)
gsk init-skills --agent all

# Custom output directory
gsk init-skills -o ./docs/gsk-skills
Option Default Description
-o, --out <dir> .gsk/skills (cwd) Output directory for skills
--agent <type> Generate agent config: claude, gemini, or all

Search & Crawl

Search the web.

gsk search "latest AI news"

crawler (alias: crawl)

Extract content from a web page or document.

gsk crawl "https://example.com/article"

summarize_large_document (alias: summarize)

Analyze a document and answer questions about it.

gsk summarize "https://example.com/report.pdf" --question "What are the key findings?"
Option Description
<url> Document URL (required, positional)
--question <text> Question about the document

Search for images on the web.

gsk img-search "modern architecture"

Media Analysis & Transcription

understand_images (alias: analyze)

Analyze images with AI vision model.

gsk analyze "Describe this image" -i "https://example.com/image.jpg"
gsk analyze "Extract all text" -i "https://img1.jpg" "https://img2.jpg"
gsk analyze "What's in this photo?" -i ./photo.jpg
Option Default Description
-i, --image_urls <url...> Image URL(s) or local file path(s) to analyze (required)
-r, --instruction <text> Custom analysis instruction

Image Generation

image_generation (alias: img)

Generate images using AI. Supports text-to-image and image-to-image.

# Text-to-image
gsk img "A beautiful sunset over mountains" -r "16:9" -o ./sunset.png
gsk img "Modern office at night" -s "4k" -r "1:1"

# Image-to-image (reference-based)
gsk img "A portrait in similar style" -i ./reference.png
Option Default Description
-r, --aspect_ratio <ratio> 1:1 Aspect ratio (1:1, 16:9, 9:16)
-s, --image_size <size> auto Image size: auto, 2k, 4k
-m, --model <name> Model to use (optional)
-i, --image_urls <url...> Reference image URL(s) or local file(s) for image-to-image
-o, --output-file <path> Download the generated file to a local path

Video Generation

video_generation (alias: video)

Generate videos using AI.

gsk video "A cat playing with yarn" -m "kling/v1.6/standard" -d 5 -o ./cat.mp4
gsk video "Sunrise over a beach" -m "minimax/hailuo-02/standard" -r "16:9" -d 8

# Image-to-video
gsk video "Camera pan around the subject" -m "kling/v1.6/standard" -i ./photo.jpg
Option Default Description
-m, --model <name> Model (required). e.g., kling/v1.6/standard, minimax/hailuo-02/standard
-r, --aspect_ratio <ratio> 16:9 Aspect ratio
-d, --duration <sec> 5 Duration in seconds (2-15)
-i, --image_urls <url...> Reference image URL(s) or local file(s)
-a, --audio_url <url> Audio URL for soundtrack
-o, --output-file <path> Download the generated file to a local path

Audio Generation

audio_generation (alias: audio)

Generate audio: TTS, music, or sound effects.

# Text-to-speech
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -r "professional female voice"
gsk audio "Hello, welcome to Genspark!" -m "google/gemini-2.5-pro-preview-tts" -o ./hello.mp3

# Music with lyrics
gsk audio "A pop song" -m "fal-ai/minimax/speech-2.6-hd" -l "Verse 1: ..." -d 120

# Sound effect
gsk audio "Door creaking slowly open" -m "sfx-model"
Option Default Description
-m, --model <name> Model (required). e.g., elevenlabs/v3-tts, fal-ai/minimax/speech-2.6-hd
-d, --duration <sec> 0 (auto) Duration in seconds
-r, --requirements <text> Voice requirements for TTS
-l, --lyrics <text> Lyrics for song generation
-o, --output-file <path> Download the generated file to a local path

File Transfer

upload

Upload a local file and get a URL for use in other commands.

gsk upload "./image.png"
gsk upload "./document.pdf"

download

Download a file from a file wrapper URL.

# Get download URL only
gsk download "/api/files/s/abc123"

# Download and save to local file
gsk download "/api/files/s/abc123" -s "./downloaded.png"
Option Description
-s, --save <path> Download and save to local file path

analyze_media (alias: media-analyze)

Analyze various types of media content including images, audio, and video.

gsk media-analyze -i "https://example.com/image.jpg" -r "Describe the content"
gsk media-analyze -i "https://example.com/video.mp4" -r "Summarize the video"
Option Default Description
-i, --media_urls <urls> Media URL(s) to analyze (required)
-r, --requirements <text> Analysis instructions

audio_transcribe (alias: transcribe)

Transcribe audio files to text.

gsk transcribe -i "https://example.com/audio.mp3"
gsk transcribe -i ./meeting.wav -m "whisper-large-v3"
Option Default Description
-i, --audio_urls <url...> Audio URL(s) or local file(s) to transcribe (required)
-m, --model <name> Transcription model to use

AI Drive (Cloud Storage)

aidrive (alias: drive)

AI-Drive file storage and management. List, create, delete, move files and directories. Download videos, audio, and files from URLs directly to AI-Drive.

# List files in root directory
gsk drive ls
gsk drive ls -p "/documents" -f file

# Create directory
gsk drive mkdir -p "/my-folder"

# Move file
gsk drive move -p "/old-path/file.txt" --target_path "/new-path/file.txt"

# Download video/audio/file to AI-Drive
gsk drive download_video --video_url "https://example.com/video.mp4" --target_folder "/videos"
gsk drive download_file --file_url "https://example.com/doc.pdf" --target_folder "/docs"

# Upload inline text content to AI-Drive
gsk drive upload --file_content "Hello World" --upload_path "/notes/hello.txt"

# Upload a local file directly to AI-Drive (streaming, supports 100MB+ files)
gsk drive upload --local_file ./report.pdf --upload_path /docs/report.pdf
gsk drive upload --local_file ./video.mp4 --upload_path /videos/demo.mp4
gsk drive upload --local_file ./photo.png              # upload_path defaults to /photo.png
gsk drive upload --local_file ./doc.pdf --upload_path /docs/doc.pdf --override  # overwrite existing

# Get readable URL for a file
gsk drive get_readable_url -p "/documents/report.pdf"
Option Default Description
-p, --path <path> File or directory path in AI-Drive
-f, --filter_type <type> all Filter: all, file, directory
--file_type <type> all File type filter: all, audio, video, image
--target_path <path> Target path for move operations
--target_folder <path> Target folder for downloads
--video_url <url> Video URL for download_video action
--audio_url <url> Audio URL for download_audio action
--file_url <url> File URL for download_file action
--file_name <name> Custom file name for downloads
--file_content <text> Inline text content to upload
--local_file <path> Local file path to upload directly to AI-Drive (streaming, no size limit)
--upload_path <path> Destination path for upload (defaults to /<filename> for --local_file)
--override false Overwrite an existing file at the destination path

AI Agents & Tasks

create_task (alias: task)

Create and execute tasks using specialized AI agents.

# Create a podcast
gsk task podcasts --task_name "AI Trends" --query "Create a podcast about AI trends" --instructions "Focus on practical applications"

# Create a document
gsk task docs --task_name "Quantum Report" --query "Write a report on quantum computing" --instructions "Include recent breakthroughs"

# Create slides
gsk task slides --task_name "Q4 Results" --query "Create a Q4 results presentation" --instructions "Use charts and data"

# Create a spreadsheet (returns file wrapper URL, use `gsk download` to save)
gsk task sheets --task_name "Sales Report" --query "Create a quarterly sales report with formulas" --instructions "Use formulas and formatting"

# Deep research
gsk task deep_research --task_name "Fusion Energy" --query "Research fusion energy advances" --instructions "Cover public and private sector"

# Fact-check a claim
gsk task cross_check --task_name "Earth shape" --query "The Earth is flat" --instructions "Verify this claim with evidence"
Option Default Description
--task_name <name> Name for the task (required)
--query <text> Query describing what to create (required)
--instructions <text> Detailed instructions (required)

Supported task types: super_agent, podcasts, docs, slides, sheets, deep_research, website, video_generation, audio_generation, meeting_notes, cross_check

Stock Prices

stock_price (alias: stock)

Retrieve stock price information and financial data.

gsk stock AAPL
gsk stock MSFT

email

Read and send emails from your connected Gmail or Outlook account.

Requirements: Connect Gmail or Outlook in Genspark Account Settings → Integrations.

email list

List emails from a folder.

# List inbox (default)
gsk email list

# List sent folder, limit to 5 emails
gsk email list sent -n 5

# List only unread emails
gsk email list --unread_only

# List emails after a specific date
gsk email list --after_date 2026-03-01

# Use a specific account (when multiple accounts are connected)
gsk email list -a user@gmail.com
Option Default Description
[folder] inbox Folder: inbox, sent, drafts, trash, spam, archive, or a custom label
-n, --limit <n> 20 Maximum number of emails to return
--unread_only false Return only unread emails
--after_date <YYYY-MM-DD> Return emails after this date
--before_date <YYYY-MM-DD> Return emails before this date
-a, --from_account <email> Account email address (for multi-account)

email read

Read a specific email by ID.

# Read a specific email (get ID from email list output)
gsk email read 19cbfecd7fb14d46

# Ask a specific question about the email
gsk email read 19cbfecd7fb14d46 "What action is required?"

# Specify account for multi-account setups
gsk email read 19cbfecd7fb14d46 -a user@gmail.com
Option Default Description
<id> Email ID from email list or email search (required)
-a, --from_account <email> Account email address (for multi-account)

Search emails using a query string.

# Search by subject
gsk email search "meeting agenda"

# Gmail query syntax
gsk email search "from:boss@company.com subject:budget"

# Search with date range
gsk email search "invoice" --after_date 2026-01-01 --before_date 2026-03-01

# Limit results
gsk email search "quarterly report" -n 5
Option Default Description
<query> Search query (Gmail GQL or Outlook KQL) (required)
-n, --limit <n> 20 Maximum number of results
--after_date <YYYY-MM-DD> Return emails after this date
--before_date <YYYY-MM-DD> Return emails before this date
-a, --from_account <email> Account email address (for multi-account)

email send

Send a new email.

# Send a plain text email
gsk email send --to recipient@example.com --subject "Hello" --body "Hi there!"

# Send to multiple recipients
gsk email send --to "alice@example.com,bob@example.com" --subject "Team Update" --body "..."

# Send HTML email
gsk email send --to user@example.com --subject "Newsletter" --body "<h1>Title</h1><p>...</p>" --content_type text/html

# Send with CC and BCC
gsk email send --to recipient@example.com --subject "Meeting" --body "..." --cc manager@company.com --bcc archive@company.com

# Send from a specific account
gsk email send --to user@example.com --subject "Hi" --body "..." -a sender@gmail.com
Option Default Description
--to <addresses> Recipient(s), comma-separated (required)
--subject <text> Email subject (required)
--body <text> Email body (required)
--cc <addresses> CC recipient(s), comma-separated
--bcc <addresses> BCC recipient(s), comma-separated
--content_type <type> text/plain Body content type: text/plain or text/html
-a, --from_account <email> Account to send from (for multi-account)

calendar

List and create calendar events from your connected Google Calendar or Outlook account.

Requirements: Connect Google Calendar or Outlook in Genspark Account Settings → Integrations.

calendar list

List upcoming calendar events.

# List events in the next 7 days (default)
gsk calendar list

# List events in a specific time range
gsk calendar list --time_min 2026-03-10T00:00:00Z --time_max 2026-03-15T23:59:59Z

# Search for events by keyword
gsk calendar list --filter_query "standup"

# Use a specific calendar account
gsk calendar list -a user@gmail.com
Option Default Description
--time_min <ISO8601> now Start of time range (e.g., 2026-03-10T00:00:00Z)
--time_max <ISO8601> +7 days End of time range
--filter_query <text> Filter events by title/subject keyword
-a, --from_account <email> Calendar account email (for multi-account)

calendar create

Create a new calendar event.

# Create a simple event
gsk calendar create --summary "Team Meeting" --start_time "2026-03-10T14:00:00-08:00" --end_time "2026-03-10T15:00:00-08:00"

# Create an event with details
gsk calendar create \
  --summary "Q1 Planning" \
  --start_time "2026-03-15T09:00:00-08:00" \
  --end_time "2026-03-15T11:00:00-08:00" \
  --description "Quarterly planning session" \
  --location "Conference Room A" \
  --time_zone "America/Los_Angeles"

# Create an event with attendees (sends invitations)
gsk calendar create \
  --summary "Interview" \
  --start_time "2026-03-12T10:00:00-08:00" \
  --end_time "2026-03-12T11:00:00-08:00" \
  --attendees candidate@example.com interviewer@company.com

# Create on a specific calendar account
gsk calendar create --summary "Personal Event" --start_time "..." --end_time "..." -a user@gmail.com
Option Default Description
--summary <title> Event title (required)
--start_time <ISO8601> Start time with timezone offset (required)
--end_time <ISO8601> End time with timezone offset (required)
--description <text> Event description
--location <text> Event location
--attendees <emails...> Attendee email address(es) (sends invitations)
--time_zone <tz> UTC Timezone name (e.g., America/Los_Angeles)
-a, --from_account <email> Calendar account email (for multi-account)

AI Phone Calls

call

Make an AI phone call on your behalf. The AI validates prerequisites, resolves contact info, and initiates the call.

# Call a business by phone number
gsk call "Pizza Hut" -c "+1-555-123-4567" -p "Check if they deliver to my area"

# Call a business by Google Maps place_id
gsk call "Joe's Pizza" -c "ChIJxxxxxxxx" --is_place_id -p "Reserve a table for 4"

# Dry run: validate and resolve contact info without initiating the call
gsk call "Pizza Hut" -c "+1-555-123-4567" -p "Check hours" --dry-run
Option Default Description
<recipient> Name of the person or business to call (required, positional)
-c, --contact_info <value> Phone number or Google Maps place_id (required)
--is_place_id false Treat contact_info as a Google Maps place_id
-p, --purpose <value> Purpose of the call (required)
--dry-run Only validate and resolve contact info, do not initiate the call

Social Media

Retrieve data from Twitter/X, Instagram, and Reddit. All social commands are grouped under gsk social.

social twitter

Search and retrieve data from Twitter/X. 12 actions available.

# Search tweets
gsk social twitter search_posts -q "artificial intelligence" --start_date 2026-03-01 --language en

# Search users
gsk social twitter search_users -q "openai" --limit 5

# Get tweets by a specific author
gsk social twitter get_posts_by_author -q "elonmusk" --start_date 2026-01-01

# Get tweets by IDs
gsk social twitter get_posts_by_ids --post_ids "123456789,987654321"

# Get user profile
gsk social twitter get_user -q "elonmusk"

# Get followers or following
gsk social twitter get_user_connections -q "elonmusk" --connection_type followers

# Get users by keywords (mentioned in tweets)
gsk social twitter get_users_by_keywords -q "machine learning" --start_date 2026-01-01

# Get comments on a tweet
gsk social twitter get_comments -p "123456789" --start_date 2026-03-01

# Get quotes of a tweet
gsk social twitter get_quotes -p "123456789"

# Get retweets of a tweet
gsk social twitter get_retweets -p "123456789"

# Get users who interacted with a tweet
gsk social twitter get_post_interacting_users -p "123456789" --interaction_type retweeters

# Count posts matching a query
gsk social twitter count_posts -q "AI" --start_date 2026-03-01 --end_date 2026-03-10
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or identifier
-p, --post_id <id> Tweet/post ID
--post_ids <ids> Comma-separated tweet IDs
--connection_type <type> followers followers or following
--interaction_type <type> retweeters commenters, quoters, or retweeters
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--language <code> Language filter (e.g., en, zh)
--limit <n> Max number of results

Actions: search_posts, search_users, get_posts_by_author, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_quotes, get_retweets, get_post_interacting_users, count_posts

social instagram

Search and retrieve data from Instagram. 9 actions available.

# Search posts
gsk social instagram search_posts -q "travel photography" --start_date 2026-01-01

# Search users
gsk social instagram search_users -q "natgeo" --limit 5

# Get posts by a specific user
gsk social instagram get_posts_by_user -q "natgeo" --start_date 2026-03-01

# Get posts by IDs
gsk social instagram get_posts_by_ids --post_ids "abc123,def456"

# Get user profile
gsk social instagram get_user -q "natgeo"

# Get followers or following
gsk social instagram get_user_connections -q "natgeo" --connection_type following

# Get users by keywords
gsk social instagram get_users_by_keywords -q "landscape photographer"

# Get comments on a post
gsk social instagram get_comments -p "abc123" --start_date 2026-03-01

# Get users who liked or commented on a post
gsk social instagram get_post_interacting_users -p "abc123" --interaction_type likers
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or identifier
-p, --post_id <id> Post ID
--post_ids <ids> Comma-separated post IDs
--connection_type <type> followers followers or following
--interaction_type <type> likers likers or commenters
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--limit <n> Max number of results

Actions: search_posts, search_users, get_posts_by_user, get_posts_by_ids, get_user, get_user_connections, get_users_by_keywords, get_comments, get_post_interacting_users

social reddit

Search and retrieve data from Reddit. 9 actions available.

# Search posts (with sort and time filters)
gsk social reddit search_posts -q "rust programming" --sort top --time week -s "programming"

# Search comments
gsk social reddit search_comments -q "async await" -s "rust"

# Search users
gsk social reddit search_users -q "spez" --limit 5

# Search subreddits
gsk social reddit search_subreddits -q "machine learning" --limit 10

# Get a post with its comments
gsk social reddit get_post_with_comments -p "1abc2de"

# Get subreddit info with recent posts
gsk social reddit get_subreddit_with_posts -q "programming"

# Get subreddits by keywords
gsk social reddit get_subreddits_by_keywords -q "artificial intelligence"

# Get user profile
gsk social reddit get_user -q "spez"

# Get users by keywords (active in discussions)
gsk social reddit get_users_by_keywords -q "neural networks" -s "MachineLearning"
Option Default Description
<action> Action to perform (required, positional)
-q, --query <text> Search query, username, or subreddit name
-p, --post_id <id> Post ID
-s, --subreddit <name> Subreddit name filter
--sort <order> Sort: relevance, hot, top, new, comments
--time <range> Time filter: hour, day, week, month, year, all
--start_date <YYYY-MM-DD> Start date filter
--end_date <YYYY-MM-DD> End date filter
--limit <n> Max number of results

Actions: search_posts, search_comments, search_users, search_subreddits, get_post_with_comments, get_subreddit_with_posts, get_subreddits_by_keywords, get_user, get_users_by_keywords

Local File Handling

Most commands that accept URLs also accept local file paths. The CLI automatically uploads local files before passing them to the API:

# These are equivalent:
gsk analyze "Describe this" -i ./photo.jpg
gsk img "Enhance this" -i ./photo.png -o ./result.png
gsk video "Animate this" -i ./frame.jpg -o ./video.mp4

Use -o / --output-file to save generated results directly to a local file.

Auto-Update

The CLI checks for updates every 4 hours and installs new versions in the background.

To disable auto-update:

# Via environment variable
export GSK_NO_AUTO_UPDATE=1

# Via config file
# Add "auto_update": false to ~/.genspark-tool-cli/config.json

Output Conventions

Stream Content Consumer
stdout JSON result Programs / AI agents
stderr Progress, debug, error messages Human / logs

This separation allows programs to parse clean JSON from stdout while humans can follow progress on stderr.

Available Models

Image Generation Models — gsk img -m <model>
Model Description
nano-banana-2 Gemini 3.1 Flash Image - Fast and efficient with advanced reasoning. Multi-image fusion with up to 14 references. Supports 0.5K-4K resolution
fal-ai/gpt-image-1.5 GPT Image 1.5 - Supports text-to-image and image editing with multi-image input
imagen4 Latest high quality image generation model, upgrade from Imagen 3
recraft-v3 Realistic image generation model
fal-ai/bytedance/seedream/v5/lite Bytedance Seedream v5 Lite - Text-to-image and image editing with native 2K resolution and excellent text layout
fal-ai/flux-2 Flux 2 - Text-to-image and image editing with enhanced realism and crisp text generation. Supports up to 3 images for edit mode
fal-ai/flux-2-pro Flux 2 Pro - Higher quality version of Flux 2 with professional-grade output
fal-ai/z-image/turbo Z-Image Turbo - Optimized for speed. Good for quick iterations, bulk generation, and style transfer
ideogram/V_3 Ideogram V3 - Character reference specialist with superior facial feature preservation and character consistency
qwen-image Chinese poster specialist with outstanding Chinese text rendering and cultural context mastery
bbox-segment Extract subjects from images based on bounding box region
fal-bria-rmbg Remove background from image
fal-ai/recraft-clarity-upscale Upscale image
fal-ai/image-editing/text-removal Remove text and watermarks from images while preserving background
flux-pro/outpaint Expand image to a specific aspect ratio
Video Generation Models — gsk video -m <model>
Model Capabilities Aspect Ratios Duration Notes
kling/v3 Text/Image-to-video 16:9, 9:16, 1:1 3-15s Latest Kling V3 with audio. Pro/Standard quality modes
gemini/veo3.1 Text/Image-to-video 16:9, 9:16 8s Latest Veo with enhanced quality. Supports fast_mode and hd_mode (1080p)
gemini/veo3.1/reference-to-video Reference-to-video 16:9, 9:16 8s Generate video using 1+ reference images. Supports fast_mode and hd_mode
gemini/veo3.1/first-last-frame-to-video Frame transition 16:9, 9:16 8s Precise transitions from first to last frame. Requires exactly 2 images
minimax/hailuo-2.3/standard Text/Image-to-video 16:9, 9:16 6s, 10s Fast (~4min), cost-effective. Supports first & last frame control
wan/v2.6 Text/Image/Video-to-video 16:9, 9:16, 1:1, 4:3, 3:4 5s, 10s, 15s 1080p with audio. Supports reference-to-video with 1-3 reference videos
vidu/q3 Text/Image-to-video 16:9, 9:16, 4:3, 3:4, 1:1 1-16s Enhanced quality with audio generation. Resolution: 720p, 1080p
runway/gen4_turbo Image-to-video 5:3, 3:5 5s, 10s Fast, high quality. Requires reference image
pixverse/v5 Text/Image-to-video 16:9, 9:16, 4:3, 1:1, 3:4 5s Fast (~30s). Supports start/end frame transitions
fal-ai/bytedance/seedance/v1.5/pro Text/Image-to-video 21:9, 16:9, 4:3, 1:1, 3:4, 9:16 4-12s Seedance v1.5 Pro with native audio support. Supports first & last frame control
sora-2 Text/Image/Video-to-video 16:9, 9:16 4s, 8s, 12s OpenAI Sora 2 for fast, creative videos. Supports video remixing
sora-2-pro Text/Image-to-video 16:9, 9:16 4s, 8s Sora 2 Pro - Higher fidelity, cinematic quality. 720p and 1080p
fal-ai/bytedance-upscaler/upscale/video Video upscaling Upscale existing videos to 2K. Requires video_url parameter
xai/grok-imagine-video Text/Image-to-video 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, 9:21 1-15s xAI Grok Imagine Video. 720p HD output
Audio Generation Models — gsk audio -m <model>

Text-to-Speech (TTS)

Model Description
google/gemini-2.5-pro-preview-tts Best, high-quality, realistic TTS. Supports one or multiple speakers with speaker prefixes (e.g., Speaker1: text, Speaker2: text)
elevenlabs/v3-tts Advanced multilingual TTS with multi-speaker dialogue support. Supports emotional tags like [excited], [whispers], [laughs]
fal-ai/elevenlabs/tts/multilingual-v2 High-quality multilingual TTS. Preferred for English
fal-ai/minimax/speech-2.8-hd High-quality multilingual TTS. Preferred for Chinese, Cantonese, Japanese, Korean. One speaker per generation

Sound Effects

Model Description
elevenlabs/sound-effects Sound effect generation. Duration: 0.1-22 seconds

Music Generation

Model Description
elevenlabs/music ElevenLabs music generation with vocals/singing. Lyrics auto-generated (no custom lyrics). Duration: 10s-5min
CassetteAI/music-generator Background music generation. Duration: 10-180 seconds
mureka/song-generator Professional song generation with lyrics. Supports style prompts, reference tracks, vocal and melody inputs. Max: 180s
mureka/instrumental-generator Instrumental music generation without vocals. Supports style prompts and reference tracks. Max: 180s
fal-ai/lyria2 Google Lyria 2 text-to-music. Good for sound effects and lyrics-free music. Max: 30 seconds
fal-ai/minimax-music/v2.5 Song generation with lyrics using MiniMax Music 2.5. Supports markers (Verse), (Chorus), (Bridge), etc. Requires style prompt and lyrics

Voice Cloning & Transformation

Model Description
elevenlabs/voice-clone Clone a voice from audio samples. Returns voice ID for use in TTS generation
elevenlabs/voice-changer Transform audio from one voice to another. Requires source audio and target voice ID
fal-ai/minimax/voice-clone Clone a voice from a sample audio and generate speech from text prompts (gated feature)

License

MIT