JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 37
  • Score
    100M100P100Q63451F
  • License MIT

Video intelligence distiller — extract structured notes, transcripts, and insights from any video using Gemini

Package Exports

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (vidistill) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    vidistill

    Video intelligence distiller — turn any video or audio file into structured notes, transcripts, and insights using Gemini.

    Feed it a YouTube URL, local video, or audio file. It analyzes the content through multiple AI passes (scene analysis, transcript, visuals, code extraction, people, chat, implicit signals) and synthesizes everything into organized markdown output.

    Install

    npm install -g vidistill

    Requires Node.js 22+ and ffmpeg.

    Usage

    vidistill [input] [options]

    Arguments:

    • input — YouTube URL, local video, or audio file path (prompted interactively if omitted)

    Options:

    • -c, --context — context about the video (e.g. "CS lecture", "product demo")
    • -o, --output — output directory (default: ./vidistill-output/)
    • -l, --lang <code> — output language (e.g. zh, ja, ko, es, fr, de, pt, ru, ar, hi)

    Examples:

    # Interactive mode — prompts for everything
    vidistill
    
    # YouTube video
    vidistill "https://youtube.com/watch?v=dQw4w9WgXcQ"
    
    # Local file with context
    vidistill ./lecture.mp4 --context "distributed systems lecture"
    
    # Audio file
    vidistill ./podcast.mp3
    
    # Custom output directory
    vidistill ./demo.mp4 -o ./notes/
    
    # Output in another language
    vidistill ./lecture.mp4 --lang zh

    API Key

    vidistill needs a Gemini API key. It checks these sources in order:

    1. GEMINI_API_KEY environment variable
    2. ~/.vidistill/config.json
    3. Interactive prompt (with option to save for next time)

    Get a key at ai.google.dev.

    Output

    vidistill creates a folder per video with structured files:

    vidistill-output/my-video/
    ├── guide.md           # overview and navigation
    ├── transcript.md      # full timestamped transcript
    ├── combined.md        # transcript + visual notes merged
    ├── notes.md           # notes, implicit questions/decisions, recurring themes
    ├── code/              # extracted and reconstructed source files
    │   ├── *.ext          # individual source files
    │   └── code-timeline.md  # code evolution timeline
    ├── people.md          # speakers and participants
    ├── chat.md            # chat messages and links
    ├── action-items.md    # tasks and follow-ups
    ├── links.md           # all URLs mentioned
    ├── timeline.html      # interactive visual timeline
    ├── metadata.json      # processing metadata
    └── raw/               # raw pass outputs

    Which files are generated depends on the video content — a coding tutorial gets code/, a meeting gets people.md and action-items.md, etc.

    Speaker Naming

    When multiple speakers are detected, use rename-speakers to assign real names. Names replace generic labels (SPEAKER_00, SPEAKER_01) in all output files.

    To rename speakers:

    # Interactive rename — prompts for each speaker
    vidistill rename-speakers ./vidistill-output/my-meeting/
    
    # List current speaker state
    vidistill rename-speakers ./vidistill-output/my-meeting/ --list
    
    # Quick rename a single speaker
    vidistill rename-speakers ./vidistill-output/my-meeting/ --rename "Steven Kang" "Steven K."
    
    # Merge two speakers (e.g. same person on different devices)
    vidistill rename-speakers ./vidistill-output/my-meeting/ --merge "K Iphone" "Kristian"

    MCP Server

    vidistill can run as an MCP server, letting AI coding tools (Claude Code, Cursor, etc.) analyze videos and read output directly.

    vidistill mcp

    To configure in Claude Code:

    claude mcp add vidistill -- npx vidistill mcp

    Tools exposed:

    • analyze_video — run the full pipeline on a URL or file, returns output dir + summary
    • get_transcript — read transcript from an existing output dir, with optional time range filtering
    • get_code — read extracted code files from an existing output dir

    Requires GEMINI_API_KEY set as environment variable or in ~/.vidistill/config.json.

    How It Works

    Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Supported audio formats: MP3, AAC, WAV, FLAC, OGG, M4A.

    1. Input — accepts YouTube URL directly or reads local file (video or audio), compresses if over 2GB
    2. Pass 0 — scene analysis to classify video type and determine processing strategy
    3. Pass 1a — pure verbatim transcription (timestamps, tone, emphasis — no speaker labels), runs 3x with consensus alignment
    4. Pass 1b — speaker diarization (assigns SPEAKER_XX labels to transcript entries using voice and visual cues, then merged with 1a), runs 3x with majority voting
    5. Pass 2 — visual content extraction (screen states, diagrams, slides)
    6. Pass 3 — specialist passes based on video type:
      • 3c: chat and links (live streams) — per segment, runs 3x with consensus voting
      • 3d: implicit signals (all types) — per segment
      • 3b: people and social dynamics (meetings) — whole video, anchored to transcript speakers
      • 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
    7. Synthesis — cross-references all passes into unified analysis
    8. Output — generates structured markdown files

    Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.

    Long videos are segmented automatically. Passes that fail are skipped gracefully.

    License

    MIT