Package Exports

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (vidistill) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

vidistill

Video intelligence distiller — turn any video or audio file into structured notes, transcripts, and insights using Gemini.

Feed it a YouTube URL, local video, or audio file. It analyzes the content through multiple AI passes (scene analysis, transcript, visuals, code extraction, people, chat, implicit signals) and synthesizes everything into organized markdown output.

Install

npm install -g vidistill

Requires Node.js 22+ and ffmpeg.

Usage

vidistill [input] [options]

Arguments:

input — YouTube URL, local video, or audio file path (prompted interactively if omitted)

Options:

-c, --context — context about the video (e.g. "CS lecture", "product demo")
-o, --output — output directory (default: ./vidistill-output/)
-l, --lang <code> — output language (e.g. zh, ja, ko, es, fr, de, pt, ru, ar, hi)

Examples:

# Interactive mode — prompts for everything
vidistill

# YouTube video
vidistill "https://youtube.com/watch?v=dQw4w9WgXcQ"

# Local file with context
vidistill ./lecture.mp4 --context "distributed systems lecture"

# Audio file
vidistill ./podcast.mp3

# Custom output directory
vidistill ./demo.mp4 -o ./notes/

# Output in another language
vidistill ./lecture.mp4 --lang zh

Extract

Pull specific data from a previously processed video or re-run a targeted pass on a video file.

vidistill extract <type> <source>

Arguments:

type — what to extract: code, links, people, transcript, or commands
source — path to a vidistill output directory or a video/audio file

Examples:

# Extract code from existing output (no API calls)
vidistill extract code ./vidistill-output/my-video/

# Extract links from a video file (runs targeted pipeline)
vidistill extract links ./lecture.mp4

When pointed at an output directory, extract reads from already-generated files with zero API calls. When pointed at a video file, it runs a minimal pipeline with only the passes needed for the requested data type.

API Key

vidistill needs a Gemini API key. It checks these sources in order:

GEMINI_API_KEY environment variable
~/.vidistill/config.json
Interactive prompt (with option to save for next time)

Get a key at ai.google.dev.

Output

vidistill creates a folder per video with structured files:

vidistill-output/my-video/
├── guide.md           # overview and navigation
├── transcript.md      # full timestamped transcript
├── combined.md        # transcript + visual notes merged
├── notes.md           # meeting/lecture notes
├── code/              # extracted and reconstructed source files
│   ├── *.ext          # individual source files
│   └── code-timeline.md  # code evolution timeline
├── people.md          # speakers and participants
├── chat.md            # chat messages and links
├── action-items.md    # tasks and follow-ups
├── insights.md        # implicit signals and analysis
├── links.md           # all URLs mentioned
├── metadata.json      # processing metadata
└── raw/               # raw pass outputs

Which files are generated depends on the video content — a coding tutorial gets code/, a meeting gets people.md and action-items.md, etc.

How It Works

Supported video formats: MP4, MOV, WebM, MKV, AVI, MPEG, FLV, WMV, 3GPP. Supported audio formats: MP3, AAC, WAV, FLAC, OGG, M4A.

Input — downloads YouTube video via yt-dlp or reads local file (video or audio), compresses if over 2GB
Pass 0 — scene analysis to classify video type and determine processing strategy
Pass 1 — transcript extraction with speaker identification
Pass 2 — visual content extraction (screen states, diagrams, slides)
Pass 3 — specialist passes based on video type:
- 3c: chat and links (live streams) — per segment
- 3d: implicit signals (all types) — per segment
- 3b: people and social dynamics (meetings) — whole video
- 3a: code reconstruction (coding videos) — whole video, runs 3x with consensus voting and validation
Synthesis — cross-references all passes into unified analysis
Output — generates structured markdown files

Audio files skip visual passes and go straight to transcript, people, implicit signals, and synthesis.

Long videos are segmented automatically. Passes that fail are skipped gracefully.

License

MIT