Package Exports

framecap
framecap/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (framecap) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

framecap

YouTube videos → structured markdown with visual frame captures.

Takes a YouTube URL and outputs a clean markdown document with a structured transcript (chapter headers, speaker labels, paragraph breaks) and frame captures at key moments — embedded as images in the markdown.

Why

YouTube videos contain valuable knowledge, but it's trapped in a format you can't search, reference, or link to. Transcripts alone miss the visual context. framecap gives you both: a readable document with visual bookmarks.

Install

# Prerequisites
brew install yt-dlp ffmpeg

# Install framecap
npm install -g framecap

Usage

# Single video
framecap https://youtube.com/watch?v=abc123

# Custom output directory
framecap https://youtube.com/watch?v=abc -o ./notes/

# Capture frames at specific timestamps
framecap https://youtube.com/watch?v=abc --capture-at 1:30,5:00,12:45

# Hint speaker names for interviews
framecap https://youtube.com/watch?v=abc --speakers "Lex Fridman,Andrej Karpathy"

# Skip LLM structuring (free mode — raw transcript + frames only)
framecap https://youtube.com/watch?v=abc --no-structure

# Obsidian-compatible output (wikilink image syntax)
framecap https://youtube.com/watch?v=abc --format obsidian

# Preview plan and cost (fetches metadata + transcript, skips video download and LLM)
framecap https://youtube.com/watch?v=abc --dry-run

Output

./how-karpathy-builds-software.md
./frames/how-karpathy-builds-software/
├── frame-0001-00m00s.jpg
├── frame-0002-01m45s.jpg
├── frame-0003-05m30s.jpg
└── ...

The markdown file includes:

YAML frontmatter — title, channel, URL, duration, upload date, auto-generated tags
Structured transcript — organized by chapters (from video description), with speaker labels and natural paragraph breaks
Embedded frames — images at key moments, with timestamps and captions

Options

Flag	Default	Description
`-o, --output`	`./`	Output directory
`--interval`	auto	Force fixed-interval frame capture (seconds)
`--max-frames`	`50`	Maximum frames to extract
`--dedup-threshold`	`0.85`	Frame similarity filter (0.0-1.0)
`--no-dedup`	off	Keep all frames
`--format`	`markdown`	`markdown` or `obsidian` (wikilinks)
`--capture-at`	—	Capture at specific timestamps (e.g. `1:30,5:00`)
`--speakers`	auto	Comma-separated speaker names
`--no-structure`	off	Skip LLM pass (free mode)
`--no-frames`	off	Transcript only
`--language`	`en`	Transcript language
`--keep-video`	off	Retain downloaded video file
`--cookies-from-browser`	—	Use cookies from browser (chrome, firefox, edge)
`--model`	`claude-sonnet-latest`	LLM model for structuring
`--dry-run`	off	Preview plan and cost (skips video download and LLM)
`--quiet`	off	Suppress all output except the final path (for piping)
`-v, --verbose`	off	Detailed logging

Requirements

Node.js 18+
yt-dlp — video/transcript download
ffmpeg — frame extraction
Anthropic API key (optional, for transcript structuring — set ANTHROPIC_API_KEY)

How It Works

Fetch metadata — yt-dlp gets title, channel, duration, chapters, description
Extract transcript — yt-dlp pulls auto/manual captions, parses VTT
Capture frames — ffmpeg extracts frames at intervals or chapter boundaries
Deduplicate frames — removes visually similar frames (configurable threshold)
Structure transcript (optional) — LLM adds chapter headers, speaker labels, paragraph breaks. All words stay verbatim — only whitespace and labels are added.
Assemble markdown — combines metadata, structured transcript, and frame references into the output file

Cost

The LLM structuring pass is the only part that costs money (requires Anthropic API key):

Video Length	Approximate Cost (Sonnet)
15 minutes	~$0.05–0.08
1 hour	~$0.20–0.35
2 hours	~$0.40–0.70

Use --no-structure for completely free operation (raw transcript + frames).

License

MIT