Package Exports
- framecap
- framecap/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (framecap) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
framecap
YouTube videos → structured markdown with visual frame captures.
Takes a YouTube URL and outputs a clean markdown document with a structured transcript (chapter headers, speaker labels, paragraph breaks) and frame captures at key moments — embedded as images in the markdown.
Why
YouTube videos contain valuable knowledge, but it's trapped in a format you can't search, reference, or link to. Transcripts alone miss the visual context. framecap gives you both: a readable document with visual bookmarks.
Install
# Prerequisites
brew install yt-dlp ffmpeg
# Install framecap
npm install -g framecapUsage
# Single video
framecap https://youtube.com/watch?v=abc123
# Custom output directory
framecap https://youtube.com/watch?v=abc -o ./notes/
# Capture frames at specific timestamps
framecap https://youtube.com/watch?v=abc --capture-at 1:30,5:00,12:45
# Hint speaker names for interviews
framecap https://youtube.com/watch?v=abc --speakers "Lex Fridman,Andrej Karpathy"
# Skip LLM structuring (free mode — raw transcript + frames only)
framecap https://youtube.com/watch?v=abc --no-structure
# Obsidian-compatible output (wikilink image syntax)
framecap https://youtube.com/watch?v=abc --format obsidian
# Preview plan and cost (fetches metadata + transcript, skips video download and LLM)
framecap https://youtube.com/watch?v=abc --dry-runOutput
./how-karpathy-builds-software.md
./frames/how-karpathy-builds-software/
├── frame-0001-00m00s.jpg
├── frame-0002-01m45s.jpg
├── frame-0003-05m30s.jpg
└── ...The markdown file includes:
- YAML frontmatter — title, channel, URL, duration, upload date, auto-generated tags
- Structured transcript — organized by chapters (from video description), with speaker labels and natural paragraph breaks
- Embedded frames — images at key moments, with timestamps and captions
Options
| Flag | Default | Description |
|---|---|---|
-o, --output |
./ |
Output directory |
--interval |
auto | Force fixed-interval frame capture (seconds) |
--max-frames |
50 |
Maximum frames to extract |
--dedup-threshold |
0.85 |
Frame similarity filter (0.0-1.0) |
--no-dedup |
off | Keep all frames |
--format |
markdown |
markdown or obsidian (wikilinks) |
--capture-at |
— | Capture at specific timestamps (e.g. 1:30,5:00) |
--speakers |
auto | Comma-separated speaker names |
--no-structure |
off | Skip LLM pass (free mode) |
--no-frames |
off | Transcript only |
--language |
en |
Transcript language |
--keep-video |
off | Retain downloaded video file |
--cookies-from-browser |
— | Use cookies from browser (chrome, firefox, edge) |
--model |
claude-sonnet-latest |
LLM model for structuring |
--dry-run |
off | Preview plan and cost (skips video download and LLM) |
--quiet |
off | Suppress all output except the final path (for piping) |
-v, --verbose |
off | Detailed logging |
Requirements
- Node.js 18+
- yt-dlp — video/transcript download
- ffmpeg — frame extraction
- Anthropic API key (optional, for transcript structuring — set
ANTHROPIC_API_KEY)
How It Works
- Fetch metadata — yt-dlp gets title, channel, duration, chapters, description
- Extract transcript — yt-dlp pulls auto/manual captions, parses VTT
- Capture frames — ffmpeg extracts frames at intervals or chapter boundaries
- Deduplicate frames — removes visually similar frames (configurable threshold)
- Structure transcript (optional) — LLM adds chapter headers, speaker labels, paragraph breaks. All words stay verbatim — only whitespace and labels are added.
- Assemble markdown — combines metadata, structured transcript, and frame references into the output file
Cost
The LLM structuring pass is the only part that costs money (requires Anthropic API key):
| Video Length | Approximate Cost (Sonnet) |
|---|---|
| 15 minutes | ~$0.05–0.08 |
| 1 hour | ~$0.20–0.35 |
| 2 hours | ~$0.40–0.70 |
Use --no-structure for completely free operation (raw transcript + frames).
License
MIT