JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 9
  • Score
    100M100P100Q63643F
  • License MIT

YouTube videos → structured markdown with visual frame captures

Package Exports

  • framecap
  • framecap/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (framecap) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

framecap

YouTube videos → structured markdown with visual frame captures.

Takes a YouTube URL and outputs a clean markdown document with a structured transcript (chapter headers, speaker labels, paragraph breaks) and frame captures at key moments — embedded as images in the markdown.

Why

YouTube videos contain valuable knowledge, but it's trapped in a format you can't search, reference, or link to. Transcripts alone miss the visual context. framecap gives you both: a readable document with visual bookmarks.

Install

# Prerequisites
brew install yt-dlp ffmpeg

# Install framecap
npm install -g framecap

Usage

# Single video
framecap https://youtube.com/watch?v=abc123

# Custom output directory
framecap https://youtube.com/watch?v=abc -o ./notes/

# Capture frames at specific timestamps
framecap https://youtube.com/watch?v=abc --capture-at 1:30,5:00,12:45

# Hint speaker names for interviews
framecap https://youtube.com/watch?v=abc --speakers "Lex Fridman,Andrej Karpathy"

# Skip LLM structuring (free mode — raw transcript + frames only)
framecap https://youtube.com/watch?v=abc --no-structure

# Obsidian-compatible output (wikilink image syntax)
framecap https://youtube.com/watch?v=abc --format obsidian

# Preview plan and cost (fetches metadata + transcript, skips video download and LLM)
framecap https://youtube.com/watch?v=abc --dry-run

Output

./how-karpathy-builds-software.md
./frames/how-karpathy-builds-software/
├── frame-0001-00m00s.jpg
├── frame-0002-01m45s.jpg
├── frame-0003-05m30s.jpg
└── ...

The markdown file includes:

  • YAML frontmatter — title, channel, URL, duration, upload date, auto-generated tags
  • Structured transcript — organized by chapters (from video description), with speaker labels and natural paragraph breaks
  • Embedded frames — images at key moments, with timestamps and captions

Options

Flag Default Description
-o, --output ./ Output directory
--interval auto Force fixed-interval frame capture (seconds)
--max-frames 50 Maximum frames to extract
--dedup-threshold 0.85 Frame similarity filter (0.0-1.0)
--no-dedup off Keep all frames
--format markdown markdown or obsidian (wikilinks)
--capture-at Capture at specific timestamps (e.g. 1:30,5:00)
--speakers auto Comma-separated speaker names
--no-structure off Skip LLM pass (free mode)
--no-frames off Transcript only
--language en Transcript language
--keep-video off Retain downloaded video file
--cookies-from-browser Use cookies from browser (chrome, firefox, edge)
--model claude-sonnet-latest LLM model for structuring
--dry-run off Preview plan and cost (skips video download and LLM)
--quiet off Suppress all output except the final path (for piping)
-v, --verbose off Detailed logging

Requirements

  • Node.js 18+
  • yt-dlp — video/transcript download
  • ffmpeg — frame extraction
  • Anthropic API key (optional, for transcript structuring — set ANTHROPIC_API_KEY)

How It Works

  1. Fetch metadata — yt-dlp gets title, channel, duration, chapters, description
  2. Extract transcript — yt-dlp pulls auto/manual captions, parses VTT
  3. Capture frames — ffmpeg extracts frames at intervals or chapter boundaries
  4. Deduplicate frames — removes visually similar frames (configurable threshold)
  5. Structure transcript (optional) — LLM adds chapter headers, speaker labels, paragraph breaks. All words stay verbatim — only whitespace and labels are added.
  6. Assemble markdown — combines metadata, structured transcript, and frame references into the output file

Cost

The LLM structuring pass is the only part that costs money (requires Anthropic API key):

Video Length Approximate Cost (Sonnet)
15 minutes ~$0.05–0.08
1 hour ~$0.20–0.35
2 hours ~$0.40–0.70

Use --no-structure for completely free operation (raw transcript + frames).

License

MIT