JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 9
  • Score
    100M100P100Q63608F
  • License MIT

YouTube videos → structured markdown with visual frame captures

Package Exports

  • framecap
  • framecap/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (framecap) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

framecap

YouTube videos → structured markdown with visual frame captures.

Takes a YouTube URL and outputs a clean markdown document with a structured transcript (chapter headers, speaker labels, paragraph breaks) and frame captures at key moments — embedded as images in the markdown.

Why

YouTube videos contain valuable knowledge, but it's trapped in a format you can't search, reference, or link to. Transcripts alone miss the visual context. framecap gives you both: a readable document with visual bookmarks.

Install

# Prerequisites
brew install yt-dlp ffmpeg

# Install framecap
npm install -g framecap

Usage

# Single video
framecap https://youtube.com/watch?v=abc123

# Multiple videos
framecap https://youtube.com/watch?v=abc https://youtube.com/watch?v=def

# Full playlist
framecap https://youtube.com/playlist?list=PLxyz

# Custom output directory
framecap https://youtube.com/watch?v=abc -o ./notes/

# Hint speaker names for interviews
framecap https://youtube.com/watch?v=abc --speakers "Lex Fridman,Andrej Karpathy"

# Skip LLM structuring (free mode — raw transcript + frames only)
framecap https://youtube.com/watch?v=abc --no-structure

# Obsidian-compatible output (wikilink image syntax)
framecap https://youtube.com/watch?v=abc --format obsidian

Output

./how-karpathy-builds-software.md
./frames/how-karpathy-builds-software/
├── frame-0001-00m00s.jpg
├── frame-0002-01m45s.jpg
├── frame-0003-05m30s.jpg
└── ...

The markdown file includes:

  • YAML frontmatter — title, channel, URL, duration, upload date, auto-generated tags
  • Structured transcript — organized by chapters (from video description), with speaker labels and natural paragraph breaks
  • Embedded frames — images at chapter boundaries or fixed intervals, with timestamps and captions
  • Quotes section — notable quotes extracted during structuring

Options

Flag Default Description
-o, --output ./ Output directory
--interval auto Force fixed-interval frame capture (seconds)
--max-frames 50 Maximum frames to extract
--dedup-threshold 0.85 Frame similarity filter (0.0-1.0)
--no-dedup off Keep all frames
--format markdown markdown or obsidian (wikilinks)
--capture-at Capture at specific timestamps (e.g. 1:30,5:00)
--speakers auto Comma-separated speaker names
--no-structure off Skip LLM pass (free mode)
--no-frames off Transcript only
--language en Transcript language
--keep-video off Retain downloaded video file
--cookies-from-browser Use cookies from browser (chrome, firefox, edge)
-v, --verbose off Detailed logging

Configuration

Defaults can be set in ~/.framecap.yml:

interval: 15
max_frames: 50
dedup_threshold: 0.85
format: markdown
language: en
output: ~/Notes/Videos/

CLI flags override config file values.

Requirements

  • Node.js 18+
  • yt-dlp — video/transcript download
  • ffmpeg — frame extraction
  • Anthropic API key (optional, for transcript structuring — set ANTHROPIC_API_KEY)

How It Works

  1. Fetch metadata — yt-dlp gets title, channel, duration, chapters, description
  2. Extract transcript — yt-dlp pulls auto/manual captions, parses VTT
  3. Capture frames — ffmpeg extracts frames at intervals or chapter boundaries
  4. Deduplicate frames — removes visually similar frames (configurable threshold)
  5. Structure transcript (optional) — LLM adds chapter headers, speaker labels, paragraph breaks. All words stay verbatim — only whitespace and labels are added.
  6. Assemble markdown — combines metadata, structured transcript, and frame references into the output file

Cost

The LLM structuring pass is the only part that costs money (requires Anthropic API key):

Video Length Approximate Cost
15 minutes ~$0.02
1 hour ~$0.10
2 hours ~$0.20

Use --no-structure for completely free operation (raw transcript + frames).

License

MIT