Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (vidclaude) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
vidclaude
Multimodal video understanding for Claude Code. Extract frames, transcribe audio in 90+ languages, build temporal timelines — all from a single command. No API key needed.
pip install vidclaude
vidclaude video.mp4 --mode standard --verboseWhat it does
Drop a video in, get structured evidence out. Claude in your conversation does the thinking.
Video File
│
├─ ffmpeg ──────────► Frames (adaptive, shot-aware sampling)
├─ faster-whisper ──► Transcript with timestamps (large-v3, 90+ languages)
├─ pytesseract ─────► On-screen text / OCR (optional)
├─ scene detection ─► Shot boundaries
│
└─► Timeline ──► evidence.md ──► Claude reasons over itWorks with Hindi, English, Spanish, Japanese, Arabic — any of the 90+ languages Whisper supports. Language is auto-detected.
Prerequisites
| Requirement | Install |
|---|---|
| Python 3.10+ | python.org |
| ffmpeg | Windows: winget install ffmpeg / macOS: brew install ffmpeg / Linux: sudo apt install ffmpeg |
Install
pip install vidclaudeThat's it. First run downloads the Whisper model (~3GB, one time).
Usage
With Claude Code (recommended)
Set up the skill once:
vidclaude --install-skillThen in Claude Code, just say:
"analyze the video at C:/Users/me/Videos/meeting.mp4"
"what does the speaker say about the budget in presentation.mp4?"
"when does the logo appear in intro.mov?"
Claude runs the extraction, reads the evidence report + frames, and answers your question. Your Max/Pro plan covers everything — no API key needed.
Follow-up questions about the same video are instant (cached).
From the command line
# Standard analysis — good for most videos
vidclaude video.mp4 --mode standard --verbose
# Quick — fewer frames, faster
vidclaude video.mp4 --mode quick
# Deep — dense frames, full OCR, detailed
vidclaude video.mp4 --mode deep --verbose
# Batch process a folder
vidclaude ./videos/ --verbose
# Skip audio transcription
vidclaude video.mp4 --no-audio
# Force fresh extraction (ignore cache)
vidclaude video.mp4 --no-cacheOutput
Every run creates a .vidcache/ directory next to the video:
.vidcache/a3f7b2c1/
evidence.md ← Human-readable report (Claude reads this)
frames/ ← Extracted JPEG frames
transcript.json ← Timestamped transcript
timeline.json ← Unified event timeline
meta.json ← Video metadata
shots.json ← Shot boundaries
ocr.json ← On-screen text
summaries.json ← Scene/chapter summariesModes
| Mode | Frames | Whisper model | OCR | Best for |
|---|---|---|---|---|
quick |
~20, uniform | base | skip | Short clips, fast overview |
standard |
~60, shot-aware | large-v3 | keyframes | General use |
deep |
~150, burst sampling | large-v3 | all frames | Long videos, detailed analysis |
Smart frame budget: If a video is too long for the frame limit, FPS is automatically reduced. Shot boundary frames are always prioritized.
CLI Reference
vidclaude [input] [options]
positional:
input Video file or folder
setup:
--install-skill Copy SKILL.md to current dir for Claude Code
processing:
--mode {quick,standard,deep} Processing mode (default: standard)
-f, --fps N Override frames per second
-m, --max-frames N Override max frame count
--no-audio Skip audio transcription
--no-ocr Skip OCR extraction
--no-cache Force re-extraction
output:
--extract Print cache path summary
-o, --output FILE Write output to file
--verbose Show detailed progress
--batch-summary Cross-video summary for foldersHow it works
Built on a multi-layer video understanding architecture:
- Layer A — Ingestion: Validates format (MP4, MOV, MKV, WebM, AVI), extracts metadata via ffprobe
- Layer B — Segmentation: Detects shot boundaries using ffmpeg's scene change filter
- Layer C — Adaptive Sampling: Content-aware frame selection — more frames at scene transitions, smart frame budgets per mode
- Layer D — Audio: faster-whisper with large-v3 for multilingual transcription with word-level timestamps and VAD filtering
- Layer E — OCR: pytesseract text extraction from key frames (optional)
- Layer G — Timeline: Merges speech, OCR, and scene events into a single time-sorted list
- Layer I — Memory: Hierarchical summaries for longer videos (scene → chapter → global)
- Layer J — Evidence Assembly: Generates
evidence.mdwith frame references, transcript, timeline for Claude to reason over
Intent-aware processing
The tool classifies your question and adjusts the pipeline:
| Question type | Example | What happens |
|---|---|---|
| Description | "What happens in this video?" | Balanced extraction |
| Moment retrieval | "When does the person stand up?" | Prioritizes transcript + timeline |
| Temporal ordering | "Does X happen before Y?" | Prioritizes timeline events |
| Counting | "How many cars appear?" | Denser frame sampling |
| OCR / text | "What text is on the slide?" | Prioritizes OCR extraction |
| Speech | "What did they say about revenue?" | Prioritizes transcript |
Optional extras
# OCR support (also needs Tesseract binary installed)
pip install pytesseract
# Standalone API mode (use outside Claude Code)
pip install anthropic
export ANTHROPIC_API_KEY=sk-...
vidclaude video.mp4 --api -q "What happens in this video?"Examples
Analyze a meeting recording:
vidclaude meeting.mp4 --mode standard --verbose
# → 55 frames, full transcript, timeline
# → In Claude Code: "summarize the key decisions from this meeting"Review security footage:
vidclaude ./cameras/ --mode deep --verbose
# → Batch processes all videos in the folder
# → Dense frame extraction catches fast eventsExtract text from a lecture:
pip install pytesseract
vidclaude lecture.mp4 --mode standard --verbose
# → Captures slide text via OCR + speaker transcriptQuick check on a short clip:
vidclaude clip.mp4 --mode quick
# → 20 frames, fast whisper, done in secondsTroubleshooting
| Problem | Fix |
|---|---|
ffmpeg not found |
Install: winget install ffmpeg (Windows) / brew install ffmpeg (Mac) |
No module named 'faster_whisper' |
pip install faster-whisper |
| Slow first run | Normal — downloading Whisper large-v3 model (~3GB, one time) |
| Wrong language detected | Whisper auto-detects; usually correct. For edge cases, the transcript still captures phonetics |
Large .vidcache folder |
Delete it to free space: rm -rf .vidcache/ |
| Want fresh extraction | Use --no-cache flag |
| OCR not working | Install pytesseract + Tesseract binary |
Project structure
vidclaude/
cli.py Argument parsing, orchestration, caching
models.py Data model (VideoMeta, Frame, Shot, TranscriptChunk, etc.)
ingest.py Video validation + ffprobe metadata
segment.py Shot detection + adaptive frame sampling
audio.py faster-whisper transcription (large-v3)
ocr.py pytesseract text extraction
intent.py Question intent classification
timeline.py Temporal event merging
memory.py Hierarchical summaries
reason.py Evidence assembly + optional API mode
util.py ffmpeg helpers, image encoding, caching
SKILL.md Claude Code skill definition (bundled)License
MIT