Package Exports
- @nadimtuhin/ytranscript
Readme
ytranscript
Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.
Read the blog post: "Automating My Second Brain with YouTube Transcripts"
Why ytranscript?
- No API keys required - Uses YouTube's public innertube API directly
- Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
- Bulk processing - Process thousands of videos from Google Takeout exports
- Resume-safe - Automatically skips already-processed videos
- Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text
Quick Start
# Get a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ
# Output: "We're no strangers to love, you know the rules..."Installation
# Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscript
# Or use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_ID
# Add to a project (for library usage)
npm add @nadimtuhin/ytranscriptRuntimes supported: Node.js 18+ and Bun 1.0+
MCP Server (AI Assistant Integration)
ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.
Available Tools
| Tool | Description |
|---|---|
get_transcript |
Fetch transcript with format options (text, segments, srt, vtt) |
get_transcript_languages |
List available caption languages for a video |
extract_video_id |
Extract video ID from various YouTube URL formats |
get_transcripts_bulk |
Fetch transcripts for multiple videos at once |
Setup with Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"ytranscript": {
"command": "npx",
"args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
}
}
}Or if installed globally:
{
"mcpServers": {
"ytranscript": {
"command": "ytranscript-mcp"
}
}
}Example Prompts for Claude
Once configured, you can ask Claude:
- "Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
- "Summarize the key points from this video"
- "What languages are available for this video's captions?"
- "Get transcripts for these 5 videos and compare their content"
CLI Usage
Single Video
# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ
# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# With specific language
ytranscript get dQw4w9WgXcQ --lang es
# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt
# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format jsonCheck Available Languages
ytranscript info dQw4w9WgXcQ
# Output:
# en English (auto-generated)
# es Spanish
# fr FrenchBulk Processing
# From Google Takeout exports
ytranscript bulk \
--history "Takeout/YouTube/history/watch-history.json" \
--watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
--out-jsonl transcripts.jsonl \
--out-csv transcripts.csv
# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"
# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt
# Resume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resumeRate Limiting
YouTube may rate-limit requests. Use these flags to control pacing:
ytranscript bulk \
--history watch-history.json \
--concurrency 4 \ # Max concurrent requests (default: 4, safe: 1-8)
--pause-after 10 \ # Pause after N requests (default: 10)
--pause-ms 5000 # Pause duration in ms (default: 5000)Recommended for large batches: --concurrency 2 --pause-after 10 --pause-ms 5000
Proxy Support
Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:
# CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080
# Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:pass@proxy.example.com:8080
# With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080Programmatic usage:
import { fetchTranscript } from '@nadimtuhin/ytranscript';
const transcript = await fetchTranscript('dQw4w9WgXcQ', {
proxy: {
url: 'http://localhost:8080',
},
});Proxy support inspired by ytfetcher
Programmatic API
Fetch a Single Transcript
import { fetchTranscript } from '@nadimtuhin/ytranscript';
try {
const transcript = await fetchTranscript('dQw4w9WgXcQ', {
languages: ['en', 'es'], // Preference order
includeAutoGenerated: true,
});
console.log(transcript.text); // Full transcript text
console.log(transcript.segments); // Array of { text, start, duration }
console.log(transcript.language); // 'en'
console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
// See "Error Handling" section below
console.error(error.message);
}Bulk Processing
import {
loadWatchHistory,
loadWatchLater,
mergeVideoSources,
processVideos,
} from '@nadimtuhin/ytranscript';
// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');
// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);
// Process with progress callback
const results = await processVideos(videos, {
concurrency: 4,
pauseAfter: 10,
pauseDuration: 5000,
onProgress: (completed, total, result) => {
const status = result.transcript ? 'OK' : 'FAIL';
console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
},
});
// Filter successful results
const transcripts = results.filter((r) => r.transcript);Streaming for Large Datasets
import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';
for await (const result of streamVideos(videos, { concurrency: 4 })) {
// Write each result immediately (resume-safe)
await appendJsonl(result, 'output.jsonl');
}Output Formatting
import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';
const transcript = await fetchTranscript('dQw4w9WgXcQ');
// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);
// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);
// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...Error Handling
The library throws errors for various failure cases:
| Error Message | Cause | Solution |
|---|---|---|
No captions available for this video |
Video has no captions/subtitles | Check with ytranscript info first |
No suitable caption track found |
Requested language not available | Use includeAutoGenerated: true or different language |
Caption track is empty |
Captions exist but have no content | Rare; try a different language |
HTTP 429 |
Rate limited by YouTube | Reduce concurrency, add pauses |
HTTP 403 |
Video is private or region-locked | Cannot access this video |
try {
const transcript = await fetchTranscript(videoId);
} catch (error) {
if (error.message.includes('No captions available')) {
console.log('This video has no subtitles');
} else if (error.message.includes('429')) {
console.log('Rate limited - slow down requests');
}
}Limitations
| Scenario | Supported |
|---|---|
| Public videos with captions | ✅ Yes |
| Auto-generated captions | ✅ Yes |
| Manual/community captions | ✅ Yes |
| Private videos | ❌ No |
| Age-restricted videos | ❌ No |
| Live streams (while live) | ❌ No |
| Premiere videos (before premiere) | ❌ No |
| Region-locked videos | ❌ No (unless you're in the allowed region) |
Google Takeout
To export your YouTube data:
- Go to Google Takeout
- Deselect all, then select only "YouTube and YouTube Music"
- Click "All YouTube data included" and select:
- History → Watch history
- Playlists (includes Watch Later)
- Export and download
- Extract the archive
The relevant files are:
Takeout/YouTube and YouTube Music/history/watch-history.jsonTakeout/YouTube and YouTube Music/playlists/Watch later-videos.csv
API Reference
Types
interface Transcript {
videoId: string;
text: string;
segments: TranscriptSegment[];
language: string;
isAutoGenerated: boolean;
}
interface TranscriptSegment {
text: string;
start: number; // seconds
duration: number; // seconds
}
interface WatchHistoryMeta {
videoId: string;
title?: string;
url?: string;
channel?: { name?: string; url?: string };
watchedAt?: string;
source: 'history' | 'watch_later' | 'manual';
}
interface TranscriptResult {
meta: WatchHistoryMeta;
transcript: Transcript | null;
error?: string; // Present when transcript is null
}
interface FetchOptions {
languages?: string[]; // Default: ['en']
timeout?: number; // Default: 30000 (ms)
includeAutoGenerated?: boolean; // Default: true
proxy?: ProxyConfig; // Optional proxy configuration
}
interface ProxyConfig {
url: string; // HTTP proxy URL (e.g., "http://user:pass@host:port")
}
interface BulkOptions extends FetchOptions {
concurrency?: number; // Default: 4
pauseAfter?: number; // Default: 10
pauseDuration?: number; // Default: 5000 (ms)
skipIds?: Set<string>; // Videos to skip
onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Report bugs via GitHub Issues
- Security issues: see SECURITY.md
License
MIT