Package Exports

@nadimtuhin/ytranscript

Readme

ytranscript

Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.

Read the blog post: "Automating My Second Brain with YouTube Transcripts"

Why ytranscript?

No API keys required - Uses YouTube's public innertube API directly
Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
Bulk processing - Process thousands of videos from Google Takeout exports
Resume-safe - Automatically skips already-processed videos
Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text

Quick Start

# Get a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ

# Output: "We're no strangers to love, you know the rules..."

Installation

# Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscript

# Or use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_ID

# Add to a project (for library usage)
npm add @nadimtuhin/ytranscript

Runtimes supported: Node.js 18+ and Bun 1.0+

MCP Server (AI Assistant Integration)

ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.

Available Tools

Tool	Description
`get_transcript`	Fetch transcript with format options (text, segments, srt, vtt)
`get_transcript_languages`	List available caption languages for a video
`extract_video_id`	Extract video ID from various YouTube URL formats
`get_transcripts_bulk`	Fetch transcripts for multiple videos at once

Setup with Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "ytranscript": {
      "command": "npx",
      "args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "ytranscript": {
      "command": "ytranscript-mcp"
    }
  }
}

Example Prompts for Claude

Once configured, you can ask Claude:

"Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
"Summarize the key points from this video"
"What languages are available for this video's captions?"
"Get transcripts for these 5 videos and compare their content"

CLI Usage

Single Video

# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ

# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# With specific language
ytranscript get dQw4w9WgXcQ --lang es

# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt

# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format json

Check Available Languages

ytranscript info dQw4w9WgXcQ
# Output:
#   en     English (auto-generated)
#   es     Spanish
#   fr     French

Bulk Processing

# From Google Takeout exports
ytranscript bulk \
  --history "Takeout/YouTube/history/watch-history.json" \
  --watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
  --out-jsonl transcripts.jsonl \
  --out-csv transcripts.csv

# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"

# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt

# Resume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resume

Rate Limiting

YouTube may rate-limit requests. Use these flags to control pacing:

ytranscript bulk \
  --history watch-history.json \
  --concurrency 4 \      # Max concurrent requests (default: 4, safe: 1-8)
  --pause-after 10 \     # Pause after N requests (default: 10)
  --pause-ms 5000        # Pause duration in ms (default: 5000)

Recommended for large batches: --concurrency 2 --pause-after 10 --pause-ms 5000

Proxy Support

Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:

# CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080

# Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:pass@proxy.example.com:8080

# With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080

Programmatic usage:

import { fetchTranscript } from '@nadimtuhin/ytranscript';

const transcript = await fetchTranscript('dQw4w9WgXcQ', {
  proxy: {
    url: 'http://localhost:8080',
  },
});

Proxy support inspired by ytfetcher

Programmatic API

Fetch a Single Transcript

import { fetchTranscript } from '@nadimtuhin/ytranscript';

try {
  const transcript = await fetchTranscript('dQw4w9WgXcQ', {
    languages: ['en', 'es'], // Preference order
    includeAutoGenerated: true,
  });

  console.log(transcript.text);           // Full transcript text
  console.log(transcript.segments);       // Array of { text, start, duration }
  console.log(transcript.language);       // 'en'
  console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
  // See "Error Handling" section below
  console.error(error.message);
}

Bulk Processing

import {
  loadWatchHistory,
  loadWatchLater,
  mergeVideoSources,
  processVideos,
} from '@nadimtuhin/ytranscript';

// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');

// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);

// Process with progress callback
const results = await processVideos(videos, {
  concurrency: 4,
  pauseAfter: 10,
  pauseDuration: 5000,
  onProgress: (completed, total, result) => {
    const status = result.transcript ? 'OK' : 'FAIL';
    console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
  },
});

// Filter successful results
const transcripts = results.filter((r) => r.transcript);

Streaming for Large Datasets

import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';

for await (const result of streamVideos(videos, { concurrency: 4 })) {
  // Write each result immediately (resume-safe)
  await appendJsonl(result, 'output.jsonl');
}

Output Formatting

import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';

const transcript = await fetchTranscript('dQw4w9WgXcQ');

// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);

// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);

// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...

Error Handling

The library throws errors for various failure cases:

Error Message	Cause	Solution
`No captions available for this video`	Video has no captions/subtitles	Check with `ytranscript info` first
`No suitable caption track found`	Requested language not available	Use `includeAutoGenerated: true` or different language
`Caption track is empty`	Captions exist but have no content	Rare; try a different language
`HTTP 429`	Rate limited by YouTube	Reduce concurrency, add pauses
`HTTP 403`	Video is private or region-locked	Cannot access this video

try {
  const transcript = await fetchTranscript(videoId);
} catch (error) {
  if (error.message.includes('No captions available')) {
    console.log('This video has no subtitles');
  } else if (error.message.includes('429')) {
    console.log('Rate limited - slow down requests');
  }
}

Limitations

Scenario	Supported
Public videos with captions	✅ Yes
Auto-generated captions	✅ Yes
Manual/community captions	✅ Yes
Private videos	❌ No
Age-restricted videos	❌ No
Live streams (while live)	❌ No
Premiere videos (before premiere)	❌ No
Region-locked videos	❌ No (unless you're in the allowed region)

Google Takeout

To export your YouTube data:

Go to Google Takeout
Deselect all, then select only "YouTube and YouTube Music"
Click "All YouTube data included" and select:
- History → Watch history
- Playlists (includes Watch Later)
Export and download
Extract the archive

The relevant files are:

Takeout/YouTube and YouTube Music/history/watch-history.json
Takeout/YouTube and YouTube Music/playlists/Watch later-videos.csv

API Reference

Types

interface Transcript {
  videoId: string;
  text: string;
  segments: TranscriptSegment[];
  language: string;
  isAutoGenerated: boolean;
}

interface TranscriptSegment {
  text: string;
  start: number;    // seconds
  duration: number; // seconds
}

interface WatchHistoryMeta {
  videoId: string;
  title?: string;
  url?: string;
  channel?: { name?: string; url?: string };
  watchedAt?: string;
  source: 'history' | 'watch_later' | 'manual';
}

interface TranscriptResult {
  meta: WatchHistoryMeta;
  transcript: Transcript | null;
  error?: string;  // Present when transcript is null
}

interface FetchOptions {
  languages?: string[];          // Default: ['en']
  timeout?: number;              // Default: 30000 (ms)
  includeAutoGenerated?: boolean; // Default: true
  proxy?: ProxyConfig;           // Optional proxy configuration
}

interface ProxyConfig {
  url: string;        // HTTP proxy URL (e.g., "http://user:pass@host:port")
}

interface BulkOptions extends FetchOptions {
  concurrency?: number;    // Default: 4
  pauseAfter?: number;     // Default: 10
  pauseDuration?: number;  // Default: 5000 (ms)
  skipIds?: Set<string>;   // Videos to skip
  onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Report bugs via GitHub Issues
Security issues: see SECURITY.md

License

MIT