JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 48
  • Score
    100M100P100Q80289F
  • License MIT

Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats

Package Exports

  • @nadimtuhin/ytranscript

Readme

ytranscript

npm version npm downloads CI License: MIT

Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.

Read the blog post: "Automating My Second Brain with YouTube Transcripts"

Why ytranscript?

  • No API keys required - Uses YouTube's public innertube API directly
  • Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
  • Bulk processing - Process thousands of videos from Google Takeout exports
  • Resume-safe - Automatically skips already-processed videos
  • Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text

Quick Start

# Get a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ

# Output: "We're no strangers to love, you know the rules..."

Installation

# Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscript

# Or use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_ID

# Add to a project (for library usage)
npm add @nadimtuhin/ytranscript

Runtimes supported: Node.js 18+ and Bun 1.0+

MCP Server (AI Assistant Integration)

ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.

Available Tools

Tool Description
get_transcript Fetch transcript with format options (text, segments, srt, vtt)
get_transcript_languages List available caption languages for a video
extract_video_id Extract video ID from various YouTube URL formats
get_transcripts_bulk Fetch transcripts for multiple videos at once

Setup with Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "ytranscript": {
      "command": "npx",
      "args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "ytranscript": {
      "command": "ytranscript-mcp"
    }
  }
}

Example Prompts for Claude

Once configured, you can ask Claude:

  • "Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
  • "Summarize the key points from this video"
  • "What languages are available for this video's captions?"
  • "Get transcripts for these 5 videos and compare their content"

CLI Usage

Single Video

# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ

# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# With specific language
ytranscript get dQw4w9WgXcQ --lang es

# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt

# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format json

Check Available Languages

ytranscript info dQw4w9WgXcQ
# Output:
#   en     English (auto-generated)
#   es     Spanish
#   fr     French

Bulk Processing

# From Google Takeout exports
ytranscript bulk \
  --history "Takeout/YouTube/history/watch-history.json" \
  --watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
  --out-jsonl transcripts.jsonl \
  --out-csv transcripts.csv

# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"

# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt

# Resume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resume

Rate Limiting

YouTube may rate-limit requests. Use these flags to control pacing:

ytranscript bulk \
  --history watch-history.json \
  --concurrency 4 \      # Max concurrent requests (default: 4, safe: 1-8)
  --pause-after 10 \     # Pause after N requests (default: 10)
  --pause-ms 5000        # Pause duration in ms (default: 5000)

Recommended for large batches: --concurrency 2 --pause-after 10 --pause-ms 5000

Proxy Support

Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:

# CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080

# Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:pass@proxy.example.com:8080

# With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080

Programmatic usage:

import { fetchTranscript } from '@nadimtuhin/ytranscript';

const transcript = await fetchTranscript('dQw4w9WgXcQ', {
  proxy: {
    url: 'http://localhost:8080',
  },
});

Proxy support inspired by ytfetcher

Programmatic API

Fetch a Single Transcript

import { fetchTranscript } from '@nadimtuhin/ytranscript';

try {
  const transcript = await fetchTranscript('dQw4w9WgXcQ', {
    languages: ['en', 'es'], // Preference order
    includeAutoGenerated: true,
  });

  console.log(transcript.text);           // Full transcript text
  console.log(transcript.segments);       // Array of { text, start, duration }
  console.log(transcript.language);       // 'en'
  console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
  // See "Error Handling" section below
  console.error(error.message);
}

Bulk Processing

import {
  loadWatchHistory,
  loadWatchLater,
  mergeVideoSources,
  processVideos,
} from '@nadimtuhin/ytranscript';

// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');

// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);

// Process with progress callback
const results = await processVideos(videos, {
  concurrency: 4,
  pauseAfter: 10,
  pauseDuration: 5000,
  onProgress: (completed, total, result) => {
    const status = result.transcript ? 'OK' : 'FAIL';
    console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
  },
});

// Filter successful results
const transcripts = results.filter((r) => r.transcript);

Streaming for Large Datasets

import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';

for await (const result of streamVideos(videos, { concurrency: 4 })) {
  // Write each result immediately (resume-safe)
  await appendJsonl(result, 'output.jsonl');
}

Output Formatting

import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';

const transcript = await fetchTranscript('dQw4w9WgXcQ');

// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);

// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);

// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...

Error Handling

The library throws errors for various failure cases:

Error Message Cause Solution
No captions available for this video Video has no captions/subtitles Check with ytranscript info first
No suitable caption track found Requested language not available Use includeAutoGenerated: true or different language
Caption track is empty Captions exist but have no content Rare; try a different language
HTTP 429 Rate limited by YouTube Reduce concurrency, add pauses
HTTP 403 Video is private or region-locked Cannot access this video
try {
  const transcript = await fetchTranscript(videoId);
} catch (error) {
  if (error.message.includes('No captions available')) {
    console.log('This video has no subtitles');
  } else if (error.message.includes('429')) {
    console.log('Rate limited - slow down requests');
  }
}

Limitations

Scenario Supported
Public videos with captions ✅ Yes
Auto-generated captions ✅ Yes
Manual/community captions ✅ Yes
Private videos ❌ No
Age-restricted videos ❌ No
Live streams (while live) ❌ No
Premiere videos (before premiere) ❌ No
Region-locked videos ❌ No (unless you're in the allowed region)

Google Takeout

To export your YouTube data:

  1. Go to Google Takeout
  2. Deselect all, then select only "YouTube and YouTube Music"
  3. Click "All YouTube data included" and select:
    • History → Watch history
    • Playlists (includes Watch Later)
  4. Export and download
  5. Extract the archive

The relevant files are:

  • Takeout/YouTube and YouTube Music/history/watch-history.json
  • Takeout/YouTube and YouTube Music/playlists/Watch later-videos.csv

API Reference

Types

interface Transcript {
  videoId: string;
  text: string;
  segments: TranscriptSegment[];
  language: string;
  isAutoGenerated: boolean;
}

interface TranscriptSegment {
  text: string;
  start: number;    // seconds
  duration: number; // seconds
}

interface WatchHistoryMeta {
  videoId: string;
  title?: string;
  url?: string;
  channel?: { name?: string; url?: string };
  watchedAt?: string;
  source: 'history' | 'watch_later' | 'manual';
}

interface TranscriptResult {
  meta: WatchHistoryMeta;
  transcript: Transcript | null;
  error?: string;  // Present when transcript is null
}

interface FetchOptions {
  languages?: string[];          // Default: ['en']
  timeout?: number;              // Default: 30000 (ms)
  includeAutoGenerated?: boolean; // Default: true
  proxy?: ProxyConfig;           // Optional proxy configuration
}

interface ProxyConfig {
  url: string;        // HTTP proxy URL (e.g., "http://user:pass@host:port")
}

interface BulkOptions extends FetchOptions {
  concurrency?: number;    // Default: 4
  pauseAfter?: number;     // Default: 10
  pauseDuration?: number;  // Default: 5000 (ms)
  skipIds?: Set<string>;   // Videos to skip
  onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT