JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 48
  • Score
    100M100P100Q81255F
  • License MIT

Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats

Package Exports

  • @nadimtuhin/ytranscript

Readme

ytranscript

Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats.

Built with Bun for maximum performance.

Features

  • Direct YouTube API - No third-party services, uses YouTube's innertube API
  • MCP Server - Use with Claude, Cursor, and other AI assistants via Model Context Protocol
  • Bulk processing - Process thousands of videos with concurrency control
  • Google Takeout support - Import from watch history JSON and watch-later CSV
  • Resume-safe - Automatically skips already-processed videos
  • Multiple output formats - JSON, JSONL, CSV, SRT, VTT, plain text
  • Language selection - Choose preferred transcript languages
  • Programmatic API - Use as a library in your TypeScript/JavaScript projects

Installation

# Install globally
bun install -g ytranscript

# Or use locally in a project
bun add ytranscript

CLI Usage

Fetch a single transcript

# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ

# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# With specific language
ytranscript get dQw4w9WgXcQ --lang es

# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt

# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format json

Check available languages

ytranscript info dQw4w9WgXcQ

Bulk processing

# From Google Takeout exports
ytranscript bulk \
  --history "Takeout/YouTube/history/watch-history.json" \
  --watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
  --out-jsonl transcripts.jsonl \
  --out-csv transcripts.csv

# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"

# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt

# Resume a previous run
ytranscript bulk --history watch-history.json --resume

# Control concurrency and rate limiting
ytranscript bulk \
  --history watch-history.json \
  --concurrency 8 \
  --pause-after 20 \
  --pause-ms 3000

Programmatic API

Fetch a single transcript

import { fetchTranscript } from 'ytranscript';

const transcript = await fetchTranscript('dQw4w9WgXcQ', {
  languages: ['en', 'es'], // Preference order
  includeAutoGenerated: true,
});

console.log(transcript.text); // Full transcript text
console.log(transcript.segments); // Array of { text, start, duration }
console.log(transcript.language); // 'en'
console.log(transcript.isAutoGenerated); // true/false

Bulk processing

import {
  loadWatchHistory,
  loadWatchLater,
  mergeVideoSources,
  processVideos,
} from 'ytranscript';

// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');

// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);

// Process with progress callback
const results = await processVideos(videos, {
  concurrency: 4,
  pauseAfter: 10,
  pauseDuration: 5000,
  onProgress: (completed, total, result) => {
    const status = result.transcript ? 'OK' : 'FAIL';
    console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
  },
});

// Filter successful results
const transcripts = results.filter((r) => r.transcript);

Streaming for large datasets

import { streamVideos, appendJsonl } from 'ytranscript';

for await (const result of streamVideos(videos, { concurrency: 4 })) {
  // Write each result immediately (resume-safe)
  await appendJsonl(result, 'output.jsonl');
}

Output formatting

import { fetchTranscript, formatSrt, formatVtt, formatText } from 'ytranscript';

const transcript = await fetchTranscript('dQw4w9WgXcQ');

// SRT subtitles
const srt = formatSrt(transcript);
await Bun.write('video.srt', srt);

// VTT subtitles
const vtt = formatVtt(transcript);
await Bun.write('video.vtt', vtt);

// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...

Google Takeout

To export your YouTube data:

  1. Go to Google Takeout
  2. Deselect all, then select only "YouTube and YouTube Music"
  3. Click "All YouTube data included" and select:
    • History → Watch history
    • Playlists (includes Watch Later)
  4. Export and download
  5. Extract the archive

The relevant files are:

  • Takeout/YouTube and YouTube Music/history/watch-history.json
  • Takeout/YouTube and YouTube Music/playlists/Watch later-videos.csv

API Reference

Types

interface Transcript {
  videoId: string;
  text: string;
  segments: TranscriptSegment[];
  language: string;
  isAutoGenerated: boolean;
}

interface TranscriptSegment {
  text: string;
  start: number;  // seconds
  duration: number;  // seconds
}

interface TranscriptResult {
  meta: WatchHistoryMeta;
  transcript: Transcript | null;
  error?: string;
}

interface FetchOptions {
  languages?: string[];
  timeout?: number;
  includeAutoGenerated?: boolean;
}

interface BulkOptions extends FetchOptions {
  concurrency?: number;
  pauseAfter?: number;
  pauseDuration?: number;
  skipIds?: Set<string>;
  onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}

License

MIT


MCP Server (Model Context Protocol)

ytranscript includes an MCP server that allows AI assistants like Claude to fetch YouTube transcripts directly.

Available Tools

Tool Description
get_transcript Fetch transcript for a YouTube video with format options (text, segments, srt, vtt)
get_transcript_languages List available caption languages for a video
extract_video_id Extract video ID from various YouTube URL formats
get_transcripts_bulk Fetch transcripts for multiple videos at once

Setup with Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "ytranscript": {
      "command": "npx",
      "args": ["-y", "ytranscript-mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "ytranscript": {
      "command": "ytranscript-mcp"
    }
  }
}

Setup with Cursor

Add to your Cursor MCP settings:

{
  "mcpServers": {
    "ytranscript": {
      "command": "npx",
      "args": ["-y", "ytranscript-mcp"]
    }
  }
}

Example Usage in Claude

Once configured, you can ask Claude:

  • "Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
  • "What languages are available for this video?"
  • "Summarize the transcript of this video"
  • "Get transcripts for these 5 videos and compare their content"

Running the MCP Server Manually

# Via npx
npx ytranscript-mcp

# Or if installed globally
ytranscript-mcp

# For development
bun run dev:mcp