Package Exports
- mcp-headless-youtube-transcript
- mcp-headless-youtube-transcript/build/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mcp-headless-youtube-transcript) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
MCP Headless YouTube Transcript
An MCP (Model Context Protocol) server that extracts YouTube video transcripts using the headless-youtube-captions library.
Features
- Extract transcripts from YouTube videos using video ID or full URL
- Support for multiple languages
- Automatic pagination for large transcripts (98k character chunks)
- Clean text output optimized for LLM consumption
- Built with TypeScript and the MCP SDK
Installation
Install via npm:
npm install -g mcp-headless-youtube-transcriptOr use directly with npx:
npx mcp-headless-youtube-transcriptMCP Configuration
Add this server to your MCP settings:
{
"mcpServers": {
"youtube-transcript": {
"command": "npx",
"args": ["-y", "mcp-headless-youtube-transcript"]
}
}
}Tools Available
get_youtube_transcript
Extracts transcript/captions from a YouTube video with automatic pagination for large transcripts.
Parameters:
videoId(required): YouTube video ID or full URLlang(optional): Language code for captions (e.g., "en", "es", "ko"). Defaults to "en"segment(optional): Segment number to retrieve (1-based). Each segment is ~98k characters. Defaults to 1
Examples:
Basic usage:
{
"name": "get_youtube_transcript",
"arguments": {
"videoId": "dQw4w9WgXcQ"
}
}With language:
{
"name": "get_youtube_transcript",
"arguments": {
"videoId": "dQw4w9WgXcQ",
"lang": "es"
}
}With pagination:
{
"name": "get_youtube_transcript",
"arguments": {
"videoId": "dQw4w9WgXcQ",
"segment": 2
}
}Response Format
The tool returns the raw transcript text. For large transcripts, the response includes pagination information:
[Segment 1 of 3]
this is the actual transcript text content...When multiple segments are available, you can retrieve subsequent segments by incrementing the segment parameter.
Caching
The server includes built-in caching to improve performance for paginated requests. The cache behavior can be configured with an environment variable:
TRANSCRIPT_CACHE_TTL: Cache duration in seconds (default: 300 = 5 minutes)
Cache Features:
- Full transcripts are cached on first fetch
- Cache expiration time is updated on each read or write
- Expired entries are automatically cleaned up after each request
- Each video+language combination is cached separately
Setting Cache Duration:
# Set cache to 10 minutes
TRANSCRIPT_CACHE_TTL=600 npx mcp-headless-youtube-transcriptSupported URL Formats
- Video ID:
dQw4w9WgXcQ - YouTube URLs:
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/embed/dQw4w9WgXcQhttps://www.youtube.com/v/dQw4w9WgXcQ
Development
# Install dependencies
npm install
# Run in development mode
npm run dev
# Build for production
npm run build
# Start the server
npm start
# Run tests
npm test
# Run tests once (CI mode)
npm run test:runTesting
The project includes comprehensive tests:
- Unit tests: Test helper functions like URL parsing and time formatting
- Integration tests: Test the core transcript extraction logic with mocked APIs
- Manual tests: Optional tests that call real YouTube APIs (skipped by default)
All tests use Vitest and include mocking of the headless-youtube-captions library to ensure reliable testing without external API dependencies.
Dependencies
@modelcontextprotocol/sdk: MCP SDK for building serversheadless-youtube-captions: Library for extracting YouTube captions
License
MIT