Package Exports

@ignidor/web-search-mcp
@ignidor/web-search-mcp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@ignidor/web-search-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@ignidor/web-search-mcp

Local, unlimited web-search MCP server with BM25 ranking, Playwright crawling, and YouTube transcripts.

🔍 No API keys - Uses free DuckDuckGo HTML search
🚀 No rate limits - Unlimited searches, 24/7
🐳 No Docker - Direct Playwright integration (optional)
📊 Smart ranking - BM25 + hybrid scoring with freshness
📄 Full extraction - 1000+ words per page (not 200-word snippets)
🎬 YouTube transcripts - Fast, robust extraction with yt-dlp
💰 100% Free - Outperforms Brave Search, Tavily, commercial alternatives

Features

Tool	Description
`search`	Fast web search with BM25 ranking (DuckDuckGo)
`crawl_and_extract`	Extract full content from URLs using Playwright
`search_and_crawl`	Search + extract top results (one-stop research)
`get_youtube_transcript`	Get YouTube video transcript (yt-dlp, 1-5sec) ⭐ NEW
`capture_screenshot`	Screenshot any webpage (base64 PNG)
`generate_pdf`	Convert webpage to PDF (base64)
`extract_structured`	CSS selector-based data extraction
`execute_js`	Run custom JavaScript on webpages
`extract_regex`	Extract emails, phones, URLs, dates (21 patterns)

Quick Start

Installation (via npx)

npx @ignidor/web-search-mcp

Claude Desktop / Cursor / Windsurf Config

For npx usage (recommended):

{
  "mcpServers": {
    "web-search": {
      "command": "npx",
      "args": ["-y", "@ignidor/web-search-mcp"]
    }
  }
}

For local/SSH usage:

{
  "mcpServers": {
    "web-search": {
      "command": "node",
      "args": ["/path/to/dist/index.js"]
    }
  }
}

Tool Examples

1. Search with BM25 Ranking

// Search for anything - unlimited queries, no API key
{
  "name": "search",
  "arguments": {
    "query": "Rust programming language tutorial",
    "limit": 10,
    "rankingMode": "hybrid"  // 'bm25' or 'hybrid'
  }
}

2. Search + Extract Full Content

// Best for deep research - gets full articles, not snippets
{
  "name": "search_and_crawl",
  "arguments": {
    "query": "AWS DynamoDB batchWrite bug fix",
    "extractTopN": 5,
    "rerankAfterExtract": true
  }
}

Result: 8,000+ words of detailed content including:

Root cause analysis
Step-by-step fixes
Complete code examples
Common pitfalls

3. Get YouTube Transcript ⭐ NEW

// Fast, reliable transcript extraction (1-5 seconds)
{
  "name": "get_youtube_transcript",
  "arguments": {
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "language": "en",
    "includeTimestamps": false,
    "includeMetadata": true
  }
}

Features:

Works with any video length (1 min or 10 hours - same speed!)
Fetches existing captions (no audio processing)
Multiple language support (en, es, fr, de, ja, ko, etc.)
Optional timestamps: [00:15] Text here
Metadata: title, duration, word count
Uses yt-dlp (gold standard, 85k+ GitHub stars)

Requirements:

Install yt-dlp: brew install yt-dlp (macOS) or pip install yt-dlp

Supported URL formats:

Full URL: https://www.youtube.com/watch?v=VIDEO_ID
Short URL: https://youtu.be/VIDEO_ID
Shorts: https://www.youtube.com/shorts/VIDEO_ID
Video ID only: VIDEO_ID

4. Extract Structured Data

// Scrape product listings, articles, etc.
{
  "name": "extract_structured",
  "arguments": {
    "url": "https://example.com/products",
    "baseSelector": ".product",
    "fields": [
      { "name": "title", "selector": "h2", "type": "text" },
      { "name": "price", "selector": ".price", "type": "text" },
      { "name": "link", "selector": "a", "type": "attribute", "attribute": "href" }
    ]
  }
}

5. Execute JavaScript

// Great for dynamic content, debugging
{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "scripts": [
      "return document.title",
      "return document.links.length",
      "return document.URL"
    ]
  }
}

6. Screenshot

{
  "name": "capture_screenshot",
  "arguments": {
    "url": "https://example.com",
    "waitFor": 2  // seconds
  }
}

6. Regex Extraction

// Extract emails, phones, URLs, etc.
{
  "name": "extract_regex",
  "arguments": {
    "url": "https://example.com/contact",
    "patterns": ["email", "phone_intl", "url"]
  }
}

21 built-in patterns: email, phone_intl, phone_us, url, ipv4, ipv6, uuid, currency, percentage, number, date_iso, date_us, time_24h, postal_us, postal_uk, hex_color, twitter_handle, hashtag, mac_addr, iban, credit_card, all

Playwright Setup (Optional but Recommended)

For full functionality (crawling, screenshots, PDFs, JS execution), install Playwright browsers:

npx playwright install chromium

Without Playwright: Only search tool works (DuckDuckGo results only).

With Playwright: All 11 tools work with full content extraction.

Why This Over Brave Search?

Feature	Brave Free	This MCP
Cost	Free tier only	100% Free
Rate Limits	2,000 requests/month	Unlimited
Content Depth	~200 words snippet	1,000+ words
Ranking	Black-box	Transparent BM25
Infrastructure	Cloud API	Local control
API Key	Required	Not needed

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Claude Desktop / Cursor                      │
└───────────────────────────────┬─────────────────────────────────┘
                                │ MCP (stdio)
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                   @ignidor/web-search-mcp                       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │  Tool Router                                              │  │
│  │  • search              → DuckDuckGo + BM25 ranking         │  │
│  │  • crawl_and_extract   → Playwright → Markdown            │  │
│  │  • search_and_crawl     → Combined (search + extract)     │  │
│  │  • capture_screenshot  → Playwright → base64 PNG          │  │
│  │  • generate_pdf        → Playwright → base64 PDF          │  │
│  │  • extract_structured  → Playwright → CSS extraction      │  │
│  │  • execute_js          → Playwright → JS results          │  │
│  │  • extract_regex       → Playwright → 21 patterns         │  │
│  └───────────────────────────┬───────────────────────────────┘  │
│                              │                                  │
│  ┌───────────────────────────▼───────────────────────────────┐  │
│  │              Ranking Engine (BM25 + Hybrid)                │  │
│  │  • fast-bm25 package for scoring                           │  │
│  │  • Freshness scoring (exponential decay)                   │  │
│  │  • Domain authority heuristics                             │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                  Playwright (optional)                         │
│  • Chromium browser for dynamic content                         │
│  • Screenshot, PDF generation                                    │
│  • JavaScript execution                                          │
└─────────────────────────────────────────────────────────────────┘

Development

# Clone repo
git clone https://github.com/JayaBigDataIsCool/ignidor-web-search-mcp.git
cd ignidor-web-search-mcp

# Install dependencies
npm install

# Install Playwright (optional but recommended)
npx playwright install chromium

# Build
npm run build

# Run locally
npm start

JSPM