JSPM – webpeel@0.13.3

Package Exports

webpeel

Readme

Reliable web access for AI agents.
Fetch any page · Extract structured data · Crawl entire sites · Deep research — one tool, three interfaces.

Website · Docs · Playground · Dashboard · Discussions

What is WebPeel?

WebPeel gives your AI agent reliable access to the web. Fetch any page, extract structured data, crawl entire sites, and research topics — all through a single CLI, API, or MCP server.

It automatically handles the hard parts: JavaScript rendering, bot detection, Cloudflare challenges, infinite scroll, pagination, and content noise. Your agent gets clean markdown. You don't think about the plumbing.

🚀 Quick Start

Three paths in, all free to try:

CLI

npx webpeel "https://news.ycombinator.com"

No install needed. First 25 fetches work without signup. Get 500/week free →

MCP Server (for Claude, Cursor, VS Code, Windsurf)

Add to your MCP config (~/Library/Application Support/Claude/claude_desktop_config.json on Mac):

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"]
    }
  }
}

REST API

curl "https://api.webpeel.dev/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer wp_YOUR_KEY"

✨ Features

Core

Feature	Description
Web Fetching	Any URL → clean markdown, text, HTML, or JSON
Smart Escalation	Auto-upgrades: HTTP → Browser → Stealth. Uses the fastest method, escalates only when needed
Content Pruning	2-pass HTML reduction — strips nav/footer/sidebar/ads automatically
Token Budget	Hard-cap output to N tokens. No surprises in your LLM bill
Screenshots	Full-page or viewport screenshots with a single flag
Batch Mode	Process multiple URLs concurrently

AI Agent

Feature	Description
MCP Server	13 tools for Claude Desktop, Cursor, VS Code, and Windsurf
Deep Research	Multi-hop agent: search → fetch → analyze → follow leads → synthesize
Search	Web search across 27+ structured sources
Hotel Search	Kayak, Booking.com, Google Travel, Expedia — in parallel
Browser Profiles	Persistent sessions for sites that require login
Infinite Scroll	Auto-scrolls lazy-loaded feeds until stable
Actions	Click, type, fill, select, hover, press, scroll — full browser automation

Extraction

Feature	Description
CSS Schema Extraction	7 built-in schemas (Amazon, Booking.com, eBay, Expedia, Hacker News, Walmart, Yelp) — auto-detected by domain
JSON Schema Extraction	Pass any JSON Schema and get back typed, structured data
LLM Extraction (BYOK)	Natural language → structured data using your own OpenAI-compatible key
BM25 Filtering	Query-focused content: only the parts relevant to your question
Links / Images / Meta	Extract just the links, images, or metadata from any page

Anti-Bot

Feature	Description
Stealth Mode	Bypasses Cloudflare, PerimeterX, DataDome, Akamai, and more
28 Auto-Stealth Domains	Amazon, LinkedIn, Glassdoor, Zillow, and 24 more — stealth kicks in automatically
Challenge Detection	7 bot-protection vendors detected and handled automatically
Browser Fingerprinting	Masks WebGL, navigator properties, canvas fingerprint

Advanced

Feature	Description
Crawl + Sitemap	BFS/DFS crawling, sitemap discovery, robots.txt compliance, deduplication
Site Map	Map all URLs on a domain up to any depth
Pagination	Follow "Next" links automatically for N pages
Chunking	Split long content into LLM-sized pieces (fixed, semantic, or paragraph)
Caching	Local result cache with configurable TTL (`5m`, `1h`, `1d`)
Geo-targeting	ISO country code + language preferences per request
Change Tracking	Detect what changed between two fetches of the same page
Brand Extraction	Pull logo, colors, fonts, and social links from any site
PDF Extraction	Extract text from PDF documents
Self-Hostable	Docker Compose for full on-premise deployment
Python SDK	Sync + async client, `pip install webpeel`

🤖 MCP Integration

WebPeel exposes 13 tools to your AI coding assistant:

Tool	What it does
`webpeel_fetch`	Fetch any URL → markdown (smart escalation built in)
`webpeel_search`	Web search with structured results
`webpeel_batch`	Fetch multiple URLs concurrently
`webpeel_crawl`	Crawl a site with depth/page limits
`webpeel_map`	Discover all URLs on a domain
`webpeel_extract`	Structured extraction (CSS, JSON Schema, or LLM)
`webpeel_screenshot`	Screenshot any page
`webpeel_research`	Deep multi-hop research on a topic
`webpeel_summarize`	AI summary of any URL
`webpeel_answer`	Ask a question about a URL's content
`webpeel_change_track`	Detect changes between two fetches
`webpeel_brand`	Extract branding assets from a site
`webpeel_deep_fetch`	Search + batch fetch + merge — comprehensive research in one call, no LLM key needed

Setup for each editor

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Cursor (Settings → MCP Servers):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

VS Code (~/.vscode/mcp.json):

{
  "servers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Windsurf (~/.codeium/windsurf/mcp_config.json):

{
  "mcpServers": {
    "webpeel": { "command": "npx", "args": ["-y", "webpeel", "mcp"] }
  }
}

Docker (stdio):

{
  "mcpServers": {
    "webpeel": { "command": "docker", "args": ["run", "-i", "--rm", "webpeel/mcp"] }
  }
}

🔬 Deep Research

Multi-hop research that thinks like a researcher, not a search engine:

# Sources only — no API key needed
npx webpeel research "best practices for rate limiting APIs" --max-sources 8

# Full synthesis with LLM (BYOK)
npx webpeel research "compare Firecrawl vs Crawl4AI vs WebPeel" --llm-key sk-...

How it works: Search → fetch top results → extract key passages (BM25) → follow the most relevant links → synthesize across sources. No circular references, no duplicate content.

📦 Extraction

Three ways to get structured data out of any page:

CSS Schema (zero config, auto-detected)

# Auto-detects Amazon and applies the built-in schema
npx webpeel "https://www.amazon.com/s?k=mechanical+keyboard" --json

# Force a specific schema
npx webpeel "https://www.booking.com/searchresults.html?city=Paris" --schema booking --json

# List all built-in schemas
npx webpeel --list-schemas

Built-in schemas: amazon · booking · ebay · expedia · hackernews · walmart · yelp

JSON Schema (type-safe structured extraction)

npx webpeel "https://example.com/product" \
  --extract-schema '{"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}}}' \
  --llm-key sk-...

LLM Extraction (natural language, BYOK)

npx webpeel "https://hn.algolia.com" \
  --llm-extract "top 10 posts with title, score, and comment count" \
  --llm-key $OPENAI_API_KEY \
  --json

Node.js extraction example

import { peel } from 'webpeel';

// CSS selector extraction
const result = await peel('https://news.ycombinator.com', {
  extract: {
    selectors: {
      titles: '.titleline > a',
      scores: '.score',
    }
  }
});
console.log(result.extracted); // { titles: [...], scores: [...] }

// LLM extraction with JSON Schema
const product = await peel('https://example.com/product', {
  llmExtract: 'title, price, rating, availability',
  llmKey: process.env.OPENAI_API_KEY,
});

🛡️ Stealth & Anti-Bot

WebPeel detects 7 bot-protection vendors and handles them automatically:

Cloudflare (JS challenge, Turnstile, Bot Management)
PerimeterX / HUMAN (behavioral analysis)
DataDome (ML-based bot detection)
Akamai Bot Manager
Distil Networks
reCAPTCHA / hCaptcha (page-level detection)
Generic challenge pages

28 high-protection domains (Amazon, LinkedIn, Glassdoor, Zillow, Ticketmaster, and more) automatically route through stealth mode — no flags needed.

# Explicitly enable stealth
npx webpeel "https://glassdoor.com/jobs" --stealth

# Auto-escalation (stealth triggers automatically on challenge detection)
npx webpeel "https://amazon.com/dp/ASIN"

⚡ Benchmark

Evaluated on 30 real-world URLs across 6 categories (static, dynamic, SPA, protected, documents, international):

	WebPeel	Next best
Success rate	100% (30/30)	93.3%
Content quality	92.3%	83.2%

WebPeel is the only tool that extracted content from all 30 test URLs. Full methodology →

🆚 Comparison

Feature	WebPeel	Firecrawl	Jina Reader	ScrapingBee	Tavily
Free tier	✅ 500/wk recurring	⚠️ 500 one-time	❌	❌	⚠️ 1,000 one-time
Smart escalation	✅ auto HTTP→browser→stealth	❌ manual	❌	❌	❌
Stealth mode	✅ all plans	✅	❌	✅ paid	❌
Challenge detection	✅ 7 vendors	❌	❌	❌	❌
MCP tools	✅ 13 tools	⚠️ ~6	❌	❌	✅
Deep research	✅ multi-hop + BM25	⚠️ cloud only	❌	❌	✅
CSS schema extraction	✅ 7 bundled	❌	❌	❌	❌
LLM extraction (BYOK)	✅	⚠️ cloud only	❌	❌	❌
Site search (27+ sites)	✅	❌	❌	❌	⚠️ web only
Hotel search	✅ 4 sources parallel	❌	❌	❌	❌
Browser profiles	✅ persistent sessions	❌	❌	❌	❌
Self-hosting	✅ Docker Compose	⚠️ complex	❌	❌	❌
Python SDK	✅ `pip install`	✅	❌	✅	✅
Firecrawl-compatible API	✅ drop-in	✅ native	❌	❌	❌
License	AGPL-3.0	AGPL-3.0	Proprietary	Proprietary	Proprietary
Price	$0 / $9 / $29	$0 / $16 / $83	custom	$49 / $149	$0 / $99

💳 Pricing

Plan	Price	Weekly Fetches	Burst
Free	$0/mo	500/wk	50/hr
Pro	$9/mo	1,250/wk	100/hr
Max	$29/mo	6,250/wk	500/hr

All features on all plans. Pro/Max add pay-as-you-go extra usage (fetch $0.002, search $0.001, stealth $0.01). Quota resets every Monday.

🐍 Python SDK

pip install webpeel

from webpeel import WebPeel

client = WebPeel(api_key="wp_...")  # or use WEBPEEL_API_KEY env var

# Fetch a page
result = client.scrape("https://example.com")
print(result.content)    # Clean markdown
print(result.metadata)   # title, description, author, ...

# Search the web
results = client.search("latest AI research papers")

# Crawl a site
job = client.crawl("https://docs.example.com", limit=100)

# With browser + stealth
result = client.scrape("https://protected-site.com", render=True, stealth=True)

Sync and async clients. Pure Python 3.8+, zero dependencies. Full SDK docs →

🐳 Self-Hosting

git clone https://github.com/webpeel/webpeel.git
cd webpeel && docker compose up

Full REST API available at http://localhost:3000. AGPL-3.0 licensed. Self-hosting guide →

Just the MCP server:

docker run -i webpeel/mcp

Just the API server:

docker run -p 3000:3000 webpeel/api

📖 API Reference

Full OpenAPI spec at openapi.yaml and api.webpeel.dev.

# Fetch
GET  /v1/fetch?url=<url>

# Search
GET  /v1/search?q=<query>

# Crawl
POST /v1/crawl  { "url": "...", "limit": 100 }

# Map
GET  /v1/map?url=<url>

# Extract
POST /v1/extract  { "url": "...", "schema": { ... } }

Full API reference →

🤝 Contributing

git clone https://github.com/webpeel/webpeel.git
cd webpeel
npm install && npm run build
npm test

Bug reports: Open an issue
Feature requests: Start a discussion
Code: See CONTRIBUTING.md for guidelines

The project has a comprehensive test suite. Please add tests for new features.

Star History

License

AGPL-3.0 — free to use, modify, and distribute. If you run a modified version as a network service, you must release your source under AGPL-3.0.

Need a commercial license? support@webpeel.dev

Versions 0.7.1 and earlier were released under MIT and remain MIT-licensed.

If WebPeel saves you time, ⭐ star the repo — it helps others find it.