Found 379 results for web-scraping

rebrowser-puppeteer-core

A drop-in replacement for puppeteer-core patched with rebrowser-patches. It allows to pass modern automation detection tests.

defuddle

Extract article content and metadata from web pages.

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.

rebrowser-puppeteer

A drop-in replacement for puppeteer patched with rebrowser-patches. It allows to pass modern automation detection tests.

@mzxrai/mcp-webresearch

MCP server for web research

googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

fetcher-mcp

MCP server for fetching web content using Playwright browser

brave-search

A fully typed Brave Search API wrapper, providing easy access to web search, local POI search, and automatic polling for web search summary feature.

rebrowser-playwright-core

A drop-in replacement for playwright-core patched with rebrowser-patches. It allows to pass modern automation detection tests.

rebrowser-playwright

A drop-in replacement for playwright patched with rebrowser-patches. It allows to pass modern automation detection tests.

@yigitkonur/scrape-do-mcp-server

MCP server for web scraping using Scrape.do API

telegram-scraper

A simple Telegram channel scraper

@promptbook/website-crawler

Promptbook: Turn your company's scattered knowledge into AI ready books

brave-real-browser-mcp-server

MCP server for brave-real-browser

d-scrape

The library scraper for WhatsApp bot or Restfull API's

@bonginkan/maria

🚀 MARIA v4.4.1 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes nat

@wordbricks/fetch-mcp

Model Context Protocol (MCP) server for fetching data from the web

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

anchorbrowser

The official TypeScript library for the Anchorbrowser API

@fanboynz/network-scanner

A Puppeteer-based network scanner for analyzing web traffic, generating adblock filter rules, and identifying third-party requests. Features include fingerprint spoofing, Cloudflare bypass, content analysis with curl/grep, and multiple output formats.

@gravityai-dev/ingest

Data ingestion nodes for Gravity workflow system

kalpana-agent

Kalpana (कल्पना) - AI development assistant with multi-runtime containerized execution, web automation, multi-modal analysis, error checking, and intelligent context management

n8n-nodes-headlessx

n8n community node for HeadlessX API integration - web scraping, screenshots, and PDF generation

deepcrawl

JavaScript/TypeScript SDK for Deepcrawl API - A powerful web scraping and crawling service

node-libcurl-ja3

Node.js native bindings for libcurl-impersonate. Impersonate Chrome, Edge, Firefox and Safari TLS fingerprints.

@agentic-intelligence/dom-engine

Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents

hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

h2m-parser

LLM-ready HTML to Markdown pipeline with Readability, htmlparser2, and post-processing utilities.

google-search-ts

A TypeScript library for performing Google searches with support for proxy, pagination, and customization

crawl4ai

TypeScript SDK for Crawl4AI REST API - Bun & Node.js compatible

xyz-scraper

A web scraping framework for various websites using Playwright.

n8n-nodes-stagehand-browser

N8n node for integrating Stagehand browser automation with Browserless support

n8n-nodes-scraping-dog

A custom n8n node for integrating with ScrapingDog to perform web scraping tasks.

@faouzkk/tiktok-dl

A module for downloading TikTok videos by the URL

email-scrape

Toolkit for extracting email addresses from HTML and remote websites

raggle-js

JavaScript client for Raggle API

searxng

A TypeScript service to interact with the SearXNG search engine API, enabling customizable searches and result retrieval.

n8n-nodes-browser-use

n8n node to control browser-use AI-powered browser automation with Nodes-as-Tools support

@pinkpixel/web-scout-mcp

MCP server for web search and content extraction with multiple URL support and memory optimizations

crawlee-storage-extensions

Package for Apify/Crawlee that allows to store encrypted text values into the Storages

qa-agent

AI-powered QA agent using LLM models for automated testing and web interaction

article-summarizer-jp

CLI tool for summarizing web articles in Japanese using Anthropic Claude API. Fetches content from URLs and generates both 3-line summaries and full translations in polite Japanese.

@lmcc-dev/mult-fetch-mcp-server

An MCP protocol-based web content fetching tool that supports multiple modes and formats, can be integrated with AI assistants like Claude

umbrellamode

UmbrellaMode shared library

scrapedo-mcp-server

Web scraping for Claude Desktop, Codex, and Gemini using Scrapedo API. Simple setup with npx.

@monostate/node-scraper

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

@michaelvanlaar/n8n-nodes-defuddle

n8n node to extract main content from webpages using Defuddle library

mcp-web-scrape

Clean, cached web content for agents—Markdown + citations

@qnaplus/node-curl-impersonate

A typescript wrapper around cURL-impersonate.

@_brcode/mcp-browser-inspector

MCP server for browser inspection with Puppeteer - network monitoring and console error tracking

crawl4ai-mcp-sse-stdio

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction. Supports STDIO, SSE, and HTTP transports.

@victorsouzaleal/googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

@hanivanrizky/nestjs-html-parser

A powerful NestJS HTML parsing service with XPath and CSS selector support, proxy configuration, random user agents, and rich response metadata including headers and status codes

rag-system-pgvector

A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers - supports OpenAI, Anthropic, HuggingFace, Azure, Google AI, and more

anycrawl-mcp-server

AnyCrawl MCP Server - Adds powerful web scraping and crawling to Cursor, Claude and any other LLM clients

crawlforge-mcp-server

CrawlForge MCP Server - Professional Model Context Protocol server with 19 comprehensive web scraping, crawling, and content processing tools.

playwright-cache

Efficient response caching for Playwright automation scripts.

ethos-crawler

Web crawler and API for aggregating and serving digital rights organizations' publications.

n8n-nodes-smart-web-scraper

Smart web scraper node for n8n with automatic failover and content extraction

selenium-mcp-server-agbobli

Comprehensive Selenium MCP Server with full WebDriver functionality for browser automation and testing

n8n-nodes-anchorbrowser

n8n node for Anchor Browser API - browser automation and control

doc-ops-mcp

MCP Document Converter Server — A Model Context Protocol server for seamless document format conversion and processing

@aduptive/instagram-scraper

Modern TypeScript library for collecting public Instagram content with smart delays, mobile-first approach, and media support

@dimitrk/mcp-search

MCP server for web search and semantic page content retrieval with local caching

agentql-mcp

Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

@sashbot/uibridge

🤖 AI-friendly live session automation with REAL screenshot backgrounds (no transparency issues!) - control your EXISTING browser with visual debug panel. Perfect for AI agents!

@kirkdeam/puppeteer-mcp-server

Puppeteer MCP Server for browser automation via Model Context Protocol

@clado-ai/mcp-router

Clado Model Context Protocol Server

web-developer-mcp

A Model Context Protocol (MCP) server that provides web development tools for AI assistants. Enables browser automation, DOM inspection, network monitoring, and console analysis through Playwright.

@daidaitw/twitter-scraper

A clean and powerful Twitter/X scraping library with CycleTLS support, proxy configuration, and full TypeScript type definitions | 简洁的 Twitter/X 爬虫库，支持 CycleTLS 和代理，提供完整的 TypeScript 类型定义

xnxx-scraper

Xnxx Search and information scraper

@ducksguse/scraper-sdk

TypeScript SDK for Scraper Microservice - Server-side only

moshai-cli

A modern, fast Node.js CLI powered by arasadrahman

mcp-search-tools

MCP server and client for web search and page viewing tools - DuckDuckGo search and web scraping

@elizaos/plugin-browser

Plugin for browser actions and web scraping

@adobe/spacecat-shared-html-analyzer

Analyze HTML content visibility for AI crawlers and citations - compare static HTML vs fully rendered content

estudante-sei-api

A non-official API to interact with UniRV's SEI system, focused on student functionalities.

n8n-nodes-url-to-html

n8n node for converting URLs to HTML using pdfmunk API

koffi-curl

Node.js libcurl bindings using koffi with browser fingerprint capabilities

@scrapeops/n8n-nodes-scrapeops

n8n community node for ScrapeOps Proxy, Parser, and Data APIs for web scraping and data extraction

@lightfeed/extractor

Use LLMs to robustly extract and enrich structured data from HTML and markdown

@suzakuteam/scraper-node

Sebuah Module Scraper yang dibuat oleh Sxyz dan SuzakuTeam untuk memudahkan penggunaan scraper di project ESM maupun CJS.

web-fetch-mcp

MCP server for web content fetching, summarizing, comparing, and extracting information

aluvia-ts-sdk

Official Aluvia proxy management SDK for Node.js and modern JavaScript environments

oembedder

Library to oEmbed a resource

n8n-nodes-playwright-captcha

Nodo de n8n para automatización web con Playwright, resolución de captchas con 2captcha y soporte para proxies

@browserbasehq/orca

An AI web browsing framework focused on simplicity and extensibility.

js-harvester

Harvester is a lightweight and highly optimized javascript library for extracting data from the DOM tree. It supports extraction of tag texts with specified types and attributes. it's tiny and has no dependencies and also works with Puppeteer

gpt-research

Autonomous AI research agent that conducts comprehensive research on any topic and generates detailed reports with citations

waterfall-fetch

utility for web scraping and fetching the html from a url or using puppeteer to interact with the page. getHtml uses various strategies in a 'waterfall' approch to get the content of the url, depending on priorities, such as stealth, speed, freshness.

mcp-fetch

A Model Context Protocol server providing tools for HTTP requests, GraphQL queries, WebSocket connections, and browser automation

scraperis-mcp

Model Context Protocol (MCP) integration for Scraper.is - A web scraping tool for AI assistants

n8n-nodes-playwright-mcp

Complete n8n Playwright node with all Microsoft Playwright MCP tools and AI assistant support for advanced browser automation

panini-scraper

A Node.js TypeScript API for scraping Panini Brasil product information following Clean Architecture principles

curl-cffi

A powerful HTTP client for Node.js based on libcurl with browser fingerprinting capabilities.

node-curl-impersonate

A wrapper around cURL-impersonate, a binary which can be used to bypass TLS fingerprinting.

n8n-nodes-browser-use-cloud

n8n node for Browser Use Cloud API - Automate web tasks with AI agents

plugin-books-pro

[![npm version](https://badge.fury.io/js/plugin-books-pro.svg)](https://badge.fury.io/js/plugin-books-pro)

octagon-deep-research-mcp

MCP server for Deep Research. Provides specialized AI-powered deep research capabilities with no rate limits - faster than ChatGPT Deep Research, more thorough than Grok DeepSearch or Perplexity Deep Research.

@akukral/site-comparator

A sophisticated website comparison tool with intelligent content analysis and offset-aware difference detection

mcp-crawl4ai-ts

TypeScript MCP server for Crawl4AI - web crawling and content extraction

sigaa-api

Unofficial high performance API for SIGAA IFSC using web scraping.

web-parser-mcp

🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!

@devnaneve/googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

@actionbase/web-action-sdk

Work with the internet as if it were your own API. Automate web interactions across popular internet platforms.

puremd-mcp

Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

jann-scraper

The library scraper for WhatsApp bot or Restfull API's

@webagent-cloud/n8n-nodes-webagent

Webagent n8n nodes package

@6digit/silktext

Lightweight, runtime-safe crawling → clean Markdown

olostep-mcp

Olostep MCP server for web scraping, google search and website urls search.

chrome-automation-mcp

MCP server for browser automation with custom scripts

@speed_of/movietorrent-scraper

movietorrent scraper for extracting movie news from all pages.

@rpidanny/odysseus

Odysseus is a web scraping library built on top of Playwright, designed to handle dynamic web pages and CAPTCHA challenges with ease.

hunterbot-sdk

HunterBot Actor SDK - Official SDK for building web scraping actors on HunterBot platform

ts-curl-impersonate

A typescript wrapper around cURL-impersonate.

mcp-web-content-pick

A tool for extracting structured content from web pages with customizable selectors and crawling options

@ejazullah/playwright-mcp-server

A Model Context Protocol (MCP) server for Playwright browser automation with dynamic CDP endpoint support

@fastmcp-me/fetchserp-mcp-server-node

A Model Context Protocol (MCP) server that provides access to FetchSERP API for SEO analysis, SERP data, web scraping, and keyword research. Supports both stdio and HTTP transport modes.

@boboiboyturuuu/xgrovy-scrape

Scraper untuk konten dewasa dari situs xgrovy.

mcp-jinaai-grounding

MCP server for JinaAI grounding

@tomisakae/syosetu-api

Enterprise-grade Fastify TypeScript API for Syosetu.com data extraction using official API and web scraping. Run instantly with 'npx @tomisakae/syosetu-api'

fetchserp-mcp-server

A Model Context Protocol (MCP) server that provides access to FetchSERP API for SEO analysis, SERP data, web scraping, and keyword research. Supports both stdio and HTTP transport modes.

@webstandard/robots

A standards-compliant generator for producing robots.txt files

mcpxbridge

MCP server for browser automation using McpXbridge

@suthio/brave-deep-research-mcp

DeepSearch MCP Server with Brave Search API and Puppeteer content extraction

@jtsang/fetcher-mcp

MCP server for fetching web content using Playwright browser

@monostate/browsernative-client

Browser Native client SDK for web scraping and content extraction API

n8n-nodes-firecrawl-tool

n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

undetected-chromedriver-js

Node.js wrapper for undetected-chromedriver with automatic setup and cross-platform support

@iflow-mcp/firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

@iflow-mcp/agentql-mcp

Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

@crawlbase/mcp

MCP server for Crawlbase API - enables web scraping through Model Context Protocol

@fastmcp-me/mcp-jinaai-search

MCP server for JinaAI search

playwright-vision-mcp

n8n MCP server with Playwright browser automation capabilities

mcp-server-image-extractor

MCP server for extracting and categorizing images from web pages with intelligent classification

@micrawl/mcp-server

Model Context Protocol (MCP) server for web scraping with Micrawl - exposes scraping capabilities to AI assistants

@bcoders.gr/eth-scrapper

Electron web scraper for Etherscan transactions - External and Internal transaction hash extractor

@fastmcp-me/linkd-mcp

Linkd Model Context Protocol Server

cparse

一个基于 Cheerio 的 HTML 解析和数据提取工具库

sigaa-api-cefetmg

API for CEFETMG-SIGAA plataform, forked from sigaa-api project.

@cityssm/wsib-clearance-check

A tool to scrape the clearance certificate status from the WSIB Online Services website.

@micrawl/core

Core scraping engine for Micrawl - supports Playwright and HTTP drivers with multi-format output

web-feed-mcp

Local Browser MCP Server for web automation with Playwright integration

@langgraph-js/crawler

A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

devchrome-mcp

MCP (Model Context Protocol) сервер для работы с браузером через Puppeteer

linktree-parser

linktree-parser is a TypeScript library for scraping and extracting account, links, banners, and metadata from Linktree profiles.

mcp-jinaai-search

MCP server for JinaAI search

mcp-jinaai-reader

MCP server for JinaAI reader

@armand1m/papercut

Papercut is a scraping/crawling library for Node.js, written in Typescript.

@speed_of/imdbscraper

IMDb scraper for extracting movie reviews from IMDb pages.

webscraping-ai-mcp

Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

@watercrawl/mcp

A Model Context Protocol (MCP) server for WaterCrawl, enabling AI systems to perform web crawling and search operations

@nathanclevenger/googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

@mcprelay/client

MCP client for MCPRelay proxy service - provides web access for AI agents

@supadata/mcp

MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

osu-droid-scraping

Gather information of an osu!droid user via web scraping.

@fwdslsh/inform

A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown

multi-dictionary-scraper

Professional multi-dictionary scraper supporting WordReference and Linguee with unified API, TypeScript definitions, and comprehensive language coverage for 1000+ language pairs.

googlethis-augmented

A fork of googlethis to get specific data for different needs.

web-structure

A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness

@clado-ai/mcp

Clado Model Context Protocol Server

novel-scraper

Made to scraping novels with Puppeter

undetected-puppeteer

1:1 (sorta) replacement for Puppeteer, but undetected

olx-mcp

OLX MCP server that enables Claude Desktop to browse and search OLX listings across multiple domains (PT, PL, BG, RO, UA)

finview

A command-line tool for monitoring financial data and market trends in real-time directly from your terminal.

auto-captcha-solver

Automatically detect and solve various captcha types in Playwright & Puppeteer with 2Captcha/CapMonster Cloud integration

google-reviews-api

A simple Node.js library to fetch Google Maps reviews

web-page-analyzer-cli

一个强大的网站链接抓取工具，支持深度抓取、认证和页面分析

@pricething/curl

A typescript wrapper around cURL-impersonate.

@decodo/langchain-ts

LangChain tools for Decodo's Scraper API

puppeteer-dsl

An intuitive DSL for Puppeteer, simplifying web automation and testing. Currently in alpha, subject to changes.

docs-to-markdown

Tool for automatically analyzing and summarizing library documentation for use with LLM's

linkd-mcp

Linkd Model Context Protocol Server

mcp-url-to-text

MCP server for converting URLs to text using urltoany.com

@rpidanny/google-scholar

A minimal TypeScript library for fetching and parsing Google Scholar pages.

crawlee-scraper-toolkit

A comprehensive TypeScript toolkit for building robust web scrapers with Crawlee, featuring maximum configurability and CLI generator

mcp-crawl4ai

MCP server for advanced web scraping with Crawl4AI - supports authentication, dynamic content, and AI extraction

@crawlus/cheerio

Cheerio-based crawler for server-side HTML parsing and extraction

ts-jobspy

TypeScript job scraper for LinkedIn, Indeed, Glassdoor, ZipRecruiter & more - rewritten from python-jobspy

doc-to-readable

Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

@nrjdalal/google-parser

Google parser is a lightweight yet powerful HTTP client based Google Search Result scraper/parser with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.

@bluggie/nodescrapy

Web crawler in NodeJS

puppeteer-server

Servidor MCP personalizado para automatización de navegadores usando Puppeteer

@crawlus/puppeteer

Puppeteer-based crawler for Chrome automation and dynamic content scraping

@crawlus/utils

Utility functions for web crawling - sitemap processing, link extraction, system info

@crawlus/api

API crawler for REST and GraphQL endpoint crawling with auto-detection

n8n-nodes-dumplingai

n8n community node for Dumpling AI integration

fdy-scraping

`fdy-scraping` is a versatile HTTP client designed for making API requests with support for proxy configuration, debugging, and detailed error handling. It utilizes the [`got-scraping`](https://github.com/apify/got-scraping) library for HTTP operations.

bitbuffet

TypeScript SDK for the Structured Scraper API - BitBuffet

node-wreq

Browser fingerprint bypass library using Rust for TLS/HTTP2 impersonation

@crawlus/http

HTTP crawler for basic web scraping without JavaScript execution

braid-video-downloader

A powerful TypeScript library for downloading videos from web pages, including M3U8/HLS streams, with browser automation and intelligent stream detection

crunchyroll-toolkit

Toolkit Node.js complet pour extraire données d'animés, métadonnées et thumbnails depuis Crunchyroll avec techniques anti-détection 2024/2025

web-scraper-pro

Professional web scraper with Puppeteer & Mozilla Readability. Extract clean content from any website with full TypeScript support.

ayakashi

The next generation web scraping framework

@crawlus/playwright

Playwright-based crawler for full browser automation and JavaScript rendering

@lyuboslavlyubenov/se-scraper

A module using puppeteer to scrape several search engines such as Google, Bing and Duckduckgo

@crawlus/core

Core crawler framework functionality - TypeScript web crawling library

url-to-json-markdown

A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.

llm-gen

A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

markdown-crawler

A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown

mirror-web-cli

Professional website mirroring tool with intelligent framework preservation, AI-powered analysis, and comprehensive asset optimization

n8n-nodes-exa-websets

n8n node for Exa Websets API - Create, manage, and query structured datasets from web sources

@juhomat/hexagonal-ai-framework

AI-powered hexagonal framework with OpenAI integration, database adapters, web scraping, and Next.js demo application

@zentus/googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

@mseep/firecrawl-mcp

anil-brd-typescript-sdk

TypeScript SDK for Bright Data APIs - Web Unlocker, SERP, and Scraper APIs

supapup

⚡ Lightning-fast MCP browser dev tool. Navigate → Get instant structured data. No screenshots needed! Puppeteer: 📸 → CSS selectors → JS eval. Supapup: semantic IDs ready to use. 10x faster, 90% fewer tokens.

cheerio-mcp

MCP server for web scraping with Cheerio

@crawlus/cli

Command-line interface for creating and managing crawler projects

@subtitles/providers

Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo