JSPM

Found 379 results for web-scraping

rebrowser-puppeteer-core

A drop-in replacement for puppeteer-core patched with rebrowser-patches. It allows to pass modern automation detection tests.

  • v24.8.1
  • 61.83
  • Published

defuddle

Extract article content and metadata from web pages.

  • v0.6.6
  • 60.25
  • Published

firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.

  • v3.4.0
  • 59.81
  • Published

rebrowser-puppeteer

A drop-in replacement for puppeteer patched with rebrowser-patches. It allows to pass modern automation detection tests.

  • v24.8.1
  • 58.68
  • Published

googlethis

A simple yet powerful module to retrieve organic search results and much more from Google.

  • v1.8.0
  • 52.67
  • Published

fetcher-mcp

MCP server for fetching web content using Playwright browser

    • v0.3.5
    • 49.88
    • Published

    brave-search

    A fully typed Brave Search API wrapper, providing easy access to web search, local POI search, and automatic polling for web search summary feature.

    • v0.9.0
    • 47.56
    • Published

    rebrowser-playwright-core

    A drop-in replacement for playwright-core patched with rebrowser-patches. It allows to pass modern automation detection tests.

    • v1.52.0
    • 46.24
    • Published

    rebrowser-playwright

    A drop-in replacement for playwright patched with rebrowser-patches. It allows to pass modern automation detection tests.

    • v1.52.0
    • 46.16
    • Published

    d-scrape

    The library scraper for WhatsApp bot or Restfull API's

    • v1.2.0
    • 41.32
    • Published

    @bonginkan/maria

    🚀 MARIA v4.4.1 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes nat

    • v4.4.1
    • 40.72
    • Published

    @wordbricks/fetch-mcp

    Model Context Protocol (MCP) server for fetching data from the web

    • v1.3.0
    • 39.62
    • Published

    rebrowser-patches

    Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

    • v1.0.19
    • 39.59
    • Published

    anchorbrowser

    The official TypeScript library for the Anchorbrowser API

    • v0.6.0
    • 38.74
    • Published

    @fanboynz/network-scanner

    A Puppeteer-based network scanner for analyzing web traffic, generating adblock filter rules, and identifying third-party requests. Features include fingerprint spoofing, Cloudflare bypass, content analysis with curl/grep, and multiple output formats.

    • v2.0.28
    • 38.74
    • Published

    kalpana-agent

    Kalpana (कल्पना) - AI development assistant with multi-runtime containerized execution, web automation, multi-modal analysis, error checking, and intelligent context management

    • v1.1.16
    • 38.12
    • Published

    n8n-nodes-headlessx

    n8n community node for HeadlessX API integration - web scraping, screenshots, and PDF generation

    • v1.2.2
    • 37.71
    • Published

    deepcrawl

    JavaScript/TypeScript SDK for Deepcrawl API - A powerful web scraping and crawling service

    • v0.5.2
    • 36.45
    • Published

    node-libcurl-ja3

    Node.js native bindings for libcurl-impersonate. Impersonate Chrome, Edge, Firefox and Safari TLS fingerprints.

    • v5.0.3
    • 36.11
    • Published

    @agentic-intelligence/dom-engine

    Agentic DOM Intelligence - A lightweight TypeScript library for DOM analysis and manipulation, designed for web automation and AI agents

    • v1.2.0-dev.6
    • 35.82
    • Published

    hyperbrowser-mcp

    Hyperbrowser Model Context Protocol Server

    • v1.0.25
    • 35.67
    • Published

    h2m-parser

    LLM-ready HTML to Markdown pipeline with Readability, htmlparser2, and post-processing utilities.

    • v1.0.0
    • 35.21
    • Published

    google-search-ts

    A TypeScript library for performing Google searches with support for proxy, pagination, and customization

    • v1.0.1
    • 35.02
    • Published

    crawl4ai

    TypeScript SDK for Crawl4AI REST API - Bun & Node.js compatible

    • v1.0.1
    • 34.69
    • Published

    xyz-scraper

    A web scraping framework for various websites using Playwright.

    • v0.0.14
    • 34.13
    • Published

    n8n-nodes-stagehand-browser

    N8n node for integrating Stagehand browser automation with Browserless support

    • v0.2.11
    • 33.81
    • Published

    n8n-nodes-scraping-dog

    A custom n8n node for integrating with ScrapingDog to perform web scraping tasks.

    • v0.4.2
    • 33.41
    • Published

    @faouzkk/tiktok-dl

    A module for downloading TikTok videos by the URL

    • v1.0.1
    • 33.28
    • Published

    email-scrape

    Toolkit for extracting email addresses from HTML and remote websites

    • v0.7.1
    • 33.01
    • Published

    raggle-js

    JavaScript client for Raggle API

    • v0.2.55
    • 32.59
    • Published

    searxng

    A TypeScript service to interact with the SearXNG search engine API, enabling customizable searches and result retrieval.

    • v0.0.5
    • 32.22
    • Published

    n8n-nodes-browser-use

    n8n node to control browser-use AI-powered browser automation with Nodes-as-Tools support

    • v0.1.6
    • 32.19
    • Published

    @pinkpixel/web-scout-mcp

    MCP server for web search and content extraction with multiple URL support and memory optimizations

    • v1.5.5
    • 32.09
    • Published

    crawlee-storage-extensions

    Package for Apify/Crawlee that allows to store encrypted text values into the Storages

    • v1.0.11
    • 31.99
    • Published

    qa-agent

    AI-powered QA agent using LLM models for automated testing and web interaction

    • v2.2.3-beta.0
    • 31.70
    • Published

    article-summarizer-jp

    CLI tool for summarizing web articles in Japanese using Anthropic Claude API. Fetches content from URLs and generates both 3-line summaries and full translations in polite Japanese.

    • v1.5.23
    • 31.05
    • Published

    @lmcc-dev/mult-fetch-mcp-server

    An MCP protocol-based web content fetching tool that supports multiple modes and formats, can be integrated with AI assistants like Claude

    • v1.3.2
    • 30.85
    • Published

    umbrellamode

    UmbrellaMode shared library

      • v0.0.8
      • 30.60
      • Published

      scrapedo-mcp-server

      Web scraping for Claude Desktop, Codex, and Gemini using Scrapedo API. Simple setup with npx.

      • v1.2.5
      • 30.37
      • Published

      @monostate/node-scraper

      Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

      • v1.8.1
      • 30.29
      • Published

      mcp-web-scrape

      Clean, cached web content for agents—Markdown + citations

      • v1.0.7
      • 29.90
      • Published

      @_brcode/mcp-browser-inspector

      MCP server for browser inspection with Puppeteer - network monitoring and console error tracking

      • v1.1.3
      • 29.57
      • Published

      crawl4ai-mcp-sse-stdio

      MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction. Supports STDIO, SSE, and HTTP transports.

      • v1.2.1
      • 29.51
      • Published

      @victorsouzaleal/googlethis

      A simple yet powerful module to retrieve organic search results and much more from Google.

      • v1.8.1
      • 29.45
      • Published

      @hanivanrizky/nestjs-html-parser

      A powerful NestJS HTML parsing service with XPath and CSS selector support, proxy configuration, random user agents, and rich response metadata including headers and status codes

      • v2.3.0
      • 29.26
      • Published

      rag-system-pgvector

      A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers - supports OpenAI, Anthropic, HuggingFace, Azure, Google AI, and more

        • v2.2.0
        • 28.80
        • Published

        anycrawl-mcp-server

        AnyCrawl MCP Server - Adds powerful web scraping and crawling to Cursor, Claude and any other LLM clients

          • v1.0.1
          • 28.74
          • Published

          crawlforge-mcp-server

          CrawlForge MCP Server - Professional Model Context Protocol server with 19 comprehensive web scraping, crawling, and content processing tools.

          • v3.0.3
          • 28.67
          • Published

          playwright-cache

          Efficient response caching for Playwright automation scripts.

          • v1.0.1
          • 28.60
          • Published

          ethos-crawler

          Web crawler and API for aggregating and serving digital rights organizations' publications.

          • v1.3.0
          • 28.21
          • Published

          selenium-mcp-server-agbobli

          Comprehensive Selenium MCP Server with full WebDriver functionality for browser automation and testing

          • v1.0.3
          • 28.09
          • Published

          n8n-nodes-anchorbrowser

          n8n node for Anchor Browser API - browser automation and control

          • v0.1.4
          • 27.99
          • Published

          doc-ops-mcp

          MCP Document Converter Server — A Model Context Protocol server for seamless document format conversion and processing

          • v0.3.8
          • 27.76
          • Published

          @aduptive/instagram-scraper

          Modern TypeScript library for collecting public Instagram content with smart delays, mobile-first approach, and media support

          • v1.0.3
          • 27.68
          • Published

          @dimitrk/mcp-search

          MCP server for web search and semantic page content retrieval with local caching

          • v0.1.6
          • 27.36
          • Published

          agentql-mcp

          Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

          • v1.0.0
          • 27.15
          • Published

          @sashbot/uibridge

          🤖 AI-friendly live session automation with REAL screenshot backgrounds (no transparency issues!) - control your EXISTING browser with visual debug panel. Perfect for AI agents!

          • v1.6.0
          • 26.86
          • Published

          web-developer-mcp

          A Model Context Protocol (MCP) server that provides web development tools for AI assistants. Enables browser automation, DOM inspection, network monitoring, and console analysis through Playwright.

          • v1.6.0
          • 26.39
          • Published

          @daidaitw/twitter-scraper

          A clean and powerful Twitter/X scraping library with CycleTLS support, proxy configuration, and full TypeScript type definitions | 简洁的 Twitter/X 爬虫库,支持 CycleTLS 和代理,提供完整的 TypeScript 类型定义

          • v1.0.6
          • 26.34
          • Published

          xnxx-scraper

          Xnxx Search and information scraper

          • v1.0.4
          • 26.34
          • Published

          moshai-cli

          A modern, fast Node.js CLI powered by arasadrahman

            • v1.0.0
            • 26.14
            • Published

            mcp-search-tools

            MCP server and client for web search and page viewing tools - DuckDuckGo search and web scraping

            • v1.1.5
            • 26.14
            • Published

            estudante-sei-api

            A non-official API to interact with UniRV's SEI system, focused on student functionalities.

            • v1.2.0
            • 25.90
            • Published

            koffi-curl

            Node.js libcurl bindings using koffi with browser fingerprint capabilities

            • v0.1.23
            • 25.73
            • Published

            @scrapeops/n8n-nodes-scrapeops

            n8n community node for ScrapeOps Proxy, Parser, and Data APIs for web scraping and data extraction

            • v0.2.6
            • 25.56
            • Published

            @lightfeed/extractor

            Use LLMs to robustly extract and enrich structured data from HTML and markdown

            • v0.2.1
            • 25.49
            • Published

            @suzakuteam/scraper-node

            Sebuah Module Scraper yang dibuat oleh Sxyz dan SuzakuTeam untuk memudahkan penggunaan scraper di project ESM maupun CJS.

              • v1.3.0
              • 25.42
              • Published

              web-fetch-mcp

              MCP server for web content fetching, summarizing, comparing, and extracting information

                • v0.1.1
                • 25.28
                • Published

                aluvia-ts-sdk

                Official Aluvia proxy management SDK for Node.js and modern JavaScript environments

                • v2.0.0
                • 25.11
                • Published

                oembedder

                Library to oEmbed a resource

                • v2.1.1
                • 25.08
                • Published

                n8n-nodes-playwright-captcha

                Nodo de n8n para automatización web con Playwright, resolución de captchas con 2captcha y soporte para proxies

                  • v1.0.2
                  • 24.98
                  • Published

                  @browserbasehq/orca

                  An AI web browsing framework focused on simplicity and extensibility.

                  • v3.0.0-preview.2
                  • 24.88
                  • Published

                  js-harvester

                  Harvester is a lightweight and highly optimized javascript library for extracting data from the DOM tree. It supports extraction of tag texts with specified types and attributes. it's tiny and has no dependencies and also works with Puppeteer

                  • v0.3.14
                  • 24.67
                  • Published

                  gpt-research

                  Autonomous AI research agent that conducts comprehensive research on any topic and generates detailed reports with citations

                  • v1.0.1
                  • 24.63
                  • Published

                  waterfall-fetch

                  utility for web scraping and fetching the html from a url or using puppeteer to interact with the page. getHtml uses various strategies in a 'waterfall' approch to get the content of the url, depending on priorities, such as stealth, speed, freshness.

                  • v1.0.11
                  • 24.50
                  • Published

                  mcp-fetch

                  A Model Context Protocol server providing tools for HTTP requests, GraphQL queries, WebSocket connections, and browser automation

                  • v0.1.6
                  • 24.32
                  • Published

                  scraperis-mcp

                  Model Context Protocol (MCP) integration for Scraper.is - A web scraping tool for AI assistants

                  • v0.1.22
                  • 24.22
                  • Published

                  n8n-nodes-playwright-mcp

                  Complete n8n Playwright node with all Microsoft Playwright MCP tools and AI assistant support for advanced browser automation

                  • v1.0.0
                  • 24.21
                  • Published

                  panini-scraper

                  A Node.js TypeScript API for scraping Panini Brasil product information following Clean Architecture principles

                  • v1.0.3
                  • 24.14
                  • Published

                  curl-cffi

                  A powerful HTTP client for Node.js based on libcurl with browser fingerprinting capabilities.

                  • v0.1.41
                  • 24.11
                  • Published

                  node-curl-impersonate

                  A wrapper around cURL-impersonate, a binary which can be used to bypass TLS fingerprinting.

                    • v1.5.4
                    • 23.96
                    • Published

                    plugin-books-pro

                    [![npm version](https://badge.fury.io/js/plugin-books-pro.svg)](https://badge.fury.io/js/plugin-books-pro)

                    • v0.0.11
                    • 23.55
                    • Published

                    octagon-deep-research-mcp

                    MCP server for Deep Research. Provides specialized AI-powered deep research capabilities with no rate limits - faster than ChatGPT Deep Research, more thorough than Grok DeepSearch or Perplexity Deep Research.

                    • v1.0.18
                    • 23.35
                    • Published

                    @akukral/site-comparator

                    A sophisticated website comparison tool with intelligent content analysis and offset-aware difference detection

                    • v1.4.0
                    • 23.17
                    • Published

                    mcp-crawl4ai-ts

                    TypeScript MCP server for Crawl4AI - web crawling and content extraction

                    • v3.0.2
                    • 23.09
                    • Published

                    sigaa-api

                    Unofficial high performance API for SIGAA IFSC using web scraping.

                    • v1.0.34
                    • 22.92
                    • Published

                    web-parser-mcp

                    🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!

                    • v3.7.9
                    • 22.89
                    • Published

                    @devnaneve/googlethis

                    A simple yet powerful module to retrieve organic search results and much more from Google.

                    • v1.8.1
                    • 22.59
                    • Published

                    @actionbase/web-action-sdk

                    Work with the internet as if it were your own API. Automate web interactions across popular internet platforms.

                    • v0.1.9
                    • 22.52
                    • Published

                    puremd-mcp

                    Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

                    • v1.0.3
                    • 21.78
                    • Published

                    jann-scraper

                    The library scraper for WhatsApp bot or Restfull API's

                    • v0.0.6
                    • 21.68
                    • Published

                    @6digit/silktext

                    Lightweight, runtime-safe crawling → clean Markdown

                    • v0.1.5
                    • 21.58
                    • Published

                    olostep-mcp

                    Olostep MCP server for web scraping, google search and website urls search.

                    • v1.0.7
                    • 21.46
                    • Published

                    @rpidanny/odysseus

                    Odysseus is a web scraping library built on top of Playwright, designed to handle dynamic web pages and CAPTCHA challenges with ease.

                    • v2.6.0
                    • 21.08
                    • Published

                    hunterbot-sdk

                    HunterBot Actor SDK - Official SDK for building web scraping actors on HunterBot platform

                      • v1.0.0
                      • 21.08
                      • Published

                      ts-curl-impersonate

                      A typescript wrapper around cURL-impersonate.

                        • v1.0.3
                        • 20.95
                        • Published

                        mcp-web-content-pick

                        A tool for extracting structured content from web pages with customizable selectors and crawling options

                        • v0.0.25
                        • 20.87
                        • Published

                        @ejazullah/playwright-mcp-server

                        A Model Context Protocol (MCP) server for Playwright browser automation with dynamic CDP endpoint support

                        • v1.88.0
                        • 20.82
                        • Published

                        @fastmcp-me/fetchserp-mcp-server-node

                        A Model Context Protocol (MCP) server that provides access to FetchSERP API for SEO analysis, SERP data, web scraping, and keyword research. Supports both stdio and HTTP transport modes.

                        • v1.0.5
                        • 20.81
                        • Published

                        @tomisakae/syosetu-api

                        Enterprise-grade Fastify TypeScript API for Syosetu.com data extraction using official API and web scraping. Run instantly with 'npx @tomisakae/syosetu-api'

                        • v0.0.2
                        • 20.40
                        • Published

                        fetchserp-mcp-server

                        A Model Context Protocol (MCP) server that provides access to FetchSERP API for SEO analysis, SERP data, web scraping, and keyword research. Supports both stdio and HTTP transport modes.

                        • v1.0.5
                        • 20.39
                        • Published

                        @webstandard/robots

                        A standards-compliant generator for producing robots.txt files

                        • v1.0.0
                        • 20.37
                        • Published

                        mcpxbridge

                        MCP server for browser automation using McpXbridge

                        • v0.1.0
                        • 20.26
                        • Published

                        @jtsang/fetcher-mcp

                        MCP server for fetching web content using Playwright browser

                          • v0.3.3
                          • 20.12
                          • Published

                          n8n-nodes-firecrawl-tool

                          n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

                          • v0.1.2
                          • 19.84
                          • Published

                          undetected-chromedriver-js

                          Node.js wrapper for undetected-chromedriver with automatic setup and cross-platform support

                          • v1.2.2
                          • 19.76
                          • Published

                          @iflow-mcp/firecrawl-mcp

                          MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

                          • v1.12.0
                          • 19.70
                          • Published

                          @iflow-mcp/agentql-mcp

                          Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

                          • v1.0.0
                          • 19.70
                          • Published

                          @crawlbase/mcp

                          MCP server for Crawlbase API - enables web scraping through Model Context Protocol

                          • v1.0.3
                          • 19.67
                          • Published

                          playwright-vision-mcp

                          n8n MCP server with Playwright browser automation capabilities

                          • v2.0.0
                          • 19.49
                          • Published

                          mcp-server-image-extractor

                          MCP server for extracting and categorizing images from web pages with intelligent classification

                            • v1.0.8
                            • 19.42
                            • Published

                            @micrawl/mcp-server

                            Model Context Protocol (MCP) server for web scraping with Micrawl - exposes scraping capabilities to AI assistants

                            • v0.1.0
                            • 19.34
                            • Published

                            @bcoders.gr/eth-scrapper

                            Electron web scraper for Etherscan transactions - External and Internal transaction hash extractor

                            • v1.4.0
                            • 19.34
                            • Published

                            cparse

                            一个基于 Cheerio 的 HTML 解析和数据提取工具库

                            • v2.2.0
                            • 19.27
                            • Published

                            sigaa-api-cefetmg

                            API for CEFETMG-SIGAA plataform, forked from sigaa-api project.

                            • v1.0.35
                            • 19.21
                            • Published

                            @micrawl/core

                            Core scraping engine for Micrawl - supports Playwright and HTTP drivers with multi-format output

                            • v0.1.0
                            • 19.16
                            • Published

                            web-feed-mcp

                            Local Browser MCP Server for web automation with Playwright integration

                            • v1.0.1-beta.4
                            • 19.12
                            • Published

                            @langgraph-js/crawler

                            A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

                            • v1.7.0
                            • 18.97
                            • Published

                            devchrome-mcp

                            MCP (Model Context Protocol) сервер для работы с браузером через Puppeteer

                            • v1.8.1
                            • 18.96
                            • Published

                            linktree-parser

                            linktree-parser is a TypeScript library for scraping and extracting account, links, banners, and metadata from Linktree profiles.

                            • v1.5.0
                            • 18.66
                            • Published

                            @armand1m/papercut

                            Papercut is a scraping/crawling library for Node.js, written in Typescript.

                            • v2.0.5
                            • 18.50
                            • Published

                            @speed_of/imdbscraper

                            IMDb scraper for extracting movie reviews from IMDb pages.

                            • v1.0.7
                            • 18.50
                            • Published

                            webscraping-ai-mcp

                            Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

                            • v1.0.2
                            • 18.49
                            • Published

                            @watercrawl/mcp

                            A Model Context Protocol (MCP) server for WaterCrawl, enabling AI systems to perform web crawling and search operations

                            • v1.2.0
                            • 18.48
                            • Published

                            @nathanclevenger/googlethis

                            A simple yet powerful module to retrieve organic search results and much more from Google.

                            • v1.8.3
                            • 18.47
                            • Published

                            @mcprelay/client

                            MCP client for MCPRelay proxy service - provides web access for AI agents

                            • v1.0.7
                            • 18.43
                            • Published

                            @supadata/mcp

                            MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

                            • v1.0.1
                            • 18.43
                            • Published

                            osu-droid-scraping

                            Gather information of an osu!droid user via web scraping.

                            • v0.0.2
                            • 18.36
                            • Published

                            @fwdslsh/inform

                            A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown

                              • v0.1.4
                              • 18.35
                              • Published

                              multi-dictionary-scraper

                              Professional multi-dictionary scraper supporting WordReference and Linguee with unified API, TypeScript definitions, and comprehensive language coverage for 1000+ language pairs.

                              • v1.1.6
                              • 18.24
                              • Published

                              googlethis-augmented

                              A fork of googlethis to get specific data for different needs.

                              • v0.0.2--canary.1.4307381028.0
                              • 18.03
                              • Published

                              web-structure

                              A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness

                              • v1.0.2
                              • 18.02
                              • Published

                              @clado-ai/mcp

                              Clado Model Context Protocol Server

                              • v1.0.34
                              • 17.95
                              • Published

                              novel-scraper

                              Made to scraping novels with Puppeter

                              • v9.0.0
                              • 17.93
                              • Published

                              undetected-puppeteer

                              1:1 (sorta) replacement for Puppeteer, but undetected

                              • v1.0.1
                              • 17.72
                              • Published

                              olx-mcp

                              OLX MCP server that enables Claude Desktop to browse and search OLX listings across multiple domains (PT, PL, BG, RO, UA)

                              • v1.0.9
                              • 17.69
                              • Published

                              finview

                              A command-line tool for monitoring financial data and market trends in real-time directly from your terminal.

                              • v1.0.5
                              • 17.67
                              • Published

                              auto-captcha-solver

                              Automatically detect and solve various captcha types in Playwright & Puppeteer with 2Captcha/CapMonster Cloud integration

                              • v1.3.7
                              • 17.52
                              • Published

                              google-reviews-api

                              A simple Node.js library to fetch Google Maps reviews

                              • v1.0.6
                              • 17.49
                              • Published

                              web-page-analyzer-cli

                              一个强大的网站链接抓取工具,支持深度抓取、认证和页面分析

                              • v1.0.19
                              • 17.48
                              • Published

                              @pricething/curl

                              A typescript wrapper around cURL-impersonate.

                                • v1.1.6
                                • 17.31
                                • Published

                                puppeteer-dsl

                                An intuitive DSL for Puppeteer, simplifying web automation and testing. Currently in alpha, subject to changes.

                                • v0.0.17
                                • 17.22
                                • Published

                                docs-to-markdown

                                Tool for automatically analyzing and summarizing library documentation for use with LLM's

                                  • v1.0.0
                                  • 17.22
                                  • Published

                                  linkd-mcp

                                  Linkd Model Context Protocol Server

                                  • v1.0.25
                                  • 17.20
                                  • Published

                                  mcp-url-to-text

                                  MCP server for converting URLs to text using urltoany.com

                                  • v1.0.0
                                  • 17.10
                                  • Published

                                  @rpidanny/google-scholar

                                  A minimal TypeScript library for fetching and parsing Google Scholar pages.

                                  • v3.3.0
                                  • 16.93
                                  • Published

                                  crawlee-scraper-toolkit

                                  A comprehensive TypeScript toolkit for building robust web scrapers with Crawlee, featuring maximum configurability and CLI generator

                                  • v2.0.2
                                  • 16.86
                                  • Published

                                  mcp-crawl4ai

                                  MCP server for advanced web scraping with Crawl4AI - supports authentication, dynamic content, and AI extraction

                                  • v1.0.6
                                  • 16.74
                                  • Published

                                  @crawlus/cheerio

                                  Cheerio-based crawler for server-side HTML parsing and extraction

                                  • v0.6.0
                                  • 16.65
                                  • Published

                                  ts-jobspy

                                  TypeScript job scraper for LinkedIn, Indeed, Glassdoor, ZipRecruiter & more - rewritten from python-jobspy

                                  • v1.3.1
                                  • 16.64
                                  • Published

                                  doc-to-readable

                                  Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

                                  • v1.5.3
                                  • 16.56
                                  • Published

                                  @nrjdalal/google-parser

                                  Google parser is a lightweight yet powerful HTTP client based Google Search Result scraper/parser with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.

                                  • v2.3.0
                                  • 16.32
                                  • Published

                                  puppeteer-server

                                  Servidor MCP personalizado para automatización de navegadores usando Puppeteer

                                  • v2.1.1
                                  • 16.10
                                  • Published

                                  @crawlus/puppeteer

                                  Puppeteer-based crawler for Chrome automation and dynamic content scraping

                                  • v0.6.0
                                  • 16.04
                                  • Published

                                  @crawlus/utils

                                  Utility functions for web crawling - sitemap processing, link extraction, system info

                                  • v0.9.0
                                  • 15.99
                                  • Published

                                  @crawlus/api

                                  API crawler for REST and GraphQL endpoint crawling with auto-detection

                                  • v0.9.0
                                  • 15.97
                                  • Published

                                  fdy-scraping

                                  `fdy-scraping` is a versatile HTTP client designed for making API requests with support for proxy configuration, debugging, and detailed error handling. It utilizes the [`got-scraping`](https://github.com/apify/got-scraping) library for HTTP operations.

                                  • v1.0.3
                                  • 15.96
                                  • Published

                                  bitbuffet

                                  TypeScript SDK for the Structured Scraper API - BitBuffet

                                  • v1.0.2
                                  • 15.86
                                  • Published

                                  node-wreq

                                  Browser fingerprint bypass library using Rust for TLS/HTTP2 impersonation

                                  • v0.2.0
                                  • 15.72
                                  • Published

                                  @crawlus/http

                                  HTTP crawler for basic web scraping without JavaScript execution

                                  • v0.9.0
                                  • 15.69
                                  • Published

                                  braid-video-downloader

                                  A powerful TypeScript library for downloading videos from web pages, including M3U8/HLS streams, with browser automation and intelligent stream detection

                                    • v1.0.2
                                    • 15.56
                                    • Published

                                    crunchyroll-toolkit

                                    Toolkit Node.js complet pour extraire données d'animés, métadonnées et thumbnails depuis Crunchyroll avec techniques anti-détection 2024/2025

                                    • v1.1.1
                                    • 15.51
                                    • Published

                                    web-scraper-pro

                                    Professional web scraper with Puppeteer & Mozilla Readability. Extract clean content from any website with full TypeScript support.

                                    • v1.1.1
                                    • 15.41
                                    • Published

                                    ayakashi

                                    The next generation web scraping framework

                                    • v1.0.0-beta8.4
                                    • 15.39
                                    • Published

                                    @crawlus/playwright

                                    Playwright-based crawler for full browser automation and JavaScript rendering

                                    • v0.6.0
                                    • 15.36
                                    • Published

                                    @lyuboslavlyubenov/se-scraper

                                    A module using puppeteer to scrape several search engines such as Google, Bing and Duckduckgo

                                    • v1.9.12
                                    • 15.36
                                    • Published

                                    @crawlus/core

                                    Core crawler framework functionality - TypeScript web crawling library

                                    • v0.9.0
                                    • 15.28
                                    • Published

                                    url-to-json-markdown

                                    A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.

                                    • v1.0.7
                                    • 15.25
                                    • Published

                                    llm-gen

                                    A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

                                    • v1.0.3
                                    • 15.19
                                    • Published

                                    markdown-crawler

                                    A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown

                                    • v1.0.18
                                    • 15.19
                                    • Published

                                    mirror-web-cli

                                    Professional website mirroring tool with intelligent framework preservation, AI-powered analysis, and comprehensive asset optimization

                                    • v1.1.3
                                    • 15.09
                                    • Published

                                    n8n-nodes-exa-websets

                                    n8n node for Exa Websets API - Create, manage, and query structured datasets from web sources

                                    • v1.0.2
                                    • 15.02
                                    • Published

                                    @juhomat/hexagonal-ai-framework

                                    AI-powered hexagonal framework with OpenAI integration, database adapters, web scraping, and Next.js demo application

                                    • v0.2.2
                                    • 15.02
                                    • Published

                                    @zentus/googlethis

                                    A simple yet powerful module to retrieve organic search results and much more from Google.

                                    • v1.7.1
                                    • 15.00
                                    • Published

                                    @mseep/firecrawl-mcp

                                    MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

                                    • v1.9.0
                                    • 14.99
                                    • Published

                                    anil-brd-typescript-sdk

                                    TypeScript SDK for Bright Data APIs - Web Unlocker, SERP, and Scraper APIs

                                    • v1.0.1
                                    • 14.92
                                    • Published

                                    supapup

                                    ⚡ Lightning-fast MCP browser dev tool. Navigate → Get instant structured data. No screenshots needed! Puppeteer: 📸 → CSS selectors → JS eval. Supapup: semantic IDs ready to use. 10x faster, 90% fewer tokens.

                                    • v0.1.31
                                    • 14.87
                                    • Published

                                    cheerio-mcp

                                    MCP server for web scraping with Cheerio

                                      • v1.2.2
                                      • 14.74
                                      • Published

                                      @crawlus/cli

                                      Command-line interface for creating and managing crawler projects

                                      • v0.6.0
                                      • 14.63
                                      • Published

                                      @subtitles/providers

                                      Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo

                                      • v0.3.0-beta.2
                                      • 14.38
                                      • Published

                                      web2os

                                      Scrap the web asynchronously in live, reusing Node.js, all in one file, with a few lines!

                                      • v1.1.0
                                      • 14.38
                                      • Published

                                      url-to-markdown-cli-tool

                                      CLI tool for converting web pages to clean, LLM-friendly markdown. Fetches content from URLs and converts HTML to optimized markdown format perfect for LLM training, RAG systems, and AI applications.

                                      • v1.1.0
                                      • 14.37
                                      • Published

                                      link-view

                                      A Node.js package to generate link previews from URLs

                                        • v1.0.3
                                        • 14.21
                                        • Published

                                        anna-archieve

                                        A powerful Node.js tool for searching and downloading books from Anna's Archive with Cloudflare bypass

                                        • v1.0.0
                                        • 14.20
                                        • Published

                                        @cifumo/scraper-node

                                        Sebuah Module Scraper yang dibuat oleh Sxyz dan SuzakuTeam untuk memudahkan penggunaan scraper di project ESM maupun CJS.

                                          • v1.1.0
                                          • 14.16
                                          • Published

                                          search-engine-scraper

                                          A module using puppeteer to scrape several search engines such as Google, Bing

                                          • v1.0.0
                                          • 14.10
                                          • Published

                                          n8n-nodes-my-browserless

                                          n8n node to interact with a Browserless instance for web scraping

                                            • v1.0.5
                                            • 14.09
                                            • Published

                                            xscrape

                                            A flexible and powerful library designed to extract and transform data from HTML documents using user-defined schemas

                                            • v3.0.4
                                            • 14.00
                                            • Published

                                            apify-sdk-legacy

                                            Package for Crawlee that should allows to import and use packages, that are using older version of Apify SDK.

                                            • v1.0.5
                                            • 13.99
                                            • Published

                                            alou-fetch-mcp

                                            MCP服务器用于抓取网页内容,支持HTML、Markdown、纯文本和JSON格式,特别优化了微信公众号文章和学术论文的抓取

                                            • v1.0.0
                                            • 13.84
                                            • Published

                                            stepwright

                                            A powerful web scraping library built with Playwright

                                            • v1.0.2
                                            • 13.69
                                            • Published

                                            tester-scraper

                                            Sebuah Module Scraper yang dibuat oleh Sxyz dan SuzakuTeam untuk memudahkan penggunaan scraper di project ESM maupun CJS.

                                              • v1.1.7
                                              • 13.68
                                              • Published

                                              selektra

                                              Easily generate unique and optimized CSS or XPath selectors for any DOM element.

                                              • v1.0.5
                                              • 13.44
                                              • Published

                                              stonkinator

                                              A low level stock data aggregation tool, a boring lib for others to build upon

                                              • v1.0.0
                                              • 13.29
                                              • Published

                                              nepali-news-scraper

                                              A powerful Node.js package to scrape news from popular Nepali news portals including Kathmandu Post and Kantipur

                                              • v1.0.1
                                              • 13.23
                                              • Published

                                              @iflow-mcp/logo-mcp

                                              一个智能Logo提取和处理的MCP服务器,支持从网站URL自动识别并提取Logo图标

                                              • v1.0.0
                                              • 13.04
                                              • Published

                                              @imaginerlabs/user-agent-generator

                                              High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

                                              • v1.0.2
                                              • 13.04
                                              • Published

                                              @darkbing/knowledge-retrieval

                                              A powerful web crawler and knowledge processing toolkit for extracting and managing web content

                                              • v1.0.2
                                              • 13.03
                                              • Published

                                              extreme-scrap

                                              Convert any webpage to markdown using headless Chrome

                                              • v1.0.10
                                              • 12.99
                                              • Published

                                              crawltojson

                                              Crawl websites and convert them to JSON with ease

                                                • v1.11.11
                                                • 12.91
                                                • Published

                                                notice-alert-cli

                                                A Node.js application that fetches examination result and other notices from IOE's and IOM's website and sends desktop notifications.

                                                • v2.0.0
                                                • 12.89
                                                • Published

                                                @md-anas-sabah/async-task-runner

                                                Powerful async task runner for Node.js with concurrency control, smart retries, timeouts & comprehensive reporting. Perfect for web scraping, API processing, file operations & bulk async operations.

                                                • v1.0.2
                                                • 12.80
                                                • Published

                                                html-content-processor

                                                A professional library for processing, cleaning, filtering, and converting HTML content to Markdown. Features advanced customization options, presets, plugin support, fluent API, and TypeScript integration for reliable content extraction.

                                                • v1.0.5
                                                • 12.64
                                                • Published

                                                @sxyzdev/scrapers

                                                Simple Module All In One Scrapers Untuk Memenuhi Kebutuhan Pengumpulan Data Kamu Dari 50+ Website!

                                                • v0.0.5
                                                • 12.62
                                                • Published

                                                @langgraph-js/crawler-mcp

                                                A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

                                                • v1.5.3
                                                • 12.34
                                                • Published

                                                scrape-them-all

                                                🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

                                                • v2.0.0
                                                • 12.20
                                                • Published

                                                proxy-auto-ts

                                                A comprehensive TypeScript library for automatic proxy management with validation, rotation, and intelligent selection

                                                • v1.1.2
                                                • 12.17
                                                • Published

                                                pathik

                                                High-performance web crawler implemented in Go with JavaScript bindings

                                                  • v0.3.11
                                                  • 12.17
                                                  • Published

                                                  @mseep/puremd-mcp

                                                  Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

                                                  • v1.0.3
                                                  • 12.12
                                                  • Published

                                                  @jambudipa/spider

                                                  A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting

                                                  • v0.2.1
                                                  • 12.08
                                                  • Published

                                                  hyperscraped-follower

                                                  Lightweight scraper written in TypeScript using ES6 generators.

                                                  • v1.0.6
                                                  • 12.00
                                                  • Published

                                                  qserp

                                                  Robust Node.js module for Google Custom Search with rate limiting, error handling, and offline testing capabilities. Supports parallel searches and comprehensive result formatting.

                                                  • v1.0.9
                                                  • 11.90
                                                  • Published