JSPM

Found 1180 results for crawler

npm-license-crawler

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

  • v0.2.1
  • 50.71
  • Published

isbot-fast

JavaScript module detecting bots/crawlers/spiders via user-agent

  • v1.2.0
  • 50.29
  • Published

notion-md-crawler

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

  • v1.0.2
  • 49.85
  • Published

apify

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

  • v3.4.4
  • 49.12
  • Published

web-auto-extractor

Automatically extracts structured information from webpages

  • v1.0.17
  • 47.51
  • Published

spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

  • v2.1.0
  • 46.55
  • Published

es6-crawler-detect

This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

  • v4.0.2
  • 45.25
  • Published

pdfdataextract

Extract data from a pdf with pure javascript

  • v4.0.0
  • 44.88
  • Published

sitemap-generator

Easily create XML sitemaps for your website.

  • v8.5.1
  • 44.81
  • Published

puppeteer-afp

Stop website fingerprinting techniques

  • v1.1.6
  • 43.38
  • Published

crawler

Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

  • v2.0.2
  • 43.14
  • Published

@nodebb/spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

  • v2.0.3
  • 42.66
  • Published

robots-txt-parser

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

  • v2.0.3
  • 42.07
  • Published

firecrawl

JavaScript SDK for Firecrawl API

  • v4.3.1
  • 41.68
  • Published

node-scrapy

Simple, lightweight and expressive web scraping with Node.js

  • v0.5.0
  • 41.21
  • Published

sqreen

Node.js agent for Sqreen, please see https://www.sqreen.io/

  • v2.0.2
  • 40.47
  • Published

crawler-request

HTTP request module customized for crawlers.

  • v1.2.2
  • 38.69
  • Published

beautiful-dom

Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in

  • v1.0.9
  • 36.88
  • Published

recrawl

[![npm](https://img.shields.io/npm/v/recrawl.svg)](https://www.npmjs.com/package/recrawl) [![ci](https://github.com/aleclarson/recrawl/actions/workflows/release.yml/badge.svg)](https://github.com/aleclarson/recrawl/actions/workflows/release.yml) [![codeco

  • v2.2.1
  • 36.73
  • Published

crawlab-sdk

Node.js SDK for Crawlab

  • v0.6.0-12
  • 36.65
  • Published

playwright-afp

Stop website fingerprinting techniques playwright edition

  • v0.0.3
  • 36.60
  • Published

hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

  • v1.0.25
  • 35.44
  • Published

@algolia/netlify-plugin-crawler

This plugin links your Netlify site with Algolia's Crawler. It will trigger a crawl on each successful build.

  • v1.0.15
  • 35.27
  • Published

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

  • v1.0.19
  • 35.23
  • Published

crawlbase

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

  • v1.0.2
  • 34.80
  • Published

websearch-mcp

A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API

  • v1.0.3
  • 34.21
  • Published

@brightsec/cli

Bright CLI is a CLI tool that can initialize, stop, poll and maintain scans in Bright solutions.

  • v13.7.0
  • 33.92
  • Published

better-fetch-mcp

Advanced MCP server for web scraping with nested URL fetching and intelligent markdown formatting

    • v1.0.0
    • 33.36
    • Published

    grunt-link-checker

    Finds broken links and resources on websites

    • v0.2.0
    • 33.28
    • Published

    simple-headless-chrome

    Headless Chrome abstraction to simplify the interaction with the browser. It may be used for crawling sites, test automation, etc

    • v4.3.10
    • 33.25
    • Published

    node-html-crawler

    Crawler (spider) of site web pages by domain name

    • v1.2.3
    • 33.12
    • Published

    chowdown

    A JavaScript library that allows for the quick transformation of DOM documents into useful formats.

    • v1.2.6
    • 32.96
    • Published

    puremd-mcp

    Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

    • v1.0.3
    • 32.82
    • Published

    web-structure

    A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness

    • v1.0.2
    • 32.74
    • Published

    linkd-mcp

    Linkd Model Context Protocol Server

    • v1.0.25
    • 32.74
    • Published

    @turingnova/robots

    Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications

      • v1.0.21
      • 32.32
      • Published

      fiftyone.devicedetection

      Parse HTTP headers to detect the device type, model, operating system, browser, and crawler information

      • v4.4.210
      • 31.79
      • Published

      cheerio-httpcli

      http client module with cheerio & iconv(-lite) & promise

      • v0.8.3
      • 31.71
      • Published

      webhead

      An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

      • v1.1.3
      • 31.45
      • Published

      web-parser-mcp

      🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!

      • v3.7.9
      • 30.99
      • Published

      web-page-analyzer-cli

      一个强大的网站链接抓取工具,支持深度抓取、认证和页面分析

      • v1.0.19
      • 30.33
      • Published

      usetube

      crawl youtube without api key (search videos channels or get all channel/playlist's videos)

      • v2.2.7
      • 30.25
      • Published

      @crawlee/impit-client

      impit-based HTTP client implementation for Crawlee. Impersonates browser requests to avoid bot detection.

      • v3.14.1
      • 30.10
      • Published

      taki

      Take a snapshot of any website.

      • v3.0.0
      • 29.53
      • Published

      @folder/readdir

      Recursively read a directory, blazing fast.

      • v3.1.0
      • 29.50
      • Published

      torrent-search-api

      Yet another node torrent scraper based on x-ray. (Support iptorrents, torrentleech, torrent9, Yyggtorrent, ThePiratebay, torrentz2, 1337x, KickassTorrent, Rarbg, TorrentProject, Yts, Limetorrents, Eztv)

      • v2.1.4
      • 28.85
      • Published

      fakebrowser

      🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

      • v0.0.66
      • 28.70
      • Published

      @fwdslsh/inform

      A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown

        • v0.1.3
        • 28.51
        • Published

        @just-every/crawl

        Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown

        • v1.0.8
        • 27.39
        • Published

        node-site-downloader

        An easy to use CLI for downloading websites for offline usage

        • v1.3.0
        • 27.36
        • Published

        gulp-license-crawler

        Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

        • v0.0.10
        • 27.36
        • Published

        @spider-rs/spider-rs

        The [spider](https://github.com/spider-rs/spider) project ported to Node.js

        • v0.0.157
        • 27.07
        • Published

        extract-email

        A simple email extractor for obfuscated emails.

        • v1.1.3
        • 27.00
        • Published

        hltv

        The unofficial HLTV Node.js API

        • v3.5.0
        • 26.68
        • Published

        node-webcrawler

        Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

        • v0.8.0
        • 26.60
        • Published

        googlebot-verify

        Verify that a request is from Google using Google's recommended DNS verification steps

        • v0.1.3
        • 26.49
        • Published

        node-spider

        Generic web crawler powered by Node.js

        • v1.4.1
        • 26.26
        • Published

        roboto

        A web crawler for Nodejs.

        • v0.8.2
        • 26.24
        • Published

        express-nobots

        Keep Bots Away From Your Express App

        • v1.0.5
        • 26.13
        • Published

        @kdinisv/sql-scanner

        Smart SQL injection scanner with crawler and optional Playwright capture.

        • v0.2.4
        • 25.91
        • Published

        novel-downloader

        novel downloader for node-novel style , include site ( dmzj / wenku8 / syosetu / ...etc )

        • v2.0.40
        • 25.75
        • Published

        @akukral/site-comparator

        A sophisticated website comparison tool with intelligent content analysis and offset-aware difference detection

        • v1.2.2
        • 25.74
        • Published

        nintendo-switch-eshop

        Unofficial API lib for Nintendo Switch eShop game listing and pricing information.

        • v8.0.1
        • 25.54
        • Published

        license-crawler

        crawls a npm package and it's dependencies for their licenses

        • v0.0.5
        • 25.50
        • Published

        js-crawler

        Web crawler for Node.js

        • v0.3.21
        • 25.30
        • Published

        @6digit/silktext

        Lightweight, runtime-safe crawling → clean Markdown

        • v0.1.5
        • 25.20
        • Published

        crawl-server

        Efficient SEO-focused server for Wasm-generated pages

        • v1.8.2
        • 25.12
        • Published

        funnelweb

        Detect search engine crawlers by their User-Agent strings.

        • v0.0.1
        • 24.99
        • Published

        x-crawl

        x-crawl is a flexible Node.js AI-assisted crawler library.

        • v10.1.0
        • 24.99
        • Published

        @letsscrapedata/controller

        Unified browser / HTML controller interfaces that support patchright, camoufox, playwright, puppeteer and cheerio

        • v0.0.68
        • 24.88
        • Published

        crawl-cli

        A Node crawler/scrape for retrieving data from websites

          • v0.2.0
          • 24.60
          • Published

          semantic-crawler

          Priority based Semantic Web Crawler.

          • v0.0.2
          • 24.49
          • Published

          osmosis

          Web scraper for NodeJS

          • v1.1.10
          • 24.24
          • Published

          linkedin-jobs-scraper

          Scrape public available jobs on Linkedin using headless browser

          • v18.0.1
          • 24.22
          • Published

          crawler-ninja

          A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta

          • v0.2.7
          • 23.69
          • Published

          mcp-smart-crawler

          A command-line tool acting as an MCP (ModelContextProtocol) server, using Playwright to crawl web content for AI models.

          • v1.0.10
          • 23.67
          • Published

          seo-checker

          A library for checking basic SEO signals of a website

          • v0.3.2
          • 23.63
          • Published

          @letsscrapedata/scraper

          Web scraper that scraping web pages by LetsScrapeData XML template

          • v0.0.87
          • 23.63
          • Published

          jopi-crawler

          A crawler, to download web-site

          • v1.0.4
          • 23.57
          • Published

          ghcrawler

          A robust GitHub API crawler that walks a queue of GitHub entities retrieving and storing their contents.

          • v0.2.23
          • 23.43
          • Published

          tse-client

          A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.

          • v2.27.6
          • 23.06
          • Published

          @adncorp/apify

          The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

          • v2.7.6
          • 22.44
          • Published

          @morioh/is-bot

          Detect user-agent is a bot/spider/crawler

          • v1.1.2
          • 22.21
          • Published

          geoforge-cli

          Generate AI-ready optimization files for websites, including robots.txt, sitemaps, and AI manifests

          • v0.1.2
          • 22.13
          • Published

          website-crawler-sdk

          Node.js SDK for interacting with WebsiteCrawler.org API

          • v1.0.4
          • 22.13
          • Published

          @jambudipa/spider

          A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting

          • v0.2.1
          • 22.03
          • Published

          udger-nodejs

          NodeJS User-Agent String Parser based on Udger SQLite databases https://udger.com/products/local_parser

          • v1.5.0
          • 21.70
          • Published

          puppeteer-cloak

          Secure your puppeteer for scraping

          • v1.0.6
          • 21.57
          • Published

          sauron-crawler

          Basic page crawler written in nodejs

          • v4.0.1
          • 21.47
          • Published

          huntsman

          Super configurable async web spider

          • v0.3.0
          • 21.35
          • Published

          scrapedin

          linkedin scraper for 2020 website

          • v1.0.21
          • 21.33
          • Published

          @warren-bank/node-request-cli

          An extremely lightweight HTTP request client for the command-line. Supports: http, https, proxy, redirects, cookies, content-encoding, multipart/form-data, multi-threading, recursive website crawling and mirroring.

          • v4.0.25
          • 21.27
          • Published

          nest-crawler

          An easiest crawling and scraping module for NestJS

          • v1.9.0
          • 21.03
          • Published

          nodespider

          Simple, flexible, delightful web crawler/spider package

          • v0.11.4
          • 20.66
          • Published

          crawler-links

          Node.js web crawler to get all internal links from a website.

          • v1.0.1
          • 20.63
          • Published

          reporter-cli

          Crawler queue creation tool for paging

          • v0.2.5
          • 20.62
          • Published

          backstop-crawl

          Crawl a site to generate a backstopjs config

          • v2.3.1
          • 20.51
          • Published

          @crawlbyte/crawlbyte-sdk-ts

          Official TypeScript SDK for Crawlbyte – create tasks, poll results, and integrate data scraping into your JavaScript/TypeScript applications.

          • v1.0.1
          • 20.42
          • Published

          @supacrawler/js

          Typed TypeScript/JavaScript SDK for Supacrawler API (scrape, jobs, screenshots, watch)

          • v0.1.2
          • 20.36
          • Published

          @supadata/mcp

          MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

          • v1.0.1
          • 20.29
          • Published

          nuxt3-bot-handler

          🛡️ Nuxt 3 middleware to block suspicious bots, protect SEO crawlers with reverse DNS checks, and enforce User-Agent rules.

          • v1.0.7-beta
          • 19.96
          • Published

          snapcrawl-vercel-ssr

          Vercel integration for SnapCrawl. Serve pre-rendered HTML to crawlers in Next.js middleware or Edge Functions for static SPAs and Express apps.

            • v1.3.7
            • 19.84
            • Published

            @dotsur/link-harvest

            Deterministic link harvesting for QA and website migration testing

            • v1.0.1
            • 19.68
            • Published

            flysh

            DOM Document Object Artifact Collector

            • v1.2.0
            • 19.43
            • Published

            images-downloader

            A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)

            • v1.0.3
            • 19.42
            • Published

            anydownload

            A powerful website downloader with GUI support

            • v1.2.0
            • 19.33
            • Published

            supercrawler

            A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

            • v2.0.0
            • 19.28
            • Published

            spankbang

            spankbang.com api implementation

            • v0.0.9
            • 19.06
            • Published

            site-audit-seo

            Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv.

            • v6.0.1
            • 19.05
            • Published

            advanced-seo-checker

            A library for checking basic SEO signals of a website

            • v3.2.0
            • 19.02
            • Published

            crawlee-one

            CrawleeOne is a framework built on top of Crawlee and Apify for writing robust and highly configurable web scrapers

            • v2.0.4
            • 18.97
            • Published

            get-all-links

            A node crawler that return all links/href from website

            • v1.0.2
            • 18.68
            • Published

            @devjoyvn/fakebrowser

            🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

            • v0.0.67
            • 18.46
            • Published

            goose-parser

            Multi environment web page parser

            • v0.6.1
            • 18.35
            • Published

            express-bot

            Crawler(robots) decision middleware for Express

            • v1.0.7
            • 18.28
            • Published

            @upstash/search-crawler

            A CLI tool to crawl documentation sites and create a search index for Upstash Search.

            • v0.2.0
            • 18.16
            • Published

            site2pdf-cli

            Generate comprehensive PDFs of entire websites, ideal for RAG.

            • v0.1.10
            • 18.12
            • Published

            @hardbulls/wbsc-crawler

            Tool to crawl events, leagues and statistics from WBSC based websites.

            • v0.6.1
            • 18.01
            • Published

            bas

            Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.

            • v0.1.1
            • 18.01
            • Published

            syphonx

            SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

            • v1.2.66
            • 17.89
            • Published

            email-extractor

            extract emails address from website by following links

            • v0.2.9
            • 17.75
            • Published

            salticidae

            A utility library to make downloading & extracting specific content from a URL easy

            • v0.10.0
            • 17.70
            • Published

            @iflow-mcp/firecrawl-mcp

            MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

            • v1.12.0
            • 17.59
            • Published

            puppeteer-prerender

            Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)

            • v0.14.0
            • 17.59
            • Published

            aliexpress-product-scraper

            Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,

              • v2.0.2
              • 17.54
              • Published

              @justcooldev/slwcrawl

              Crawl and download Snap Lenses from *lens.snapchat.com* with ease.

              • v1.2.4
              • 17.38
              • Published

              directory-crawler

              The directory crawler library for Node.JS

              • v0.0.6
              • 17.20
              • Published

              vue-seo-helper

              A Vue3 plugin to improve SEO and crawler accessibility

                • v1.0.0
                • 16.85
                • Published

                silktext

                Lightweight, runtime-safe crawling → clean Markdown

                • v0.1.0
                • 16.71
                • Published

                gin-downloader

                Simple manga scrapper for famous online manga websites.

                • v2.0.0-beta.6
                • 16.57
                • Published

                flixhq-core

                Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.

                • v1.1.1
                • 16.50
                • Published

                page-scraper

                Web page scraper with a jQuery-like syntax for Node.

                • v2.0.5
                • 16.49
                • Published

                schabbi-webscraper

                Lightweight and easy to use crawling solution for websites.

                • v1.2.2
                • 16.27
                • Published

                axe-crawler

                A highly configurable website crawler for automatically testing a website for accessibility issues using the axe-core library. Uses selenium and headless Chrome to load pages, inject axe-core, and run tests. Generates an html summary report in addition

                • v0.5.5
                • 16.26
                • Published

                @acq/environ

                Environment variable collector

                • v0.4.0
                • 16.26
                • Published

                kaiser-crawler

                Node.js module for crawling the web

                • v1.0.5
                • 16.19
                • Published

                headless-crawler

                A crawler implemented using a headless browser (Chrome).

                • v1.4.0
                • 15.95
                • Published

                @acq/acq

                A util tool

                • v0.4.0
                • 15.93
                • Published

                @acwink/movies-search-mcp

                Smart MCP tool to find and validate movie/tv-show resources with multiple sources support

                • v1.0.18
                • 15.86
                • Published

                gpapi

                use google play protobuf api in node

                • v4.5.0
                • 15.78
                • Published

                @duyquangnvx/story-spider

                A TypeScript library for scraping stories from various Vietnamese websites

                • v2.0.2
                • 15.60
                • Published

                @langgraph-js/crawler

                A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

                • v1.7.0
                • 15.52
                • Published

                hydris

                Generic node service to handle SSR for SPA made with any kind of frontend framework

                • v1.3.0
                • 15.51
                • Published

                @crawlus/core

                Core crawler framework functionality - TypeScript web crawling library

                • v0.9.0
                • 15.43
                • Published

                @crawlus/utils

                Utility functions for web crawling - sitemap processing, link extraction, system info

                • v0.9.0
                • 15.43
                • Published

                xvideosx

                xvideos.com api implementation.

                • v1.6.4
                • 15.42
                • Published

                eztv-crawler

                A promised based node module to scrape TV shows, episodes and torrent info from EZTV.

                • v1.3.6
                • 15.42
                • Published

                crawlyx

                Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

                • v2.2.5
                • 15.32
                • Published

                fakebrowser-dev

                🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

                • v0.0.69-dev
                • 15.26
                • Published

                @crawlus/api

                API crawler for REST and GraphQL endpoint crawling with auto-detection

                • v0.9.0
                • 15.23
                • Published

                cardinalis

                A socks and http proxy by nodejs for you to over GWF

                  • v3.2.4
                  • 15.15
                  • Published

                  @crawlus/http

                  HTTP crawler for basic web scraping without JavaScript execution

                  • v0.9.0
                  • 14.82
                  • Published

                  scrapefrom

                  Scrape data from any webpage.

                  • v2.6.7
                  • 14.77
                  • Published

                  flexible

                  Easily build flexible, scalable, and distributed, web crawlers.

                  • v0.1.20
                  • 14.77
                  • Published

                  crawlercore

                  crawler with nodejs

                  • v1.5.51
                  • 14.71
                  • Published

                  crawlkit

                  A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.

                  • v2.0.2
                  • 14.64
                  • Published

                  easydomscrapper

                  An extremely simple module to web scrapper a DOM element(s)

                  • v0.1.0
                  • 14.61
                  • Published

                  @gonetone/google-play-api

                  Access Google Play by logging in and making requests as an Android device!

                  • v1.3.1
                  • 14.61
                  • Published

                  floodesh

                  Floodesh is a distributed web spider/crawler written with Nodejs.

                  • v0.8.19
                  • 14.55
                  • Published

                  absolut-crawler

                  Funções para os rastreadores ABSOLUT Mobile

                    • v1.0.20
                    • 14.55
                    • Published

                    @vjlanguage/mcp-vj-docs

                    MCP server for documentation crawling, indexing, and retrieval

                    • v0.1.72
                    • 14.55
                    • Published

                    googlebot

                    Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page

                    • v0.1.41
                    • 14.49
                    • Published

                    bauer-crawler

                    Multi-thread crawler engine.

                    • v0.2.9
                    • 14.24
                    • Published

                    aliexpress-product-scraper-ts

                    Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,

                      • v2.0.25
                      • 14.14
                      • Published

                      @botmation/twitter

                      Auxiliary package of functions for the TypeScript framework Botmation

                      • v1.0.2
                      • 14.13
                      • Published

                      browser-bot-detector

                      A TypeScript library for detecting and categorizing bots from user agent strings

                        • v1.0.0
                        • 13.94
                        • Published

                        csdn-crawler

                        一个专门用于爬取csdn文章的爬虫/A JS library for Crawl CSDN Article.

                        • v1.0.8
                        • 13.81
                        • Published

                        @citoyasha/yt-search

                        Youtube Crawler with no API that returns 3 first videos.

                        • v1.0.1
                        • 13.76
                        • Published

                        robotto

                        A robots.txt reader, parser and matcher.

                        • v1.0.16
                        • 13.76
                        • Published

                        @crawlus/playwright

                        Playwright-based crawler for full browser automation and JavaScript rendering

                        • v0.6.0
                        • 13.69
                        • Published

                        @crawlus/puppeteer

                        Puppeteer-based crawler for Chrome automation and dynamic content scraping

                        • v0.6.0
                        • 13.66
                        • Published

                        @crawlbase/mcp

                        MCP server for Crawlbase API - enables web scraping through Model Context Protocol

                        • v1.0.3
                        • 13.65
                        • Published

                        @opd/crawler

                        web crawler based on Puppeteer

                        • v1.7.0
                        • 13.63
                        • Published

                        godless

                        crawler

                        • v0.1.69
                        • 13.63
                        • Published

                        dbcrawler

                        crawls mysql database and creates insert queries or returns data from multiple table depending on the relationship information of the tables provided

                        • v0.0.42
                        • 13.63
                        • Published

                        btcnews

                        A bitcoin news crawler

                        • v17.1.21
                        • 13.63
                        • Published

                        yggtorrent

                        Web crawler to use as API

                        • v2.0.3
                        • 13.63
                        • Published

                        wenku8

                        轻小说文库下载器

                        • v4.0.0
                        • 13.56
                        • Published

                        scraply

                        A simple, configurable and functional content scraper

                          • v1.0.25
                          • 13.49
                          • Published

                          fits-api

                          Fast Implemented TikTok Scraping API

                          • v0.2.0
                          • 13.49
                          • Published

                          turbocrawl

                          The simple and fast crawling framework. So you can focus on scraping.

                          • v0.4.1
                          • 13.40
                          • Published

                          xscrape

                          A flexible and powerful library designed to extract and transform data from HTML documents using user-defined schemas

                          • v3.0.4
                          • 13.35
                          • Published

                          webscraping-ai-mcp

                          Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

                          • v1.0.2
                          • 13.30
                          • Published

                          @crawlus/cli

                          Command-line interface for creating and managing crawler projects

                          • v0.6.0
                          • 13.28
                          • Published

                          web-link-collector

                          A library and CLI tool to recursively collect links from a given initial URL and output them as structured data

                          • v1.0.10
                          • 13.27
                          • Published

                          mcp-jobs

                          A job search and crawling tool built with Model Context Protocol

                            • v1.4.0
                            • 13.24
                            • Published

                            graceful-playwright

                            Gracefully handle timeout and network error with auto retry.

                            • v1.5.1
                            • 13.20
                            • Published

                            icrawler

                            Tool for easy scraping data from websites

                            • v2.6.5
                            • 13.15
                            • Published

                            speedwalk

                            Walk an entire directory. Fast, simple, and asynchronous.

                            • v0.1.0
                            • 13.05
                            • Published

                            ppspider

                            web spider, support puppeteer, cheerio and so on, include task-queue and dispatcher

                            • v2.2.4-preview.1607350101966
                            • 12.98
                            • Published