Found 1180 results for crawler

@nodelib/fs.walk

A library for efficiently walking a directory recursively

fdir

The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s

puppeteer-extra-plugin-stealth

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.

[![npm](https://img.shields.io/npm/v/recrawl-sync.svg)](https://www.npmjs.com/package/recrawl-sync) [![ci](https://github.com/aleclarson/recrawl/actions/workflows/release.yml/badge.svg)](https://github.com/aleclarson/recrawl/actions/workflows/release.yml)

json-crawl

Async and sync crawler for json object

apify-client

Apify API client for JavaScript

@crawlee/core

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

simplecrawler

Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.

@crawlee/browser

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/playwright

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/puppeteer

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/jsdom

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/templates

Templates for the crawlee projects

@crawlee/http

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/cheerio

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

crawlee

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/cli

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@crawlee/linkedom

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.

npm-license-crawler

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

isbot-fast

JavaScript module detecting bots/crawlers/spiders via user-agent

notion-md-crawler

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

apify

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

web-auto-extractor

Automatically extracts structured information from webpages

spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

es6-crawler-detect

This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

sitemap-generator

Easily create XML sitemaps for your website.

pdfdataextract

Extract data from a pdf with pure javascript

puppeteer-afp

Stop website fingerprinting techniques

crawler

Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

@nodebb/spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

robots-txt-parser

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

firecrawl

JavaScript SDK for Firecrawl API

@the-convocation/twitter-scraper

A port of n0madic/twitter-scraper to Node.js.

node-scrapy

Simple, lightweight and expressive web scraping with Node.js

x-ray-crawler

x-ray's crawler

sqreen

Node.js agent for Sqreen, please see https://www.sqreen.io/

nodejs-web-scraper

A web scraper for NodeJs

limit-request-promise

http request for web scraping

crawler-request

HTTP request module customized for crawlers.

playwright-afp

Stop website fingerprinting techniques playwright edition

sitemap-generator-cli

Create xml sitemaps from the command line.

beautiful-dom

Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in

recrawl

[![npm](https://img.shields.io/npm/v/recrawl.svg)](https://www.npmjs.com/package/recrawl) [![ci](https://github.com/aleclarson/recrawl/actions/workflows/release.yml/badge.svg)](https://github.com/aleclarson/recrawl/actions/workflows/release.yml) [![codeco

crawlab-sdk

Node.js SDK for Crawlab

hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

@blocklet/crawler

blocklet crawler lib

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

@algolia/netlify-plugin-crawler

This plugin links your Netlify site with Algolia's Crawler. It will trigger a crawl on each successful build.

crawlbase

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

websearch-mcp

A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API

@brightsec/cli

Bright CLI is a CLI tool that can initialize, stop, poll and maintain scans in Bright solutions.

fiftyone.devicedetection.shared

Shared functionality for implementing device detection engine for the 51Degrees Pipeline API

fiftyone.devicedetection.onpremise

Device detection on-premise services for the 51Degrees Pipeline API

better-fetch-mcp

Advanced MCP server for web scraping with nested URL fetching and intelligent markdown formatting

grunt-link-checker

Finds broken links and resources on websites

simple-headless-chrome

Headless Chrome abstraction to simplify the interaction with the browser. It may be used for crawling sites, test automation, etc

node-html-crawler

Crawler (spider) of site web pages by domain name

chowdown

A JavaScript library that allows for the quick transformation of DOM documents into useful formats.

puremd-mcp

Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

web-structure

A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness

linkd-mcp

Linkd Model Context Protocol Server

fiftyone.devicedetection.cloud

Device detection cloud services for the 51Degrees Pipeline API

@turingnova/robots

Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications

@marbec/web-auto-extractor

Automatically extracts structured information from webpages

fiftyone.devicedetection

Parse HTTP headers to detect the device type, model, operating system, browser, and crawler information

cheerio-httpcli

http client module with cheerio & iconv(-lite) & promise

webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

web-parser-mcp

🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!

web-page-analyzer-cli

一个强大的网站链接抓取工具，支持深度抓取、认证和页面分析

usetube

crawl youtube without api key (search videos channels or get all channel/playlist's videos)

@crawlee/impit-client

impit-based HTTP client implementation for Crawlee. Impersonates browser requests to avoid bot detection.

@rane/web-auto-extractor

Automatically extracts structured information from webpages

@openactive/harvesting-utils

Utils library for harvesting RPDE feeds

taki

Take a snapshot of any website.

torrent-search-api

Yet another node torrent scraper based on x-ray. (Support iptorrents, torrentleech, torrent9, Yyggtorrent, ThePiratebay, torrentz2, 1337x, KickassTorrent, Rarbg, TorrentProject, Yts, Limetorrents, Eztv)

@folder/readdir

Recursively read a directory, blazing fast.

fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

@fwdslsh/inform

A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown

headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

webcrawlerapi-js

JS client for WebcrawlerAPI

better-sitemap-crawler

To install:

@just-every/crawl

Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown

google-news-scraper

Lightweight async scraper for Google News

node-site-downloader

An easy to use CLI for downloading websites for offline usage

gulp-license-crawler

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

n8n-nodes-noticrawlee

a Test Node

@spider-rs/spider-rs

The [spider](https://github.com/spider-rs/spider) project ported to Node.js

extract-email

A simple email extractor for obfuscated emails.

hltv

The unofficial HLTV Node.js API

node-webcrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

googlebot-verify

Verify that a request is from Google using Google's recommended DNS verification steps

@mendable/firecrawl

JavaScript SDK for Firecrawl API

roboto

A web crawler for Nodejs.

node-spider

Generic web crawler powered by Node.js

@qualweb/crawler

Webpage crawler for qualweb

express-nobots

Keep Bots Away From Your Express App

@crawlus/core

Core crawler framework functionality - TypeScript web crawling library

@attestate/crawler-call-block-logs

An attestate crawler strategy to download and transform Ethereum block event logs

@kdinisv/sql-scanner

Smart SQL injection scanner with crawler and optional Playwright capture.

@crawlus/utils

Utility functions for web crawling - sitemap processing, link extraction, system info

novel-downloader

novel downloader for node-novel style , include site ( dmzj / wenku8 / syosetu / ...etc )

@gurumnyang/dcinside.js

dcinside 갤러리 크롤링을 위한 Node.js 라이브러리

nintendo-switch-eshop

Unofficial API lib for Nintendo Switch eShop game listing and pricing information.

license-crawler

crawls a npm package and it's dependencies for their licenses

x-crawl

x-crawl is a flexible Node.js AI-assisted crawler library.

js-crawler

Web crawler for Node.js

node-email-extractor

Extract emails from text and also from a site page

@6digit/silktext

Lightweight, runtime-safe crawling → clean Markdown

crawl-server

Efficient SEO-focused server for Wasm-generated pages

funnelweb

Detect search engine crawlers by their User-Agent strings.

@waynechang65/ptt-crawler

A web crawler module designed to scarp data from Ptt.

@letsscrapedata/controller

Unified browser / HTML controller interfaces that support patchright, camoufox, playwright, puppeteer and cheerio

@ptrumpis/snap-lens-web-crawler

Crawl and download Snap Lenses from *lens.snapchat.com* with ease.

@rekttdoteth/agent-twitter-client

A twitter client for agents with notifications support

crawler-url-parser

An `URL` parser for crawling purpose.

crawl-cli

A Node crawler/scrape for retrieving data from websites

@crawlus/api

API crawler for REST and GraphQL endpoint crawling with auto-detection

semantic-crawler

Priority based Semantic Web Crawler.

crawler-toolbox

crawler toolbox

linkedin-jobs-scraper

Scrape public available jobs on Linkedin using headless browser

osmosis

Web scraper for NodeJS

seo-bot-detect

Detect SEO Bot Crawler

seo-checker

A library for checking basic SEO signals of a website

crawler-ninja

A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta

mcp-smart-crawler

A command-line tool acting as an MCP (ModelContextProtocol) server, using Playwright to crawl web content for AI models.

@letsscrapedata/scraper

Web scraper that scraping web pages by LetsScrapeData XML template

jopi-crawler

A crawler, to download web-site

ghcrawler

A robust GitHub API crawler that walks a queue of GitHub entities retrieving and storing their contents.

web-link-collector

A library and CLI tool to recursively collect links from a given initial URL and output them as structured data

tse-client

A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.

iobroker.netatmo-crawler

Crawls information from public netatmo stations

@web-master/node-web-crawler

Crawl web as easy as possible

@deepcrawl/custom-metric-types

TypeScript types for defining custom metrics

@web-master/node-web-fetch

Fetch web data as easy as possible

@a11ywatch/crawler

gRPC tokio based web crawler

@crawlus/http

HTTP crawler for basic web scraping without JavaScript execution

@adncorp/apify

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@beermonster/hltv

The unofficial HLTV Node.js API

@algolia/algoliasearch-netlify-frontend

`algoliasearch-netlify-frontend` is the front-end bundle we recommend to use with our Netlify plugin.

geoforge-cli

Generate AI-ready optimization files for websites, including robots.txt, sitemaps, and AI manifests

@jambudipa/spider

A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting

@deepcrawl/oreo

deepcrawl cli

@morioh/is-bot

Detect user-agent is a bot/spider/crawler

website-crawler-sdk

Node.js SDK for interacting with WebsiteCrawler.org API

udger-nodejs

NodeJS User-Agent String Parser based on Udger SQLite databases https://udger.com/products/local_parser

puppeteer-cloak

Secure your puppeteer for scraping

sauron-crawler

Basic page crawler written in nodejs

scrapedin

linkedin scraper for 2020 website

@warren-bank/node-request-cli

An extremely lightweight HTTP request client for the command-line. Supports: http, https, proxy, redirects, cookies, content-encoding, multipart/form-data, multi-threading, recursive website crawling and mirroring.

nest-crawler

An easiest crawling and scraping module for NestJS

@popstas/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

nodespider

Simple, flexible, delightful web crawler/spider package

reporter-cli

Crawler queue creation tool for paging

crawler-links

Node.js web crawler to get all internal links from a website.

backstop-crawl

Crawl a site to generate a backstopjs config

html-article-extractor

A web page content extractor for News websites

huntsman

Super configurable async web spider

@crawlbyte/crawlbyte-sdk-ts

Official TypeScript SDK for Crawlbyte – create tasks, poll results, and integrate data scraping into your JavaScript/TypeScript applications.

@supadata/mcp

MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

@supacrawler/js

Typed TypeScript/JavaScript SDK for Supacrawler API (scrape, jobs, screenshots, watch)

nuxt3-bot-handler

🛡️ Nuxt 3 middleware to block suspicious bots, protect SEO crawlers with reverse DNS checks, and enforce User-Agent rules.

snapcrawl-vercel-ssr

Vercel integration for SnapCrawl. Serve pre-rendered HTML to crawlers in Next.js middleware or Edge Functions for static SPAs and Express apps.

@letsscrapedata/proxy

proxy manager used to scrape data

@dotsur/link-harvest

Deterministic link harvesting for QA and website migration testing

images-downloader

A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)

flysh

DOM Document Object Artifact Collector

anydownload

A powerful website downloader with GUI support

spankbang

spankbang.com api implementation

site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv.

advanced-seo-checker

A library for checking basic SEO signals of a website

crawlee-one

CrawleeOne is a framework built on top of Crawlee and Apify for writing robust and highly configurable web scrapers

get-all-links

A node crawler that return all links/href from website

@devjoyvn/fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

goose-parser

Multi environment web page parser

express-bot

Crawler(robots) decision middleware for Express

@upstash/search-crawler

A CLI tool to crawl documentation sites and create a search index for Upstash Search.

@hardbulls/wbsc-crawler

Tool to crawl events, leagues and statistics from WBSC based websites.

bas

Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.

site2pdf-cli

Generate comprehensive PDFs of entire websites, ideal for RAG.

puppeteer-extra-plugin-notbody-stealth

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.

syphonx

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

email-extractor

extract emails address from website by following links

salticidae

A utility library to make downloading & extracting specific content from a URL easy

@konker.dev/tiny-treecrawler-fp

A library for crawling a filesystem tree, based on Effect-ts

puppeteer-prerender

Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)

@iflow-mcp/firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

aliexpress-product-scraper

Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,

@justcooldev/slwcrawl

Crawl and download Snap Lenses from *lens.snapchat.com* with ease.

crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

@knowcode/screenshotfetch

Web application spider with screenshot capture and customer journey documentation. Automate user flow documentation with authentication support.

directory-crawler

The directory crawler library for Node.JS

advanced-sitemap-generator

Easily create XML sitemaps for your website.

@acwink/movies-search-mcp

Smart MCP tool to find and validate movie/tv-show resources with multiple sources support

website-scraper-cli

Web Download CLI

@elchika-inc/open-crawler-mcp-server

Web crawler MCP server for extracting text content from web pages

vue-seo-helper

A Vue3 plugin to improve SEO and crawler accessibility

puppeteer-extra-plugin-stealth-lp

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.

silktext

Lightweight, runtime-safe crawling → clean Markdown

gin-downloader

Simple manga scrapper for famous online manga websites.

flixhq-core

Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.

page-scraper

Web page scraper with a jQuery-like syntax for Node.

undetectable

axe-crawler

A highly configurable website crawler for automatically testing a website for accessibility issues using the axe-core library. Uses selenium and headless Chrome to load pages, inject axe-core, and run tests. Generates an html summary report in addition