Found 1180 results for crawler

npm-license-crawler

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

isbot-fast

JavaScript module detecting bots/crawlers/spiders via user-agent

notion-md-crawler

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

apify

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

web-auto-extractor

Automatically extracts structured information from webpages

spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

es6-crawler-detect

This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.

pdfdataextract

Extract data from a pdf with pure javascript

sitemap-generator

Easily create XML sitemaps for your website.

crawler

Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.

@nodebb/spider-detector

A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS

robots-txt-parser

A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.

@the-convocation/twitter-scraper

A port of n0madic/twitter-scraper to Node.js.

node-scrapy

Simple, lightweight and expressive web scraping with Node.js

sqreen

Node.js agent for Sqreen, please see https://www.sqreen.io/

crawler-request

HTTP request module customized for crawlers.

sitemap-generator-cli

Create xml sitemaps from the command line.

Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in

recrawl

[![npm](https://img.shields.io/npm/v/recrawl.svg)](https://www.npmjs.com/package/recrawl) [![ci](https://github.com/aleclarson/recrawl/actions/workflows/release.yml/badge.svg)](https://github.com/aleclarson/recrawl/actions/workflows/release.yml) [![codeco

crawlab-sdk

Node.js SDK for Crawlab

playwright-afp

Stop website fingerprinting techniques playwright edition

@blocklet/crawler

blocklet crawler lib

hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

@algolia/netlify-plugin-crawler

This plugin links your Netlify site with Algolia's Crawler. It will trigger a crawl on each successful build.

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

crawlbase

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

websearch-mcp

A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API

@brightsec/cli

Bright CLI is a CLI tool that can initialize, stop, poll and maintain scans in Bright solutions.

fiftyone.devicedetection.shared

Shared functionality for implementing device detection engine for the 51Degrees Pipeline API

fiftyone.devicedetection.onpremise

Device detection on-premise services for the 51Degrees Pipeline API

better-fetch-mcp

Advanced MCP server for web scraping with nested URL fetching and intelligent markdown formatting

grunt-link-checker

Finds broken links and resources on websites

simple-headless-chrome

Headless Chrome abstraction to simplify the interaction with the browser. It may be used for crawling sites, test automation, etc

node-html-crawler

Crawler (spider) of site web pages by domain name

chowdown

A JavaScript library that allows for the quick transformation of DOM documents into useful formats.

puremd-mcp

Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs

web-structure

A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness

linkd-mcp

Linkd Model Context Protocol Server

fiftyone.devicedetection.cloud

Device detection cloud services for the 51Degrees Pipeline API

@turingnova/robots

Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications

fiftyone.devicedetection

Parse HTTP headers to detect the device type, model, operating system, browser, and crawler information

@marbec/web-auto-extractor

Automatically extracts structured information from webpages

cheerio-httpcli

http client module with cheerio & iconv(-lite) & promise

webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

web-parser-mcp

🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!

web-page-analyzer-cli

一个强大的网站链接抓取工具，支持深度抓取、认证和页面分析

usetube

crawl youtube without api key (search videos channels or get all channel/playlist's videos)

webcrawlerapi-js

JS client for WebcrawlerAPI

@rane/web-auto-extractor

Automatically extracts structured information from webpages

@crawlee/impit-client

impit-based HTTP client implementation for Crawlee. Impersonates browser requests to avoid bot detection.

@openactive/harvesting-utils

Utils library for harvesting RPDE feeds

taki

Take a snapshot of any website.

@folder/readdir

Recursively read a directory, blazing fast.

torrent-search-api

Yet another node torrent scraper based on x-ray. (Support iptorrents, torrentleech, torrent9, Yyggtorrent, ThePiratebay, torrentz2, 1337x, KickassTorrent, Rarbg, TorrentProject, Yts, Limetorrents, Eztv)

fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

@fwdslsh/inform

A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown

headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

better-sitemap-crawler

To install:

google-news-scraper

Lightweight async scraper for Google News

@just-every/crawl

Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown

node-site-downloader

An easy to use CLI for downloading websites for offline usage

gulp-license-crawler

Analyzes license information for multiple node.js modules (package.json files) as part of your software project.

@mendable/firecrawl

JavaScript SDK for Firecrawl API

n8n-nodes-noticrawlee

a Test Node

@spider-rs/spider-rs

The [spider](https://github.com/spider-rs/spider) project ported to Node.js

extract-email

A simple email extractor for obfuscated emails.

hltv

The unofficial HLTV Node.js API

node-webcrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

googlebot-verify

Verify that a request is from Google using Google's recommended DNS verification steps

@attestate/crawler-call-block-logs

An attestate crawler strategy to download and transform Ethereum block event logs

node-spider

Generic web crawler powered by Node.js

roboto

A web crawler for Nodejs.

@qualweb/crawler

Webpage crawler for qualweb

express-nobots

Keep Bots Away From Your Express App

@kdinisv/sql-scanner

Smart SQL injection scanner with crawler and optional Playwright capture.

novel-downloader

novel downloader for node-novel style , include site ( dmzj / wenku8 / syosetu / ...etc )

@akukral/site-comparator

A sophisticated website comparison tool with intelligent content analysis and offset-aware difference detection

@gurumnyang/dcinside.js

dcinside 갤러리 크롤링을 위한 Node.js 라이브러리

nintendo-switch-eshop

Unofficial API lib for Nintendo Switch eShop game listing and pricing information.

license-crawler

crawls a npm package and it's dependencies for their licenses

node-email-extractor

Extract emails from text and also from a site page

js-crawler

Web crawler for Node.js

@6digit/silktext

Lightweight, runtime-safe crawling → clean Markdown

crawl-server

Efficient SEO-focused server for Wasm-generated pages

funnelweb

Detect search engine crawlers by their User-Agent strings.

@ptrumpis/snap-lens-web-crawler

Crawl and download Snap Lenses from *lens.snapchat.com* with ease.

x-crawl

x-crawl is a flexible Node.js AI-assisted crawler library.

@letsscrapedata/controller

Unified browser / HTML controller interfaces that support patchright, camoufox, playwright, puppeteer and cheerio

@rekttdoteth/agent-twitter-client

A twitter client for agents with notifications support

crawler-url-parser

An `URL` parser for crawling purpose.

crawl-cli

A Node crawler/scrape for retrieving data from websites

semantic-crawler

Priority based Semantic Web Crawler.

crawler-toolbox

crawler toolbox

seo-bot-detect

Detect SEO Bot Crawler

osmosis

Web scraper for NodeJS

linkedin-jobs-scraper

Scrape public available jobs on Linkedin using headless browser

crawler-ninja

A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta

mcp-smart-crawler

A command-line tool acting as an MCP (ModelContextProtocol) server, using Playwright to crawl web content for AI models.

seo-checker

A library for checking basic SEO signals of a website

@letsscrapedata/scraper

Web scraper that scraping web pages by LetsScrapeData XML template

jopi-crawler

A crawler, to download web-site

ghcrawler

A robust GitHub API crawler that walks a queue of GitHub entities retrieving and storing their contents.

tse-client

A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.

iobroker.netatmo-crawler

Crawls information from public netatmo stations

@web-master/node-web-fetch

Fetch web data as easy as possible

@web-master/node-web-crawler

Crawl web as easy as possible

@a11ywatch/crawler

gRPC tokio based web crawler

@adncorp/apify

The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.

@waynechang65/ptt-crawler

A web crawler module designed to scarp data from Ptt.

@beermonster/hltv

The unofficial HLTV Node.js API

@deepcrawl/custom-metric-types

TypeScript types for defining custom metrics

@algolia/algoliasearch-netlify-frontend

`algoliasearch-netlify-frontend` is the front-end bundle we recommend to use with our Netlify plugin.

@morioh/is-bot

Detect user-agent is a bot/spider/crawler

geoforge-cli

Generate AI-ready optimization files for websites, including robots.txt, sitemaps, and AI manifests

website-crawler-sdk

Node.js SDK for interacting with WebsiteCrawler.org API

@jambudipa/spider

A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting

udger-nodejs

NodeJS User-Agent String Parser based on Udger SQLite databases https://udger.com/products/local_parser

puppeteer-cloak

Secure your puppeteer for scraping

@deepcrawl/oreo

deepcrawl cli

sauron-crawler

Basic page crawler written in nodejs

huntsman

Super configurable async web spider

scrapedin

linkedin scraper for 2020 website

@warren-bank/node-request-cli

An extremely lightweight HTTP request client for the command-line. Supports: http, https, proxy, redirects, cookies, content-encoding, multipart/form-data, multi-threading, recursive website crawling and mirroring.

nest-crawler

An easiest crawling and scraping module for NestJS

@popstas/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

nodespider

Simple, flexible, delightful web crawler/spider package

crawler-links

Node.js web crawler to get all internal links from a website.

reporter-cli

Crawler queue creation tool for paging

backstop-crawl

Crawl a site to generate a backstopjs config

html-article-extractor

A web page content extractor for News websites

@crawlbyte/crawlbyte-sdk-ts

Official TypeScript SDK for Crawlbyte – create tasks, poll results, and integrate data scraping into your JavaScript/TypeScript applications.

@supacrawler/js

Typed TypeScript/JavaScript SDK for Supacrawler API (scrape, jobs, screenshots, watch)

@supadata/mcp

MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

nuxt3-bot-handler

🛡️ Nuxt 3 middleware to block suspicious bots, protect SEO crawlers with reverse DNS checks, and enforce User-Agent rules.

snapcrawl-vercel-ssr

Vercel integration for SnapCrawl. Serve pre-rendered HTML to crawlers in Next.js middleware or Edge Functions for static SPAs and Express apps.

@letsscrapedata/proxy

proxy manager used to scrape data

@dotsur/link-harvest

Deterministic link harvesting for QA and website migration testing

flysh

DOM Document Object Artifact Collector

images-downloader

A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)

anydownload

A powerful website downloader with GUI support

supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

spankbang

spankbang.com api implementation

site-audit-seo

Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv.

advanced-seo-checker

A library for checking basic SEO signals of a website

crawlee-one

CrawleeOne is a framework built on top of Crawlee and Apify for writing robust and highly configurable web scrapers

get-all-links

A node crawler that return all links/href from website

@devjoyvn/fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

goose-parser

Multi environment web page parser

express-bot

Crawler(robots) decision middleware for Express

@upstash/search-crawler

A CLI tool to crawl documentation sites and create a search index for Upstash Search.

@konker.dev/tiny-treecrawler-fp

A library for crawling a filesystem tree, based on Effect-ts

site2pdf-cli

Generate comprehensive PDFs of entire websites, ideal for RAG.

@hardbulls/wbsc-crawler

Tool to crawl events, leagues and statistics from WBSC based websites.

bas

Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.

syphonx

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

puppeteer-extra-plugin-notbody-stealth

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.

email-extractor

extract emails address from website by following links

salticidae

A utility library to make downloading & extracting specific content from a URL easy

@iflow-mcp/firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

puppeteer-prerender

Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)

aliexpress-product-scraper

Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,

@justcooldev/slwcrawl

Crawl and download Snap Lenses from *lens.snapchat.com* with ease.

directory-crawler

The directory crawler library for Node.JS

advanced-sitemap-generator

Easily create XML sitemaps for your website.

website-scraper-cli

Web Download CLI

vue-seo-helper

A Vue3 plugin to improve SEO and crawler accessibility

@elchika-inc/open-crawler-mcp-server

Web crawler MCP server for extracting text content from web pages

puppeteer-extra-plugin-stealth-lp

Stealth mode: Applies various techniques to make detection of headless puppeteer harder.

silktext

Lightweight, runtime-safe crawling → clean Markdown

gin-downloader

Simple manga scrapper for famous online manga websites.

flixhq-core

Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.

undetectable

page-scraper

Web page scraper with a jQuery-like syntax for Node.

schabbi-webscraper

Lightweight and easy to use crawling solution for websites.

axe-crawler

A highly configurable website crawler for automatically testing a website for accessibility issues using the axe-core library. Uses selenium and headless Chrome to load pages, inject axe-core, and run tests. Generates an html summary report in addition