@nodelib/fs.walk
A library for efficiently walking a directory recursively
Found 1180 results for crawler
A library for efficiently walking a directory recursively
The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
[](https://www.npmjs.com/package/recrawl-sync) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml)
JavaScript SDK for Firecrawl API
Async and sync crawler for json object
Apify API client for JavaScript
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Templates for the crawlee projects
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
JavaScript module detecting bots/crawlers/spiders via user-agent
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Automatically extracts structured information from webpages
A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
Easily create XML sitemaps for your website.
Extract data from a pdf with pure javascript
Stop website fingerprinting techniques
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS
A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.
JavaScript SDK for Firecrawl API
Simple, lightweight and expressive web scraping with Node.js
A port of n0madic/twitter-scraper to Node.js.
Node.js agent for Sqreen, please see https://www.sqreen.io/
x-ray's crawler
A web scraper for NodeJs
http request for web scraping
HTTP request module customized for crawlers.
Stop website fingerprinting techniques playwright edition
Create xml sitemaps from the command line.
Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in
Node.js SDK for Crawlab
[](https://www.npmjs.com/package/recrawl) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml) [ API
A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API
Bright CLI is a CLI tool that can initialize, stop, poll and maintain scans in Bright solutions.
Shared functionality for implementing device detection engine for the 51Degrees Pipeline API
Device detection on-premise services for the 51Degrees Pipeline API
Advanced MCP server for web scraping with nested URL fetching and intelligent markdown formatting
Headless Chrome abstraction to simplify the interaction with the browser. It may be used for crawling sites, test automation, etc
Finds broken links and resources on websites
Crawler (spider) of site web pages by domain name
A JavaScript library that allows for the quick transformation of DOM documents into useful formats.
Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs
A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness
Linkd Model Context Protocol Server
Device detection cloud services for the 51Degrees Pipeline API
Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications
Parse HTTP headers to detect the device type, model, operating system, browser, and crawler information
Automatically extracts structured information from webpages
http client module with cheerio & iconv(-lite) & promise
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!
一个强大的网站链接抓取工具,支持深度抓取、认证和页面分析
crawl youtube without api key (search videos channels or get all channel/playlist's videos)
impit-based HTTP client implementation for Crawlee. Impersonates browser requests to avoid bot detection.
Utils library for harvesting RPDE feeds
Automatically extracts structured information from webpages
Take a snapshot of any website.
Yet another node torrent scraper based on x-ray. (Support iptorrents, torrentleech, torrent9, Yyggtorrent, ThePiratebay, torrentz2, 1337x, KickassTorrent, Rarbg, TorrentProject, Yts, Limetorrents, Eztv)
Recursively read a directory, blazing fast.
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown
Distributed web crawler powered by Headless Chrome
JS client for WebcrawlerAPI
To install:
Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown
Lightweight async scraper for Google News
An easy to use CLI for downloading websites for offline usage
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
a Test Node
The [spider](https://github.com/spider-rs/spider) project ported to Node.js
A simple email extractor for obfuscated emails.
The unofficial HLTV Node.js API
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously
Verify that a request is from Google using Google's recommended DNS verification steps
JavaScript SDK for Firecrawl API
A web crawler for Nodejs.
Generic web crawler powered by Node.js
Webpage crawler for qualweb
Core crawler framework functionality - TypeScript web crawling library
Keep Bots Away From Your Express App
Smart SQL injection scanner with crawler and optional Playwright capture.
An attestate crawler strategy to download and transform Ethereum block event logs
Utility functions for web crawling - sitemap processing, link extraction, system info
novel downloader for node-novel style , include site ( dmzj / wenku8 / syosetu / ...etc )
Unofficial API lib for Nintendo Switch eShop game listing and pricing information.
crawls a npm package and it's dependencies for their licenses
dcinside 갤러리 크롤링을 위한 Node.js 라이브러리
x-crawl is a flexible Node.js AI-assisted crawler library.
Web crawler for Node.js
Extract emails from text and also from a site page
Lightweight, runtime-safe crawling → clean Markdown
Efficient SEO-focused server for Wasm-generated pages
A web crawler module designed to scarp data from Ptt.
Detect search engine crawlers by their User-Agent strings.
Unified browser / HTML controller interfaces that support patchright, camoufox, playwright, puppeteer and cheerio
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
A twitter client for agents with notifications support
An `URL` parser for crawling purpose.
A Node crawler/scrape for retrieving data from websites
crawler toolbox
API crawler for REST and GraphQL endpoint crawling with auto-detection
Priority based Semantic Web Crawler.
Web scraper for NodeJS
Scrape public available jobs on Linkedin using headless browser
Detect SEO Bot Crawler
A library for checking basic SEO signals of a website
A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta
A command-line tool acting as an MCP (ModelContextProtocol) server, using Playwright to crawl web content for AI models.
Web scraper that scraping web pages by LetsScrapeData XML template
A crawler, to download web-site
A robust GitHub API crawler that walks a queue of GitHub entities retrieving and storing their contents.
A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.
Crawls information from public netatmo stations
TypeScript types for defining custom metrics
Crawl web as easy as possible
Fetch web data as easy as possible
gRPC tokio based web crawler
HTTP crawler for basic web scraping without JavaScript execution
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The unofficial HLTV Node.js API
`algoliasearch-netlify-frontend` is the front-end bundle we recommend to use with our Netlify plugin.
Generate AI-ready optimization files for websites, including robots.txt, sitemaps, and AI manifests
Node.js SDK for interacting with WebsiteCrawler.org API
Detect user-agent is a bot/spider/crawler
deepcrawl cli
A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting
NodeJS User-Agent String Parser based on Udger SQLite databases https://udger.com/products/local_parser
Secure your puppeteer for scraping
Basic page crawler written in nodejs
linkedin scraper for 2020 website
An extremely lightweight HTTP request client for the command-line. Supports: http, https, proxy, redirects, cookies, content-encoding, multipart/form-data, multi-threading, recursive website crawling and mirroring.
An easiest crawling and scraping module for NestJS
Distributed web crawler powered by Headless Chrome
Simple, flexible, delightful web crawler/spider package
Node.js web crawler to get all internal links from a website.
Crawler queue creation tool for paging
Crawl a site to generate a backstopjs config
Super configurable async web spider
A web page content extractor for News websites
Official TypeScript SDK for Crawlbyte – create tasks, poll results, and integrate data scraping into your JavaScript/TypeScript applications.
Typed TypeScript/JavaScript SDK for Supacrawler API (scrape, jobs, screenshots, watch)
MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.
🛡️ Nuxt 3 middleware to block suspicious bots, protect SEO crawlers with reverse DNS checks, and enforce User-Agent rules.
Vercel integration for SnapCrawl. Serve pre-rendered HTML to crawlers in Next.js middleware or Edge Functions for static SPAs and Express apps.
proxy manager used to scrape data
Deterministic link harvesting for QA and website migration testing
DOM Document Object Artifact Collector
A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)
A powerful website downloader with GUI support
spankbang.com api implementation
Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv.
A library for checking basic SEO signals of a website
CrawleeOne is a framework built on top of Crawlee and Apify for writing robust and highly configurable web scrapers
A node crawler that return all links/href from website
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
Multi environment web page parser
Crawler(robots) decision middleware for Express
A CLI tool to crawl documentation sites and create a search index for Upstash Search.
A library for crawling a filesystem tree, based on Effect-ts
Tool to crawl events, leagues and statistics from WBSC based websites.
Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.
Generate comprehensive PDFs of entire websites, ideal for RAG.
SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
extract emails address from website by following links
A utility library to make downloading & extracting specific content from a URL easy
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.
Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)
Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
Web application spider with screenshot capture and customer journey documentation. Automate user flow documentation with authentication support.
The directory crawler library for Node.JS
Easily create XML sitemaps for your website.
Smart MCP tool to find and validate movie/tv-show resources with multiple sources support
Web Download CLI
A Vue3 plugin to improve SEO and crawler accessibility
Web crawler MCP server for extracting text content from web pages
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
Lightweight, runtime-safe crawling → clean Markdown
Simple manga scrapper for famous online manga websites.
Web page scraper with a jQuery-like syntax for Node.
Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.
A highly configurable website crawler for automatically testing a website for accessibility issues using the axe-core library. Uses selenium and headless Chrome to load pages, inject axe-core, and run tests. Generates an html summary report in addition
Environment variable collector
Lightweight and easy to use crawling solution for websites.
Node.js module for crawling the web
A crawler implemented using a headless browser (Chrome).
xvideos.com api implementation.
A util tool
use google play protobuf api in node
A Node.JS Web crawler using the API Fetch
A TypeScript library for scraping stories from various Vietnamese websites
Generic node service to handle SSR for SPA made with any kind of frontend framework
A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.
A promised based node module to scrape TV shows, episodes and torrent info from EZTV.
xvideos.com api implementation.
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
Scrape data from any webpage.
A socks and http proxy by nodejs for you to over GWF
A util tool
crawls mysql database and creates insert queries or returns data from multiple table depending on the relationship information of the tables provided
Easily build flexible, scalable, and distributed, web crawlers.
crawler with nodejs
爬虫下载插件
A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.
A crawler to retrieve store data from the Epic Games store
An extremely simple module to web scrapper a DOM element(s)
Access Google Play by logging in and making requests as an Android device!
Floodesh is a distributed web spider/crawler written with Nodejs.
self twitter scraper
Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page
Botify Rest API SDK middlewares
Multi-thread crawler engine.
Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,
Auxiliary package of functions for the TypeScript framework Botmation
A TypeScript library for detecting and categorizing bots from user agent strings
A simple license crawler for crediting open source work
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
一个专门用于爬取csdn文章的爬虫/A JS library for Crawl CSDN Article.
Get information using the string of the specified rule
Youtube Crawler with no API that returns 3 first videos.
A robots.txt reader, parser and matcher.
google search crawler
Playwright-based crawler for full browser automation and JavaScript rendering
A bitcoin news crawler
MCP server for Crawlbase API - enables web scraping through Model Context Protocol
Puppeteer-based crawler for Chrome automation and dynamic content scraping
A util tool
Web crawler to use as API