@nodelib/fs.walk
A library for efficiently walking a directory recursively
Found 1180 results for crawler
A library for efficiently walking a directory recursively
The fastest directory crawler & globbing alternative to glob, fast-glob, & tiny-glob. Crawls 1m files in < 1s
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
JavaScript SDK for Firecrawl API
[](https://www.npmjs.com/package/recrawl-sync) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml)
Async and sync crawler for json object
Apify API client for JavaScript
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Very straightforward, event driven web crawler. Features a flexible queue interface and a basic cache mechanism with extensible backend.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Templates for the crawlee projects
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
JavaScript module detecting bots/crawlers/spiders via user-agent
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Automatically extracts structured information from webpages
A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS
This is an ES6 adaptation of the original PHP library CrawlerDetect, this library will help you detect bots/crawlers/spiders vie the useragent.
Easily create XML sitemaps for your website.
Extract data from a pdf with pure javascript
Stop website fingerprinting techniques
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
A tiny node module to detect spiders/crawlers quickly and comes with optional middleware for ExpressJS
A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.
JavaScript SDK for Firecrawl API
A port of n0madic/twitter-scraper to Node.js.
Simple, lightweight and expressive web scraping with Node.js
x-ray's crawler
Node.js agent for Sqreen, please see https://www.sqreen.io/
A web scraper for NodeJs
http request for web scraping
HTTP request module customized for crawlers.
Stop website fingerprinting techniques playwright edition
Create xml sitemaps from the command line.
Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in
[](https://www.npmjs.com/package/recrawl) [](https://github.com/aleclarson/recrawl/actions/workflows/release.yml) [ API
A Model Context Protocol (MCP) server implementation that provides real-time web search capabilities through a simple API
Bright CLI is a CLI tool that can initialize, stop, poll and maintain scans in Bright solutions.
Shared functionality for implementing device detection engine for the 51Degrees Pipeline API
Device detection on-premise services for the 51Degrees Pipeline API
Advanced MCP server for web scraping with nested URL fetching and intelligent markdown formatting
Finds broken links and resources on websites
Headless Chrome abstraction to simplify the interaction with the browser. It may be used for crawling sites, test automation, etc
Crawler (spider) of site web pages by domain name
A JavaScript library that allows for the quick transformation of DOM documents into useful formats.
Model Context Protocol (MCP) server for pure.md, the markdown delivery network for LLMs
A powerful and flexible web scraping library with concurrent processing and DOM hierarchy awareness
Linkd Model Context Protocol Server
Device detection cloud services for the 51Degrees Pipeline API
Next.js robots.tsx generator - Automatically create and serve robots.txt for Next.js applications
Automatically extracts structured information from webpages
Parse HTTP headers to detect the device type, model, operating system, browser, and crawler information
http client module with cheerio & iconv(-lite) & promise
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
🚀 MCP SERVER FIXED v3.7.9! Resolved import errors, middleware conflicts, type hints - NOW WORKING PERFECTLY!
一个强大的网站链接抓取工具,支持深度抓取、认证和页面分析
crawl youtube without api key (search videos channels or get all channel/playlist's videos)
impit-based HTTP client implementation for Crawlee. Impersonates browser requests to avoid bot detection.
Automatically extracts structured information from webpages
Utils library for harvesting RPDE feeds
Take a snapshot of any website.
Yet another node torrent scraper based on x-ray. (Support iptorrents, torrentleech, torrent9, Yyggtorrent, ThePiratebay, torrentz2, 1337x, KickassTorrent, Rarbg, TorrentProject, Yts, Limetorrents, Eztv)
Recursively read a directory, blazing fast.
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
A high-performance web crawler powered by Bun that downloads pages and converts them to Markdown
Distributed web crawler powered by Headless Chrome
JS client for WebcrawlerAPI
To install:
Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown
Lightweight async scraper for Google News
An easy to use CLI for downloading websites for offline usage
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
a Test Node
The [spider](https://github.com/spider-rs/spider) project ported to Node.js
A simple email extractor for obfuscated emails.
The unofficial HLTV Node.js API
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously
Verify that a request is from Google using Google's recommended DNS verification steps
JavaScript SDK for Firecrawl API
A web crawler for Nodejs.
Generic web crawler powered by Node.js
Webpage crawler for qualweb
Keep Bots Away From Your Express App
Core crawler framework functionality - TypeScript web crawling library
An attestate crawler strategy to download and transform Ethereum block event logs
Smart SQL injection scanner with crawler and optional Playwright capture.
Utility functions for web crawling - sitemap processing, link extraction, system info
novel downloader for node-novel style , include site ( dmzj / wenku8 / syosetu / ...etc )
dcinside 갤러리 크롤링을 위한 Node.js 라이브러리
Unofficial API lib for Nintendo Switch eShop game listing and pricing information.
crawls a npm package and it's dependencies for their licenses
x-crawl is a flexible Node.js AI-assisted crawler library.
Web crawler for Node.js
Extract emails from text and also from a site page
Lightweight, runtime-safe crawling → clean Markdown
Efficient SEO-focused server for Wasm-generated pages
Detect search engine crawlers by their User-Agent strings.
A web crawler module designed to scarp data from Ptt.
Unified browser / HTML controller interfaces that support patchright, camoufox, playwright, puppeteer and cheerio
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
A twitter client for agents with notifications support
An `URL` parser for crawling purpose.
A Node crawler/scrape for retrieving data from websites
API crawler for REST and GraphQL endpoint crawling with auto-detection
Priority based Semantic Web Crawler.
crawler toolbox
Scrape public available jobs on Linkedin using headless browser
Web scraper for NodeJS
Detect SEO Bot Crawler
A library for checking basic SEO signals of a website
A web crawler made for the SEO based on plugins. Please wait or contribute ... still in beta
A command-line tool acting as an MCP (ModelContextProtocol) server, using Playwright to crawl web content for AI models.
Web scraper that scraping web pages by LetsScrapeData XML template
A crawler, to download web-site
A robust GitHub API crawler that walks a queue of GitHub entities retrieving and storing their contents.
A library and CLI tool to recursively collect links from a given initial URL and output them as structured data
A client for fetching stock data from the Tehran Stock Exchange (TSETMC). Works in Browser, Node and as CLI.
Crawls information from public netatmo stations
Crawl web as easy as possible
TypeScript types for defining custom metrics
Fetch web data as easy as possible
gRPC tokio based web crawler
HTTP crawler for basic web scraping without JavaScript execution
The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
The unofficial HLTV Node.js API
`algoliasearch-netlify-frontend` is the front-end bundle we recommend to use with our Netlify plugin.
Generate AI-ready optimization files for websites, including robots.txt, sitemaps, and AI manifests
A comprehensive web scraping library with resumable operations, middleware support, and built-in rate limiting
deepcrawl cli
Detect user-agent is a bot/spider/crawler
Node.js SDK for interacting with WebsiteCrawler.org API
NodeJS User-Agent String Parser based on Udger SQLite databases https://udger.com/products/local_parser
Secure your puppeteer for scraping
Basic page crawler written in nodejs
linkedin scraper for 2020 website
An extremely lightweight HTTP request client for the command-line. Supports: http, https, proxy, redirects, cookies, content-encoding, multipart/form-data, multi-threading, recursive website crawling and mirroring.
An easiest crawling and scraping module for NestJS
Distributed web crawler powered by Headless Chrome
Simple, flexible, delightful web crawler/spider package
Crawler queue creation tool for paging
Node.js web crawler to get all internal links from a website.
Crawl a site to generate a backstopjs config
A web page content extractor for News websites
Super configurable async web spider
Official TypeScript SDK for Crawlbyte – create tasks, poll results, and integrate data scraping into your JavaScript/TypeScript applications.
MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.
Typed TypeScript/JavaScript SDK for Supacrawler API (scrape, jobs, screenshots, watch)
🛡️ Nuxt 3 middleware to block suspicious bots, protect SEO crawlers with reverse DNS checks, and enforce User-Agent rules.
Vercel integration for SnapCrawl. Serve pre-rendered HTML to crawlers in Next.js middleware or Edge Functions for static SPAs and Express apps.
proxy manager used to scrape data
Deterministic link harvesting for QA and website migration testing
A Node.js module for downloading a single image or multiple images to disk from a given Url (checking if url exist and detecting image type)
DOM Document Object Artifact Collector
A powerful website downloader with GUI support
spankbang.com api implementation
Web service and CLI tool for SEO site audit: crawl site, lighthouse all pages, view public reports in browser. Also output to console, json, csv.
A library for checking basic SEO signals of a website
CrawleeOne is a framework built on top of Crawlee and Apify for writing robust and highly configurable web scrapers
A node crawler that return all links/href from website
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
Multi environment web page parser
Crawler(robots) decision middleware for Express
A CLI tool to crawl documentation sites and create a search index for Upstash Search.
Tool to crawl events, leagues and statistics from WBSC based websites.
Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.
Generate comprehensive PDFs of entire websites, ideal for RAG.
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest
extract emails address from website by following links
A utility library to make downloading & extracting specific content from a URL easy
A library for crawling a filesystem tree, based on Effect-ts
Fetch the pre-rendered content, meta, links and Open Graph of a webpage, especially Single-Page Application (SPA)
MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.
Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,
Crawl and download Snap Lenses from *lens.snapchat.com* with ease.
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
Web application spider with screenshot capture and customer journey documentation. Automate user flow documentation with authentication support.
The directory crawler library for Node.JS
Easily create XML sitemaps for your website.
Smart MCP tool to find and validate movie/tv-show resources with multiple sources support
Web Download CLI
Web crawler MCP server for extracting text content from web pages
A Vue3 plugin to improve SEO and crawler accessibility
Stealth mode: Applies various techniques to make detection of headless puppeteer harder.
Lightweight, runtime-safe crawling → clean Markdown
Simple manga scrapper for famous online manga websites.
Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.
Web page scraper with a jQuery-like syntax for Node.
A highly configurable website crawler for automatically testing a website for accessibility issues using the axe-core library. Uses selenium and headless Chrome to load pages, inject axe-core, and run tests. Generates an html summary report in addition
Lightweight and easy to use crawling solution for websites.
Environment variable collector
Node.js module for crawling the web
A crawler implemented using a headless browser (Chrome).
A util tool
xvideos.com api implementation.
use google play protobuf api in node
A Node.JS Web crawler using the API Fetch
A TypeScript library for scraping stories from various Vietnamese websites
Generic node service to handle SSR for SPA made with any kind of frontend framework
A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.
A promised based node module to scrape TV shows, episodes and torrent info from EZTV.
xvideos.com api implementation.
🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
A socks and http proxy by nodejs for you to over GWF
Scrape data from any webpage.
A util tool
crawls mysql database and creates insert queries or returns data from multiple table depending on the relationship information of the tables provided
Easily build flexible, scalable, and distributed, web crawlers.
爬虫下载插件
crawler with nodejs
A crawler to retrieve store data from the Epic Games store
A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.
An extremely simple module to web scrapper a DOM element(s)
Access Google Play by logging in and making requests as an Android device!
Floodesh is a distributed web spider/crawler written with Nodejs.
Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page
Botify Rest API SDK middlewares
self twitter scraper
Multi-thread crawler engine.
Auxiliary package of functions for the TypeScript framework Botmation
Get Aliexpress product details as a json reponse including feedbacks, variants, description, images, etc.,
A TypeScript library for detecting and categorizing bots from user agent strings
A simple license crawler for crediting open source work
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
一个专门用于爬取csdn文章的爬虫/A JS library for Crawl CSDN Article.
Get information using the string of the specified rule
Youtube Crawler with no API that returns 3 first videos.
A robots.txt reader, parser and matcher.
google search crawler
Playwright-based crawler for full browser automation and JavaScript rendering
MCP server for Crawlbase API - enables web scraping through Model Context Protocol
Web crawler to use as API
Puppeteer-based crawler for Chrome automation and dynamic content scraping
A util tool