@crawlee/utils
A set of shared utilities that can be used by crawlers
Found 181 results for crawling
A set of shared utilities that can be used by crawlers
A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.
Node SDK for Hyperbrowser API
Promptbook: Turn your company's scattered knowledge into AI ready books
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
Hyperbrowsers Web Agent
Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API
Real transparent HTTP-Proxy-Server. Upstream your requests whatever you want!
JavaScript/TypeScript SDK for Deepcrawl API - A powerful web scraping and crawling service
A web crawler for Nodejs.
Distributed web crawler powered by Headless Chrome
Web crawler for Node.js
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously
Local browser automation for no-code tools like n8n or make
Distributed web crawler powered by Headless Chrome
JS client for WebcrawlerAPI
Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers
A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDOM, ...).
An `URL` parser for crawling purpose.
AnyCrawl MCP Server - Adds powerful web scraping and crawling to Cursor, Claude and any other LLM clients
Priority based Semantic Web Crawler.
Crawler (spider) of site web pages by domain name
Collects torrents from various sources (dump, RSS, HTML pages) and associates the video files within with IMDB ID
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
Twitter API tools
A web scraping tool that extracts any data from the web.
Node.js SDK for interacting with WebsiteCrawler.org API
Extraction of text and related metadata.
TypeScript API Documentation Processor with Real LangGraph Workflow - Automates API integration research and planning
An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+
Tool for easy scraping data from websites
Multi environment web page parser
Serverless browser agent
Official JavaScript/TypeScript SDK for the Friday API
Sample website text content over time.
Serpstat SERP Crawling API MCP Server
A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)
Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM
A @0y0/scraper expansion pack.
Quick Scraper SDK NodeJS APIs
A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.
MCP server for Crawlbase API - enables web scraping through Model Context Protocol
n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents
Distributed web crawler powered by Headless Chrome
Plugin for goldwasher to add needle for easy HTTP requests.
Node.js module for crawling the web
simple polite crawling of the web.
Fast asynchronous NodeJS module for crawling/scraping a web through worker_threads.
Scheduled goldwasher requests, using goldwasher-needle and node-schedule.
A way to make your web application crawlable, so it can be well referenced on the web.
Scrapoxy is a proxy for scrapers
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
plosone.org scraper
This script provides to analyze console error on your website.
SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest
Easily scrap the web for torrent and media files.
A version of goldwasher that runs as a module on AWS Lambda.
Lightweight crawler written in TypeScript using ES6 generators.
High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.
Easily crawl your public notion pages
A simple crawler made in JavaScript for Node.
Lightweight crawler written in TypeScript using ES6 generators.
Gracefully handle timeout and network error with auto retry.
Build web scraping agents using AI to auto-extract the data from websites
A simple web scraping tool built for developers that can be utilized on both the client and server.
Generate a sitemap javascript object from the folder structure crawling HTML files only.
keyword mention 크롤러
Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!
Streaming pdf fetcher for academic papers.
Real Fish Youtube Video Crawling Module
Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo
A set of shared utilities that can be used by crawlers
easily create crawlers based on self-replicated scrapers
Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API
Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.
This is the React Component for Detect Crawling
Web scraping/crawling framework built on top of headless Chrome
PhantomJS/Browser lib which allows to parse a webpage
This extracts the top five news metadata from NAVER headlines.
crawler for single page applications
PhantomJS sitemap generator
tiny-crawler is a web crawler.
Site content parser for popular websites with fallback to Open Graph and Twitter Cards
BotWall SDK for site protection and bot crawling
Paginator enriches ability to paginate over the pages in Goose Parser
🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.
Scalable, extensible, web crawler framework.
Crawling Udemy course info and save into JSON format.
Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs
🤛🏻 Regular Expression Data Grabber
A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.
An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL
spamlet is an efficient and simple crawler for playwright
Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.
A 2nd generation spider to crawl any article site, automatic reading title and content.
proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.
Web crawler for Node.js
Lightweight crawler written in TypeScript using ES6 generators.
crawler service
Data extraction tools.
MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
Some tools to help you to render your application as a static web site using the crawlable module.
a headless browser automation library with easy-use API
NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.
A tool to get sitemaps from websites and crawl them
Distributed web crawler powered by Headless Chrome
Fast and lightweight web crawler with built-in cheerio, xml and json parser.
Scrapy Framework implemented by nodejs.
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!
Node.js client for the CloudCrawler.io API
Datasco API SDK for Node.js to collect any data from any website
Lightweight crawler written in TypeScript using ES6 generators.
Firecrawl API tools for OpenAI, Anthropic, and AI SDK
robin web crawling engine with nodejs
A simple command0line tool to crawl and test your website
Environment for Goose Parser which allows to run it using JsDOM
billboard chart crawling module
Package to find style links from the site you want
WebCreeper easy web crawler
One API to scrape All the Web.
Aragog web scraping framework client
fork from headless-chrome-crawler and update puppeteer to the latest version
Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!
Simple website crawler and scraper
Simple Instagram Crawling without using public API
Distributed web crawler powered by Headless Chrome
A lightweight and modular web crawling framework built with Puppeteer.
SoongSil UniverSity U-saint Score Crawling
A tool for getting public website content using a browser engine or http get.
Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.
PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj
Moving or backing up your Wordpress site to Blogger
An API to get data off of IMDB using Puppeteer.
Crawler made simple
Set of utils and queues to make web scraping easy.
make web scraping easy
NodeJS Crawler for Twitter
web scraper for album reviews from pitchfork
Easy To Use Web Crawler
A Simple Job Manager
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!
Tem o objetivo de executar rotinas de CRAWLING a partir de um arquivo JSON utilizando xpath mas aceitando para cada passo uma função callback que recebe o valor e pode passar esse valor para um próximo passo.
Harvesting data at the <html> mine.
Environment for Goose parser which allows to run it in commmon Browser
Single Page App SER
StackSleuth in-house browser automation agent for debugging and user simulation
scrap and caching by use a redis from instagram
An API to get magnet links using Puppeteer.
Daily use crawling methods for puppeteer
A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha. The core module without browser installation
A simple web crawler
Automated scraping module using patterns generated by the userscript Scrapeasy.
Distributed web crawler powered by Headless Chrome
A lightweight and simple API for web crawling built on chromium puppeteer
Easily scrap web pages by providing json recipes
A Node.js scraping framework built on puppeteer-core (to use a headless Chrome/Chromium browser). The core module without browser installation
The error crawler that powers http://plucky.io/
NodeCraw is a web crawling application that allows you to crawl specified URLs and extract information from web pages. It utilizes various modules and libraries to perform crawling and save the results.
A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha
The most advanced web crawler for JavaScript
Helper to extract confessions from webpages
A Wight backend for fetching static web pages
Crawler Second-system effect,the second development
Web crawler
Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously
Node.js web scraping utility powered by puppeteer pool
Real Fish Youtube Trend Video Crawling
based on node-crawler
DCrawler is a distribited web spider written in Nodejs and queued with Mongodb. It gives you the full power of jQuery to parse big pages as they are downloaded, asynchronously. Simplifying distributed crawler!
A straightforward sitemap generator written in TypeScript.
Simple & Human-Friendly HTML Scraper with Json-ld support
Net Crawler is a web spider written with Nodejs
Distributed web crawler powered by Headless Chrome
Simple scraper for imitating browsing sessions
Parkour the web like a yamakazi
naver stock data crawler
A plugin for Hapi.js to run goldwasher as a scraping API on the web.