Found 181 results for crawling

@crawlee/utils

A set of shared utilities that can be used by crawlers

notion-md-crawler

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

@promptbook/website-crawler

Promptbook: Turn your company's scattered knowledge into AI ready books

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

@hyperbrowser/agent

Hyperbrowsers Web Agent

crawlbase

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

transparent-proxy

Real transparent HTTP-Proxy-Server. Upstream your requests whatever you want!

deepcrawl

JavaScript/TypeScript SDK for Deepcrawl API - A powerful web scraping and crawling service

roboto

A web crawler for Nodejs.

headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

js-crawler

Web crawler for Node.js

node-webcrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

bromato

Local browser automation for no-code tools like n8n or make

@popstas/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

webcrawlerapi-js

JS client for WebcrawlerAPI

@monostate/node-scraper

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

jsonld-extract

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDOM, ...).

crawler-url-parser

An `URL` parser for crawling purpose.

anycrawl-mcp-server

AnyCrawl MCP Server - Adds powerful web scraping and crawling to Cursor, Claude and any other LLM clients

semantic-crawler

Priority based Semantic Web Crawler.

node-html-crawler

Crawler (spider) of site web pages by domain name

multipass-torrent

Collects torrents from various sources (dump, RSS, HTML pages) and associates the video files within with IMDB ID

crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

twilight

Twitter API tools

@0y0/scraper

A web scraping tool that extracts any data from the web.

website-crawler-sdk

Node.js SDK for interacting with WebsiteCrawler.org API

goldwasher

Extraction of text and related metadata.

langgraph-api-doc-processor

TypeScript API Documentation Processor with Real LangGraph Workflow - Automates API integration research and planning

hquery.php

An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

icrawler

Tool for easy scraping data from websites

goose-parser

Multi environment web page parser

@lightfeed/browser-agent

Serverless browser agent

friday-sdk

Official JavaScript/TypeScript SDK for the Friday API

sitesampler

Sample website text content over time.

serpstat-crawling

Serpstat SERP Crawling API MCP Server

enispider

A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)

miniscraper

Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

@0y0/scraper-extensions

A @0y0/scraper expansion pack.

quickscraper-sdk

Quick Scraper SDK NodeJS APIs

@crstn/redirect

A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.

@crawlbase/mcp

MCP server for Crawlbase API - enables web scraping through Model Context Protocol

n8n-nodes-firecrawl-tool

n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

node-headless-crawler

Distributed web crawler powered by Headless Chrome

goldwasher-needle

Plugin for goldwasher to add needle for easy HTTP requests.

kaiser-crawler

Node.js module for crawling the web

mrspider

simple polite crawling of the web.

@abilashinamdar/node-crawler

Fast asynchronous NodeJS module for crawling/scraping a web through worker_threads.

goldwasher-schedule

Scheduled goldwasher requests, using goldwasher-needle and node-schedule.

crawlable

A way to make your web application crawlable, so it can be well referenced on the web.

@jnv/scrapoxy

Scrapoxy is a proxy for scrapers

sasori-crawl

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

papermonk-downloader-plosone

plosone.org scraper

console-tourist

This script provides to analyze console error on your website.

syphonx

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

node-raspar

Easily scrap the web for torrent and media files.

goldwasher-aws-lambda

A version of goldwasher that runs as a module on AWS Lambda.

crawler-ts

Lightweight crawler written in TypeScript using ES6 generators.

@imaginerlabs/user-agent-generator

High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

notion-crawler

Easily crawl your public notion pages

crawling

A simple crawler made in JavaScript for Node.

crawler-ts-htmlparser2

Lightweight crawler written in TypeScript using ES6 generators.

graceful-playwright

Gracefully handle timeout and network error with auto retry.

scrapingai

Build web scraping agents using AI to auto-extract the data from websites

beautifulstew

A simple web scraping tool built for developers that can be utilized on both the client and server.

sitemap-js-obj

Generate a sitemap javascript object from the folder structure crawling HTML files only.

keyworm

keyword mention 크롤러

node-web-crawler

Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

@a-parser/webperl

papermonk

Streaming pdf fetcher for academic papers.

realfish-yc

Real Fish Youtube Video Crawling Module

@subtitles/providers

Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo

@vladfrangu-dev/crawlee-utils

A set of shared utilities that can be used by crawlers

earthworm

easily create crawlers based on self-replicated scrapers

goose-chrome-environment

Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API

node-crawler-scraper

Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.

detect-crawling-react

This is the React Component for Detect Crawling

scrapyteer

Web scraping/crawling framework built on top of headless Chrome

@jifeon/goose-parser

PhantomJS/Browser lib which allows to parse a webpage

headline-news-naver

This extracts the top five news metadata from NAVER headlines.

htcrawl

crawler for single page applications

phantomjs-sitemap-generator

PhantomJS sitemap generator

tiny-crawler

tiny-crawler is a web crawler.

siter

Site content parser for popular websites with fallback to Open Graph and Twitter Cards

@botwall/sdk

BotWall SDK for site protection and bot crawling

goose-paginator

Paginator enriches ability to paginate over the pages in Goose Parser

scrape-them-all

🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

web-crawler

Scalable, extensible, web crawler framework.

udemy-crawler

Crawling Udemy course info and save into JSON format.

firecrawl-simple-mcp

Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

pattern-grab

🤛🏻 Regular Expression Data Grabber

gumo

A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL

spamlet

spamlet is an efficient and simple crawler for playwright

crawlme

Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.

spider2

A 2nd generation spider to crawl any article site, automatic reading title and content.

proxidoor

proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.

commodidolores

Web crawler for Node.js

crawler-ts-fetch

Lightweight crawler written in TypeScript using ES6 generators.

crawler-find-word

crawler service

xstruct

Data extraction tools.

@mseep/firecrawl-simple-mcp

MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content

rebrowser-patches-fadi-patch

crawlable-solidify

Some tools to help you to render your application as a static web site using the crawlable module.

doffy

a headless browser automation library with easy-use API

node-crawling-framework

NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)

fadi-rebrowser-patches

sitemaps-getter

A tool to get sitemaps from websites and crawl them

headless-chrome-crawler-x

Distributed web crawler powered by Headless Chrome

krawler

Fast and lightweight web crawler with built-in cheerio, xml and json parser.

tai-spider

Scrapy Framework implemented by nodejs.

nocrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

crawl-client

Node.js client for the CloudCrawler.io API

@datasco/sdk

Datasco API SDK for Node.js to collect any data from any website

crawler-ts-fs

Lightweight crawler written in TypeScript using ES6 generators.

@tooly/firecrawl

Firecrawl API tools for OpenAI, Anthropic, and AI SDK

robinbot

robin web crawling engine with nodejs

ccht

A simple command0line tool to crawl and test your website

goose-jsdom-environment

Environment for Goose Parser which allows to run it using JsDOM

billboard-chart-api

billboard chart crawling module

style-crawl

Package to find style links from the site you want

webcreeper

WebCreeper easy web crawler

scrapingapi

One API to scrape All the Web.

aragog-client

Aragog web scraping framework client

hylsplider

fork from headless-chrome-crawler and update puppeteer to the latest version

crawling-typer

Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo

crawler2

sitescrapr

Simple website crawler and scraper

instagram-crawling

Simple Instagram Crawling without using public API

@jonnyprof/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

crawly-automation

A lightweight and modular web crawling framework built with Puppeteer.

saintjs-score

SoongSil UniverSity U-saint Score Crawling

scrapr

A tool for getting public website content using a browser engine or http get.

crt-scrapper

Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.

cookied-phantom-crawler

PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj