Found 177 results for crawling

@popstas/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

goldwasher

Extraction of text and related metadata.

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

kaiser-crawler

Node.js module for crawling the web

crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

goldwasher-schedule

Scheduled goldwasher requests, using goldwasher-needle and node-schedule.

goldwasher-aws-lambda

A version of goldwasher that runs as a module on AWS Lambda.

quickscraper-sdk

Quick Scraper SDK NodeJS APIs

sitesampler

Sample website text content over time.

goldwasher-needle

Plugin for goldwasher to add needle for easy HTTP requests.

@crawlbase/mcp

MCP server for Crawlbase API - enables web scraping through Model Context Protocol

node-web-crawler

Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

graceful-playwright

Gracefully handle timeout and network error with auto retry.

@monostate/node-scraper

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

@jnv/scrapoxy

Scrapoxy is a proxy for scrapers

icrawler

Tool for easy scraping data from websites

hquery.php

An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

siter

Site content parser for popular websites with fallback to Open Graph and Twitter Cards

crawlable

A way to make your web application crawlable, so it can be well referenced on the web.

htcrawl

crawler for single page applications

goose-paginator

Paginator enriches ability to paginate over the pages in Goose Parser

@imaginerlabs/user-agent-generator

High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

session-scraper

Simple scraper for imitating browsing sessions

mrspider

simple polite crawling of the web.

web-crawler

Scalable, extensible, web crawler framework.

xstruct

Data extraction tools.

sitemap-js-obj

Generate a sitemap javascript object from the folder structure crawling HTML files only.

@0y0/scraper

A web scraping tool that extracts any data from the web.

@botwall/sdk

BotWall SDK for site protection and bot crawling

udemy-crawler

Crawling Udemy course info and save into JSON format.

webcreeper

WebCreeper easy web crawler

pattern-grab

🤛🏻 Regular Expression Data Grabber

goose-chrome-environment

Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API

crawl-client

Node.js client for the CloudCrawler.io API

node-headless-crawler

Distributed web crawler powered by Headless Chrome

crawler-ts-fetch

Lightweight crawler written in TypeScript using ES6 generators.

fadi-rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

@tooly/firecrawl

Firecrawl API tools for OpenAI, Anthropic, and AI SDK

tiny-crawler

tiny-crawler is a web crawler.

scrapr

A tool for getting public website content using a browser engine or http get.

enispider

A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)

console-tourist

This script provides to analyze console error on your website.

node-raspar

Easily scrap the web for torrent and media files.

notion-crawler

Easily crawl your public notion pages

beautifulstew

A simple web scraping tool built for developers that can be utilized on both the client and server.

crawling

A simple crawler made in JavaScript for Node.

realfish-yct

Real Fish Youtube Trend Video Crawling

papermonk

Streaming pdf fetcher for academic papers.

node-crawler-scraper

Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.

earthworm

easily create crawlers based on self-replicated scrapers

img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL

papermonk-downloader-plosone

plosone.org scraper

scrapingai

Build web scraping agents using AI to auto-extract the data from websites

scrapingapi

One API to scrape All the Web.

crawler-hq

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

@abilashinamdar/node-crawler

Fast asynchronous NodeJS module for crawling/scraping a web through worker_threads.

firecrawl-simple-mcp

Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

realfish-yc

Real Fish Youtube Video Crawling Module

@0y0/scraper-extensions

A @0y0/scraper expansion pack.

hapi-goldwasher

A plugin for Hapi.js to run goldwasher as a scraping API on the web.

scrapeasy

Automated scraping module using patterns generated by the userscript Scrapeasy.

sasori-crawl

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

nodecraw

NodeCraw is a web crawling application that allows you to crawl specified URLs and extract information from web pages. It utilizes various modules and libraries to perform crawling and save the results.

spider2

A 2nd generation spider to crawl any article site, automatic reading title and content.

aragog-client

Aragog web scraping framework client

tai-spider

Scrapy Framework implemented by nodejs.

scrape-them-all

🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

@leoko/crawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

crawley

A simple web crawler

simplecrawling

Crawler made simple

doffy

a headless browser automation library with easy-use API

@jifeon/goose-parser

PhantomJS/Browser lib which allows to parse a webpage

@mseep/firecrawl-simple-mcp

MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content

phantomjs-sitemap-generator

PhantomJS sitemap generator

crawlme

Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.

crawler-ts-fs

Lightweight crawler written in TypeScript using ES6 generators.

crawler-ts

Lightweight crawler written in TypeScript using ES6 generators.

crawlable-solidify

Some tools to help you to render your application as a static web site using the crawlable module.

sitescrapr

Simple website crawler and scraper

cookied-phantom-crawler

PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj

crawly-automation

A lightweight and modular web crawling framework built with Puppeteer.

netcrawler

Net Crawler is a web spider written with Nodejs

spider-core

A Node.js scraping framework built on puppeteer-core (to use a headless Chrome/Chromium browser). The core module without browser installation

headline-news-naver

This extracts the top five news metadata from NAVER headlines.

spa-seo

Single Page App SER

crawler-ts-htmlparser2

Lightweight crawler written in TypeScript using ES6 generators.

hylsplider

fork from headless-chrome-crawler and update puppeteer to the latest version

crawler2

scrapyteer

Web scraping/crawling framework built on top of headless Chrome

dcrawler

DCrawler is a distribited web spider written in Nodejs and queued with Mongodb. It gives you the full power of jQuery to parse big pages as they are downloaded, asynchronously. Simplifying distributed crawler!

plucky-crawler

The error crawler that powers http://plucky.io/

headless-chrome-crawler-x

Distributed web crawler powered by Headless Chrome

saintjs-score

SoongSil UniverSity U-saint Score Crawling

krawler

Fast and lightweight web crawler with built-in cheerio, xml and json parser.

rebrowser-patches-fadi-patch

confession

Helper to extract confessions from webpages

@crstn/redirect

A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.

goose-jsdom-environment

Environment for Goose Parser which allows to run it using JsDOM

puppeteer-for-crawling

Daily use crawling methods for puppeteer

@datasco/sdk

Datasco API SDK for Node.js to collect any data from any website

proxidoor

proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.

crawler-find-word

crawler service

friday-sdk

Official JavaScript/TypeScript SDK for the Friday API

cspider

Distributed web crawler powered by Headless Chrome

goose-browser-environment

Environment for Goose parser which allows to run it in commmon Browser

spider-stealth

A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha

p4k-api

web scraper for album reviews from pitchfork

nocrawler

crawling-typer

Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo

magnet-getter

An API to get magnet links using Puppeteer.

spider-stealth-core

A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha. The core module without browser installation

@jonnyprof/headless-chrome-crawler

Distributed web crawler powered by Headless Chrome

dynamic-crawling

Tem o objetivo de executar rotinas de CRAWLING a partir de um arquivo JSON utilizando xpath mas aceitando para cada passo uma função callback que recebe o valor e pode passar esse valor para um próximo passo.

planisphere

A straightforward sitemap generator written in TypeScript.

robinbot

robin web crawling engine with nodejs

@a-parser/webperl

imdb-scrapi

An API to get data off of IMDB using Puppeteer.

declarative-scraper

Simple & Human-Friendly HTML Scraper with Json-ld support

miniscraper

Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

@satankebab/scraping-utils

Set of utils and queues to make web scraping easy.

twitter-crawler

NodeJS Crawler for Twitter

fiend

The most advanced web crawler for JavaScript

@stacksleuth/browser-agent

StackSleuth in-house browser automation agent for debugging and user simulation

crawler-by-sunbirder

Crawler Second-system effect,the second development

keyworm

keyword mention 크롤러

billboard-chart-api

billboard chart crawling module

style-crawl

Package to find style links from the site you want

@karthikmam/job-manager

A Simple Job Manager

crawler-mod

based on node-crawler

instagram-crawling

Simple Instagram Crawling without using public API

jason-the-miner

Harvesting data at the <html> mine.

skrap

Easily scrap web pages by providing json recipes

crawline

Web crawler

node-pool-scraper

Node.js web scraping utility powered by puppeteer pool

node-crawling-framework

NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)

ccht

A simple command0line tool to crawl and test your website

kick-off-crawling

make web scraping easy

wight-backend-web

A Wight backend for fetching static web pages

spamlet

spamlet is an efficient and simple crawler for playwright

crt-scrapper

Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.

detect-crawling-react

This is the React Component for Detect Crawling

gumo

A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.

commodidolores

Web crawler for Node.js

@vladfrangu-dev/crawlee-utils

A set of shared utilities that can be used by crawlers

hcr

Easy To Use Web Crawler

@subtitles/providers

Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo