JSPM

Found 181 results for crawling

@crawlee/utils

A set of shared utilities that can be used by crawlers

  • v3.15.1
  • 67.62
  • Published

notion-md-crawler

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

  • v1.0.2
  • 62.08
  • Published

rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

  • v1.0.19
  • 42.62
  • Published

crawlbase

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

  • v1.0.2
  • 42.01
  • Published

transparent-proxy

Real transparent HTTP-Proxy-Server. Upstream your requests whatever you want!

  • v1.15.3
  • 39.51
  • Published

deepcrawl

JavaScript/TypeScript SDK for Deepcrawl API - A powerful web scraping and crawling service

  • v0.5.2
  • 39.05
  • Published

roboto

A web crawler for Nodejs.

  • v0.8.2
  • 38.87
  • Published

js-crawler

Web crawler for Node.js

  • v0.3.21
  • 36.04
  • Published

node-webcrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

  • v0.8.0
  • 35.63
  • Published

bromato

Local browser automation for no-code tools like n8n or make

  • v0.0.8
  • 35.40
  • Published

@monostate/node-scraper

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

  • v1.8.1
  • 33.80
  • Published

jsonld-extract

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDOM, ...).

  • v0.0.8
  • 33.14
  • Published

anycrawl-mcp-server

AnyCrawl MCP Server - Adds powerful web scraping and crawling to Cursor, Claude and any other LLM clients

    • v1.0.1
    • 30.80
    • Published

    semantic-crawler

    Priority based Semantic Web Crawler.

    • v0.0.2
    • 30.61
    • Published

    node-html-crawler

    Crawler (spider) of site web pages by domain name

    • v1.2.3
    • 30.43
    • Published

    multipass-torrent

    Collects torrents from various sources (dump, RSS, HTML pages) and associates the video files within with IMDB ID

    • v0.8.6
    • 29.65
    • Published

    crawlyx

    Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

    • v2.2.5
    • 28.66
    • Published

    twilight

    Twitter API tools

    • v1.0.5
    • 27.97
    • Published

    @0y0/scraper

    A web scraping tool that extracts any data from the web.

    • v1.0.0
    • 26.79
    • Published

    website-crawler-sdk

    Node.js SDK for interacting with WebsiteCrawler.org API

    • v1.0.6
    • 26.39
    • Published

    goldwasher

    Extraction of text and related metadata.

    • v7.0.0
    • 25.92
    • Published

    langgraph-api-doc-processor

    TypeScript API Documentation Processor with Real LangGraph Workflow - Automates API integration research and planning

    • v0.1.1
    • 25.79
    • Published

    hquery.php

    An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

    • v3.3.0
    • 25.59
    • Published

    icrawler

    Tool for easy scraping data from websites

    • v2.6.5
    • 24.46
    • Published

    goose-parser

    Multi environment web page parser

    • v0.6.1
    • 24.31
    • Published

    friday-sdk

    Official JavaScript/TypeScript SDK for the Friday API

    • v0.3.0
    • 23.27
    • Published

    sitesampler

    Sample website text content over time.

    • v4.0.5
    • 22.91
    • Published

    serpstat-crawling

    Serpstat SERP Crawling API MCP Server

      • v0.1.0
      • 22.66
      • Published

      enispider

      A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)

      • v1.2.5
      • 22.47
      • Published

      miniscraper

      Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

      • v0.3.2
      • 22.18
      • Published

      @crstn/redirect

      A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.

      • v1.2.0
      • 22.00
      • Published

      @crawlbase/mcp

      MCP server for Crawlbase API - enables web scraping through Model Context Protocol

      • v1.0.3
      • 21.95
      • Published

      n8n-nodes-firecrawl-tool

      n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

      • v0.1.2
      • 21.68
      • Published

      goldwasher-needle

      Plugin for goldwasher to add needle for easy HTTP requests.

      • v2.1.0
      • 21.05
      • Published

      kaiser-crawler

      Node.js module for crawling the web

      • v1.0.5
      • 20.88
      • Published

      mrspider

      simple polite crawling of the web.

      • v5.1.2
      • 20.37
      • Published

      goldwasher-schedule

      Scheduled goldwasher requests, using goldwasher-needle and node-schedule.

      • v6.0.1
      • 19.19
      • Published

      crawlable

      A way to make your web application crawlable, so it can be well referenced on the web.

      • v0.4.13
      • 19.07
      • Published

      @jnv/scrapoxy

      Scrapoxy is a proxy for scrapers

      • v2.5.0
      • 18.90
      • Published

      sasori-crawl

      Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

      • v1.0.0
      • 18.35
      • Published

      console-tourist

      This script provides to analyze console error on your website.

      • v1.2.0
      • 18.06
      • Published

      syphonx

      SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

      • v1.2.66
      • 17.77
      • Published

      node-raspar

      Easily scrap the web for torrent and media files.

      • v1.2.6
      • 17.62
      • Published

      goldwasher-aws-lambda

      A version of goldwasher that runs as a module on AWS Lambda.

      • v1.0.3
      • 17.32
      • Published

      crawler-ts

      Lightweight crawler written in TypeScript using ES6 generators.

      • v1.1.1
      • 17.31
      • Published

      @imaginerlabs/user-agent-generator

      High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

      • v1.0.2
      • 17.08
      • Published

      notion-crawler

      Easily crawl your public notion pages

      • v0.0.9
      • 16.81
      • Published

      crawling

      A simple crawler made in JavaScript for Node.

      • v1.0.1
      • 16.64
      • Published

      crawler-ts-htmlparser2

      Lightweight crawler written in TypeScript using ES6 generators.

      • v1.1.1
      • 16.59
      • Published

      graceful-playwright

      Gracefully handle timeout and network error with auto retry.

      • v1.5.1
      • 16.46
      • Published

      scrapingai

      Build web scraping agents using AI to auto-extract the data from websites

      • v1.0.1
      • 16.46
      • Published

      beautifulstew

      A simple web scraping tool built for developers that can be utilized on both the client and server.

        • v1.1.4
        • 16.27
        • Published

        sitemap-js-obj

        Generate a sitemap javascript object from the folder structure crawling HTML files only.

        • v0.0.3
        • 16.26
        • Published

        keyworm

        keyword mention 크롤러

        • v0.1.1
        • 16.23
        • Published

        node-web-crawler

        Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

        • v0.0.6
        • 16.08
        • Published

        papermonk

        Streaming pdf fetcher for academic papers.

        • v0.0.3
        • 15.90
        • Published

        realfish-yc

        Real Fish Youtube Video Crawling Module

        • v0.1.8
        • 15.83
        • Published

        @subtitles/providers

        Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo

        • v0.3.0-beta.2
        • 15.60
        • Published

        earthworm

        easily create crawlers based on self-replicated scrapers

        • v1.0.4
        • 15.36
        • Published

        goose-chrome-environment

        Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API

        • v1.1.4
        • 15.30
        • Published

        node-crawler-scraper

        Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.

          • v1.0.1
          • 15.11
          • Published

          scrapyteer

          Web scraping/crawling framework built on top of headless Chrome

          • v1.4.0
          • 14.61
          • Published

          @jifeon/goose-parser

          PhantomJS/Browser lib which allows to parse a webpage

          • v0.2.0-alpha.3
          • 14.59
          • Published

          headline-news-naver

          This extracts the top five news metadata from NAVER headlines.

          • v1.0.5
          • 14.59
          • Published

          htcrawl

          crawler for single page applications

          • v1.2.1
          • 14.58
          • Published

          tiny-crawler

          tiny-crawler is a web crawler.

          • v0.0.5
          • 14.23
          • Published

          siter

          Site content parser for popular websites with fallback to Open Graph and Twitter Cards

          • v0.0.16
          • 13.96
          • Published

          @botwall/sdk

          BotWall SDK for site protection and bot crawling

          • v1.1.1
          • 13.77
          • Published

          goose-paginator

          Paginator enriches ability to paginate over the pages in Goose Parser

          • v1.0.2
          • 13.53
          • Published

          scrape-them-all

          🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

          • v2.0.0
          • 13.47
          • Published

          web-crawler

          Scalable, extensible, web crawler framework.

          • v0.0.0
          • 13.37
          • Published

          udemy-crawler

          Crawling Udemy course info and save into JSON format.

          • v1.1.1
          • 13.22
          • Published

          firecrawl-simple-mcp

          Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

          • v1.0.2
          • 12.81
          • Published

          pattern-grab

          🤛🏻 Regular Expression Data Grabber

            • v1.0.1
            • 12.70
            • Published

            gumo

            A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.

            • v1.0.7
            • 12.70
            • Published

            img-cli

            An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL

            • v1.2.0
            • 12.65
            • Published

            spamlet

            spamlet is an efficient and simple crawler for playwright

              • v0.1.6
              • 12.59
              • Published

              crawlme

              Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.

                • v0.0.7
                • 12.53
                • Published

                spider2

                A 2nd generation spider to crawl any article site, automatic reading title and content.

                • v0.0.7
                • 12.53
                • Published

                proxidoor

                proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.

                • v1.0.3
                • 12.25
                • Published

                crawler-ts-fetch

                Lightweight crawler written in TypeScript using ES6 generators.

                • v1.1.1
                • 11.96
                • Published

                xstruct

                Data extraction tools.

                • v0.7.9
                • 11.95
                • Published

                @mseep/firecrawl-simple-mcp

                MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content

                • v1.0.2
                • 11.83
                • Published

                rebrowser-patches-fadi-patch

                Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

                • v1.0.18
                • 11.80
                • Published

                crawlable-solidify

                Some tools to help you to render your application as a static web site using the crawlable module.

                • v1.0.2
                • 11.53
                • Published

                doffy

                a headless browser automation library with easy-use API

                • v0.0.7
                • 10.99
                • Published

                node-crawling-framework

                NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)

                • v0.0.1-alpha.2
                • 10.96
                • Published

                fadi-rebrowser-patches

                Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

                • v0.0.6
                • 10.73
                • Published

                sitemaps-getter

                A tool to get sitemaps from websites and crawl them

                • v1.0.3
                • 10.64
                • Published

                krawler

                Fast and lightweight web crawler with built-in cheerio, xml and json parser.

                • v0.3.3
                • 10.52
                • Published

                tai-spider

                Scrapy Framework implemented by nodejs.

                • v0.1.21
                • 10.50
                • Published

                nocrawler

                Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                • v0.0.1
                • 10.27
                • Published

                crawl-client

                Node.js client for the CloudCrawler.io API

                • v1.0.3
                • 10.11
                • Published

                @datasco/sdk

                Datasco API SDK for Node.js to collect any data from any website

                • v1.0.4
                • 9.79
                • Published

                crawler-ts-fs

                Lightweight crawler written in TypeScript using ES6 generators.

                • v1.1.1
                • 9.77
                • Published

                @tooly/firecrawl

                Firecrawl API tools for OpenAI, Anthropic, and AI SDK

                • v0.0.3
                • 9.63
                • Published

                robinbot

                robin web crawling engine with nodejs

                • v0.9.0
                • 9.63
                • Published

                ccht

                A simple command0line tool to crawl and test your website

                • v0.1.2
                • 9.42
                • Published

                style-crawl

                Package to find style links from the site you want

                • v1.1.2
                • 9.22
                • Published

                webcreeper

                WebCreeper easy web crawler

                • v0.0.51
                • 9.22
                • Published

                scrapingapi

                One API to scrape All the Web.

                • v0.3.1
                • 8.93
                • Published

                aragog-client

                Aragog web scraping framework client

                • v1.0.3
                • 8.81
                • Published

                hylsplider

                fork from headless-chrome-crawler and update puppeteer to the latest version

                • v1.0.0
                • 8.62
                • Published

                crawling-typer

                Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo

                • v1.1.1
                • 8.62
                • Published

                crawler2

                Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                • v0.0.2
                • 8.54
                • Published

                sitescrapr

                Simple website crawler and scraper

                • v0.0.1
                • 8.54
                • Published

                instagram-crawling

                Simple Instagram Crawling without using public API

                • v1.1.2
                • 8.43
                • Published

                crawly-automation

                A lightweight and modular web crawling framework built with Puppeteer.

                  • v1.0.4
                  • 8.23
                  • Published

                  saintjs-score

                  SoongSil UniverSity U-saint Score Crawling

                  • v2.0.1
                  • 8.22
                  • Published

                  scrapr

                  A tool for getting public website content using a browser engine or http get.

                  • v0.0.15
                  • 8.22
                  • Published

                  crt-scrapper

                  Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.

                  • v1.0.4
                  • 7.94
                  • Published

                  cookied-phantom-crawler

                  PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj

                    • v1.0.1
                    • 7.62
                    • Published

                    press2blogger

                    Moving or backing up your Wordpress site to Blogger

                    • v1.0.3
                    • 7.62
                    • Published

                    imdb-scrapi

                    An API to get data off of IMDB using Puppeteer.

                    • v1.0.2
                    • 7.43
                    • Published

                    p4k-api

                    web scraper for album reviews from pitchfork

                    • v1.4.3
                    • 7.18
                    • Published

                    hcr

                    Easy To Use Web Crawler

                    • v1.4.1
                    • 6.68
                    • Published

                    crawler-hq

                    Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                    • v0.2.7
                    • 6.57
                    • Published

                    dynamic-crawling

                    Tem o objetivo de executar rotinas de CRAWLING a partir de um arquivo JSON utilizando xpath mas aceitando para cada passo uma função callback que recebe o valor e pode passar esse valor para um próximo passo.

                    • v1.0.2
                    • 6.52
                    • Published

                    jason-the-miner

                    Harvesting data at the <html> mine.

                    • v1.1.1
                    • 6.52
                    • Published

                    goose-browser-environment

                    Environment for Goose parser which allows to run it in commmon Browser

                    • v1.0.4
                    • 6.43
                    • Published

                    spa-seo

                    Single Page App SER

                    • v0.0.3
                    • 6.43
                    • Published

                    @stacksleuth/browser-agent

                    StackSleuth in-house browser automation agent for debugging and user simulation

                    • v0.2.1
                    • 6.29
                    • Published

                    ig-scrap-cache

                    scrap and caching by use a redis from instagram

                    • v3.0.0
                    • 6.29
                    • Published

                    magnet-getter

                    An API to get magnet links using Puppeteer.

                    • v1.1.0
                    • 5.64
                    • Published

                    spider-stealth-core

                    A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha. The core module without browser installation

                    • v1.3.4
                    • 5.57
                    • Published

                    crawley

                    A simple web crawler

                    • v1.0.2
                    • 5.48
                    • Published

                    scrapeasy

                    Automated scraping module using patterns generated by the userscript Scrapeasy.

                    • v0.4.2
                    • 5.48
                    • Published

                    malkovich-malkovich

                    A lightweight and simple API for web crawling built on chromium puppeteer

                    • v0.0.1
                    • 5.48
                    • Published

                    skrap

                    Easily scrap web pages by providing json recipes

                    • v0.1.1
                    • 5.44
                    • Published

                    spider-core

                    A Node.js scraping framework built on puppeteer-core (to use a headless Chrome/Chromium browser). The core module without browser installation

                    • v1.3.11
                    • 5.44
                    • Published

                    plucky-crawler

                    The error crawler that powers http://plucky.io/

                    • v0.0.1
                    • 5.44
                    • Published

                    nodecraw

                    NodeCraw is a web crawling application that allows you to crawl specified URLs and extract information from web pages. It utilizes various modules and libraries to perform crawling and save the results.

                      • v1.0.7
                      • 5.38
                      • Published

                      spider-stealth

                      A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha

                      • v1.2.2
                      • 5.37
                      • Published

                      fiend

                      The most advanced web crawler for JavaScript

                      • v0.1.0
                      • 5.37
                      • Published

                      confession

                      Helper to extract confessions from webpages

                      • v3.1.0
                      • 5.37
                      • Published

                      wight-backend-web

                      A Wight backend for fetching static web pages

                      • v0.1.0
                      • 5.25
                      • Published

                      crawline

                      Web crawler

                      • v0.0.0
                      • 4.15
                      • Published

                      @leoko/crawler

                      Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

                      • v1.3.1
                      • 4.15
                      • Published

                      node-pool-scraper

                      Node.js web scraping utility powered by puppeteer pool

                      • v0.1.6
                      • 4.07
                      • Published

                      realfish-yct

                      Real Fish Youtube Trend Video Crawling

                      • v0.3.0
                      • 3.97
                      • Published

                      crawler-mod

                      based on node-crawler

                      • v0.0.1
                      • 2.47
                      • Published

                      dcrawler

                      DCrawler is a distribited web spider written in Nodejs and queued with Mongodb. It gives you the full power of jQuery to parse big pages as they are downloaded, asynchronously. Simplifying distributed crawler!

                      • v0.0.8
                      • 2.43
                      • Published

                      planisphere

                      A straightforward sitemap generator written in TypeScript.

                      • v1.0.1
                      • 2.41
                      • Published

                      declarative-scraper

                      Simple & Human-Friendly HTML Scraper with Json-ld support

                      • v0.1.1
                      • 2.41
                      • Published

                      netcrawler

                      Net Crawler is a web spider written with Nodejs

                        • v0.8.6
                        • 2.41
                        • Published

                        cspider

                        Distributed web crawler powered by Headless Chrome

                        • v0.0.6
                        • 2.37
                        • Published

                        session-scraper

                        Simple scraper for imitating browsing sessions

                        • v0.0.2
                        • 2.37
                        • Published

                        parkour

                        Parkour the web like a yamakazi

                        • v1.0.0
                        • 2.32
                        • Published

                        nstock

                        naver stock data crawler

                        • v0.1.0-beta
                        • 0.00
                        • Published

                        hapi-goldwasher

                        A plugin for Hapi.js to run goldwasher as a scraping API on the web.

                        • v1.0.4
                        • 0.00
                        • Published