JSPM

Found 177 results for crawling

goldwasher

Extraction of text and related metadata.

  • v7.0.0
  • 23.93
  • Published

twilight

Twitter API tools

  • v1.0.5
  • 23.20
  • Published

goose-parser

Multi environment web page parser

  • v0.6.1
  • 22.66
  • Published

syphonx

SyphonX is a tool that extracts data from HTML data, transforming it into JSON of any shape or size. It combines the power of CSS Selectors and jQuery, Regular Expressions, and Javascript into a declarative template format to elegantly solve the simplest

  • v1.2.66
  • 22.39
  • Published

kaiser-crawler

Node.js module for crawling the web

  • v1.0.5
  • 19.88
  • Published

crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

  • v2.2.5
  • 19.28
  • Published

goldwasher-schedule

Scheduled goldwasher requests, using goldwasher-needle and node-schedule.

  • v6.0.1
  • 18.90
  • Published

goldwasher-aws-lambda

A version of goldwasher that runs as a module on AWS Lambda.

  • v1.0.3
  • 18.47
  • Published

sitesampler

Sample website text content over time.

  • v4.0.5
  • 17.56
  • Published

goldwasher-needle

Plugin for goldwasher to add needle for easy HTTP requests.

  • v2.1.0
  • 17.34
  • Published

@crawlbase/mcp

MCP server for Crawlbase API - enables web scraping through Model Context Protocol

  • v1.0.3
  • 16.86
  • Published

node-web-crawler

Node Web Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

  • v0.0.6
  • 16.22
  • Published

graceful-playwright

Gracefully handle timeout and network error with auto retry.

  • v1.5.1
  • 16.20
  • Published

@monostate/node-scraper

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

  • v1.8.1
  • 16.20
  • Published

@jnv/scrapoxy

Scrapoxy is a proxy for scrapers

  • v2.5.0
  • 16.02
  • Published

icrawler

Tool for easy scraping data from websites

  • v2.6.5
  • 15.99
  • Published

hquery.php

An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

  • v3.3.0
  • 15.83
  • Published

siter

Site content parser for popular websites with fallback to Open Graph and Twitter Cards

  • v0.0.16
  • 15.32
  • Published

crawlable

A way to make your web application crawlable, so it can be well referenced on the web.

  • v0.4.13
  • 14.73
  • Published

htcrawl

crawler for single page applications

  • v1.2.1
  • 14.45
  • Published

goose-paginator

Paginator enriches ability to paginate over the pages in Goose Parser

  • v1.0.2
  • 14.40
  • Published

@imaginerlabs/user-agent-generator

High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

  • v1.0.2
  • 14.40
  • Published

session-scraper

Simple scraper for imitating browsing sessions

  • v0.0.2
  • 14.36
  • Published

mrspider

simple polite crawling of the web.

  • v5.1.2
  • 14.32
  • Published

web-crawler

Scalable, extensible, web crawler framework.

  • v0.0.0
  • 13.66
  • Published

xstruct

Data extraction tools.

  • v0.7.9
  • 13.50
  • Published

sitemap-js-obj

Generate a sitemap javascript object from the folder structure crawling HTML files only.

  • v0.0.3
  • 13.50
  • Published

@0y0/scraper

A web scraping tool that extracts any data from the web.

  • v1.0.0
  • 13.38
  • Published

@botwall/sdk

BotWall SDK for site protection and bot crawling

  • v1.1.1
  • 13.30
  • Published

udemy-crawler

Crawling Udemy course info and save into JSON format.

  • v1.1.1
  • 13.16
  • Published

webcreeper

WebCreeper easy web crawler

  • v0.0.51
  • 12.34
  • Published

pattern-grab

🤛🏻 Regular Expression Data Grabber

    • v1.0.1
    • 12.19
    • Published

    goose-chrome-environment

    Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API

    • v1.1.4
    • 12.17
    • Published

    crawl-client

    Node.js client for the CloudCrawler.io API

    • v1.0.3
    • 11.82
    • Published

    crawler-ts-fetch

    Lightweight crawler written in TypeScript using ES6 generators.

    • v1.1.1
    • 11.35
    • Published

    fadi-rebrowser-patches

    Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

    • v0.0.6
    • 11.22
    • Published

    @tooly/firecrawl

    Firecrawl API tools for OpenAI, Anthropic, and AI SDK

    • v0.0.3
    • 11.21
    • Published

    tiny-crawler

    tiny-crawler is a web crawler.

    • v0.0.5
    • 10.98
    • Published

    scrapr

    A tool for getting public website content using a browser engine or http get.

    • v0.0.15
    • 10.98
    • Published

    enispider

    A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)

    • v1.2.5
    • 10.87
    • Published

    console-tourist

    This script provides to analyze console error on your website.

    • v1.2.0
    • 10.51
    • Published

    node-raspar

    Easily scrap the web for torrent and media files.

    • v1.2.6
    • 10.51
    • Published

    notion-crawler

    Easily crawl your public notion pages

    • v0.0.9
    • 10.47
    • Published

    beautifulstew

    A simple web scraping tool built for developers that can be utilized on both the client and server.

      • v1.1.4
      • 10.34
      • Published

      crawling

      A simple crawler made in JavaScript for Node.

      • v1.0.1
      • 10.34
      • Published

      realfish-yct

      Real Fish Youtube Trend Video Crawling

      • v0.3.0
      • 10.33
      • Published

      papermonk

      Streaming pdf fetcher for academic papers.

      • v0.0.3
      • 9.96
      • Published

      node-crawler-scraper

      Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.

        • v1.0.1
        • 9.89
        • Published

        earthworm

        easily create crawlers based on self-replicated scrapers

        • v1.0.4
        • 9.89
        • Published

        img-cli

        An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL

        • v1.2.0
        • 9.89
        • Published

        scrapingai

        Build web scraping agents using AI to auto-extract the data from websites

        • v1.0.1
        • 9.76
        • Published

        scrapingapi

        One API to scrape All the Web.

        • v0.3.1
        • 9.76
        • Published

        crawler-hq

        Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

        • v0.2.7
        • 9.64
        • Published

        firecrawl-simple-mcp

        Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

        • v1.0.2
        • 9.41
        • Published

        realfish-yc

        Real Fish Youtube Video Crawling Module

        • v0.1.8
        • 9.24
        • Published

        hapi-goldwasher

        A plugin for Hapi.js to run goldwasher as a scraping API on the web.

        • v1.0.4
        • 9.01
        • Published

        scrapeasy

        Automated scraping module using patterns generated by the userscript Scrapeasy.

        • v0.4.2
        • 9.01
        • Published

        sasori-crawl

        Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

        • v1.0.0
        • 9.01
        • Published

        nodecraw

        NodeCraw is a web crawling application that allows you to crawl specified URLs and extract information from web pages. It utilizes various modules and libraries to perform crawling and save the results.

          • v1.0.7
          • 8.79
          • Published

          spider2

          A 2nd generation spider to crawl any article site, automatic reading title and content.

          • v0.0.7
          • 8.69
          • Published

          aragog-client

          Aragog web scraping framework client

          • v1.0.3
          • 8.69
          • Published

          tai-spider

          Scrapy Framework implemented by nodejs.

          • v0.1.21
          • 8.67
          • Published

          scrape-them-all

          🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

          • v2.0.0
          • 8.31
          • Published

          @leoko/crawler

          Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

          • v1.3.1
          • 8.31
          • Published

          crawley

          A simple web crawler

          • v1.0.2
          • 8.31
          • Published

          doffy

          a headless browser automation library with easy-use API

          • v0.0.7
          • 8.10
          • Published

          @jifeon/goose-parser

          PhantomJS/Browser lib which allows to parse a webpage

          • v0.2.0-alpha.3
          • 8.10
          • Published

          @mseep/firecrawl-simple-mcp

          MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content

          • v1.0.2
          • 8.10
          • Published

          crawlme

          Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.

            • v0.0.7
            • 8.01
            • Published

            crawler-ts-fs

            Lightweight crawler written in TypeScript using ES6 generators.

            • v1.1.1
            • 8.00
            • Published

            crawler-ts

            Lightweight crawler written in TypeScript using ES6 generators.

            • v1.1.1
            • 8.00
            • Published

            crawlable-solidify

            Some tools to help you to render your application as a static web site using the crawlable module.

            • v1.0.2
            • 7.83
            • Published

            sitescrapr

            Simple website crawler and scraper

            • v0.0.1
            • 7.83
            • Published

            cookied-phantom-crawler

            PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj

              • v1.0.1
              • 7.70
              • Published

              crawly-automation

              A lightweight and modular web crawling framework built with Puppeteer.

                • v1.0.4
                • 7.60
                • Published

                netcrawler

                Net Crawler is a web spider written with Nodejs

                  • v0.8.6
                  • 7.35
                  • Published

                  spider-core

                  A Node.js scraping framework built on puppeteer-core (to use a headless Chrome/Chromium browser). The core module without browser installation

                  • v1.3.11
                  • 7.35
                  • Published

                  headline-news-naver

                  This extracts the top five news metadata from NAVER headlines.

                  • v1.0.5
                  • 7.32
                  • Published

                  spa-seo

                  Single Page App SER

                  • v0.0.3
                  • 7.24
                  • Published

                  crawler-ts-htmlparser2

                  Lightweight crawler written in TypeScript using ES6 generators.

                  • v1.1.1
                  • 7.24
                  • Published

                  hylsplider

                  fork from headless-chrome-crawler and update puppeteer to the latest version

                  • v1.0.0
                  • 7.23
                  • Published

                  crawler2

                  Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                  • v0.0.2
                  • 6.87
                  • Published

                  scrapyteer

                  Web scraping/crawling framework built on top of headless Chrome

                  • v1.4.0
                  • 6.58
                  • Published

                  dcrawler

                  DCrawler is a distribited web spider written in Nodejs and queued with Mongodb. It gives you the full power of jQuery to parse big pages as they are downloaded, asynchronously. Simplifying distributed crawler!

                  • v0.0.8
                  • 6.58
                  • Published

                  plucky-crawler

                  The error crawler that powers http://plucky.io/

                  • v0.0.1
                  • 6.44
                  • Published

                  saintjs-score

                  SoongSil UniverSity U-saint Score Crawling

                  • v2.0.1
                  • 6.44
                  • Published

                  krawler

                  Fast and lightweight web crawler with built-in cheerio, xml and json parser.

                  • v0.3.3
                  • 6.42
                  • Published

                  rebrowser-patches-fadi-patch

                  Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

                  • v1.0.18
                  • 6.35
                  • Published

                  confession

                  Helper to extract confessions from webpages

                  • v3.1.0
                  • 6.35
                  • Published

                  @crstn/redirect

                  A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.

                  • v1.2.0
                  • 6.34
                  • Published

                  @datasco/sdk

                  Datasco API SDK for Node.js to collect any data from any website

                  • v1.0.4
                  • 5.63
                  • Published

                  proxidoor

                  proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.

                  • v1.0.3
                  • 5.63
                  • Published

                  friday-sdk

                  Official JavaScript/TypeScript SDK for the Friday API

                  • v0.2.2
                  • 5.37
                  • Published

                  cspider

                  Distributed web crawler powered by Headless Chrome

                  • v0.0.6
                  • 5.29
                  • Published

                  goose-browser-environment

                  Environment for Goose parser which allows to run it in commmon Browser

                  • v1.0.4
                  • 5.29
                  • Published

                  spider-stealth

                  A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha

                  • v1.2.2
                  • 5.29
                  • Published

                  p4k-api

                  web scraper for album reviews from pitchfork

                  • v1.4.3
                  • 5.29
                  • Published

                  nocrawler

                  Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                  • v0.0.1
                  • 5.29
                  • Published

                  crawling-typer

                  Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo

                  • v1.1.1
                  • 5.29
                  • Published

                  magnet-getter

                  An API to get magnet links using Puppeteer.

                  • v1.1.0
                  • 4.33
                  • Published

                  spider-stealth-core

                  A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha. The core module without browser installation

                  • v1.3.4
                  • 4.26
                  • Published

                  dynamic-crawling

                  Tem o objetivo de executar rotinas de CRAWLING a partir de um arquivo JSON utilizando xpath mas aceitando para cada passo uma função callback que recebe o valor e pode passar esse valor para um próximo passo.

                  • v1.0.2
                  • 4.20
                  • Published

                  planisphere

                  A straightforward sitemap generator written in TypeScript.

                  • v1.0.1
                  • 4.20
                  • Published

                  robinbot

                  robin web crawling engine with nodejs

                  • v0.9.0
                  • 4.15
                  • Published

                  imdb-scrapi

                  An API to get data off of IMDB using Puppeteer.

                  • v1.0.2
                  • 4.06
                  • Published

                  declarative-scraper

                  Simple & Human-Friendly HTML Scraper with Json-ld support

                  • v0.1.1
                  • 4.06
                  • Published

                  miniscraper

                  Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

                  • v0.3.2
                  • 4.06
                  • Published

                  fiend

                  The most advanced web crawler for JavaScript

                  • v0.1.0
                  • 4.00
                  • Published

                  @stacksleuth/browser-agent

                  StackSleuth in-house browser automation agent for debugging and user simulation

                  • v0.2.1
                  • 4.00
                  • Published

                  keyworm

                  keyword mention 크롤러

                  • v0.1.1
                  • 4.00
                  • Published

                  style-crawl

                  Package to find style links from the site you want

                  • v1.1.2
                  • 4.00
                  • Published

                  crawler-mod

                  based on node-crawler

                  • v0.0.1
                  • 2.49
                  • Published

                  instagram-crawling

                  Simple Instagram Crawling without using public API

                  • v1.1.2
                  • 2.49
                  • Published

                  jason-the-miner

                  Harvesting data at the <html> mine.

                  • v1.1.1
                  • 2.46
                  • Published

                  skrap

                  Easily scrap web pages by providing json recipes

                  • v0.1.1
                  • 2.46
                  • Published

                  crawline

                  Web crawler

                  • v0.0.0
                  • 2.43
                  • Published

                  node-pool-scraper

                  Node.js web scraping utility powered by puppeteer pool

                  • v0.1.6
                  • 2.37
                  • Published

                  node-crawling-framework

                  NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)

                  • v0.0.1-alpha.2
                  • 2.34
                  • Published

                  ccht

                  A simple command0line tool to crawl and test your website

                  • v0.1.2
                  • 2.34
                  • Published

                  wight-backend-web

                  A Wight backend for fetching static web pages

                  • v0.1.0
                  • 2.34
                  • Published

                  spamlet

                  spamlet is an efficient and simple crawler for playwright

                    • v0.1.6
                    • 2.34
                    • Published

                    crt-scrapper

                    Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.

                    • v1.0.4
                    • 2.34
                    • Published

                    gumo

                    A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.

                    • v1.0.7
                    • 0.00
                    • Published

                    hcr

                    Easy To Use Web Crawler

                    • v1.4.1
                    • 0.00
                    • Published

                    @subtitles/providers

                    Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo

                    • v0.3.0-beta.2
                    • 0.00
                    • Published

                    press2blogger

                    Moving or backing up your Wordpress site to Blogger

                    • v1.0.3
                    • 0.00
                    • Published

                    parkour

                    Parkour the web like a yamakazi

                    • v1.0.0
                    • 0.00
                    • Published

                    ig-scrap-cache

                    scrap and caching by use a redis from instagram

                    • v3.0.0
                    • 0.00
                    • Published

                    n8n-nodes-firecrawl-tool

                    n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

                    • v0.1.2
                    • 0.00
                    • Published

                    sitemaps-getter

                    A tool to get sitemaps from websites and crawl them

                    • v1.0.3
                    • 0.00
                    • Published

                    nstock

                    naver stock data crawler

                    • v0.1.0-beta
                    • 0.00
                    • Published

                    malkovich-malkovich

                    A lightweight and simple API for web crawling built on chromium puppeteer

                    • v0.0.1
                    • 0.00
                    • Published