JSPM

Found 177 results for crawling

siter

Site content parser for popular websites with fallback to Open Graph and Twitter Cards

  • v0.0.16
  • 15.32
  • Published

crawlable

A way to make your web application crawlable, so it can be well referenced on the web.

  • v0.4.13
  • 14.73
  • Published

htcrawl

crawler for single page applications

  • v1.2.1
  • 14.45
  • Published

goose-paginator

Paginator enriches ability to paginate over the pages in Goose Parser

  • v1.0.2
  • 14.40
  • Published

@imaginerlabs/user-agent-generator

High-performance, configurable, batch-generating User-Agent spoofing library. Supports multiple browsers, devices, and returns detailed meta information. Perfect for web scraping, automated testing, proxy pools and more.

  • v1.0.2
  • 14.40
  • Published

session-scraper

Simple scraper for imitating browsing sessions

  • v0.0.2
  • 14.36
  • Published

mrspider

simple polite crawling of the web.

  • v5.1.2
  • 14.32
  • Published

web-crawler

Scalable, extensible, web crawler framework.

  • v0.0.0
  • 13.66
  • Published

xstruct

Data extraction tools.

  • v0.7.9
  • 13.50
  • Published

sitemap-js-obj

Generate a sitemap javascript object from the folder structure crawling HTML files only.

  • v0.0.3
  • 13.50
  • Published

@0y0/scraper

A web scraping tool that extracts any data from the web.

  • v1.0.0
  • 13.38
  • Published

@botwall/sdk

BotWall SDK for site protection and bot crawling

  • v1.1.1
  • 13.30
  • Published

udemy-crawler

Crawling Udemy course info and save into JSON format.

  • v1.1.1
  • 13.16
  • Published

webcreeper

WebCreeper easy web crawler

  • v0.0.51
  • 12.34
  • Published

pattern-grab

🤛🏻 Regular Expression Data Grabber

    • v1.0.1
    • 12.19
    • Published

    goose-chrome-environment

    Environment for Goose Parser which allows to run it in Chrome headless via Puppeteer API

    • v1.1.4
    • 12.17
    • Published

    crawl-client

    Node.js client for the CloudCrawler.io API

    • v1.0.3
    • 11.82
    • Published

    crawler-ts-fetch

    Lightweight crawler written in TypeScript using ES6 generators.

    • v1.1.1
    • 11.35
    • Published

    fadi-rebrowser-patches

    Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

    • v0.0.6
    • 11.22
    • Published

    @tooly/firecrawl

    Firecrawl API tools for OpenAI, Anthropic, and AI SDK

    • v0.0.3
    • 11.21
    • Published

    tiny-crawler

    tiny-crawler is a web crawler.

    • v0.0.5
    • 10.98
    • Published

    scrapr

    A tool for getting public website content using a browser engine or http get.

    • v0.0.15
    • 10.98
    • Published

    enispider

    A Node.js scraping framework built on puppeteer (to use a headless Chrome/Chromium browser)

    • v1.2.5
    • 10.87
    • Published

    console-tourist

    This script provides to analyze console error on your website.

    • v1.2.0
    • 10.51
    • Published

    node-raspar

    Easily scrap the web for torrent and media files.

    • v1.2.6
    • 10.51
    • Published

    notion-crawler

    Easily crawl your public notion pages

    • v0.0.9
    • 10.47
    • Published

    beautifulstew

    A simple web scraping tool built for developers that can be utilized on both the client and server.

      • v1.1.4
      • 10.34
      • Published

      crawling

      A simple crawler made in JavaScript for Node.

      • v1.0.1
      • 10.34
      • Published

      realfish-yct

      Real Fish Youtube Trend Video Crawling

      • v0.3.0
      • 10.33
      • Published

      papermonk

      Streaming pdf fetcher for academic papers.

      • v0.0.3
      • 9.96
      • Published

      node-crawler-scraper

      Simple and powerful crawler. It scraps content and collects links from websites using request or phantomjs. The whole magic and simplicity is behind configuration.

        • v1.0.1
        • 9.89
        • Published

        earthworm

        easily create crawlers based on self-replicated scrapers

        • v1.0.4
        • 9.89
        • Published

        img-cli

        An interactive Command-Line Interface Build in NodeJS for downloading a single image or multiple images to disk from URL

        • v1.2.0
        • 9.89
        • Published

        scrapingai

        Build web scraping agents using AI to auto-extract the data from websites

        • v1.0.1
        • 9.76
        • Published

        scrapingapi

        One API to scrape All the Web.

        • v0.3.1
        • 9.76
        • Published

        crawler-hq

        Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

        • v0.2.7
        • 9.64
        • Published

        firecrawl-simple-mcp

        Model Context Protocol (MCP) server for Firecrawl Simple - provides web scraping and crawling capabilities to LLMs

        • v1.0.2
        • 9.41
        • Published

        realfish-yc

        Real Fish Youtube Video Crawling Module

        • v0.1.8
        • 9.24
        • Published

        hapi-goldwasher

        A plugin for Hapi.js to run goldwasher as a scraping API on the web.

        • v1.0.4
        • 9.01
        • Published

        scrapeasy

        Automated scraping module using patterns generated by the userscript Scrapeasy.

        • v0.4.2
        • 9.01
        • Published

        sasori-crawl

        Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

        • v1.0.0
        • 9.01
        • Published

        nodecraw

        NodeCraw is a web crawling application that allows you to crawl specified URLs and extract information from web pages. It utilizes various modules and libraries to perform crawling and save the results.

          • v1.0.7
          • 8.79
          • Published

          spider2

          A 2nd generation spider to crawl any article site, automatic reading title and content.

          • v0.0.7
          • 8.69
          • Published

          aragog-client

          Aragog web scraping framework client

          • v1.0.3
          • 8.69
          • Published

          tai-spider

          Scrapy Framework implemented by nodejs.

          • v0.1.21
          • 8.67
          • Published

          scrape-them-all

          🚀 An easy-to-handle Node.js scraper that allow you to scrape them all in a record time.

          • v2.0.0
          • 8.31
          • Published

          @leoko/crawler

          Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

          • v1.3.1
          • 8.31
          • Published

          crawley

          A simple web crawler

          • v1.0.2
          • 8.31
          • Published

          doffy

          a headless browser automation library with easy-use API

          • v0.0.7
          • 8.10
          • Published

          @jifeon/goose-parser

          PhantomJS/Browser lib which allows to parse a webpage

          • v0.2.0-alpha.3
          • 8.10
          • Published

          @mseep/firecrawl-simple-mcp

          MCP server for Firecrawl Simple — a web scraping and site mapping tool enabling LLMs to access and process web content

          • v1.0.2
          • 8.10
          • Published

          crawlme

          Makes your ajax web application indexable by search engines by generating html snapshots on the fly. Caches results for blazing fast responses and better page ranking.

            • v0.0.7
            • 8.01
            • Published

            crawler-ts-fs

            Lightweight crawler written in TypeScript using ES6 generators.

            • v1.1.1
            • 8.00
            • Published

            crawler-ts

            Lightweight crawler written in TypeScript using ES6 generators.

            • v1.1.1
            • 8.00
            • Published

            crawlable-solidify

            Some tools to help you to render your application as a static web site using the crawlable module.

            • v1.0.2
            • 7.83
            • Published

            sitescrapr

            Simple website crawler and scraper

            • v0.0.1
            • 7.83
            • Published

            cookied-phantom-crawler

            PhantomJS and JSDOM based crawling tool. Used PhantomJS for full load of asynchronously-loaded resources and JSDOM for quick crawls. Allows custom [tough-cookie](https://www.npmjs.com/package/tough-cookie) insertion. Refer to [cheerio](https://www.npmj

              • v1.0.1
              • 7.70
              • Published

              crawly-automation

              A lightweight and modular web crawling framework built with Puppeteer.

                • v1.0.4
                • 7.60
                • Published

                netcrawler

                Net Crawler is a web spider written with Nodejs

                  • v0.8.6
                  • 7.35
                  • Published

                  spider-core

                  A Node.js scraping framework built on puppeteer-core (to use a headless Chrome/Chromium browser). The core module without browser installation

                  • v1.3.11
                  • 7.35
                  • Published

                  headline-news-naver

                  This extracts the top five news metadata from NAVER headlines.

                  • v1.0.5
                  • 7.32
                  • Published

                  spa-seo

                  Single Page App SER

                  • v0.0.3
                  • 7.24
                  • Published

                  crawler-ts-htmlparser2

                  Lightweight crawler written in TypeScript using ES6 generators.

                  • v1.1.1
                  • 7.24
                  • Published

                  hylsplider

                  fork from headless-chrome-crawler and update puppeteer to the latest version

                  • v1.0.0
                  • 7.23
                  • Published

                  crawler2

                  Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                  • v0.0.2
                  • 6.87
                  • Published

                  scrapyteer

                  Web scraping/crawling framework built on top of headless Chrome

                  • v1.4.0
                  • 6.58
                  • Published

                  dcrawler

                  DCrawler is a distribited web spider written in Nodejs and queued with Mongodb. It gives you the full power of jQuery to parse big pages as they are downloaded, asynchronously. Simplifying distributed crawler!

                  • v0.0.8
                  • 6.58
                  • Published

                  plucky-crawler

                  The error crawler that powers http://plucky.io/

                  • v0.0.1
                  • 6.44
                  • Published

                  saintjs-score

                  SoongSil UniverSity U-saint Score Crawling

                  • v2.0.1
                  • 6.44
                  • Published

                  krawler

                  Fast and lightweight web crawler with built-in cheerio, xml and json parser.

                  • v0.3.3
                  • 6.42
                  • Published

                  rebrowser-patches-fadi-patch

                  Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

                  • v1.0.18
                  • 6.35
                  • Published

                  confession

                  Helper to extract confessions from webpages

                  • v3.1.0
                  • 6.35
                  • Published

                  @crstn/redirect

                  A small package to crawl a site and return a redirect template. This is helpful for migration from one to another website with different url schemes.

                  • v1.2.0
                  • 6.34
                  • Published

                  @datasco/sdk

                  Datasco API SDK for Node.js to collect any data from any website

                  • v1.0.4
                  • 5.63
                  • Published

                  proxidoor

                  proxidoor helps you make HTTP requests through a rotating proxy, you can use it for services such as web scraping, web crawling and more.

                  • v1.0.3
                  • 5.63
                  • Published

                  friday-sdk

                  Official JavaScript/TypeScript SDK for the Friday API

                  • v0.2.2
                  • 5.37
                  • Published

                  cspider

                  Distributed web crawler powered by Headless Chrome

                  • v0.0.6
                  • 5.29
                  • Published

                  goose-browser-environment

                  Environment for Goose parser which allows to run it in commmon Browser

                  • v1.0.4
                  • 5.29
                  • Published

                  spider-stealth

                  A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha

                  • v1.2.2
                  • 5.29
                  • Published

                  p4k-api

                  web scraper for album reviews from pitchfork

                  • v1.4.3
                  • 5.29
                  • Published

                  nocrawler

                  Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously. Scraping should be simple and fun!

                  • v0.0.1
                  • 5.29
                  • Published

                  crawling-typer

                  Transform your text with dynamic typing animations! crawling-typer lets you display an array of strings one at a time, each with its own color. Customize typing speed, delete speed, and pauses between strings. Enjoy full control with loop counts, post-loo

                  • v1.1.1
                  • 5.29
                  • Published

                  magnet-getter

                  An API to get magnet links using Puppeteer.

                  • v1.1.0
                  • 4.33
                  • Published

                  spider-stealth-core

                  A Node.js scraping framework built on puppeteer-extra (to use a headless Chrome/Chromium browser). Has the ability to solve reCaptcha. The core module without browser installation

                  • v1.3.4
                  • 4.26
                  • Published

                  dynamic-crawling

                  Tem o objetivo de executar rotinas de CRAWLING a partir de um arquivo JSON utilizando xpath mas aceitando para cada passo uma função callback que recebe o valor e pode passar esse valor para um próximo passo.

                  • v1.0.2
                  • 4.20
                  • Published

                  planisphere

                  A straightforward sitemap generator written in TypeScript.

                  • v1.0.1
                  • 4.20
                  • Published

                  robinbot

                  robin web crawling engine with nodejs

                  • v0.9.0
                  • 4.15
                  • Published

                  imdb-scrapi

                  An API to get data off of IMDB using Puppeteer.

                  • v1.0.2
                  • 4.06
                  • Published

                  declarative-scraper

                  Simple & Human-Friendly HTML Scraper with Json-ld support

                  • v0.1.1
                  • 4.06
                  • Published

                  miniscraper

                  Minimalist Node.js web scraper and crawler working with under-the-hood JSDOM

                  • v0.3.2
                  • 4.06
                  • Published

                  fiend

                  The most advanced web crawler for JavaScript

                  • v0.1.0
                  • 4.00
                  • Published

                  @stacksleuth/browser-agent

                  StackSleuth in-house browser automation agent for debugging and user simulation

                  • v0.2.1
                  • 4.00
                  • Published

                  keyworm

                  keyword mention 크롤러

                  • v0.1.1
                  • 4.00
                  • Published

                  style-crawl

                  Package to find style links from the site you want

                  • v1.1.2
                  • 4.00
                  • Published

                  crawler-mod

                  based on node-crawler

                  • v0.0.1
                  • 2.49
                  • Published

                  instagram-crawling

                  Simple Instagram Crawling without using public API

                  • v1.1.2
                  • 2.49
                  • Published

                  jason-the-miner

                  Harvesting data at the <html> mine.

                  • v1.1.1
                  • 2.46
                  • Published

                  skrap

                  Easily scrap web pages by providing json recipes

                  • v0.1.1
                  • 2.46
                  • Published

                  crawline

                  Web crawler

                  • v0.0.0
                  • 2.43
                  • Published

                  node-pool-scraper

                  Node.js web scraping utility powered by puppeteer pool

                  • v0.1.6
                  • 2.37
                  • Published

                  node-crawling-framework

                  NodeJs crawling & scraping framework heavily inspired by Scrapy (Pyhton)

                  • v0.0.1-alpha.2
                  • 2.34
                  • Published

                  ccht

                  A simple command0line tool to crawl and test your website

                  • v0.1.2
                  • 2.34
                  • Published

                  wight-backend-web

                  A Wight backend for fetching static web pages

                  • v0.1.0
                  • 2.34
                  • Published

                  spamlet

                  spamlet is an efficient and simple crawler for playwright

                    • v0.1.6
                    • 2.34
                    • Published

                    crt-scrapper

                    Easily create a scraper api with the @web/scrapper library, which includes a scraper and advanced events for your website.

                    • v1.0.4
                    • 2.34
                    • Published

                    gumo

                    A web-crawler and scraper that extracts data from a family of nested dynamic webpages with added enhancements to assist in knowledge mining applications.

                    • v1.0.7
                    • 0.00
                    • Published

                    hcr

                    Easy To Use Web Crawler

                    • v1.4.1
                    • 0.00
                    • Published

                    @subtitles/providers

                    Providers are the core of applications, where the subtitles are collected. Each provider exports a unique strategy for gathering data. From legendastv's web scraping from opensubtitle API usage, you can collect subtitles from your favorite tv shows and mo

                    • v0.3.0-beta.2
                    • 0.00
                    • Published

                    press2blogger

                    Moving or backing up your Wordpress site to Blogger

                    • v1.0.3
                    • 0.00
                    • Published

                    parkour

                    Parkour the web like a yamakazi

                    • v1.0.0
                    • 0.00
                    • Published

                    ig-scrap-cache

                    scrap and caching by use a redis from instagram

                    • v3.0.0
                    • 0.00
                    • Published

                    n8n-nodes-firecrawl-tool

                    n8n node for Firecrawl v2 API - Web scraping, crawling, and data extraction tool for workflows and AI agents

                    • v0.1.2
                    • 0.00
                    • Published

                    sitemaps-getter

                    A tool to get sitemaps from websites and crawl them

                    • v1.0.3
                    • 0.00
                    • Published

                    nstock

                    naver stock data crawler

                    • v0.1.0-beta
                    • 0.00
                    • Published

                    malkovich-malkovich

                    A lightweight and simple API for web crawling built on chromium puppeteer

                    • v0.0.1
                    • 0.00
                    • Published