JSPM

@yeskiy/sitemapper

5.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 2
  • Score
    100M100P100Q50822F
  • License MIT

Parser for XML Sitemaps to be used with Robots.txt and web crawlers

Package Exports

  • @yeskiy/sitemapper

Readme

Quality Gate Status Test NPM Version NPM Downloads license

Sitemap-parser

NOTE: This is a fork of the original sitemapper package with full migration to ESM and ts. The original package can be found here

Parse through a sitemaps xml to get all the urls for your crawler.

Installation

npm install @yeskiy/sitemapper --save

Simple Example

import Sitemapper from '@yeskiy/sitemapper';

const sitemap = new Sitemapper();

sitemap.fetch('https://www.google.com/work/sitemap.xml').then((sites) => {
    console.log(sites);
});

Options

You can add options on the initial Sitemapper object when instantiating it.

  • requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent)
  • timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
  • url: (String) - Sitemap URL to crawl
  • debug: (Boolean) - Enables/Disables debug console logging. Default: False
  • concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
  • retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
  • rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True
  • lastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urls
  • gotParams: (GotOptions) - Additional options to pass to the got library. See Got Options

License

MIT