Package Exports
- @yeskiy/sitemapper
Readme
Sitemap-parser
NOTE: This is a fork of the original sitemapper package with full migration to ESM and ts. The original package can be found here
Parse through a sitemaps xml to get all the urls for your crawler.
Installation
npm install @yeskiy/sitemapper --saveSimple Example
import Sitemapper from '@yeskiy/sitemapper';
const sitemap = new Sitemapper();
sitemap.fetch('https://www.google.com/work/sitemap.xml').then((sites) => {
console.log(sites);
});
Options
You can add options on the initial Sitemapper object when instantiating it.
requestHeaders: (Object) - Additional Request Headers (e.g.User-Agent)timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)url: (String) - Sitemap URL to crawldebug: (Boolean) - Enables/Disables debug console logging. Default: Falseconcurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: Truelastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urlsgotParams: (GotOptions) - Additional options to pass to thegotlibrary. See Got Options