JSPM – @mastixmc/sitemapper@3.2.0

Package Exports

@mastixmc/sitemapper

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@mastixmc/sitemapper) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Sitemapper - Extended version

This is a fork from https://github.com/cabbiepete/sitemapper, but adds the following features:

Allows loading of sitemap.xml.gz files
Increases default timeout
Allows to filter by lastmod date
Added URL filter to filter all returned URLs

Original description

Parse through a sitemaps xml to get all the urls for your crawler.

Version 3

Installation

npm install @mastixmc/sitemapper

Simple Example

const Sitemapper = require('sitemapper');

const sitemap = new Sitemapper();

sitemap.fetch('http://wp.seantburke.com/sitemap.xml').then(function(sites) {
  console.log(sites);
});

Examples in ES5

const Sitemapper = require('sitemapper');

const Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000, //15 seconds
  lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '5',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
  },
  urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX

});

Google.fetch()
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });


// or


const sitemap = new Sitemapper();

sitemapper.timeout = 5000;
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
  .then(function (data) {
    console.log(data);
  })
  .catch(function (error) {
    console.log(error);
  });

Examples in ES6

import Sitemapper from 'sitemapper';

const Google = new Sitemapper({
  url: 'https://www.google.com/work/sitemap.xml',
  timeout: 15000, // 15 seconds
  lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '3',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
  },
    urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX
});

Google.fetch()
  .then(data => console.log(data.sites))
  .catch(error => console.log(error));


// or


const sitemapper = new Sitemapper();
sitemapper.timeout = 5000;
sitemapper.lastmod = { //filter based on lastmod (here: only get updated links from one week ago)
    duration: '14',
    measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
};
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
  .then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
  .catch(error => console.log(error));