Package Exports
- @mastixmc/sitemapper
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@mastixmc/sitemapper) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Sitemapper - Extended version
This is a fork from https://github.com/cabbiepete/sitemapper, but adds the following features:
- Allows loading of sitemap.xml.gz files
- Increases default timeout
- Allows to filter by lastmod date
- Added URL filter to filter all returned URLs
Original description
Parse through a sitemaps xml to get all the urls for your crawler.
Version 3
Installation
npm install @mastixmc/sitemapperSimple Example
const Sitemapper = require('sitemapper');
const sitemap = new Sitemapper();
sitemap.fetch('http://wp.seantburke.com/sitemap.xml').then(function(sites) {
console.log(sites);
});
Examples in ES5
const Sitemapper = require('sitemapper');
const Google = new Sitemapper({
url: 'https://www.google.com/work/sitemap.xml',
timeout: 15000, //15 seconds
lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
duration: '5',
measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
},
urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX
});
Google.fetch()
.then(function (data) {
console.log(data);
})
.catch(function (error) {
console.log(error);
});
// or
const sitemap = new Sitemapper();
sitemapper.timeout = 5000;
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
.then(function (data) {
console.log(data);
})
.catch(function (error) {
console.log(error);
});
Examples in ES6
import Sitemapper from 'sitemapper';
const Google = new Sitemapper({
url: 'https://www.google.com/work/sitemap.xml',
timeout: 15000, // 15 seconds
lastmod: { //filter based on lastmod (here: only get updated links from one week ago)
duration: '3',
measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
},
urlFilter: '^https:\/\/www\.mysite\.com\/somepath\/' // REGEX
});
Google.fetch()
.then(data => console.log(data.sites))
.catch(error => console.log(error));
// or
const sitemapper = new Sitemapper();
sitemapper.timeout = 5000;
sitemapper.lastmod = { //filter based on lastmod (here: only get updated links from one week ago)
duration: '14',
measurement: 'days' // years, months, weeks, days, hours, minutes, and seconds
};
sitemapper.fetch('http://wp.seantburke.com/sitemap.xml')
.then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
.catch(error => console.log(error));