JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 6
  • Score
    100M100P100Q58897F
  • License MIT

simple polite crawling of the web.

Package Exports

  • mrspider

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mrspider) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

NPM

alt tag Coverage Status

Mr Spider

Crawl the web politely.

For use with io.js > 2.0

var Spider = require('mrspider');

var s = new Spider();

s.addUrl('http://blog.scrapinghub.com')
    .addLevel({
        /*
         *   Deal with all links on this domain.
         * */
        pattern: /http:\/\/blog.scrapinghub.com/,
        action: function (webpage) {
            // webpage.dom is a jquery like object
            var $ = webpage.dom;
            $('ul li a').each(function() {
                var link = $(this).attr('href');
                console.log(link);
                // add the url to be crawled.
                s.addUrl(link);
            })
        }
    })
    .addLevel({
        /*
         * For the categories do this.
         * */
        pattern: /http:\/\/blog\/scraping.com\/category/,
        action: function (webpage) {
            // here you could do something useful or ...
            console.log('do something for the categories.');
        }
    })
    /*
     * We have the flexibility to do what we like.
     * So here we match all urls and just log the url to the console.
     * */
    .addLevel({
        pattern: /.*/,
        action: function(webpage) {
            console.log('crawling %s', webpage.url);
        }
    })
    .start();

Installation

$ npm i mrspider --save

Features

  • Super simple api.
  • Use jquery selectors to scrape links/ information.
  • Use the full power of JavaScript giving you great flexibility.

Tests

To run the test suite, first install the dependencies, then run npm test: