JSPM

hapi-goldwasher

1.0.4
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 2
  • Score
    100M100P100Q28472F
  • License MIT

A plugin for Hapi.js to run goldwasher as a scraping API on the web.

Package Exports

  • hapi-goldwasher

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (hapi-goldwasher) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

hapi-goldwasher

npm version Build Status Coverage Status Code Climate

Dependency Status devDependency Status

A plugin for hapi to run goldwasher as a scraping API on the web. Basically a scraper proxy that will return information in the selected format, defaulting to JSON.

Installation

npm install hapi-goldwasher

If you aren't already running a hapi server, you need to install this too, to run the example:

npm install hapi

Options

When registering the plugin with hapi, you have several options, non of them required:

  • path - the endpoint you mount the plugin on. Defaults to /goldwasher.
  • maxRedirects - the maximum number of redirects the scraper will accept before giving up. Defaults to 5.
  • cors - a CORS object. Defaults to false. See hapi docs for more information.
  • raw - enable raw output mode. This will enable output=raw that will return the raw, scraped result, usually HTML.

Parameters

  • url - url to scrape. Required.
  • selector - cheerio (jQuery) selector, a selection of target tags. Defaults to the default of goldwasher, usually 'h1, h2, h3, h4, h5, h6, p'.
  • search - only pick results containing these terms. Not case or special character sensitive.
  • limit - limit number of results.
  • output - output format (json, xml, atom, rss or - if enabled - raw).
  • filterTexts - stop texts that should be excluded.
  • filterKeywords - stop words that should be excluded as keywords.
  • filterLocale - stop words from external JSON file (see documentation on goldwasher)).

Example

var Hapi = require('hapi');
var HapiGoldwasher = require('./index');

var server = new Hapi.Server();
server.connection({ port: 7979 });

server.register({
  register: HapiGoldwasher,
  options: {
    path: '/goldwasher',
    cors: {
      origin: ['*']
    }
  }
}, function(err) {
  if (err) {
    throw err;
  }

  server.start(function() {
    console.log('Server running at: ' + server.info.uri);
  });
});

Go to the server uri and you will be presented with a JSON response containing documentation. I recommend using something like the Chrome JSON Formatter for readability.