JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 30
  • Score
    100M100P100Q61447F
  • License MIT

An `URL` parser for crawling purpose.

Package Exports

  • crawler-url-parser

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (crawler-url-parser) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

crawler-url-parser

NPM Package Downloads Total

crawler-url-parser

An URL parser for crawling purpose.

Installation

npm install crawler-url-parser

Usage

Parse

const cup = require('crawler-url-parser');
let url = cup.parse("../ddd","http://question.stackoverflow.com/aaa/bbb/ccc/");
console.log(url.normalized);
console.log(url.host);
console.log(url.domain);
console.log(url.subdomain);
console.log(url.protocol);
console.log(url.path);

Extract

const cup = require('crawler-url-parser');
let htmlStr=
    'html> \
        <body> \
            <a href="http://www.stackoverflow.com/internal-1">test-link-4</a><br /> \
            <a href="http://www.stackoverflow.com/internal-2">test-link-5</a><br /> \
            <a href="http://www.stackoverflow.com/internal-2">test-link-6</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-1">test-link-7</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-2">test-link-8</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-2">test-link-9</a><br /> \
            <a href="http://www.google.com/external-1">test-link-10</a><br /> \
            <a href="http://www.google.com/external-2">test-link-11</a><br /> \
            <a href="http://www.google.com/external-2">test-link-12</a><br /> \
        </body> \
    </html>';
let currentUrl= "http://www.stackoverflow.com/aaa/bbb/ccc";
let result = cup.extract(htmlStr,currentUrl);
console.log(result.length);

Test

  • check test folder extra usage. mocha or npm test

API