Package Exports

crawler-url-parser

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (crawler-url-parser) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

crawler-url-parser

An URL parser for crawling purpose

node

Installation

npm install crawler-url-parser

Usage

Parse

const cup = require('crawler-url-parser');
let url = cup.parse("../ddd","http://question.stackoverflow.com/aaa/bbb/ccc/");
console.log(url.normalized);
console.log(url.host);
console.log(url.domain);
console.log(url.subdomain);
console.log(url.protocol);
console.log(url.path);

Extract

const cup = require('crawler-url-parser');
let htmlStr=
    'html> \
        <body> \
            <a href="http://www.stackoverflow.com/internal-1">test-link-4</a><br /> \
            <a href="http://www.stackoverflow.com/internal-2">test-link-5</a><br /> \
            <a href="http://www.stackoverflow.com/internal-2">test-link-6</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-1">test-link-7</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-2">test-link-8</a><br /> \
            <a href="http://faq.stackoverflow.com/subdomain-2">test-link-9</a><br /> \
            <a href="http://www.google.com/external-1">test-link-10</a><br /> \
            <a href="http://www.google.com/external-2">test-link-11</a><br /> \
            <a href="http://www.google.com/external-2">test-link-12</a><br /> \
        </body> \
    </html>';
let currentUrl= "http://www.stackoverflow.com/aaa/bbb/ccc";
let result = cup.extract(htmlStr,currentUrl);
console.log(result.length);

Level

const cup = require('crawler-url-parser');

Query

const cup = require('crawler-url-parser');

Test

mocha or npm test

more than 200 unit test cases. check test folder and quickstart.js for extra usage.