JSPM

  • Created
  • Published
  • Downloads 654
  • Score
    100M100P100Q96762F
  • License MIT

Get all links from a HTML markup

Package Exports

  • html-urls
  • html-urls/src/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (html-urls) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

html-urls

Last version Coverage Status NPM Status

Get all URLs from a HTML markup. It's based on W3C link checker.

Install

$ npm install html-urls --save

Usage

const got = require('got')
const htmlUrls = require('html-urls')

;(async () => {
  const url = process.argv[2]
  if (!url) throw new TypeError('Need to provide an url as first argument.')
  const { body: html } = await got(url)
  const links = htmlUrls({ html, url })

  links.forEach(({ url }) => console.log(url))

  // => [
  //   'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
  //   'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
  //   'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
  //   'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
  //   'https://microlink.io/commons-8b286eac293678e1c98c.js',
  //   'https://microlink.io',
  //   ...
  // ]
})()

It returns the following structure per every value detect on the HTML markup:

value

Type: <string>

The original value.

url

Type: <string|undefined>

The normalized URL, if the value can be considered an URL.

uri

Type: <string|undefined>

The normalized value as URI.


See examples for more!

API

htmlUrls([options])

options

html

Type: string
Default: ''

The HTML markup.

url

Type: string
Default: ''

The URL associated with the HTML markup.

It is used for resolve relative links that can be present in the HTML markup.

whitelist

Type: array
Default: []

A list of links to be excluded from the final output. It supports regex patterns.

See matcher for know more.

removeDuplicates

Type: boolean
Default: true

Remove duplicated links detected over all the HTML tags.

  • xml-urls – Get all urls from a Feed/Atom/RSS/Sitemap xml markup.
  • css-urls – Get all URLs referenced from stylesheet files.

License

html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats