Package Exports
- html-urls
- html-urls/src/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (html-urls) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
html-urls
Get all URLs from a HTML markup. It's based on W3C link checker.
Install
$ npm install html-urls --save
Usage
const got = require('got')
const htmlUrls = require('html-urls')
;(async () => {
const url = process.argv[2]
if (!url) throw new TypeError('Need to provide an url as first argument.')
const { body: html } = await got(url)
const links = htmlUrls({ html, url })
links.forEach(({ url }) => console.log(url))
// => [
// 'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
// 'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
// 'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
// 'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
// 'https://microlink.io/commons-8b286eac293678e1c98c.js',
// 'https://microlink.io',
// ...
// ]
})()
It returns the following structure per every value detect on the HTML markup:
value
Type: <string>
The original value.
url
Type: <string|undefined>
The normalized URL, if the value can be considered an URL.
uri
Type: <string|undefined>
The normalized value as URI.
See examples for more!
API
htmlUrls([options])
options
html
Type: string
Default: ''
The HTML markup.
url
Type: string
Default: ''
The URL associated with the HTML markup.
It is used for resolve relative links that can be present in the HTML markup.
whitelist
Type: array
Default: []
A list of links to be excluded from the final output. It supports regex patterns.
See matcher for know more.
removeDuplicates
Type: boolean
Default: true
Remove duplicated links detected over all the HTML tags.
Related
- xml-urls – Get all urls from a Feed/Atom/RSS/Sitemap xml markup.
- css-urls – Get all URLs referenced from stylesheet files.
License
html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
kikobeats.com · GitHub @Kiko Beats · X @Kikobeats