JSPM

  • Created
  • Published
  • Downloads 17932
  • Score
    100M100P100Q137280F
  • License MIT

Request an http(s) url and scrape its metadata in node.js or the browser.

Package Exports

  • url-metadata
  • url-metadata/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (url-metadata) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

url-metadata

Request a url and extract metadata from its html. Under the hood, this package does some post-request processing on top of the javascript native fetch API. Includes Open Graph Protocol (og:) and Twitter Card meta tags. Support also added for JSON-LD.

To report a bug or request a feature please open an issue or pull request in GitHub.

Usage

Works with Node.js version >=18.0.0 or in the browser when bundled with Webpack or Browserify, etc.

Use previous version 2.5.0 which uses the (now-deprecated) request module instead if you don't have access to javascript-native fetch API in your target environment.

Install:

$ npm install url-metadata --save

In your project file:

const urlMetadata = require('./../index.js');

(async function () {
  try {
    const metadata = await urlMetadata('./metadata.html', {
      mode: 'same-origin',
      includeResponseBody: true
    });
    console.log('fetched metadata:', metadata)
  } catch(err) {
    console.log('fetch error:', err);
  }
})();

Options & Defaults

The default options are the values below. To override the default options, pass in a second options argument.

const options = {
  // custom request headers
  requestHeaders: {
    'User-Agent': 'url-metadata/3.0 (npm module)',
    'From': 'example@example.com',
  }

  // `fetch` API cache setting for request
  cache: 'no-cache',

  // `fetch` API mode (ex: `cors`, `no-cors`, `same-origin`, etc)
  mode: 'cors',

  // timeout in milliseconds, default is 10 seconds
  timeout: 10000,

  // number of characters to truncate description to
  descriptionLength: 750,

  // force image urls in selected tags to use https,
  // valid for 'image', 'og:image', 'og:image:secure_url' tags & favicons with full paths
  ensureSecureImageRequest: true,

  // return raw response body as string
  includeResponseBody: false
};

const metadata = await urlMetadata('./metadata.html', options);

Returns

Returns a promise that is resolved with an object if the response is successful. Note that the url field returned will be the last hop in the request chain. So if you passed in a url that was generated by a url shortener you'll get back the final destination as the url.