JSPM

browserless

4.1.2
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 15469
  • Score
    100M100P100Q128242F
  • License MIT

Chrome Headless API made easy

Package Exports

  • browserless

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (browserless) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

browserless

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status Donate

Features

  • High level automation API on top Headless Chrome.
  • Oriented for production & performance scenarios.
  • Aborting unnecessary requests based on MIME types.
  • Pooling support to keep multiple browsers ready.
  • Blocking ads trackers by default.

Install

$ npm install puppeteer browserless --save

Usage

browserless is an high level API simplification over for do common actions.

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(tmpStream => {
    console.log(`your screenshot at ${tmpStream.path}`)
    tmpStream.cleanupSync()
  })

See more at examples.

Basic

All methods follow the same interface:

  • url: The target URL (required).
  • options: Specific settings for the method (optional).
  • callback: Node.js callback. If you don't provide one, the method will be return a promise.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

options

See puppeteer.launch#options.

By default the library will be pass a well known list of flags, so probably you don't need any additional setup.

.pool(options)

Tha main browserless constructor expose a singleton browser. This is enough for most scenarios, but in case you need you can intialize a pool of instances.

const createBrowserless = require('browserless')
const browserless = createBrowserless.pool()

options

See puppeteer.launch#options.

It follows the same API than constructor but accept a configurable parameter called poolOpts for setup specific pool options

poolOpts

See generic-pool#options.

.html(url, options)

It returns the full HTML content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

options

See page.goto.

Additionally, you can setup:

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

abortTypes

type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']

A list of resourceType requests that can be aborted in order to make the process faster.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

.text(url, options)

It returns the full text content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

options

They are the same than .html method.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpStream = await browserless.pdf(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`PDF generated at '${tmpStream.path}'`)
  tmpStream.cleanupSync() // It removes the file!
})()

It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.

options

See page.pdf.

Additionally, you can setup:

tmpOptions

See createTempFile#options.

media

Changes the CSS media type of the page using page.emulateMedia.

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.screenshot(url, options)

It takes a screenshot from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpStream = await browserless.screenshot(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`Screenshot taken at '${tmpStream.path}'`)
  tmpStream.cleanupSync() // It removes the file!
})()

It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.

options

See page.screenshot.

Additionally, you can setup:

tmpOptions

See createTempFile#options.

The options provided are passed to page.pdf.

Additionally, you can setup:

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.devices

List of all available devices preconfigured with deviceName, viewport and userAgent settings.

These devices are used for emulation purposes.

.getDevice(deviceName)

Get an specific device descriptor settings by descriptor name.

const browserless = require('browserless')

browserless.getDevice('Macbook Pro 15')

// {
//   name: 'Macbook Pro 15',
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 1,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

Advanced

The following methods are exposed to be used in scenarios where you need more granuality control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.evaluate(page, response)

It exposes an interface for creating your own evaluate function.

const browserless = require('browserless')()

const getUrlInfo = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

;(async () => {
  const url = 'https://example.com'
  const info = await getUrlInfo(url)

  console.log(info)
  // {
  //   "statusCode": 200,
  //   "url": "https://example.com/",
  //   "redirectUrls": []
  // }
})()

Internally the method performs a .goto operation and it will pass you the page and reponse.

.goto(page, options)

It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, {
    url: 'http://savevideo.me',
    abortTypes: ['image', 'media', 'stylesheet', 'font']
  })
})()

options

url

type: string

The target URL

abortTypes

type: string
default: []

A list of req.resourceType() to be blocked.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

args

type: object

The settings to be passed to page.goto.

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

Benchmark

We included a tiny benchmark utility for make easier testing multiple configuration settings.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless* environment variable in order to see what is happening behind the code:

DEBUG=browserless* node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check aws-lambda-chrome to setup AWS Lambda with a binary compatible.

License

browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

logo designed by xinh studio.

kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats