JSPM – browserless@4.0.0

Package Exports

browserless

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (browserless) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

Last version

Features

High level automation API for working with Headless Chrome.
Blocking ads trackers by default.
It aborts unnecessary requests based on MIME types.

Install

$ npm install puppeteer browserless --save

Usage

browserless is an high level API simplification over for do common actions.

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(tmpStream => {
    console.log(`your screenshot at ${tmpStream.path}`)
    tmpStream.cleanupSync()
  })

See more at examples.

Basic

All methods follow the same interface:

url: The target URL (required).
options: Specific settings for the method (optional).
callback: Node.js callback. If you don't provide one, the method will be return a promise.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

By default the library will be pass a well known list of flags, so probably you don't need any additional setup.

.html(url, options)

It returns the full HTML content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

This method accepts options that will be pased to page.goto.

Additionally, you can setup:

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

abortTypes

type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']

A list of resourceType requests that can be aborted in order to make the process faster.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

.text(url, options)

It returns the full text content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

All options that you can pass are the same than .html method.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpStream = await browserless.pdf(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`PDF generated at '${tmpStream.path}'`)
  tmpStream.cleanupSync() // It removes the file!
})()

It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.

The options provided are passed to page.pdf.

If you want to customize tmpStream settings, pass opts.tmpOptions.

Additionally, you can setup:

media

Changes the CSS media type of the page using page.emulateMedia.

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.screenshot(url, options)

It takes a screenshot from the target url

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpStream = await browserless.screenshot(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`Screenshot taken at '${tmpStream.path}'`)
  tmpStream.cleanupSync() // It removes the file!
})()

It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.

The options provided are passed to page.pdf.

If you want to customize tmpStream settings, pass opts.tmpOptions.

Additionally, you can setup:

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

Advanced

The following methods are exposed to be used in scenarios where you need more granuality control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

.goto(page, options)

It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, {
    url: 'http://savevideo.me',
    abortTypes: ['image', 'media', 'stylesheet', 'font']
  })
})()

options

url

type: string

The target URL

abortTypes

type: string
default: []

A list of req.resourceType() to be blocked.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

args

type: object

The settings to be passed to page.goto.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

DEBUG=browserless node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check aws-lambda-chrome to setup AWS Lambda with a binary compatible.

aws-lambda-chrome – Chrome binary compatible with AWS Lambda.

License

browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

logo designed by xinh studio.

kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats

browserless

Package Exports

Readme

Features

Install

Usage

Basic

.constructor(options)

.html(url, options)

waitFor

waitUntil

userAgent

viewport

abortTypes

abortTrackers

.text(url, options)

.pdf(url, options)

media

device

userAgent

viewport

.screenshot(url, options)

device

userAgent

viewport

Advanced

.browser

.page()

.goto(page, options)

options

url

abortTypes

abortTrackers

abortTrackers

waitFor

waitUntil

userAgent

viewport

args

FAQ

Related

License