Package Exports
- browserless
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (browserless) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Features
- High level automation API for working with Headless Chrome.
- Blocking ads trackers by default.
- It aborts unnecessary requests based on MIME types.
Install
$ npm install puppeteer browserless --saveUsage
browserless is an high level API simplification over for do common actions.
For example, if you want to take an screenshot, just do:
const browserless = require('browserless')()
browserless
.screenshot('http://example.com', { device: 'iPhone 6' })
.then(tmpStream => {
console.log(`your screenshot at ${tmpStream.path}`)
tmpStream.cleanupSync()
})See more at examples.
Basic
All methods follow the same interface:
url: The target URL (required).options: Specific settings for the method (optional).callback: Node.js callback. If you don't provide one, the method will be return apromise.
.constructor(options)
It creates the browser instance, using puppeter.launch method.
// Creating a simple instance
const browserless = require('browserless')()or passing specific launchers options:
// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
ignoreHTTPSErrors: true,
args: [
'--disable-gpu',
'--single-process',
'--no-zygote',
'--no-sandbox',
'--hide-scrollbars'
]
})By default the library will be pass a well known list of flags, so probably you don't need any additional setup.
.html(url, options)
It returns the full HTML content from the target url.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const html = await browserless.html(url)
console.log(html)
})()This method accepts options that will be pased to page.goto.
Additionally, you can setup:
waitFor
type:string|function|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
waitUntil
type:array
default: ['networkidle2', 'load', 'domcontentloaded']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
abortTypes
type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']
A list of resourceType requests that can be aborted in order to make the process faster.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
.text(url, options)
It returns the full text content from the target url.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const text = await browserless.text(url)
console.log(text)
})()All options that you can pass are the same than .html method.
.pdf(url, options)
It generates the PDF version of a website behind an url.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const tmpStream = await browserless.pdf(url, {
tmpOpts: {
path: './',
name: `${url.hostname}.${Date.now()}`
}
})
console.log(`PDF generated at '${tmpStream.path}'`)
tmpStream.cleanupSync() // It removes the file!
})()It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.
The options provided are passed to page.pdf.
If you want to customize tmpStream settings, pass opts.tmpOptions.
Additionally, you can setup:
media
Changes the CSS media type of the page using page.emulateMedia.
device
It generate the PDF using the device descriptor name settings, like userAgent and viewport.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
.screenshot(url, options)
It takes a screenshot from the target url
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const tmpStream = await browserless.screenshot(url, {
tmpOpts: {
path: './',
name: `${url.hostname}.${Date.now()}`
}
})
console.log(`Screenshot taken at '${tmpStream.path}'`)
tmpStream.cleanupSync() // It removes the file!
})()It returns an tmpStream, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.
The options provided are passed to page.pdf.
If you want to customize tmpStream settings, pass opts.tmpOptions.
Additionally, you can setup:
device
It generate the PDF using the device descriptor name settings, like userAgent and viewport.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
Advanced
The following methods are exposed to be used in scenarios where you need more granuality control and less magic.
.browser
It returns the internal browser instance used as singleton.
const browserless = require('browserless')
;(async () => {
const browserInstance = await browserless.browser
})().page()
It returns a standalone browser new page.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
})().goto(page, options)
It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://savevideo.me',
abortTypes: ['image', 'media', 'stylesheet', 'font']
})
})()options
url
type: string
The target URL
abortTypes
type: string
default: []
A list of req.resourceType() to be blocked.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
abortTrackers
type: boolean
default: true
It will be abort request coming for tracking domains.
waitFor
type:string|function|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
waitUntil
type:array
default: ['networkidle2', 'load', 'domcontentloaded']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
userAgent
It will setup a custom user agent, using page.setUserAgent method.
viewport
It will setup a custom viewport, using page.setViewport method.
args
type: object
The settings to be passed to page.goto.
FAQ
Q: Why use browserless over Puppeteer?
browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared with just fetch the content from a website.
In order to speed up the process, we block ads scripts by default because they are so bloat.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:
DEBUG=browserless node index.jsConsider open an issue with the debug trace.
Q: Can I use browserless with my AWS Lambda like project?
Yes, check aws-lambda-chrome to setup AWS Lambda with a binary compatible.
Related
- aws-lambda-chrome – Chrome binary compatible with AWS Lambda.
License
browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
logo designed by xinh studio.
kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats