Package Exports

browser-pilot
browser-pilot/actions
browser-pilot/browser
browser-pilot/cdp
browser-pilot/cli
browser-pilot/providers

Readme

browser-pilot

Lightweight CDP-based browser automation for AI agents. Zero dependencies, works in Node.js, Bun, and Cloudflare Workers.

import { connect } from 'browser-pilot';

const browser = await connect({ provider: 'browserbase', apiKey: process.env.BROWSERBASE_API_KEY });
const page = await browser.page();

await page.goto('https://example.com/login');
await page.fill(['#email', 'input[type=email]'], 'user@example.com');
await page.fill(['#password', 'input[type=password]'], 'secret');
await page.submit(['#login-btn', 'button[type=submit]']);

const snapshot = await page.snapshot();
console.log(snapshot.text); // Accessibility tree as text

await browser.close();

Why browser-pilot?

Problem with Playwright/Puppeteer	browser-pilot Solution
Won't run in Cloudflare Workers	Pure Web Standard APIs, zero Node.js dependencies
Bun CDP connection bugs	Custom CDP client that works everywhere
Single-selector API (fragile)	Multi-selector by default: `['#primary', '.fallback']`
No action batching (high latency)	Batch DSL: one call for entire sequences
No AI-optimized snapshots	Built-in accessibility tree extraction

Installation

bun add browser-pilot
# or
npm install browser-pilot

Providers

BrowserBase (Recommended for production)

const browser = await connect({
  provider: 'browserbase',
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID, // optional
});

Browserless

const browser = await connect({
  provider: 'browserless',
  apiKey: process.env.BROWSERLESS_API_KEY,
});

Generic (Local Chrome)

# Start Chrome with remote debugging
chrome --remote-debugging-port=9222

const browser = await connect({
  provider: 'generic',
  wsUrl: 'ws://localhost:9222/devtools/browser/...', // optional, auto-discovers
});

Core Concepts

Multi-Selector (Robust Automation)

Every action accepts string | string[]. When given an array, tries each selector in order until one works:

// Tries #submit first, falls back to alternatives
await page.click(['#submit', 'button[type=submit]', '.submit-btn']);

// Cookie consent - try multiple common patterns
await page.click([
  '#accept-cookies',
  '.cookie-accept',
  'button:has-text("Accept")',
  '[data-testid="cookie-accept"]'
], { optional: true, timeout: 3000 });

Built-in Waiting

Every action automatically waits for the element to be visible before interacting:

// No separate waitFor needed - this waits automatically
await page.click('.dynamic-button', { timeout: 5000 });

// Explicit waiting when needed
await page.waitFor('.loading', { state: 'hidden' });
await page.waitForNavigation();
await page.waitForNetworkIdle();

Batch Actions

Execute multiple actions in a single call with full result tracking:

const result = await page.batch([
  { action: 'goto', url: 'https://example.com/login' },
  { action: 'fill', selector: '#email', value: 'user@example.com' },
  { action: 'fill', selector: '#password', value: 'secret' },
  { action: 'submit', selector: '#login-btn' },
  { action: 'wait', waitFor: 'navigation' },
  { action: 'snapshot' },
]);

console.log(result.success); // true if all steps succeeded
console.log(result.totalDurationMs); // total execution time
console.log(result.steps[5].result); // snapshot from step 5

AI-Optimized Snapshots

Get the page state in a format perfect for LLMs:

const snapshot = await page.snapshot();

// Structured accessibility tree
console.log(snapshot.accessibilityTree);

// Interactive elements with refs
console.log(snapshot.interactiveElements);
// [{ ref: 'e1', role: 'button', name: 'Submit', selector: '...' }, ...]

// Text representation for LLMs
console.log(snapshot.text);
// - main [ref=e1]
//   - heading "Welcome" [ref=e2]
//   - button "Get Started" [ref=e3]
//   - textbox [ref=e4] placeholder="Email"

Ref-Based Selectors

After taking a snapshot, use element refs directly as selectors:

const snapshot = await page.snapshot();
// Output shows: button "Submit" [ref=e4]

// Click using the ref - no fragile CSS needed
await page.click('ref:e4');

// Fill input by ref
await page.fill('ref:e23', 'hello@example.com');

// Combine ref with CSS fallbacks
await page.click(['ref:e4', '#submit', 'button[type=submit]']);

Refs are stable until page navigation. Always take a fresh snapshot after navigating.

Page API

await page.goto(url, options?)
await page.reload(options?)
await page.goBack(options?)
await page.goForward(options?)

const url = await page.url()
const title = await page.title()

Actions

All actions accept string | string[] for selectors:

await page.click(selector, options?)
await page.fill(selector, value, options?)      // clears first by default
await page.type(selector, text, options?)       // types character by character
await page.select(selector, value, options?)    // native <select>
await page.select({ trigger, option, value, match }, options?)  // custom dropdown
await page.check(selector, options?)
await page.uncheck(selector, options?)
await page.submit(selector, options?)           // tries Enter, then click
await page.press(key)
await page.focus(selector, options?)
await page.hover(selector, options?)
await page.scroll(selector, options?)

Waiting

await page.waitFor(selector, { state: 'visible' | 'hidden' | 'attached' | 'detached' })
await page.waitForNavigation(options?)
await page.waitForNetworkIdle({ idleTime: 500 })

Content

const snapshot = await page.snapshot()
const text = await page.text(selector?)
const screenshot = await page.screenshot({ format: 'png', fullPage: true })
const result = await page.evaluate(() => document.title)

Files

await page.setInputFiles(selector, [{ name: 'file.pdf', mimeType: 'application/pdf', buffer: data }])
const download = await page.waitForDownload(() => page.click('#download-btn'))

Emulation

import { devices } from 'browser-pilot';

await page.emulate(devices['iPhone 14']);     // Full device emulation
await page.setViewport({ width: 1280, height: 720, deviceScaleFactor: 2 });
await page.setUserAgent('Custom UA');
await page.setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
await page.setTimezone('America/New_York');
await page.setLocale('fr-FR');

Devices: iPhone 14, iPhone 14 Pro Max, Pixel 7, iPad Pro 11, Desktop Chrome, Desktop Firefox

Request Interception

// Block images and fonts
await page.blockResources(['Image', 'Font']);

// Mock API responses
await page.route('**/api/users', { status: 200, body: { users: [] } });

// Full control
await page.intercept('*api*', async (request, actions) => {
  if (request.url.includes('blocked')) await actions.fail();
  else await actions.continue({ headers: { ...request.headers, 'X-Custom': 'value' } });
});

Cookies & Storage

// Cookies
const cookies = await page.cookies();
await page.setCookie({ name: 'session', value: 'abc', domain: '.example.com' });
await page.clearCookies();

// localStorage / sessionStorage
await page.setLocalStorage('key', 'value');
const value = await page.getLocalStorage('key');
await page.clearLocalStorage();

Console & Dialogs

// Capture console messages
await page.onConsole((msg) => console.log(`[${msg.type}] ${msg.text}`));

// Handle dialogs (alert, confirm, prompt)
await page.onDialog(async (dialog) => {
  if (dialog.type === 'confirm') await dialog.accept();
  else await dialog.dismiss();
});

// Collect messages during an action
const { result, messages } = await page.collectConsole(async () => {
  return await page.click('#button');
});

Important: Native browser dialogs (alert(), confirm(), prompt()) block all CDP commands until handled. Always set up a dialog handler before triggering actions that may show dialogs.

Iframes

Switch context to interact with iframe content:

// Switch to iframe
await page.switchToFrame('iframe#payment');

// Now actions target the iframe
await page.fill('#card-number', '4242424242424242');
await page.fill('#expiry', '12/25');

// Switch back to main document
await page.switchToMain();
await page.click('#submit-order');

Note: Cross-origin iframes cannot be accessed due to browser security.

Options

interface ActionOptions {
  timeout?: number;   // default: 30000ms
  optional?: boolean; // return false instead of throwing on failure
}

CLI

The CLI provides session persistence for interactive workflows:

# Connect to a browser
bp connect --provider browserbase --name my-session
bp connect --provider generic  # auto-discovers local Chrome

# Execute actions
bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
bp exec -s my-session '[
  {"action":"fill","selector":"#search","value":"browser automation"},
  {"action":"submit","selector":"#search-form"}
]'

# Get page state (note the refs in output)
bp snapshot -s my-session --format text
# Output: button "Submit" [ref=e4], textbox "Email" [ref=e5], ...

# Use refs from snapshot for reliable targeting
bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'

# Handle native dialogs (alert/confirm/prompt)
bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'

# Other commands
bp text -s my-session --selector ".main-content"
bp screenshot -s my-session --output page.png
bp list                    # list all sessions
bp close -s my-session     # close session
bp actions                 # show complete action reference

CLI for AI Agents

The CLI is designed for AI agent tool calls. The recommended workflow:

Take snapshot to see the page structure with refs
Use refs (ref:e4) for reliable element targeting
Batch actions to reduce round trips

# Step 1: Get page state with refs
bp snapshot --format text
# Output shows: button "Add to Cart" [ref=e12], textbox "Search" [ref=e5]

# Step 2: Use refs to interact (stable, no CSS guessing)
bp exec '[
  {"action":"fill","selector":"ref:e5","value":"laptop"},
  {"action":"click","selector":"ref:e12"},
  {"action":"snapshot"}
]' --output json

Multi-selector fallbacks for robustness:

bp exec '[
  {"action":"click","selector":["ref:e4","#submit","button[type=submit]"]}
]'

Output:

{
  "success": true,
  "steps": [
    {"action": "fill", "success": true, "durationMs": 30},
    {"action": "click", "success": true, "durationMs": 50, "selectorUsed": "ref:e12"},
    {"action": "snapshot", "success": true, "durationMs": 100, "result": "..."}
  ],
  "totalDurationMs": 180
}

Run bp actions for complete action reference.

Examples

const result = await page.batch([
  { action: 'goto', url: 'https://app.example.com/login' },
  { action: 'fill', selector: ['#email', 'input[name=email]'], value: email },
  { action: 'fill', selector: ['#password', 'input[name=password]'], value: password },
  { action: 'click', selector: '.remember-me', optional: true },
  { action: 'submit', selector: ['#login', 'button[type=submit]'] },
], { onFail: 'stop' });

if (!result.success) {
  console.error(`Failed at step ${result.stoppedAtIndex}: ${result.steps[result.stoppedAtIndex!].error}`);
}

// Using the custom select config
await page.select({
  trigger: '.country-dropdown',
  option: '.dropdown-option',
  value: 'United States',
  match: 'text',  // or 'contains' or 'value'
});

// Or compose from primitives
await page.click('.country-dropdown');
await page.fill('.dropdown-search', 'United');
await page.click('.dropdown-option:has-text("United States")');

Cloudflare Workers

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const browser = await connect({
      provider: 'browserbase',
      apiKey: env.BROWSERBASE_API_KEY,
    });

    const page = await browser.page();
    await page.goto('https://example.com');
    const snapshot = await page.snapshot();

    await browser.close();

    return Response.json({ title: snapshot.title, elements: snapshot.interactiveElements });
  },
};

AI Agent Tool Definition

const browserTool = {
  name: 'browser_action',
  description: 'Execute browser actions and get page state',
  parameters: {
    type: 'object',
    properties: {
      actions: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            action: { enum: ['goto', 'click', 'fill', 'submit', 'snapshot'] },
            selector: { type: ['string', 'array'] },
            value: { type: 'string' },
            url: { type: 'string' },
          },
        },
      },
    },
  },
  execute: async ({ actions }) => {
    const page = await getOrCreatePage();
    return page.batch(actions);
  },
};

Advanced

Direct CDP Access

const browser = await connect({ provider: 'generic' });
const cdp = browser.cdpClient;

// Send any CDP command
await cdp.send('Emulation.setDeviceMetricsOverride', {
  width: 375,
  height: 812,
  deviceScaleFactor: 3,
  mobile: true,
});

Tracing

import { enableTracing } from 'browser-pilot';

enableTracing({ output: 'console' });
// [info] goto https://example.com ✓ (1200ms)
// [info] click #submit ✓ (50ms)

AI Agent Integration

browser-pilot is designed for AI agents. Two resources for agent setup:

llms.txt - Abbreviated reference for LLM context windows
Claude Code Skill - Full skill for Claude Code agents

To use with Claude Code, copy docs/skill/ to your project or reference it in your agent's context.

Documentation

See the docs folder for detailed documentation:

License

MIT

browser-pilot

Package Exports

Readme

browser-pilot

Why browser-pilot?

Installation

Providers

BrowserBase (Recommended for production)

Browserless

Generic (Local Chrome)

Core Concepts

Multi-Selector (Robust Automation)

Built-in Waiting

Batch Actions

AI-Optimized Snapshots

Ref-Based Selectors

Page API

Navigation

Actions

Waiting

Content

Files

Emulation

Request Interception

Cookies & Storage

Console & Dialogs

Iframes

Options

CLI

CLI for AI Agents

Examples

Login Flow with Error Handling

Custom Dropdown

Cloudflare Workers

AI Agent Tool Definition

Advanced

Direct CDP Access

Tracing

AI Agent Integration

Documentation

License