Package Exports
- @centralinc/browseragent
Readme
@centralinc/browseragent
Browser automation agent using Computer Use with Playwright. This TypeScript SDK combines Anthropic's Computer Use capabilities with Playwright to provide a clean, type-safe interface for automating browser interactions using Claude's computer use abilities.
This fork is purpose-built for high-volume RPA scenariosβthink large insurance back-offices, government form-filling portals, and other data-heavy workflows.
It runs seamlessly inside Temporal workflows: the agent's native pause / resume / cancel signals can be surfaced as Temporal signals, letting your orchestration layer coordinate long-running jobs while operators jump in when needed (no tight coupling between the human and Temporal itself).
Our goal is to expose a highly configurable, fine-grained agentβdial it up for raw speed or dial it down for pixel-perfect, human-like precision.
π Additional Features in This Fork
At-a-glance feature matrix
βοΈ Capability What it does Why it rocks Tool Registry Generic capability system for any tool Extend agents with Slack, Discord, databases, etc. Smart Scrolling 90 % viewport scrolls + instant text navigation Turbo page traversal and zero-waste dropdown control Typing Modes Fill, fast-character, human-character Match CAPTCHA tolerances or burn through inputs Signal Bus Pause / Resume / Cancel at any step Add human QA checkpoints in production URL Extractor Find links by visible text Zero CSS selectors needed Speed Tweaks Screenshot + delay optimisations Cut multi-step flows from minutes to seconds
Below are the flagship improvements shipped in the fork:
π URL Extraction Tool
Extract URLs from any visible element - no CSS selectors needed! This feature is unique to this fork.
How It Works
The agent automatically uses the URL extraction tool when you ask for URLs by visible text:
// Simple URL extraction - just ask naturally!
const url = await agent.execute('Extract the URL from the "Learn More" link');
// Extract from article titles
const articleUrl = await agent.execute(
'Get the URL from the article titled "Introduction to AI"',
);
// Extract multiple URLs with structured output
const urls = await agent.execute(
"Extract URLs from the top 3 navigation links",
z.array(
z.object({
linkText: z.string(),
url: z.string(),
}),
),
);Advanced Capabilities
Smart Search Strategies (prioritized in order):
- Exact text matching - Finds elements containing the exact visible text
- Partial text matching - Matches text within larger content blocks
- Anchor tag detection - Locates
<a>tags containing the text - CSS selector fallback - Direct element selection if text is a valid selector
- Clickable element search - Finds interactive elements with the text
- URL pattern extraction - Detects URLs directly within text content
Technical Features:
- Computer Use optimized - Works seamlessly with Claude's visual perception
- Multiple HTML structures - Handles complex nested elements and dynamic content
- Automatic URL normalization - Converts relative to absolute URLs
- Smart error handling - Provides helpful feedback when elements aren't found
- Logging and debugging - Built-in console logging for troubleshooting
Best Practices:
- Use the exact visible text you can see on the page
- For buttons or links, use their label text (e.g., "Download", "Read More", "View Details")
- For articles or stories, use their title text
- The tool will automatically handle finding the associated URL
π οΈ Tool Registry System
Extend your agents with any external tool using our flexible capability system - not just Playwright!
How It Works
The Tool Registry provides a simple, type-safe way to add capabilities to your agents:
import { registerPlaywrightCapability } from "@centralinc/browseragent";
// Add a custom Playwright capability
registerPlaywrightCapability({
method: "check_all",
displayName: "Check All Checkboxes",
description: "Check all checkboxes matching a pattern",
usage: "Check multiple checkboxes at once by pattern",
schema: z.tuple([z.string()]),
handler: async (page, args) => {
const [pattern] = args;
await page.locator(`input[type="checkbox"]${pattern}`).check();
return { output: `Checked all checkboxes matching ${pattern}` };
},
});
// Use it naturally in prompts
await agent.execute('Check all the "Accept Terms" checkboxes on this form');Extend Beyond Playwright
The registry supports any tool type. Here's a Slack integration example:
// Create a Slack tool
class SlackTool implements ComputerUseTool {
name: "slack" = "slack";
// ... implementation
}
// Use it with the agent
const agent = new ComputerUseAgent({
apiKey: ANTHROPIC_API_KEY,
page,
additionalTools: [new SlackTool(SLACK_TOKEN)],
});
// Natural language Slack operations
await agent.execute(
"Send a message to #general saying the deployment is complete",
);
await agent.execute(
"Navigate to the metrics dashboard and share a screenshot in #analytics",
);Supported Tool Types:
- π§ Communication: Slack, Discord, Teams, Email
- ποΈ Data: Databases, APIs, File systems
- π§ Utilities: AWS, GitHub, Jira
- π€ Custom: Any tool you can imagine!
Key Features:
- Type-safe with Zod schemas
- Auto-generated documentation
- Natural language prompts
- No complex inheritance needed
See the Tool Registry Design Doc for complete examples.
π― Instant Text Navigation
Jump directly to any text in dropdowns, lists, or scrollable containers - no multiple scroll attempts needed!
How It Works
The agent can use the scroll_to_text playwright method to instantly navigate to specific text:
// The agent sees a state dropdown and needs Wyoming
await agent.execute(`
Use the playwright scroll_to_text method to find "Wyoming" in the state picker
`);
// Behind the scenes, the agent calls:
// {"name": "playwright", "input": {"method": "scroll_to_text", "args": ["Wyoming"]}}Smart Features:
- Automatically detects scrollable containers in viewport
- Searches visible containers first, then whole page
- Case-insensitive fallback if exact match not found
- Graceful fallback to regular scrolling if text not found
- No CSS selectors needed - just the visible text!
When the agent uses this:
- Finding specific options in dropdowns (states, countries, etc.)
- Navigating to products in long lists
- Jumping to specific items in sidebars
- Any scenario where exact text is known
Example: Instead of 10+ small scrolls to find "Wyoming", it's now a single instant jump!
π±οΈ Smart Scrolling (90 % Viewport)
Speed through long pages while preserving precise control in small UI elements.
- Default behaviour β Scrolls ~90 % of the viewport with ~10 % overlap for maximum throughput.
- Fine control β
scroll_amountbetween 5-20 performs tiny scrollsβperfect for dropdowns, lists, side-panels. - Configurable β Accepts any
scroll_amount1-100 and degrades gracefully.
Why it matters: Form-heavy portals (e.g. insurance claim systems) often require rapid page-level scrolling punctuated by pixel-perfect adjustments inside select widgets. This feature automatically handles both cases.
β‘ Speed Optimizations
Screenshots now capture ~5Γ faster and post-action waits are shortened:
| Action | Old Delay | New Delay |
|---|---|---|
| Screenshot wait | 2 s | 0.3 s |
| Post-typing wait | 0.5 s | 0.1 s |
| Post-scroll wait | 0.5 s | 0.1 s |
| Mouse move pause | 0.1 s | 0.02 s |
These cut 1-2 seconds from each multi-step interaction.
β οΈ Heads-up: Some sites rely on human-like pacing for anti-bot checks. If you encounter captchas or missing render states, increase the delays via the new constructor parameters:
const fastComputer = new ComputerTool( page, "20250124", /* screenshotDelay */ 0.5, ); // or adjust post-action waits inside ComputerTool if needed
β―οΈ Agent Signals (Pause / Resume / Cancel)
Bring human-in-the-loop control to long-running automation workflows.
- Pause an active
agent.execute()run to inspect or fix the page - Resume from the exact step where you left off
- Cancel gracefully without killing the process
- Real-time events:
onPause,onResume,onCancel,onError
const agent = new ComputerUseAgent({ apiKey, page });
// Subscribe to events
agent.controller.on("onPause", ({ step }) => console.log("Paused at", step));βοΈ Configurable Execution Behavior
This fork includes a powerful configuration system that allows you to customize how the agent executes browser automation tasks. You can control typing speed, screenshot timing, scrolling strategy, mouse behaviour, and other automation settings to optimise for raw speed or human-like interaction.
Available Configuration Options
import type { ExecutionConfig } from "@centralinc/browseragent";
const executionConfig: ExecutionConfig = {
typing: {
mode: "fill" | "character-by-character",
characterDelay: 12, // milliseconds between characters (character-by-character mode)
completionDelay: 100, // milliseconds to wait after typing completes
},
screenshot: {
delay: 0.3, // seconds to wait before taking screenshots
quality: "low" | "medium" | "high",
},
mouse: {
moveSpeed: "instant" | "fast" | "normal" | "slow",
clickDelay: 50, // milliseconds to wait after clicks
},
scrolling: {
/**
* When no scroll_amount is provided the agent will use this mode
* with ~90 % viewport coverage for page-level scrolling.
*/
mode: "percentage", // (future-proofed for pixel or element-based modes)
/** Default percentage of the viewport to scroll. */
percentage: 90,
/** Overlap percentage to keep for context during large scrolls. */
overlap: 10,
},
};Typing Mode Configuration
The most impactful configuration is the typing behavior. You can choose between two modes:
π Fill Mode (Fastest) - Directly fills input fields bypassing keyboard events entirely:
const fastAgent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
executionConfig: {
typing: { mode: "fill", completionDelay: 50 },
},
});β¨οΈ Character-by-Character Mode (Human-like) - Types text one character at a time with configurable delays:
const humanLikeAgent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
executionConfig: {
typing: {
mode: "character-by-character",
characterDelay: 100, // 100ms between each character
completionDelay: 200,
},
},
});β‘ Fast Character Mode (Balanced) - Best of both worlds - visible typing but very fast:
const balancedAgent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
executionConfig: {
typing: {
mode: "character-by-character",
characterDelay: 5, // Very fast character typing
completionDelay: 75,
},
},
});Performance Comparison:
| Mode | Speed | Visibility | Use Case |
|---|---|---|---|
| Fill | β‘β‘β‘ Fastest | β Instant | Production, speed-critical tasks |
| Fast Character | β‘β‘ Very Fast | β Visible | Development, debugging |
| Slow Character | β‘ Human-like | β Very visible | Demos, human-like automation |
Try the Example
Run the included example to see the performance differences:
# Run the typing configuration example (set ANTHROPIC_API_KEY first)
npx ts-node examples/example-typing-config.ts
agent.controller.on('onResume', () => console.log('Resumed'));
// Trigger a pause after 5 s
setTimeout(() => agent.controller.signal('pause'), 5_000);
// Start a task (the controller is available immediately)
await agent.execute('Get the titles of the top 10 stories');Great for debugging, watchdog timeouts, and manual overrides.
Features
- π€ Simple API: Single
ComputerUseAgentclass for all computer use tasks - π Dual Response Types: Support for both text and structured (JSON) responses
- π‘οΈ Type Safety: Full TypeScript support with Zod schema validation
- β‘ Optimized: Clean error handling and robust JSON parsing
- π― Focused: Clean API surface with sensible defaults
Installation
npm install @centralinc/browseragent playwright @playwright/test
# or
yarn add @centralinc/browseragent playwright @playwright/test
# or
pnpm add @centralinc/browseragent playwright @playwright/testQuick Start
import { chromium } from "playwright";
import { ComputerUseAgent } from "@centralinc/browseragent";
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
// Navigate to Hacker News manually first
await page.goto("https://news.ycombinator.com/");
const agent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
});
// Simple text response
const answer = await agent.execute("Tell me the title of the top story");
console.log(answer);
await browser.close();API Reference
ComputerUseAgent
The main class for computer use automation.
Constructor
new ComputerUseAgent(options: {
apiKey: string;
page: Page;
model?: string;
})Parameters:
apiKey(string): Your Anthropic API key. Get one from Anthropic Consolepage(Page): Playwright page instance to controlmodel(string, optional): Anthropic model to use. Defaults to'claude-sonnet-4-20250514'
Supported Models: See Anthropic's Computer Use documentation for the latest model compatibility.
execute() Method
async execute<T = string>(
query: string,
schema?: z.ZodSchema<T>,
options?: {
systemPromptSuffix?: string;
thinkingBudget?: number;
}
): Promise<T>Parameters:
query(string): The task description for Claude to executeschema(ZodSchema, optional): Zod schema for structured responses. When provided, the response will be validated against this schemaoptions(object, optional):systemPromptSuffix(string): Additional instructions appended to the system promptthinkingBudget(number): Token budget for Claude's internal reasoning process. Default:1024. See Extended Thinking documentation for details
Returns:
Promise<T>: Whenschemais provided, returns validated data of typeTPromise<string>: When noschemais provided, returns the text response
Usage Examples
Text Response
import { ComputerUseAgent } from "@centralinc/browseragent";
// Navigate to the target page first
await page.goto("https://news.ycombinator.com/");
const agent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
});
const result = await agent.execute(
"Tell me the title of the top story on this page",
);
console.log(result); // "Title of the top story"Structured Response with Zod
import { z } from "zod";
import { ComputerUseAgent } from "@centralinc/browseragent";
const agent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
});
const HackerNewsStory = z.object({
title: z.string(),
points: z.number(),
author: z.string(),
comments: z.number(),
url: z.string().optional(),
});
const stories = await agent.execute(
"Get the top 5 Hacker News stories with their details",
z.array(HackerNewsStory).max(5),
);
console.log(stories);
// [
// {
// title: "Example Story",
// points: 150,
// author: "user123",
// comments: 42,
// url: "https://example.com"
// },
// ...
// ]Advanced Options
const result = await agent.execute(
"Complex task requiring more thinking",
undefined, // No schema for text response
{
systemPromptSuffix: "Be extra careful with form submissions.",
thinkingBudget: 4096, // More thinking tokens for complex tasks
},
);Retry Configuration
The SDK includes built-in retry logic for handling connection errors and transient failures:
import { ComputerUseAgent, type RetryConfig } from "@centralinc/browseragent";
const retryConfig: RetryConfig = {
maxRetries: 5, // Maximum retry attempts (default: 3)
initialDelayMs: 2000, // Initial delay between retries (default: 1000ms)
maxDelayMs: 60000, // Maximum delay between retries (default: 30000ms)
backoffMultiplier: 2.5, // Exponential backoff multiplier (default: 2)
preferIPv4: true, // Prefer IPv4 DNS resolution (helpful with VPNs like Tailscale)
retryableErrors: [ // Errors that trigger retries
"Connection error",
"ECONNREFUSED",
"ETIMEDOUT",
"ECONNRESET",
"socket hang up",
],
};
const agent = new ComputerUseAgent({
apiKey: process.env.ANTHROPIC_API_KEY!,
page,
retryConfig, // Custom retry configuration
});The retry mechanism uses exponential backoff with jitter to avoid thundering herd problems. Connection errors and network timeouts are automatically retried with increasing delays.
Note for VPN/Tailscale Users: If you're experiencing ENETUNREACH errors with IPv6 addresses, set preferIPv4: true in your retry configuration to resolve DNS to IPv4 addresses only.
Tool Registry API
The SDK exports functions for extending capabilities:
import {
registerPlaywrightCapability,
getToolRegistry,
defineCapability,
} from "@centralinc/browseragent";
// Register a new Playwright capability
registerPlaywrightCapability({
method: "custom_action",
displayName: "Custom Action",
description: "Performs a custom browser action",
usage: "Detailed usage instructions",
schema: z.object({ selector: z.string() }),
handler: async (page, args) => {
// Implementation
return { output: "Success" };
},
});
// Register capabilities for other tools
const registry = getToolRegistry();
registry.register(
defineCapability("slack", "send_message", {
displayName: "Send Message",
description: "Send a Slack message",
usage: "Send message to channel",
schema: z.tuple([z.string(), z.string()]),
}),
);Environment Setup
Anthropic API Key: Set your API key as an environment variable:
export ANTHROPIC_API_KEY=your_api_key_here
Playwright: Install Playwright and browser dependencies:
npx playwright install
Computer Use Parameters
This SDK leverages Anthropic's Computer Use API with the following key parameters:
Model Selection
- Claude 3.5 Sonnet: Best balance of speed and capability for most tasks
- Claude 4 Models: Enhanced reasoning with extended thinking capabilities
- Claude 3.7 Sonnet: Advanced reasoning with thinking transparency
Thinking Budget
The thinkingBudget parameter controls Claude's internal reasoning process:
- 1024 tokens (default): Suitable for simple tasks
- 4096+ tokens: Better for complex reasoning tasks
- 16k+ tokens: Recommended for highly complex multi-step operations
See Anthropic's Extended Thinking guide for optimization tips.
Error Handling
The SDK includes built-in error handling:
try {
const result = await agent.execute("Your task here");
console.log(result);
} catch (error) {
if (error.message.includes("No response received")) {
console.log("Agent did not receive a response from Claude");
} else {
console.log("Other error:", error.message);
}
}Best Practices
Use specific, clear instructions: "Click the red 'Submit' button" vs "click submit"
For complex tasks, break them down: Use step-by-step instructions in your query
Optimize thinking budget: Start with default (1024) and increase for complex tasks
Handle errors gracefully: Implement proper error handling for production use
Use structured responses: When you need specific data format, use Zod schemas
Test in headless: false: During development, run with visible browser to debug
Security Considerations
β οΈ Important: Computer use can interact with any visible application. Always:
- Run in isolated environments (containers/VMs) for production
- Avoid providing access to sensitive accounts or data
- Review Claude's actions in logs before production deployment
- Use allowlisted domains when possible
See Anthropic's Computer Use Security Guide for detailed security recommendations.
Requirements
- Node.js 18+
- TypeScript 5+
- Playwright 1.52+
- Anthropic API key
Related Resources
- Anthropic Computer Use Documentation
- Extended Thinking Guide
- Playwright Documentation
- Zod Documentation
License
See License