Webpilot
A browser tool. Real Chromium, DOM-first, WebSocket protocol.
Webpilot launches a real Chromium-based browser with a local extension runtime, exposes a WebSocket protocol, and lets a user, script, or LLM drive that browser through the same command surface.
discover, html, and q give you real page structure, real selectors, and real handles. Screenshots exist as a fallback for when layout or visual rendering is the actual question. For everything else, read the DOM.
What Webpilot does
- Starts and controls a real browser; no CDP, no detectable debugging port
- Exposes the live DOM directly: navigation, element discovery, querying, interaction, cookies
- Provides configurable cursor, click, typing, and scroll behavior
- Works from the CLI, raw WebSocket, Node, or an MCP adapter
What Webpilot does not do
- Decide what to do next
- Ship a tuned human profile
- Ship site strategy, retries, or route doctrine
Install
npm install -g h17-webpilot
Quick Start
1. Start
webpilot start
If no config exists, the first run will detect installed browsers, ask you to choose one, and generate ~/h17-webpilot/config.js.
Use webpilot start -d for an append-only session log (~/h17-webpilot/webpilot.log by default).
2. Use the tool
webpilot -c 'go example.com' webpilot -c 'discover' webpilot -c 'click h1' webpilot -c 'wait h1' webpilot -c 'html' webpilot -c 'cookies load ./cookies.json'
Use the same loop every time: inspect, act, verify.
CLI
webpilot # interactive REPL webpilot -c 'go example.com' # single command webpilot start # launch browser + WS server webpilot start -d # launch with session logging webpilot stop # stop running server
Core Commands
| Command | Description |
|---|---|
go <url> |
Navigate to a URL |
discover |
List interactive elements with handles and CSS selectors |
q <selector> / query <selector> |
Query elements by CSS selector |
wait <selector> |
Wait for a selector to appear |
click <selector|handleId> |
Safe click on an element |
type [selector] <text> |
Type with configured profile |
clear <selector> |
Clear an input field |
key <name> / press <name> |
Send a key press |
sd [px] [selector] / su [px] [selector] |
Scroll down / scroll up |
html |
Read page HTML |
ss |
Save a screenshot; use when layout or visual rendering is the question, not DOM structure |
cookies / cookies load <file> |
Dump or load cookies |
frames |
List frames |
Examples
# navigate webpilot -c 'go https://example.com' # discover interactive elements webpilot -c 'discover' # query by CSS selector webpilot -c 'q "button.submit"' # click an element webpilot -c 'click h3' # type into an input webpilot -c 'type "input[name=q]" hello world' # press a key webpilot -c 'key Enter' # scroll down 300px webpilot -c 'sd 300' # read page HTML webpilot -c 'html' # save a screenshot webpilot -c 'ss' # dump cookies webpilot -c 'cookies'
Raw Mode
For direct access to capability groups, use the raw action syntax or pass raw JSON.
Action syntax
webpilot -c 'human.click {"selector": "button[type=submit]"}'
Raw JSON
webpilot -c '{"action": "dom.getHTML", "params": {}}'
WebSocket Protocol
Connect to the WebSocket server and send JSON messages to control the browser programmatically from any language.
// Connect to ws://localhost:7331 { "id": "1", "action": "tabs.navigate", "params": { "url": "https://example.com" } }
Capability groups
| Group | Description |
|---|---|
tabs |
Tab navigation and management |
dom |
DOM querying, reading, and manipulation |
human |
Human-like click, type, scroll interactions |
cookies |
Cookie dump and load |
events |
Event listening and dispatch |
framework |
Runtime and debug controls |
Node API
The Node API is a wrapper over the same WebSocket protocol.
const { startWithPage } = require('h17-webpilot'); const { page } = await startWithPage();
Available methods
| Method | Legacy alias | Description |
|---|---|---|
navigate(url) |
goto(url) |
Navigate to a URL |
query(selector) |
$(selector) |
Query a single element |
queryAll(selector) |
$$(selector) |
Query all matching elements |
waitFor(selector) |
waitForSelector(selector) |
Wait for a selector to appear |
read() |
content() |
Read the page HTML |
click(...) |
humanClick(...) |
Click an element |
type(...) |
humanType(...) |
Type text into an element |
scroll(...) |
humanScroll(...) |
Scroll the page |
clearInput(...) |
humanClearInput(...) |
Clear an input field |
pressKey(key) |
Send a key press | |
configure(config) |
setConfig(config) |
Update runtime configuration |
Full example
const { startWithPage } = require('h17-webpilot'); const { page } = await startWithPage(); await page.navigate('https://example.com'); await page.query('h1'); await page.click('h1'); await page.waitFor('body');
Config File
Config is loaded from ~/h17-webpilot/config.js (or config.json). Override with --config <path>.
Public config is split into two sections:
| Section | Controls |
|---|---|
framework |
Runtime behavior, debug toggles, handle retention |
human |
Cursor, click, typing, scroll, and avoid rules |
The public package exposes a lot of knobs on purpose. The user decides how much to tune. The package does not ship a strong profile.
module.exports = { framework: { debug: { cursor: true, sessionLogPath: '~/h17-webpilot/webpilot.log', }, }, human: { calibrated: false, profileName: 'public-default', cursor: { spreadRatio: 0.16, jitterRatio: 0, stutterChance: 0, driftThresholdPx: 0, overshootRatio: 0, }, click: { thinkDelayMin: 35, thinkDelayMax: 90, maxShiftPx: 50, }, type: { baseDelayMin: 8, baseDelayMax: 20, variance: 4, pauseChance: 0, pauseMin: 0, pauseMax: 0, }, }, };
Human Behavior
human section to match your use case.
The human config section controls how interactions behave:
- cursor: movement speed, overshoot, path jitter
- click: pre/post delays, shift tolerance
- typing: speed, variance, pause behavior
- scroll: speed, acceleration, drift
- avoid: element avoidance rules
These defaults do not represent a human profile: typing is very fast, overshoot is off, jitter is off, drift is off. They are there to show what is configurable. The package does not ship your final values.
Boot Config
Webpilot can load state on startup from config before the user or LLM sends any commands.
module.exports = { browser: "/Applications/Chromium.app/Contents/MacOS/Chromium", boot: { cookiesPath: './cookies.json', commands: [ 'go https://hugopalma.work', 'cookies load ./cookies.json', { action: 'framework.getConfig', params: {} } ], }, };
Rules:
boot.cookiesPathloads a cookie jar before commands runboot.commandsaccepts CLI-style stringsboot.commandsalso accepts raw command objects:{ action, params, tabId? }- String commands support
cookies load <file>in addition to normal shorthands
Tested Browsers
- Chromium
- Helium
- Google Chrome
Operating Loop
Every browser interaction follows the same three-phase cycle: Inspect, Act, Verify. Do not skip the inspect step unless you already have fresh page state from the immediately preceding command.
Quoting Rule
Always double-quote the entire -c argument when it contains spaces or special characters. Single-word commands can omit quotes. Never use single quotes. Never nest quotes inside the -c argument.
webpilot -c "type #input hello world" webpilot -c "click #submit" webpilot -c "go https://example.com" # single-word commands can omit quotes webpilot -c html webpilot -c discover webpilot -c ss
1. Inspect
Read page state before doing anything.
| Command | Purpose |
|---|---|
html |
Read the current page DOM, title, and URL |
discover |
List interactive elements, their handles, and CSS selectors |
q <selector> |
Query specific elements by CSS selector |
wait <selector> |
Wait for a known state change |
ss |
Last resort, only when layout or visual rendering is the actual question |
2. Act
Use the safest matching action.
| Command | Purpose |
|---|---|
click <selector|handleId> |
Safe click through the human action pipeline |
type <cssSelector> <text> |
Type with configured profile |
clear <selector> |
Clear an input field |
key <name> |
Send a key press |
sd [px] [selector] / su [px] [selector] |
Scroll down / up |
go <url> |
Navigate to a URL |
cookies load <file> |
Restore an existing session |
type selector rule: type auto-detects selectors by their first character: #, ., or [. Handle IDs (el_*) are not recognized by type and will be typed as literal text. Always use the CSS selector from discover output (e.g. #APjFqb, .search-input, [name=q]), not the handle ID.
type requires a preceding click: type is supposed to chain a click internally, but this does not always work. Always click the target element first, then type into it.
webpilot -c "click #APjFqb" webpilot -c "type #APjFqb hello world"
click accepts both handle IDs and CSS selectors. Always discover or q immediately before interacting so handles and selectors are fresh.
If the runtime refuses an action, respect the refusal and re-inspect the page.
3. Verify
After navigation or interaction, confirm the new state.
| Command | Purpose |
|---|---|
wait <selector> |
Wait for expected element |
url |
Check the current URL |
title |
Check the page title |
html |
Re-read the DOM |
q <selector> |
Query for expected elements |
Safe Usage
- Never guess selectors when
htmlordiscovercan tell you the real ones. - Never assume a click worked. Verify it.
- Never treat
{ "clicked": false }as something to brute-force through. - Never confuse DOM reading with interaction. Read first, then act.
- Re-query stale handles instead of reusing them blindly.
- Do not use
eval; it hits CSP on most sites.
Strategy Notes
- Prefer
html,discover, andqoverevalwhen DOM inspection is enough. - Use
waitafter page-changing actions. - Use handle IDs for
click, CSS selectors fortype. - Use screenshots when layout or visibility is the uncertainty, not HTML structure.
- If the task needs a preloaded authenticated session, load cookies first or use config boot commands.
Raw Protocol from LLM Context
If you need a protocol action that the shorthand CLI does not expose directly, send it through raw mode:
webpilot -c "dom.queryAllInfo {\"selector\": \"a[href]\"}" webpilot -c "human.scroll {\"selector\": \".feed\", \"direction\": \"down\"}" webpilot -c "framework.getConfig {}"
You can also send a full JSON message:
webpilot -c "{\"action\": \"tabs.navigate\", \"params\": {\"url\": \"https://example.com\"}}"
MCP Server
An MCP adapter is available for environments that support the Model Context Protocol (e.g. Claude Desktop, Cursor, Windsurf). The MCP server connects to the same WebSocket runtime at ws://localhost:7331.
webpilot start, then the MCP tools become available in the host application automatically.
Claude Desktop Config
Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"webpilot": {
"command": "npx",
"args": ["webpilot-mcp"]
}
}
}
Connection
Webpilot exposes browser control over WebSocket. The protocol tells the browser what to do. It does not decide the task; the user or LLM chooses the sequence of actions.
WebSocket: ws://localhost:7331
The local server listens on this port. The extension connects on launch. Your client connects to the same server.
Runtime Config
The extension runtime exposes configuration controls:
| Action | Params | Returns |
|---|---|---|
framework.setConfig |
{ config: { handles?, debug? } } |
{ ok: true, framework } |
framework.getConfig |
{} |
{ framework, version } |
framework.reload |
{} |
{ reloading: true } |
Normal usage: Node loads ~/h17-webpilot/config.js (or path passed via --config), the server injects framework and human settings into commands, and clients can still call framework.setConfig and framework.getConfig directly.
The public package ships example defaults. They do not represent a human profile, and advanced cursor tuning or realistic timing still depends on user configuration.
Message Format
Request
{
"id": "unique-string",
"tabId": 123,
"action": "action.name",
"params": {}
}
id: correlate responses to requeststabId: optional target tabaction: protocol action nameparams: action-specific payload
Response
// success { "id": "same-id", "result": {} } // error { "id": "same-id", "error": "error message" }
Keepalive
The server sends { "type": "ping" } every 20 seconds. The extension responds with { "type": "pong" }.
Handles
Many DOM commands return handleId values like el_42.
- Created by
dom.querySelector,dom.querySelectorAll,dom.waitForSelector - Used by
dom.click,dom.boundingBox,human.click, and related commands - Stored with
WeakRefand cleaned up after TTL or GC - If both
handleIdandselectorare provided,handleIdwins
Tabs Actions
| Action | Params | Returns |
|---|---|---|
tabs.list |
{} |
[{ id, url, title, active, windowId, index }] |
tabs.navigate |
{ url } |
{ success: true } |
tabs.create |
{ url? } |
{ id, url, title } |
tabs.close |
{} |
{ success: true } |
tabs.activate |
{} |
{ success: true } |
tabs.reload |
{} |
{ success: true } |
tabs.waitForNavigation |
{ timeout? } |
{ success: true } |
tabs.setViewport |
{ width, height } |
{ success: true } |
tabs.screenshot |
{ fullPage? } |
{ dataUrl } |
Frames Actions
| Action | Params | Returns |
|---|---|---|
frames.list |
{} |
[{ frameId, parentFrameId, url }] |
Cookies Actions
| Action | Params | Returns |
|---|---|---|
cookies.getAll |
{ url? } |
[{ name, value, domain, ... }] |
cookies.set |
{ cookie: { name, value, domain?, path?, secure?, httpOnly?, sameSite?, expires? } } |
{ success: true } |
DOM Actions
dom.click uses the same safe interaction pipeline as human.click. Other dom.* commands are direct DOM/runtime operations.
| Action | Params | Returns |
|---|---|---|
dom.querySelector |
{ selector } |
handleId or null |
dom.querySelectorAll |
{ selector } |
[handleId, ...] |
dom.querySelectorWithin |
{ parentHandleId, selector } |
handleId or null |
dom.querySelectorAllWithin |
{ parentHandleId, selector } |
[handleId, ...] |
dom.waitForSelector |
{ selector, timeout? } |
handleId or null |
dom.boundingBox |
{ handleId | selector } |
{ x, y, width, height } or null |
dom.click |
{ handleId | selector, clickCount?, avoid? } |
{ clicked: true } or { clicked: false, reason } |
dom.mouseMoveTo |
{ handleId | selector } |
{ x, y } |
dom.focus |
{ handleId | selector } |
{ focused: true } |
dom.type |
{ text, handleId?, selector? } |
{ typed: true } |
dom.keyPress |
{ key } |
{ pressed: true } |
dom.keyDown |
{ key } |
{ down: true } |
dom.keyUp |
{ key } |
{ up: true } |
dom.scroll |
{ handleId? | selector?, direction?, amount?, behavior? } |
{ scrolled: true, before, after, target } |
dom.setValue |
{ handleId | selector, value } |
{ set: true } |
dom.getAttribute |
{ handleId | selector, name } |
string or null |
dom.getProperty |
{ handleId | selector, name } |
any |
dom.evaluate |
{ fn, args? } |
any |
dom.elementEvaluate |
{ handleId, fn, args? } |
any |
dom.evaluateHandle |
{ fn, args?, elementMarkers? } |
{ type, handleId?, value?, properties? } |
dom.getHTML |
{} |
{ html, title, url } |
dom.elementHTML |
{ handleId, limit? } |
{ outer, inner, tag } |
dom.queryAllInfo |
{ selector } |
[{ handleId, tag, id, cls, text, label }] |
dom.batchQuery |
{ selectors: [...] } |
{ [selector]: boolean } |
dom.findScrollable |
{} |
[{ handleId, tag, id, cls, overflowY, overflow, scrollHeight, clientHeight, children, text }] |
dom.discoverElements |
{} |
{ elements, cursor, viewport, scrollY } |
dom.setDebug |
{ enabled } |
{ debug: boolean } |
Keyboard Names
Valid key names for dom.keyPress, dom.keyDown, dom.keyUp:
Meta, Control, Shift, Alt, Enter, Tab, Escape, Backspace, Delete, Space, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Home, End, PageUp, PageDown, single characters, or code forms like KeyA and Digit5.
Human Actions
Human commands add safety checks and use injected human.* config. Public config sections: human.cursor, human.click, human.type, human.scroll, human.avoid.
human.click
{
"action": "human.click",
"params": {
"handleId": "el_42",
"avoid": {
"selectors": [".premium-upsell"],
"classes": ["sponsored"],
"ids": ["popup-cta"],
"attributes": { "data-ad": "*" }
}
}
}
Returns { "clicked": true } on success, or { "clicked": false, "reason": "...", "detail": "..." } when blocked.
Built-in Safety Checks
- Avoid rules
aria-hidden- Missing
offsetParent - Honeypot class patterns
opacity: 0visibility: hidden- Sub-pixel size
- Missing bounding box
- Scroll into view
- Optional drift-away behavior from
human.cursor - Bezier path using
human.cursor - Think delay from
human.click - Disappearance check
- Shift check
mousedown → mouseup → clickat actual cursor coordinates
overshootRatio: 0, jitterRatio: 0, stutterChance: 0, driftThresholdPx: 0.
human.type
{
"action": "human.type",
"params": {
"text": "Hello world",
"selector": "#search-input"
}
}
Returns { "typed": true } or { "typed": false, "reason": "avoided" }. Typing cadence comes from human.type config. Public defaults are very fast: baseDelayMin: 8, baseDelayMax: 20, pauseChance: 0.
human.scroll
{
"action": "human.scroll",
"params": {
"handleId": "el_7",
"direction": "down"
}
}
Accepts handleId, selector, or neither. Returns { "scrolled": true, "amount": 487 }.
human.clearInput
{
"action": "human.clearInput",
"params": { "selector": "#email-input" }
}
Returns { "cleared": true } or a click failure response. Behavior: safe click to focus, triple-click to select, backspace/delete sequence.
Avoid Rules
All human.* commands accept an avoid object. Per-request avoid merges with global human.avoid config.
{
"avoid": {
"selectors": [".cookie-banner", "#popup button"],
"classes": ["sponsored", "ad-slot"],
"ids": ["newsletter-signup"],
"attributes": { "data-ad": "*", "data-tracking": "*" }
}
}
Events
The server pushes events over the same WebSocket connection.
response
{
"type": "event",
"event": "response",
"data": {
"url": "https://...",
"status": 200,
"tabId": 123,
"method": "GET"
}
}
urlChanged
{
"type": "event",
"event": "urlChanged",
"data": {
"tabId": 123,
"url": "https://..."
}
}
cookiesChanged
{
"type": "event",
"event": "cookiesChanged",
"data": {
"cookies": [...],
"count": 42
}
}
CSP
Two execution contexts exist: the ISOLATED world (DOM-safe, CSP-safe) and the MAIN world (page globals, may be blocked by CSP).
Commands that work under CSP
dom.querySelectordom.querySelectorAlldom.getHTMLhuman.clickhuman.typehuman.scrolldom.click
dom.evaluate and dom.elementEvaluate try MAIN first and fall back when possible.
Python Client Example
import asyncio, json, uuid, websockets async def main(): async with websockets.connect('ws://localhost:7331') as ws: async def send(action, params={}, tab_id=None): msg = {"id": str(uuid.uuid4()), "action": action, "params": params} if tab_id: msg["tabId"] = tab_id await ws.send(json.dumps(msg)) resp = json.loads(await ws.recv()) if "error" in resp: raise Exception(resp["error"]) return resp["result"] tabs = await send("tabs.list") tab = tabs[0]["id"] await send("tabs.navigate", {"url": "https://example.com"}, tab) handle = await send("dom.querySelector", {"selector": "h1"}, tab) print(await send("human.click", {"handleId": handle}, tab)) await send("human.type", {"text": "Hello", "selector": "#input"}, tab) asyncio.run(main())
Limits
- Defaults are for demonstration and development, not for behavior parity
- The browser tool does not decide workflows
- The user or LLM still has to choose selectors, waits, retries, and verification steps
dom.evaluatemay hit CSP restrictions on some sites. DOM reading and interaction still work through the isolated content-script path.
License
Apache 2.0