Webpilot
v0.5.1

Webpilot

A browser tool. Real Chromium, DOM-first, WebSocket protocol.

Webpilot launches a real Chromium-based browser with a local extension runtime, exposes a WebSocket protocol, and lets a user, script, or LLM drive that browser through the same command surface.

The primary interface is the live DOM, not screenshots. discover, html, and q give you real page structure, real selectors, and real handles. Screenshots exist as a fallback for when layout or visual rendering is the actual question. For everything else, read the DOM.

What Webpilot does

What Webpilot does not do

Install

bash
npm install -g h17-webpilot

Quick Start

1. Start

bash
webpilot start

If no config exists, the first run will detect installed browsers, ask you to choose one, and generate ~/h17-webpilot/config.js.

Use webpilot start -d for an append-only session log (~/h17-webpilot/webpilot.log by default).

2. Use the tool

bash
webpilot -c 'go example.com'
webpilot -c 'discover'
webpilot -c 'click h1'
webpilot -c 'wait h1'
webpilot -c 'html'
webpilot -c 'cookies load ./cookies.json'

Use the same loop every time: inspect, act, verify.

CLI

bash
webpilot                        # interactive REPL
webpilot -c 'go example.com'   # single command
webpilot start                  # launch browser + WS server
webpilot start -d               # launch with session logging
webpilot stop                   # stop running server

Core Commands

Command Description
go <url> Navigate to a URL
discover List interactive elements with handles and CSS selectors
q <selector> / query <selector> Query elements by CSS selector
wait <selector> Wait for a selector to appear
click <selector|handleId> Safe click on an element
type [selector] <text> Type with configured profile
clear <selector> Clear an input field
key <name> / press <name> Send a key press
sd [px] [selector] / su [px] [selector] Scroll down / scroll up
html Read page HTML
ss Save a screenshot; use when layout or visual rendering is the question, not DOM structure
cookies / cookies load <file> Dump or load cookies
frames List frames

Examples

bash
# navigate
webpilot -c 'go https://example.com'

# discover interactive elements
webpilot -c 'discover'

# query by CSS selector
webpilot -c 'q "button.submit"'

# click an element
webpilot -c 'click h3'

# type into an input
webpilot -c 'type "input[name=q]" hello world'

# press a key
webpilot -c 'key Enter'

# scroll down 300px
webpilot -c 'sd 300'

# read page HTML
webpilot -c 'html'

# save a screenshot
webpilot -c 'ss'

# dump cookies
webpilot -c 'cookies'

Raw Mode

For direct access to capability groups, use the raw action syntax or pass raw JSON.

Action syntax

bash
webpilot -c 'human.click {"selector": "button[type=submit]"}'

Raw JSON

bash
webpilot -c '{"action": "dom.getHTML", "params": {}}'

WebSocket Protocol

Connect to the WebSocket server and send JSON messages to control the browser programmatically from any language.

json
// Connect to ws://localhost:7331

{
  "id": "1",
  "action": "tabs.navigate",
  "params": {
    "url": "https://example.com"
  }
}

Capability groups

Group Description
tabs Tab navigation and management
dom DOM querying, reading, and manipulation
human Human-like click, type, scroll interactions
cookies Cookie dump and load
events Event listening and dispatch
framework Runtime and debug controls

Node API

The Node API is a wrapper over the same WebSocket protocol.

javascript
const { startWithPage } = require('h17-webpilot');
const { page } = await startWithPage();

Available methods

Method Legacy alias Description
navigate(url) goto(url) Navigate to a URL
query(selector) $(selector) Query a single element
queryAll(selector) $$(selector) Query all matching elements
waitFor(selector) waitForSelector(selector) Wait for a selector to appear
read() content() Read the page HTML
click(...) humanClick(...) Click an element
type(...) humanType(...) Type text into an element
scroll(...) humanScroll(...) Scroll the page
clearInput(...) humanClearInput(...) Clear an input field
pressKey(key) Send a key press
configure(config) setConfig(config) Update runtime configuration

Full example

javascript
const { startWithPage } = require('h17-webpilot');

const { page } = await startWithPage();
await page.navigate('https://example.com');
await page.query('h1');
await page.click('h1');
await page.waitFor('body');

Config File

Config is loaded from ~/h17-webpilot/config.js (or config.json). Override with --config <path>.

Public config is split into two sections:

Section Controls
framework Runtime behavior, debug toggles, handle retention
human Cursor, click, typing, scroll, and avoid rules

The public package exposes a lot of knobs on purpose. The user decides how much to tune. The package does not ship a strong profile.

javascript
module.exports = {
  framework: {
    debug: {
      cursor: true,
      sessionLogPath: '~/h17-webpilot/webpilot.log',
    },
  },
  human: {
    calibrated: false,
    profileName: 'public-default',
    cursor: {
      spreadRatio: 0.16,
      jitterRatio: 0,
      stutterChance: 0,
      driftThresholdPx: 0,
      overshootRatio: 0,
    },
    click: {
      thinkDelayMin: 35,
      thinkDelayMax: 90,
      maxShiftPx: 50,
    },
    type: {
      baseDelayMin: 8,
      baseDelayMax: 20,
      variance: 4,
      pauseChance: 0,
      pauseMin: 0,
      pauseMax: 0,
    },
  },
};

Human Behavior

Defaults are intentionally non-human: fast typing, no overshoot, no jitter. These are for demonstration, not behavior parity. Configure the human section to match your use case.

The human config section controls how interactions behave:

These defaults do not represent a human profile: typing is very fast, overshoot is off, jitter is off, drift is off. They are there to show what is configurable. The package does not ship your final values.

Boot Config

Webpilot can load state on startup from config before the user or LLM sends any commands.

javascript
module.exports = {
  browser: "/Applications/Chromium.app/Contents/MacOS/Chromium",
  boot: {
    cookiesPath: './cookies.json',
    commands: [
      'go https://hugopalma.work',
      'cookies load ./cookies.json',
      { action: 'framework.getConfig', params: {} }
    ],
  },
};

Rules:

Tested Browsers

Operating Loop

Every browser interaction follows the same three-phase cycle: Inspect, Act, Verify. Do not skip the inspect step unless you already have fresh page state from the immediately preceding command.

Quoting Rule

Always double-quote the entire -c argument when it contains spaces or special characters. Single-word commands can omit quotes. Never use single quotes. Never nest quotes inside the -c argument.

bash
webpilot -c "type #input hello world"
webpilot -c "click #submit"
webpilot -c "go https://example.com"

# single-word commands can omit quotes
webpilot -c html
webpilot -c discover
webpilot -c ss

1. Inspect

Read page state before doing anything.

Command Purpose
html Read the current page DOM, title, and URL
discover List interactive elements, their handles, and CSS selectors
q <selector> Query specific elements by CSS selector
wait <selector> Wait for a known state change
ss Last resort, only when layout or visual rendering is the actual question

2. Act

Use the safest matching action.

Command Purpose
click <selector|handleId> Safe click through the human action pipeline
type <cssSelector> <text> Type with configured profile
clear <selector> Clear an input field
key <name> Send a key press
sd [px] [selector] / su [px] [selector] Scroll down / up
go <url> Navigate to a URL
cookies load <file> Restore an existing session
type selector rule: type auto-detects selectors by their first character: #, ., or [. Handle IDs (el_*) are not recognized by type and will be typed as literal text. Always use the CSS selector from discover output (e.g. #APjFqb, .search-input, [name=q]), not the handle ID.
type requires a preceding click: type is supposed to chain a click internally, but this does not always work. Always click the target element first, then type into it.
bash
webpilot -c "click #APjFqb"
webpilot -c "type #APjFqb hello world"

click accepts both handle IDs and CSS selectors. Always discover or q immediately before interacting so handles and selectors are fresh.

If the runtime refuses an action, respect the refusal and re-inspect the page.

3. Verify

After navigation or interaction, confirm the new state.

Command Purpose
wait <selector> Wait for expected element
url Check the current URL
title Check the page title
html Re-read the DOM
q <selector> Query for expected elements

Safe Usage

These rules apply to all LLM-driven browser sessions. Violating them leads to incorrect actions, wasted retries, and broken flows.
  1. Never guess selectors when html or discover can tell you the real ones.
  2. Never assume a click worked. Verify it.
  3. Never treat { "clicked": false } as something to brute-force through.
  4. Never confuse DOM reading with interaction. Read first, then act.
  5. Re-query stale handles instead of reusing them blindly.
  6. Do not use eval; it hits CSP on most sites.

Strategy Notes

Raw Protocol from LLM Context

If you need a protocol action that the shorthand CLI does not expose directly, send it through raw mode:

bash
webpilot -c "dom.queryAllInfo {\"selector\": \"a[href]\"}"
webpilot -c "human.scroll {\"selector\": \".feed\", \"direction\": \"down\"}"
webpilot -c "framework.getConfig {}"

You can also send a full JSON message:

bash
webpilot -c "{\"action\": \"tabs.navigate\", \"params\": {\"url\": \"https://example.com\"}}"

MCP Server

An MCP adapter is available for environments that support the Model Context Protocol (e.g. Claude Desktop, Cursor, Windsurf). The MCP server connects to the same WebSocket runtime at ws://localhost:7331.

Start the runtime first with webpilot start, then the MCP tools become available in the host application automatically.

Claude Desktop Config

Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json:

json
{
  "mcpServers": {
    "webpilot": {
      "command": "npx",
      "args": ["webpilot-mcp"]
    }
  }
}

Connection

Webpilot exposes browser control over WebSocket. The protocol tells the browser what to do. It does not decide the task; the user or LLM chooses the sequence of actions.

text
WebSocket: ws://localhost:7331

The local server listens on this port. The extension connects on launch. Your client connects to the same server.

Runtime Config

The extension runtime exposes configuration controls:

Action Params Returns
framework.setConfig { config: { handles?, debug? } } { ok: true, framework }
framework.getConfig {} { framework, version }
framework.reload {} { reloading: true }

Normal usage: Node loads ~/h17-webpilot/config.js (or path passed via --config), the server injects framework and human settings into commands, and clients can still call framework.setConfig and framework.getConfig directly.

The public package ships example defaults. They do not represent a human profile, and advanced cursor tuning or realistic timing still depends on user configuration.

Message Format

Request

json
{
  "id": "unique-string",
  "tabId": 123,
  "action": "action.name",
  "params": {}
}

Response

json
// success
{ "id": "same-id", "result": {} }

// error
{ "id": "same-id", "error": "error message" }

Keepalive

The server sends { "type": "ping" } every 20 seconds. The extension responds with { "type": "pong" }.

Handles

Many DOM commands return handleId values like el_42.

Tabs Actions

Action Params Returns
tabs.list {} [{ id, url, title, active, windowId, index }]
tabs.navigate { url } { success: true }
tabs.create { url? } { id, url, title }
tabs.close {} { success: true }
tabs.activate {} { success: true }
tabs.reload {} { success: true }
tabs.waitForNavigation { timeout? } { success: true }
tabs.setViewport { width, height } { success: true }
tabs.screenshot { fullPage? } { dataUrl }

Frames Actions

Action Params Returns
frames.list {} [{ frameId, parentFrameId, url }]

Cookies Actions

Action Params Returns
cookies.getAll { url? } [{ name, value, domain, ... }]
cookies.set { cookie: { name, value, domain?, path?, secure?, httpOnly?, sameSite?, expires? } } { success: true }

DOM Actions

dom.click uses the same safe interaction pipeline as human.click. Other dom.* commands are direct DOM/runtime operations.

Action Params Returns
dom.querySelector { selector } handleId or null
dom.querySelectorAll { selector } [handleId, ...]
dom.querySelectorWithin { parentHandleId, selector } handleId or null
dom.querySelectorAllWithin { parentHandleId, selector } [handleId, ...]
dom.waitForSelector { selector, timeout? } handleId or null
dom.boundingBox { handleId | selector } { x, y, width, height } or null
dom.click { handleId | selector, clickCount?, avoid? } { clicked: true } or { clicked: false, reason }
dom.mouseMoveTo { handleId | selector } { x, y }
dom.focus { handleId | selector } { focused: true }
dom.type { text, handleId?, selector? } { typed: true }
dom.keyPress { key } { pressed: true }
dom.keyDown { key } { down: true }
dom.keyUp { key } { up: true }
dom.scroll { handleId? | selector?, direction?, amount?, behavior? } { scrolled: true, before, after, target }
dom.setValue { handleId | selector, value } { set: true }
dom.getAttribute { handleId | selector, name } string or null
dom.getProperty { handleId | selector, name } any
dom.evaluate { fn, args? } any
dom.elementEvaluate { handleId, fn, args? } any
dom.evaluateHandle { fn, args?, elementMarkers? } { type, handleId?, value?, properties? }
dom.getHTML {} { html, title, url }
dom.elementHTML { handleId, limit? } { outer, inner, tag }
dom.queryAllInfo { selector } [{ handleId, tag, id, cls, text, label }]
dom.batchQuery { selectors: [...] } { [selector]: boolean }
dom.findScrollable {} [{ handleId, tag, id, cls, overflowY, overflow, scrollHeight, clientHeight, children, text }]
dom.discoverElements {} { elements, cursor, viewport, scrollY }
dom.setDebug { enabled } { debug: boolean }

Keyboard Names

Valid key names for dom.keyPress, dom.keyDown, dom.keyUp:

Meta, Control, Shift, Alt, Enter, Tab, Escape, Backspace, Delete, Space, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Home, End, PageUp, PageDown, single characters, or code forms like KeyA and Digit5.

Human Actions

Human commands add safety checks and use injected human.* config. Public config sections: human.cursor, human.click, human.type, human.scroll, human.avoid.

human.click

json
{
  "action": "human.click",
  "params": {
    "handleId": "el_42",
    "avoid": {
      "selectors": [".premium-upsell"],
      "classes": ["sponsored"],
      "ids": ["popup-cta"],
      "attributes": { "data-ad": "*" }
    }
  }
}

Returns { "clicked": true } on success, or { "clicked": false, "reason": "...", "detail": "..." } when blocked.

Built-in Safety Checks

  1. Avoid rules
  2. aria-hidden
  3. Missing offsetParent
  4. Honeypot class patterns
  5. opacity: 0
  6. visibility: hidden
  7. Sub-pixel size
  8. Missing bounding box
  9. Scroll into view
  10. Optional drift-away behavior from human.cursor
  11. Bezier path using human.cursor
  12. Think delay from human.click
  13. Disappearance check
  14. Shift check
  15. mousedown → mouseup → click at actual cursor coordinates
Public defaults ship with advanced cursor tricks off or near zero: overshootRatio: 0, jitterRatio: 0, stutterChance: 0, driftThresholdPx: 0.

human.type

json
{
  "action": "human.type",
  "params": {
    "text": "Hello world",
    "selector": "#search-input"
  }
}

Returns { "typed": true } or { "typed": false, "reason": "avoided" }. Typing cadence comes from human.type config. Public defaults are very fast: baseDelayMin: 8, baseDelayMax: 20, pauseChance: 0.

human.scroll

json
{
  "action": "human.scroll",
  "params": {
    "handleId": "el_7",
    "direction": "down"
  }
}

Accepts handleId, selector, or neither. Returns { "scrolled": true, "amount": 487 }.

human.clearInput

json
{
  "action": "human.clearInput",
  "params": { "selector": "#email-input" }
}

Returns { "cleared": true } or a click failure response. Behavior: safe click to focus, triple-click to select, backspace/delete sequence.

Avoid Rules

All human.* commands accept an avoid object. Per-request avoid merges with global human.avoid config.

json
{
  "avoid": {
    "selectors": [".cookie-banner", "#popup button"],
    "classes": ["sponsored", "ad-slot"],
    "ids": ["newsletter-signup"],
    "attributes": { "data-ad": "*", "data-tracking": "*" }
  }
}

Events

The server pushes events over the same WebSocket connection.

response

json
{
  "type": "event",
  "event": "response",
  "data": {
    "url": "https://...",
    "status": 200,
    "tabId": 123,
    "method": "GET"
  }
}

urlChanged

json
{
  "type": "event",
  "event": "urlChanged",
  "data": {
    "tabId": 123,
    "url": "https://..."
  }
}

cookiesChanged

json
{
  "type": "event",
  "event": "cookiesChanged",
  "data": {
    "cookies": [...],
    "count": 42
  }
}

CSP

Two execution contexts exist: the ISOLATED world (DOM-safe, CSP-safe) and the MAIN world (page globals, may be blocked by CSP).

Commands that work under CSP

dom.evaluate and dom.elementEvaluate try MAIN first and fall back when possible.

Python Client Example

python
import asyncio, json, uuid, websockets

async def main():
    async with websockets.connect('ws://localhost:7331') as ws:
        async def send(action, params={}, tab_id=None):
            msg = {"id": str(uuid.uuid4()), "action": action, "params": params}
            if tab_id:
                msg["tabId"] = tab_id
            await ws.send(json.dumps(msg))
            resp = json.loads(await ws.recv())
            if "error" in resp:
                raise Exception(resp["error"])
            return resp["result"]

        tabs = await send("tabs.list")
        tab = tabs[0]["id"]
        await send("tabs.navigate", {"url": "https://example.com"}, tab)
        handle = await send("dom.querySelector", {"selector": "h1"}, tab)
        print(await send("human.click", {"handleId": handle}, tab))
        await send("human.type", {"text": "Hello", "selector": "#input"}, tab)

asyncio.run(main())

Limits

License

Apache 2.0