Webpilot

CDP-free browser automation. Real Chrome, DOM-first, WebSocket protocol.

Webpilot is the only browser automation tool that doesn't speak the Chrome DevTools Protocol. It launches a real Chromium-based browser with a local extension runtime and exposes a WebSocket protocol, letting a user, script, or LLM drive that browser through the same command surface. Because there's no CDP attachment, there's no debugger flag for bot detectors to catch, no synthetic profile, no headless throwaway. It's just Chrome with an extension.

The primary interface is the live DOM, not screenshots. discover, html, and q give you real page structure, real selectors, and real handles. Screenshots exist as a fallback for when layout or visual rendering is the actual question. For everything else, read the DOM.

What Webpilot does

Starts and controls a real browser; no CDP, no detectable debugging port
Exposes the live DOM directly: navigation, element discovery, querying, interaction, cookies
Provides configurable cursor, click, typing, and scroll behavior
Works from the CLI, raw WebSocket, Node, or as a Gemini CLI extension

What Webpilot does not do

Decide what to do next
Ship a tuned human profile
Ship site strategy, retries, or route doctrine

Install

bash

npm install -g @h17/webpilot
# or, equivalently:
npm install -g h17-webpilot

Quick Start

1. Start

bash

webpilot start

If no config exists, the first run will detect installed browsers, ask you to choose one, and generate ~/h17-webpilot/config.js.

Use webpilot start -d for an append-only session log (~/h17-webpilot/webpilot.log by default).

2. Use the tool

bash

webpilot -c 'go example.com'
webpilot -c 'discover'
webpilot -c 'click h1'
webpilot -c 'wait h1'
webpilot -c 'html'
webpilot -c 'cookies load ./cookies.json'

Use the same loop every time: inspect, act, verify.

CLI

bash

webpilot                        # interactive REPL
webpilot -c 'go example.com'   # single command
webpilot start                  # launch browser + WS server
webpilot start -d               # launch with session logging
webpilot stop                   # stop running server

Core Commands

Command	Description
`go <url>`	Navigate to a URL
`discover`	List interactive elements with handles and CSS selectors
`q <selector>` / `query <selector>`	Query elements by CSS selector
`wait <selector>`	Wait for a selector to appear
`click <selector\|handleId>`	Safe click on an element
`type [selector] <text>`	Type with configured profile
`clear <selector>`	Clear an input field
`key <name>` / `press <name>`	Send a key press
`sd [px] [selector]` / `su [px] [selector]`	Scroll down / scroll up
`html`	Read page HTML
`ss`	Save a screenshot; use when layout or visual rendering is the question, not DOM structure
`cookies` / `cookies load <file>`	Dump or load cookies
`frames`	List frames

Examples

bash

# navigate
webpilot -c 'go https://example.com'

# discover interactive elements
webpilot -c 'discover'

# query by CSS selector
webpilot -c 'q "button.submit"'

# click an element
webpilot -c 'click h3'

# type into an input
webpilot -c 'type "input[name=q]" hello world'

# press a key
webpilot -c 'key Enter'

# scroll down 300px
webpilot -c 'sd 300'

# read page HTML
webpilot -c 'html'

# save a screenshot
webpilot -c 'ss'

# dump cookies
webpilot -c 'cookies'

Raw Mode

For direct access to capability groups, use the raw action syntax or pass raw JSON.

Action syntax

bash

webpilot -c 'human.click {"selector": "button[type=submit]"}'

Raw JSON

bash

webpilot -c '{"action": "dom.getHTML", "params": {}}'

WebSocket Protocol

Connect to the WebSocket server and send JSON messages to control the browser programmatically from any language.

json

// Connect to ws://localhost:7331

{
  "id": "1",
  "action": "tabs.navigate",
  "params": {
    "url": "https://example.com"
  }
}

Capability groups

Group	Description
`tabs`	Tab navigation and management
`dom`	DOM querying, reading, and manipulation
`human`	Human-like click, type, scroll interactions
`cookies`	Cookie dump and load
`events`	Event listening and dispatch
`framework`	Runtime and debug controls

Node API

The Node API is a wrapper over the same WebSocket protocol.

javascript

const { startWithPage } = require('h17-webpilot');
const { page } = await startWithPage();

Available methods

Method	Legacy alias	Description
`navigate(url)`	`goto(url)`	Navigate to a URL
`query(selector)`	`$(selector)`	Query a single element
`queryAll(selector)`	`$$(selector)`	Query all matching elements
`waitFor(selector)`	`waitForSelector(selector)`	Wait for a selector to appear
`read()`	`content()`	Read the page HTML
`click(...)`	`humanClick(...)`	Click an element
`type(...)`	`humanType(...)`	Type text into an element
`scroll(...)`	`humanScroll(...)`	Scroll the page
`clearInput(...)`	`humanClearInput(...)`	Clear an input field
`pressKey(key)`		Send a key press
`configure(config)`	`setConfig(config)`	Update runtime configuration

Full example

javascript

const { startWithPage } = require('h17-webpilot');

const { page } = await startWithPage();
await page.navigate('https://example.com');
await page.query('h1');
await page.click('h1');
await page.waitFor('body');

Config File

Config is loaded from ~/h17-webpilot/config.js (or config.json). Override with --config <path>.

Public config is split into two sections:

Section	Controls
`framework`	Runtime behavior, debug toggles, handle retention
`human`	Cursor, click, typing, scroll, and avoid rules

The public package exposes a lot of knobs on purpose. The user decides how much to tune. The package does not ship a strong profile.

javascript

module.exports = {
  framework: {
    debug: {
      cursor: true,
      sessionLogPath: '~/h17-webpilot/webpilot.log',
    },
  },
  human: {
    calibrated: false,
    profileName: 'public-default',
    cursor: {
      spreadRatio: 0.16,
      jitterRatio: 0,
      stutterChance: 0,
      driftThresholdPx: 0,
      overshootRatio: 0,
    },
    click: {
      thinkDelayMin: 35,
      thinkDelayMax: 90,
      maxShiftPx: 50,
    },
    type: {
      baseDelayMin: 8,
      baseDelayMax: 20,
      variance: 4,
      pauseChance: 0,
      pauseMin: 0,
      pauseMax: 0,
    },
  },
};

Human Behavior

Defaults are intentionally non-human: fast typing, no overshoot, no jitter. These are for demonstration, not behavior parity. Configure the human section to match your use case.

The human config section controls how interactions behave:

cursor: movement speed, overshoot, path jitter
click: pre/post delays, shift tolerance
typing: speed, variance, pause behavior
scroll: speed, acceleration, drift
avoid: element avoidance rules

These defaults do not represent a human profile: typing is very fast, overshoot is off, jitter is off, drift is off. They are there to show what is configurable. The package does not ship your final values.

Boot Config

Webpilot can load state on startup from config before the user or LLM sends any commands.

javascript

module.exports = {
  browser: "/Applications/Chromium.app/Contents/MacOS/Chromium",
  boot: {
    cookiesPath: './cookies.json',
    commands: [
      'go https://hugopalma.work',
      'cookies load ./cookies.json',
      { action: 'framework.getConfig', params: {} }
    ],
  },
};

Rules:

boot.cookiesPath loads a cookie jar before commands run
boot.commands accepts CLI-style strings
boot.commands also accepts raw command objects: { action, params, tabId? }
String commands support cookies load <file> in addition to normal shorthands

Tested Browsers

Chromium
Helium
Google Chrome

Operating Loop

Every browser interaction follows the same three-phase cycle: Inspect, Act, Verify. Do not skip the inspect step unless you already have fresh page state from the immediately preceding command.

Quoting Rule

Always double-quote the entire -c argument when it contains spaces or special characters. Single-word commands can omit quotes. Never use single quotes. Never nest unescaped quotes inside the -c argument; escaped quotes (e.g., {\"selector\": \"a[href]\"}) are permitted when required by the Raw Protocol.

bash

webpilot -c "type #input hello world"
webpilot -c "click #submit"
webpilot -c "go https://example.com"

# single-word commands can omit quotes
webpilot -c html
webpilot -c discover
webpilot -c ss

1. Inspect

Read page state before doing anything.

Command	Purpose
`html`	Read the current page DOM, title, and URL
`discover`	List interactive elements, their handles, and CSS selectors
`q <selector>`	Query specific elements by CSS selector
`wait <selector>`	Wait for a known state change
`ss`	Last resort, only when layout or visual rendering is the actual question

2. Act

Use the safest matching action.

Command	Purpose
`click <selector\|handleId>`	Safe click through the human action pipeline
`type <cssSelector> <text>`	Type with configured profile
`clear <selector>`	Clear an input field
`key <name>`	Send a key press
`sd [px] [selector]` / `su [px] [selector]`	Scroll down / up
`go <url>`	Navigate to a URL
`cookies load <file>`	Restore an existing session

type selector rule: type auto-detects selectors by their first character: #, ., or [. Handle IDs (el_*) are not recognized by type and will be typed as literal text. Always use the CSS selector from discover output (e.g. #APjFqb, .search-input, [name=q]), not the handle ID.

type requires a preceding click: type is supposed to chain a click internally, but this does not always work. Always click the target element first, then type into it.

bash

webpilot -c "click #APjFqb"
webpilot -c "type #APjFqb hello world"

click accepts both handle IDs and CSS selectors. Always discover or q immediately before interacting so handles and selectors are fresh.

If the runtime refuses an action, respect the refusal and re-inspect the page.

3. Verify

After navigation or interaction, confirm the new state.

Command	Purpose
`wait <selector>`	Wait for expected element
`url`	Check the current URL
`title`	Check the page title
`html`	Re-read the DOM
`q <selector>`	Query for expected elements

Safe Usage

These rules apply to all LLM-driven browser sessions. Violating them leads to incorrect actions, wasted retries, and broken flows.

Never guess selectors when html or discover can tell you the real ones.
Never assume a click worked. Verify it.
Never treat { "clicked": false } as something to brute-force through.
Never confuse DOM reading with interaction. Read first, then act.
Re-query stale handles instead of reusing them blindly.
Do not use eval; it hits CSP on most sites.

Strategy Notes

Use html, discover, and q for DOM inspection (avoid eval due to CSP).
Use wait after page-changing actions.
Use handle IDs for click, CSS selectors for type.
Use screenshots when layout or visibility is the uncertainty, not HTML structure.
If the task needs a preloaded authenticated session, load cookies first or use config boot commands.

Raw Protocol from LLM Context

If you need a protocol action that the shorthand CLI does not expose directly, send it through raw mode:

bash

webpilot -c "dom.queryAllInfo {\"selector\": \"a[href]\"}"
webpilot -c "human.scroll {\"selector\": \".feed\", \"direction\": \"down\"}"
webpilot -c "framework.getConfig {}"

You can also send a full JSON message:

bash

webpilot -c "{\"action\": \"tabs.navigate\", \"params\": {\"url\": \"https://example.com\"}}"

Connection

Webpilot exposes browser control over WebSocket. The protocol tells the browser what to do. It does not decide the task; the user or LLM chooses the sequence of actions.

text

WebSocket: ws://localhost:7331

The local server listens on this port. The extension connects on launch. Your client connects to the same server.

Runtime Config

The extension runtime exposes configuration controls:

Action	Params	Returns
`framework.setConfig`	`{ config: { handles?, debug? } }`	`{ ok: true, framework }`
`framework.getConfig`	`{}`	`{ framework, version }`
`framework.reload`	`{}`	`{ reloading: true }`

Normal usage: Node loads ~/h17-webpilot/config.js (or path passed via --config), the server injects framework and human settings into commands, and clients can still call framework.setConfig and framework.getConfig directly.

The public package ships example defaults. They do not represent a human profile, and advanced cursor tuning or realistic timing still depends on user configuration.

Message Format

Request

json

{
  "id": "unique-string",
  "tabId": 123,
  "action": "action.name",
  "params": {}
}

id: correlate responses to requests
tabId: optional target tab
action: protocol action name
params: action-specific payload

Response

json

// success
{ "id": "same-id", "result": {} }

// error
{ "id": "same-id", "error": "error message" }

Keepalive

The server sends { "type": "ping" } every 20 seconds. The extension responds with { "type": "pong" }.

Handles

Many DOM commands return handleId values like el_42.

Created by dom.querySelector, dom.querySelectorAll, dom.waitForSelector
Used by dom.click, dom.boundingBox, human.click, and related commands
Stored with WeakRef and cleaned up after TTL or GC
If both handleId and selector are provided, handleId wins

Tabs Actions

Action	Params	Returns
`tabs.list`	`{}`	`[{ id, url, title, active, windowId, index }]`
`tabs.navigate`	`{ url }`	`{ success: true }`
`tabs.create`	`{ url? }`	`{ id, url, title }`
`tabs.close`	`{}`	`{ success: true }`
`tabs.activate`	`{}`	`{ success: true }`
`tabs.reload`	`{}`	`{ success: true }`
`tabs.waitForNavigation`	`{ timeout? }`	`{ success: true }`
`tabs.setViewport`	`{ width, height }`	`{ success: true }`
`tabs.screenshot`	`{ fullPage? }`	`{ dataUrl }`

Frames Actions

Action	Params	Returns
`frames.list`	`{}`	`[{ frameId, parentFrameId, url }]`

Cookies Actions

Action	Params	Returns
`cookies.getAll`	`{ url? }`	`[{ name, value, domain, ... }]`
`cookies.set`	`{ cookie: { name, value, domain?, path?, secure?, httpOnly?, sameSite?, expires? } }`	`{ success: true }`

DOM Actions

dom.click uses the same safe interaction pipeline as human.click. Other dom.* commands are direct DOM/runtime operations.

Action	Params	Returns
`dom.querySelector`	`{ selector }`	`handleId` or `null`
`dom.querySelectorAll`	`{ selector }`	`[handleId, ...]`
`dom.querySelectorWithin`	`{ parentHandleId, selector }`	`handleId` or `null`
`dom.querySelectorAllWithin`	`{ parentHandleId, selector }`	`[handleId, ...]`
`dom.waitForSelector`	`{ selector, timeout? }`	`handleId` or `null`
`dom.boundingBox`	`{ handleId \| selector }`	`{ x, y, width, height }` or `null`
`dom.click`	`{ handleId \| selector, clickCount?, avoid? }`	`{ clicked: true }` or `{ clicked: false, reason }`
`dom.mouseMoveTo`	`{ handleId \| selector }`	`{ x, y }`
`dom.focus`	`{ handleId \| selector }`	`{ focused: true }`
`dom.type`	`{ text, handleId?, selector? }`	`{ typed: true }`
`dom.keyPress`	`{ key }`	`{ pressed: true }`
`dom.keyDown`	`{ key }`	`{ down: true }`
`dom.keyUp`	`{ key }`	`{ up: true }`
`dom.scroll`	`{ handleId? \| selector?, direction?, amount?, behavior? }`	`{ scrolled: true, before, after, target }`
`dom.setValue`	`{ handleId \| selector, value }`	`{ set: true }`
`dom.getAttribute`	`{ handleId \| selector, name }`	string or `null`
`dom.getProperty`	`{ handleId \| selector, name }`	any
`dom.evaluate`	`{ fn, args? }`	any
`dom.elementEvaluate`	`{ handleId, fn, args? }`	any
`dom.evaluateHandle`	`{ fn, args?, elementMarkers? }`	`{ type, handleId?, value?, properties? }`
`dom.getHTML`	`{}`	`{ html, title, url }`
`dom.elementHTML`	`{ handleId, limit? }`	`{ outer, inner, tag }`
`dom.queryAllInfo`	`{ selector }`	`[{ handleId, tag, id, cls, text, label }]`
`dom.batchQuery`	`{ selectors: [...] }`	`{ [selector]: boolean }`
`dom.findScrollable`	`{}`	`[{ handleId, tag, id, cls, overflowY, overflow, scrollHeight, clientHeight, children, text }]`
`dom.discoverElements`	`{}`	`{ elements, cursor, viewport, scrollY }`
`dom.setDebug`	`{ enabled }`	`{ debug: boolean }`

Keyboard Names

Valid key names for dom.keyPress, dom.keyDown, dom.keyUp:

Meta, Control, Shift, Alt, Enter, Tab, Escape, Backspace, Delete, Space, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, Home, End, PageUp, PageDown, single characters, or code forms like KeyA and Digit5.

Human Actions

Human commands add safety checks and use injected human.* config. Public config sections: human.cursor, human.click, human.type, human.scroll, human.avoid.

human.click

json

{
  "action": "human.click",
  "params": {
    "handleId": "el_42",
    "avoid": {
      "selectors": [".premium-upsell"],
      "classes": ["sponsored"],
      "ids": ["popup-cta"],
      "attributes": { "data-ad": "*" }
    }
  }
}

Returns { "clicked": true } on success, or { "clicked": false, "reason": "...", "detail": "..." } when blocked.

Built-in Safety Checks

Avoid rules
aria-hidden
Missing offsetParent
Honeypot class patterns
opacity: 0
visibility: hidden
Sub-pixel size
Missing bounding box
Scroll into view
Optional drift-away behavior from human.cursor
Bezier path using human.cursor
Think delay from human.click
Disappearance check
Shift check
mousedown → mouseup → click at actual cursor coordinates

Public defaults ship with advanced cursor tricks off or near zero: overshootRatio: 0, jitterRatio: 0, stutterChance: 0, driftThresholdPx: 0.

human.type

json

{
  "action": "human.type",
  "params": {
    "text": "Hello world",
    "selector": "#search-input"
  }
}

Returns { "typed": true } or { "typed": false, "reason": "avoided" }. Typing cadence comes from human.type config. Public defaults are very fast: baseDelayMin: 8, baseDelayMax: 20, pauseChance: 0.

human.scroll

json

{
  "action": "human.scroll",
  "params": {
    "handleId": "el_7",
    "direction": "down"
  }
}

Accepts handleId, selector, or neither. Returns { "scrolled": true, "amount": 487 }.

human.clearInput

json

{
  "action": "human.clearInput",
  "params": { "selector": "#email-input" }
}

Returns { "cleared": true } or a click failure response. Behavior: safe click to focus, triple-click to select, backspace/delete sequence.

Avoid Rules

All human.* commands accept an avoid object. Per-request avoid merges with global human.avoid config.

json

{
  "avoid": {
    "selectors": [".cookie-banner", "#popup button"],
    "classes": ["sponsored", "ad-slot"],
    "ids": ["newsletter-signup"],
    "attributes": { "data-ad": "*", "data-tracking": "*" }
  }
}

Events

The server pushes events over the same WebSocket connection.

response

json

{
  "type": "event",
  "event": "response",
  "data": {
    "url": "https://...",
    "status": 200,
    "tabId": 123,
    "method": "GET"
  }
}

urlChanged

json

{
  "type": "event",
  "event": "urlChanged",
  "data": {
    "tabId": 123,
    "url": "https://..."
  }
}

cookiesChanged

json

{
  "type": "event",
  "event": "cookiesChanged",
  "data": {
    "cookies": [...],
    "count": 42
  }
}

CSP

Two execution contexts exist: the ISOLATED world (DOM-safe, CSP-safe) and the MAIN world (page globals, may be blocked by CSP).

Commands that work under CSP

dom.querySelector
dom.querySelectorAll
dom.getHTML
human.click
human.type
human.scroll
dom.click

dom.evaluate and dom.elementEvaluate try MAIN first and fall back when possible.

Python Client Example

python

import asyncio, json, uuid, websockets

async def main():
    async with websockets.connect('ws://localhost:7331') as ws:
        async def send(action, params={}, tab_id=None):
            msg = {"id": str(uuid.uuid4()), "action": action, "params": params}
            if tab_id:
                msg["tabId"] = tab_id
            await ws.send(json.dumps(msg))
            resp = json.loads(await ws.recv())
            if "error" in resp:
                raise Exception(resp["error"])
            return resp["result"]

        tabs = await send("tabs.list")
        tab = tabs[0]["id"]
        await send("tabs.navigate", {"url": "https://example.com"}, tab)
        handle = await send("dom.querySelector", {"selector": "h1"}, tab)
        print(await send("human.click", {"handleId": handle}, tab))
        await send("human.type", {"text": "Hello", "selector": "#input"}, tab)

asyncio.run(main())

Limits

Defaults are for demonstration and development, not for behavior parity
The browser tool does not decide workflows
The user or LLM still has to choose selectors, waits, retries, and verification steps
dom.evaluate may hit CSP restrictions on some sites. DOM reading and interaction still work through the isolated content-script path.

License

Apache 2.0