browser-use-rs is a behavioral Rust port of
browser-use/browser-use, pinned to upstream commit
157779338afdcc03023010ec3c24ad63d820453c.
The port is not a class-by-class translation. The public model stays close to
browser-use: an agent observes browser state, asks an LLM for structured
actions, executes those actions in a browser, records history, and stops when a
done action or a guard condition says it should stop. The internals are split
into typed Rust crates and modules so the compatibility surface is explicit.
- Public behavior is a conformance contract. Schema shape, action names, browser-state semantics, prompt sections, and lifecycle diagnostics should change only with tests and a documented compatibility reason.
browser-use-coreowns agent behavior, not browser transport details.browser-use-cdpowns Chrome DevTools Protocol behavior, not prompt or LLM policy.- DOM and accessibility payloads are compact model inputs, not raw CDP dumps.
- Provider quirks belong in
browser-use-llm; action semantics belong inbrowser-use-toolsandbrowser-use-core. - Background tasks must be bounded, abortable by ownership drop, and observable through state or lifecycle diagnostics.
- Public root re-exports preserve the crate API while internal modules keep the implementation navigable.
| Crate | Responsibility |
|---|---|
browser-use-core |
Agent loop, prompts, history, settings, managed files, action execution, replay, usage accounting, checkpoints, callbacks. |
browser-use-cdp |
Chrome launch/attach, Browser Use Cloud sessions, CDP transport, browser profile mapping, DOM capture, action dispatch, downloads, storage state, HAR/video/trace artifacts, lifecycle and security watchdogs. |
browser-use-dom |
Compact DOM state types, selector maps, action-history element identity, text rendering, DOM rematch levels. |
browser-use-tools |
Built-in browser action schemas and the action registry contract. |
browser-use-llm |
Provider-neutral chat trait plus OpenAI-compatible, Anthropic, Gemini, and Ollama adapters. |
browser-use-cli |
Human CLI, local session store, daemon entrypoint, JSON surfaces. |
browser-use-mcp |
MCP stdio bridge backed by the CLI/session/core contracts. |
browser-use-conformance |
Golden fixture helpers and parity utilities. |
At the crate boundary, the system is a typed pipeline: browser state becomes a prompt, model output becomes typed actions, actions mutate the browser, and the result is appended to durable history.
flowchart LR
CDP["browser-use-cdp"] --> DOM["browser-use-dom"]
DOM --> Core["browser-use-core prompt"]
Core --> LLM["browser-use-llm"]
LLM --> Output["AgentOutput"]
Output --> Tools["browser-use-tools BrowserAction"]
Tools --> Exec["browser-use-core executor"]
Exec --> CDP
Exec --> History["AgentHistory"]
History --> Core
MCP["browser-use-mcp"] --> Core
CLI["browser-use-cli"] --> Core
The main agent path is:
Agent::run
-> execute configured initial actions once
-> capture BrowserStateSummary from BrowserSession
-> build a ChatRequest in browser-use-core::prompt
-> invoke ChatModel, optionally switching to fallback LLM
-> parse AgentOutput and restore shortened URLs
-> execute BrowserAction values through BrowserActionExecutor
-> update AgentHistory and usage summary
-> maybe compact history, judge done result, save transcript, save GIF
The CDP browser-state path is:
flowchart TD
State["CdpBrowserSession::state"] --> Policy["enforce URL policy"]
Policy --> Settle["wait for page-load settle"]
Settle --> Location["read URL/title/page metrics"]
Location --> DOMIndex["run DOM indexing JS"]
DOMIndex --> AX["join accessibility metadata"]
AX --> Iframes["merge iframe target DOM states"]
Iframes --> Cache["cache selector map for indexed actions"]
Cache --> Summary["BrowserStateSummary"]
In text form:
CdpBrowserSession::state
-> enforce URL policy and wait for page-load settle
-> read page location and page metrics
-> run DOM indexing JavaScript
-> join DOMSnapshot backend ids to Accessibility.getFullAXTree
-> parse compact SerializedDomState
-> merge same-origin and OOPIF iframe states
-> cache target-aware elements for stable actions
-> return BrowserStateSummary with lifecycle/security diagnostics
The CDP action path is:
BrowserAction
-> BrowserActionExecutor in core
-> BrowserSession trait method
-> CdpBrowserSession method
-> cached-node callFunctionOn when possible
-> index-based fallback when the cached node is stale or detached
-> URL-policy and page-change guards
browser-use-core is the agent contract crate. It re-exports the public types
that downstream callers use, while private modules hold the implementation.
| Module | Owns |
|---|---|
agent.rs |
Agent, AgentTask, checkpoints, run/step orchestration, pause/resume/stop, callbacks, fallback LLM switching, transcript and GIF output, final-response and judge handling. |
prompt.rs |
Step requests, final response requests, judge and compaction requests, action-output schemas, sensitive-data substitution, TOTP placeholders, previous-result rendering, prompt screenshot resizing, loop and budget warnings. |
history.rs |
AgentHistory, AgentHistoryItem, AgentOutput, action results, replay/rematch planning, compacted memory, usage summary shape, terminal-result helpers. |
settings.rs |
AgentSettings, vision modes, action and wait timeout coercion, message compaction settings, generated GIF settings, sensitive data values. |
file_system.rs |
Managed sandbox paths, file state serialization, text/PDF/DOCX/image file actions, result display helpers. |
executor.rs |
ActionExecutor, BrowserActionExecutor, browser action side effects, page extraction preparation, screenshot/PDF output helpers, replay execution helpers. |
urls.rs |
Task URL extraction, search URL building, prompt URL shortening, model-output URL restoration. |
usage.rs |
Token and cost aggregation from provider usage metadata. |
lib.rs |
Public re-exports and crate-level compatibility shims for tests and sibling modules. |
tests.rs |
Core compatibility tests for agent loop, prompt/schema shape, history/replay, executor behavior, managed files, settings, URLs, usage, and checkpoints. |
agent.rs is the only module that should decide agent control flow:
- when to call the model;
- when a step times out;
- when the fallback LLM is eligible;
- when a loop or max-failure guard stops the run;
- when callbacks run;
- when final-response-after-failure or final-step requests are used;
- when history compaction or judge validation runs.
Prompt wording and schemas stay in prompt.rs. Browser action side effects
stay in executor.rs through BrowserActionExecutor and the BrowserSession
trait.
tests.rs is a private child module of lib.rs so it can exercise
crate-private core seams without making those seams public. Keep cross-module
agent behavior and compatibility fixtures there; narrow module-local tests can
live beside the module they cover.
prompt.rs is the only core module that should know the model-facing prompt
layout. It deliberately owns both text rendering and JSON schema generation so
the prompt and parser stay compatible. If a new action needs an LLM schema
change, add the schema branch, prompt rendering, and exclusion behavior here
with tests.
The managed filesystem is part of agent state. It is serialized through
AgentCheckpoint, restored through Agent::from_checkpoint, and passed into
prompt construction so the model can see available files. Browser upload and
read/write actions should route through ManagedFileSystem instead of touching
arbitrary relative paths.
browser-use-cdp is the largest crate because it owns both browser process
management and live CDP session behavior.
| Module | Owns |
|---|---|
lib.rs |
Public browser primitives, CDP session state, action methods, root re-exports, BrowserSession trait, and compatibility tests. |
types.rs |
Shared public DTOs and serde helpers: errors, screenshots, PDFs, found elements, viewport/proxy settings, cloud proxy country codes. |
cloud.rs |
Browser Use Cloud request/response/client types, API-key discovery, auth-config lookup, cloud HTTP error rendering. |
profile.rs |
Browser profile serde defaults and aliases, Chrome launch plans, executable discovery, local process launch, DevToolsActivePort parsing. |
policy.rs |
Browser profile URL-access policy, allow/prohibit pattern matching, IP-address blocking, and navigation block reasons. |
input.rs |
Keyboard alias normalization and CDP Input.dispatchKeyEvent parameter construction. |
runtime.rs |
Runtime.evaluate parameter construction, value extraction, and exception/result rendering. |
lifecycle.rs |
Lifecycle event DTOs, upstream adapter event mapping, lifecycle subscriptions, lag/closed stream errors. |
transport.rs |
Websocket connection, CDP command actor, response routing, event broadcast, reconnect attempts, stale session generation checks, websocket header validation. |
dom.rs |
Injected DOM/action JavaScript, element highlight scripts, DOMSnapshot and accessibility joins, iframe target merging, compact DOM parsing, pagination detection, cached-index target mapping. |
recording.rs |
HAR capture, trace artifacts, screencast video/GIF writing, artifact path generation, recorder diagnostics. |
storage.rs |
Cookie/origin storage save/load, frame-origin discovery, DOMStorage conversion, storage-state counts and file writes. |
target.rs |
Page target selection, target attach/create, viewport emulation commands, browser permission grants, download-event enablement, and tab-id resolution. |
watchdog.rs |
Lifecycle watchdog, security watchdog, URL-policy actions, bounded event buffers, websocket lifecycle event mapping, network timeouts, download event mapping, auto-PDF download handling. |
tests.rs |
CDP compatibility and live-browser conformance tests that need private session/module access. |
CdpBrowserSession holds:
Arc<CdpConnection>for command/event transport;- the current attached page target and session id;
- cached DOM state and target-aware cached elements;
- security and lifecycle event buffers;
- URL access policy and pending policy errors;
- profile-derived iframe, viewport, page-load, highlight, and download config;
- optional HAR, video, and trace recorders;
- owned temporary directories for user data and downloads.
The session implements the public BrowserSession trait used by core. That
trait is the boundary between agent semantics and browser mechanics.
transport.rs has no DOM, prompt, profile, or action policy. It only knows how
to:
- connect to the CDP websocket with optional profile headers;
- send commands with monotonically increasing ids;
- route command responses back to callers;
- broadcast CDP events;
- reconnect boundedly after unexpected websocket drops;
- mark old target sessions as stale after reconnect.
Any code that needs a browser operation should call CdpConnection::command
through a higher-level session method rather than adding protocol policy to the
transport actor.
dom.rs owns the model-visible browser-state contract. It indexes elements
with injected JavaScript, joins accessibility metadata, parses page stats and
bounds, merges iframe target states, and detects pagination affordances.
This module intentionally emits compact state:
- numbered selector maps;
- useful names, roles, text, values, attributes, and bounds;
- compact accessibility names/descriptions and state/value properties;
- target ids for iframe and stale-node fallback;
- eval tree data needed by evaluator-style prompts.
It should not expose raw DOMSnapshot or Accessibility trees to normal agent
prompts. Add raw payloads only behind an explicit diagnostic or conformance
surface.
watchdog.rs owns asynchronous browser safety and observability tasks:
- lifecycle event collection and bounded publication;
- websocket closed/reconnecting/reconnected/failure events;
- target crash and JavaScript dialog handling;
- network request timeout diagnostics;
- download start/progress/completion mapping;
- auto-PDF response-body capture;
- URL-policy reset/close actions for blocked current tabs and popups.
Watchdogs are owned by CdpBrowserSession; dropping the session aborts their
tasks. Watchdog diagnostics are available through lifecycle subscriptions and
selected state fields, but they are not added to normal agent answers.
policy.rs owns the pure URL decision logic used by both the session boundary
and the security watchdog. Session code may record and surface policy failures,
but allowlist/prohibit matching and IP canonicalization should stay in
policy.rs.
runtime.rs and input.rs own protocol value shaping for JavaScript
evaluation and keyboard events. Session methods choose when to evaluate or
dispatch; these modules decide how CDP payloads and responses are represented.
types.rs owns DTOs that are shared across CDP submodules or exported publicly.
Keep serde compatibility helpers next to the DTOs they shape, then re-export the
public API from lib.rs.
target.rs owns operations that address browser/page targets before a concrete
agent action runs: choosing a usable page, attaching target sessions, creating
new tabs, applying viewport emulation, granting browser permissions, enabling
download events, and resolving full/short tab ids. Session methods decide when
those operations are needed; target helpers decide the CDP payloads and target
selection rules.
tests.rs is a private child module of lib.rs, not an integration-test crate,
so it can exercise private CDP helpers while keeping the implementation root
readable. Keep browser-use compatibility fixtures and ignored live-Chrome
conformance tests there unless a test naturally belongs beside a smaller module.
profile.rs translates user-facing browser profile options into launch plans
and cloud/local endpoints. It should not know about live target state after a
session starts.
recording.rs observes CDP events and writes optional artifacts. Recording
failures become lifecycle diagnostics; they should not change browser action
semantics unless a required CDP command fails.
storage.rs owns the browser storage-state contract. Session methods may ask it
to read, write, or apply state, but cookie/origin normalization should stay in
that module.
browser-use-dom is intentionally data-focused. It defines:
SerializedDomState;DomElementRef;DomEvalNode;DomInteractedElementand rematch metadata;BrowserStateSummary;- page stats, page info, tabs, and pagination DTOs;
- compact text rendering for prompts and history.
Core and CDP both depend on these types. DOM state should stay serializable and small enough to be used in prompt fixtures.
browser-use-tools defines the browser action enum and action parameter DTOs:
navigation, search, click, input, scroll, keyboard, tabs, upload, screenshot,
PDF, extraction, file operations, dropdowns, JavaScript evaluation, and done.
Action names and schema shape are part of the upstream compatibility contract.
Runtime semantics are split: schema lives here, prompt/exclusion handling lives
in browser-use-core::prompt, and browser side effects live behind the
BrowserSession implementation.
browser-use-llm exposes ChatModel and provider-neutral message/request/
completion types. Provider modules translate that contract into concrete APIs.
Provider-specific structured-output quirks belong here. The agent should not know whether a model needs forced tool use, schema sanitization, prompt-only fallback, wrapped-JSON recovery, or a provider-specific endpoint.
browser-use-cli is both a human entrypoint and the owner of local persistent
session process state. browser-use-mcp exposes the same operations as MCP
tools. These front doors should avoid duplicating core agent or CDP session
logic; they adapt IO, persistence, and error presentation.
browser-use-cli/src/tests.rs is a private child module of the CLI entrypoint.
It keeps CLI parser, daemon, HTTP, session-store, and MCP bridge regressions out
of main.rs while still testing private adapter helpers directly.
browser-use-conformance and the tests embedded in each crate hold parity
fixtures. When in doubt, add a fixture or regression test before changing a
contract.
The root modules re-export public types to avoid breaking downstream users as
the implementation gets more idiomatic internally. New modules may expose
pub(crate) helpers for cross-module tests or crate-internal collaboration,
but new public exports should be deliberate.
Compatibility-sensitive public exports include:
- core agent types and settings;
- action schemas;
- browser profile and lifecycle DTOs;
BrowserSession;- provider request/response DTOs;
- CLI and MCP JSON shapes.
- New prompt text or schema:
browser-use-core/src/prompt.rs. - New agent control-flow guard:
browser-use-core/src/agent.rs. - New browser action schema:
browser-use-tools/src/lib.rs. - New browser action execution:
browser-use-core/src/executor.rsand theBrowserSessiontrait implementation inbrowser-use-cdp/src/lib.rs. - New DOM state field:
browser-use-dom/src/lib.rsplus parser/rendering inbrowser-use-cdp/src/dom.rsand prompt use inbrowser-use-core/src/prompt.rs. - New CDP command transport behavior:
browser-use-cdp/src/transport.rs. - New browser safety/lifecycle behavior:
browser-use-cdp/src/watchdog.rs. - New URL policy behavior:
browser-use-cdp/src/policy.rs. - New Runtime.evaluate or keyboard payload behavior:
browser-use-cdp/src/runtime.rsorbrowser-use-cdp/src/input.rs. - New page target, viewport emulation, browser permission, or download-event
setup behavior:
browser-use-cdp/src/target.rs. - New CDP compatibility/conformance regression that needs private helper access:
browser-use-cdp/src/tests.rs. - New profile/cloud launch behavior:
browser-use-cdp/src/profile.rsorbrowser-use-cdp/src/cloud.rs. - New artifact or storage behavior:
browser-use-cdp/src/recording.rsorbrowser-use-cdp/src/storage.rs. - New provider behavior:
browser-use-llm/src/lib.rs. - New CLI/MCP surface:
browser-use-cli/src/main.rsandbrowser-use-mcp/src/lib.rs. - New CDP public DTO or serde-helper behavior:
browser-use-cdp/src/types.rs.
Use the narrowest useful command while developing, then run the full gate before shipping:
cargo fmt --all --check
cargo clippy --workspace --all-targets -- -D warnings
cargo test --workspace
cargo test -p browser-use-cdp -- --ignored
python3 scripts/release-version.py --check
python3 scripts/release-version.py --self-test
python3 scripts/upstream-drift.py --self-testIgnored CDP tests require Chrome or Chromium. They are the live conformance check for DOM indexing, target fallback, URL policy watchdogs, and browser actions.
- A literal Python class hierarchy.
- Raw CDP payloads in normal prompts or agent replies.
- Provider-specific shortcuts in core agent logic.
- Hidden unbounded watchers.
- Silent upstream drift.