This release targets:
browser-use/browser-use@157779338afdcc03023010ec3c24ad63d820453c
- Local Chrome/Chromium launch and CDP attach, including upstream-style
--proxy-serverand--proxy-bypass-listlaunch flags fromBrowserProfile.proxy,BrowserProfile.disable_securityinsecure-content and certificate flags, andBrowserProfile.deterministic_renderingscreenshot-stability flags, plusBrowserProfile.user_agentto emit typed--user-agentlaunch flags andBrowserProfile.profile_directoryto emit persistent-profile--profile-directoryflags alongside--user-data-dir.BrowserProfile.chromium_sandbox=falseemits upstream no-sandbox/container Chrome flags for CI and Docker-style launches, and explicitBrowserProfile.window_sizevalues plus default or explicitwindow_positionvalues emit typed--window-sizeand--window-positionlaunch geometry flags. The default profile emits upstream's--window-position=0,0origin position.BrowserProfile.screencan supply the launch window-size fallback, whileBrowserProfile.viewport,no_viewport, anddevice_scale_factorcontrol CDPEmulation.setDeviceMetricsOverrideon initial attach, new-tab creation, tab switch, stale-session reattach, and fallback attach after closing the focused tab.no_viewport=truekeeps launch window sizing but skips the CDP device-metrics override, and launch planning rejects the upstream-invalidheadless=trueplusno_viewport=truecombination before spawning Chrome.BrowserProfile.highlight_elementsdefaults totrue; indexed click/input actions and coordinate clicks inject non-fatal temporary browser-side interaction highlights using upstream-default color and duration settings. Settinghighlight_elements=falsedisables those markers while preserving the underlying action behavior.BrowserProfile.dom_highlight_elementsdefaults tofalse; when enabled, state capture refreshes a non-fatal debug overlay for the current selector map and honorsfilter_highlight_ids.BrowserProfile.keep_alive=truedetaches the locally launched child process from Rust session ownership so dropping a one-shotCdpBrowserSessionleaves Chrome running for reuse, while explicitclose_browser()still sendsBrowser.closeover CDP. Omitted/null/falsekeep_alivepreserves the previous owned-child drop behavior. Direct CDP and cloud sessions do not own a local child process and are unaffected.BrowserProfile.minimum_wait_page_load_timeandBrowserProfile.wait_for_network_idle_page_load_timedefault to upstream's0.25and0.5second settle waits before browser-state capture and after successful navigation. Setting either value to0opts out of that wait, and negative or non-finite values are rejected during profile deserialization.BrowserProfile.devtoolsemits--auto-open-devtools-for-tabsfor headful launches and rejects the upstream-invalidheadless=trueplusdevtools=truecombination before spawning Chrome.BrowserProfile.envcarries launch-process environment overrides into the local Chrome command; serde accepts upstream-style string, number, and boolean JSON env values and coerces them to the process strings Chrome receives. Launch plans include the frozen upstreamCHROME_DEFAULT_ARGSbaseline,BrowserProfile.ignore_default_argslist/true suppression, upstream-style merged--disable-featuresvalues, and last-wins switch de-dupe so raw caller args can override typed generated switches.BrowserProfile.permissionsdefaults to upstream'sclipboardReadWriteandnotificationspermissions, sends root CDPBrowser.grantPermissionsbefore target attach/create for launched and directly connected sessions, skips empty lists, and records non-fatal lifecycle diagnostics when Chrome rejects the grant.BrowserProfile.headersround-trip through serde and are sent on CDP websocket handshakes, including reconnect attempts, for remote browser/proxy endpoints that require connection-level authentication.BrowserProfile.channelround-trips upstream browser channel strings and constrains local executable resolution to channel-specific candidates when no explicitexecutable_pathorBROWSER_USE_CHROMEoverride is supplied. Upstreambrowser_binary_pathandchrome_binary_pathaliases deserialize into the same canonicalexecutable_pathfield. BrowserProfile.record_har_content=embed,record_har_mode=full, and unsetrecord_har_pathdefaults;save_har_pathdeserializes into canonicalrecord_har_path. Configured direct-CDP sessions record HTTPS request/response/loading events into a HAR 1.2 file on best-effortclose_browser()flush, includingfullvsminimalfiltering andomit/embed/attachbody representation.- Video recording runtime support for
record_video_dirwith upstreamsave_recording_pathalias, optionalrecord_video_size, andrecord_video_framerate=30. Configured direct-CDP sessions start PNG screencast capture, acknowledge frames, switch capture on focused-target changes, and write MP4 by default through a runtimeffmpegencoder during best-effortclose_browser()flush.record_video_format=webm|gifselects WebM or the dependency-light GIF path, and MP4/WebM encoder failures record a browser diagnostic before falling back to GIF. - Trace path configuration parity for
traces_dirwith upstreamtrace_pathalias. At the frozen upstream target, that field is described as a Playwright trace zip directory but no browser trace watchdog is wired, so the Rust port exposes an explicit direct-CDP JSON boundary instead of emulating a Playwrighttrace.zip. Configured direct-CDP sessions write a best-effort close-time JSON trace artifact with schemabrowser-use-rs.trace.v1, kindbrowser-use-rs.cdp_json_trace,runtime="direct_cdp", andplaywright_trace_zip=false. The artifact contains lifecycle events, security diagnostics, current target ids, and the last cached DOM state. Trace write failures record a browser diagnostic without failing normal close, and trace artifact paths/kinds/metadata stay out of normal browser state, action, and agent replies. - Browser Use Cloud creation and stop request/response contracts, including
BROWSER_USE_API_KEY/explicit-key client support,cloud_auth.jsonAPI-token fallback, 30-second request timeout, extra request headers merged after default auth/content-type headers, current-session tracking after create, explicit or current-session stop requests, auth errors, missing-session errors, action-specific create/stop Cloud error context, current-session cleanup on successful stop or 404, conversion ofcdpUrlresponses into CDP endpoints, and upstream-compatible omitted/null/country proxy-country serialization. - Browser profile URL access policies for explicit navigation, including
allowed/prohibited domain patterns, allowed-domain precedence, internal
browser URL allowances, data/blob URL allowances, authentication-bypass
resistance, and optional IP-address blocking that canonicalizes browser-resolvable
decimal, hex, octal, short-form, percent-encoded, Unicode-normalized, and
IDNA-dot IPv4 hosts before classification, plus post-navigation redirect
checks, blocked-navigation preflight diagnostics, navigation-capable
action-boundary checks, newly observed tab closure for disallowed URLs, and
event-driven target/frame navigation watchdog enforcement while a session is
active. CDP sessions expose bounded
BrowserLifecycleEventdiagnostics for browser connect/close, target create/switch/close, navigation start/complete, navigation failure/timeout, target crash, URL-policy block/reset/popup outcomes, reconnect, non-fatal browser diagnostics such as permission-grant failures, JavaScript dialog, sanitized download filenames, and storage-state event shapes.BrowserProfile.accept_downloadsdefaults totrue; accepted sessions use an explicitdownloads_pathor a session-owned temporary directory to enable browser download behavior and CDP download lifecycle events. Upstreamdownloads_dirandsave_downloads_pathaliases deserialize into the same canonicaldownloads_pathfield and serialize back asdownloads_path. Settingaccept_downloads=falseskips CDP download setup and PDF auto-download writes even whendownloads_pathis configured. Page-controlled download filenames from CDP events are reduced to safe basenames and containment helpers reject paths outside the effective downloads directory model.BrowserProfile.auto_download_pdfsdefaults totrue; when downloads are accepted, direct PDF viewer URLs are downloaded once per session into the effective downloads directory with safe filenames andauto_download=truelifecycle metadata. The direct-CDP path usesNetwork.responseReceivedmetadata andNetwork.getResponseBodybytes where Chrome exposes them, including content-disposition filenames, before falling back to conservative direct-URL downloads. Explicitauto_download_pdfs=falseskips the PDF auto-download path while preserving normal browser download events.storage_state_pathloads and saves browser cookie plus attached frame-tree origin local/session storage state with storage lifecycle events. CDP websocket closure records a browser-stopped lifecycle diagnostic, and unexpected websocket drops trigger bounded actor-level reconnect attempts with reconnecting/reconnected/failure lifecycle diagnostics. Registered CDP target sessions are invalidated after reconnect so stale session-scoped commands fail locally with a clear reattach error, and the current target is reattached automatically on the next session access when Chrome still exposes it.BrowserProfile.navigation_timeout_msbounds directPage.navigatecalls and records network-timeout lifecycle diagnostics on timeout.network_request_timeout_msrecords lifecycle diagnostics for HTTP(S) requests that remain active beyond the watchdog budget. Page-load settle waits share the same CDP network-event stream without adding those per-request details to normal lifecycle output. - Browser state with URL, title, tabs plus browser-use-style short tab ids,
screenshots, page metrics, compact DOM state, element bounds, open
shadow-root indexing, same-origin iframe tag and content indexing, scrollable
element metadata, Chrome OOPIF cross-origin iframe target content indexing
and cached-node actions.
BrowserProfile.cross_origin_iframes,max_iframes, andmax_iframe_depthuse upstream-compatible defaults, preserve parent iframe elements when cross-origin traversal is disabled, cap same-origin iframe-document traversal inside the injected snapshot script, and cap direct Chrome OOPIF target fanout before CDP attach. Nested OOPIF offset stitching remains bounded to attached target pages Chrome exposes to this direct-CDP path; the port does not infer deeper offsets from browser profile internals. Automation-friendly data/ARIA/value attributes, native boolean/read-only state, validation patterns,data-state, static history-matching attributes, accessibility-tree role/name/description/state/value enrichment with compactax_name/ax_descriptionmetadata and backend/frontend node ids, AX hidden/disabled suppression, hidden-element anddata-browser-use-excludesubtree filtering, upstream-defaultBrowserProfile.paint_order_filtering=truetopmost/occlusion filtering with an explicit false opt-out, hidden file-input upload targets, plain scroll-container indexing, non-content tag pruning, prompt-visible pages-above/below context for indexed scroll containers, href-less anchor tags, accessible names from labels, ARIA references, image alt text, selected dropdown values, compound control metadata, compact select option summaries, common ARIA widget roles, search affordance signals, small icon controls, tabindex-backed controls includingtabindex="-1", ARIA required/autocomplete/keyshortcut interactivity signals with prompt-visiblekeyshortcuts, quiet AX focusable/editable/settable metadata, AX-shaped numeric value aliases, human-readable value text, contenteditable editor variants, media control compounds, duplicate long-attribute pruning, input mask/autocomplete/date-format datepicker hints, live-region, hierarchy, and multiselect state aliases, JavaScript click/pointer listener-backed controls, cursor-pointer controls, decorative SVG child pruning, static mouse/keyboard handler attributes, contained duplicate-descendant pruning for action containers, pagination affordance detection, configurable prompt-visible attributes, the upstream empty-DOM load hint, and a CDP-populated tree-shaped eval/judge DOM representation with backend-node interactive markers, shadow-root markers, iframe-content markers, compact key attributes, scroll context, and collapsed SVG contents. - Built-in actions for search, navigate, back navigation, 4-character tab-id
switch/close, click, coordinate click, input, page or indexed element scroll,
wait, text-target scroll, browser JavaScript evaluation, screenshot, native and
ARIA dropdown options/selection, keyboard text/special-key/shortcut events,
file upload with upstream-style agent availability checks and managed
FileSystembasename containment for traversal-like relative paths, local text-file read/write/replace with upstream-style CSV row normalization and relative filename sanitization, page-aware PDF read envelopes, PDF/DOCX write/append artifacts with paginated PDF text layout, and append-only-to-existing-file semantics, upstream-aligned binary/image extension rejection, DOCX text extraction, PNG/JPEG image-file reads with one-shot image prompt parts, PDF capture, LLM-backed agent extraction with raw direct-executor extraction envelopes preserved, page search, element lookup across Chrome OOPIF iframe targets, cached observed-node click/input/scroll/dropdown/upload resolution, target-aware stale-node fallback for cached iframe actions, and done. screenshotis model-facing only in upstream-style auto vision mode. Default vision still includes screenshots in normal observations, disabled vision never requests screenshots, and auto mode requests the next screenshot after the model choosesscreenshot. The action writes a local.pngfile with an attachment path whenfile_nameis supplied.save_as_pdfwrites a local PDF file, appends.pdfwhen missing, derives a safe page-title filename when omitted, avoids overwriting existing files, and returns the saved file as an attachment.done.files_to_displayappends readable requested text files to the final result and returns their attachment paths.- Managed
FileSystemstate with abrowseruse_agent_datasandbox directory, defaulttodo.md, file listing/display, extract-content numbering, serialization/restoration, nuke, and disk sync for text, CSV, PDF, and DOCX artifacts. Executor-owned relative file actions anddone.files_to_displayroute through that sandbox while absolute external paths bypass it. Agent prompts include upstream-style<file_system>and<todo_contents>context, and large extract results can spill into managedextracted_content_N.mdfiles. Agent-levelextraction_schemasupplies the default structured schema for LLM-backed extract actions that do not provide their ownoutput_schema. Restored agents can continue from serializedFileSystemStatewith prompt-visible todo/file context, restoredread_filebehavior, and extracted-content numbering that survives replay.AgentSettings.file_system_pathand the CLI--file-system-pathflag place the managed filesystem under a caller-selected base directory while preserving thebrowseruse_agent_datasubdirectory contract. - The upstream MCP allowed-domains advisory introduced before this target is not
applicable to the current Rust MCP surface:
browser-use-rsdoes not expose the upstream profile-merge retry tool that introduced that Python-side allowed-domain handling path. - Browser-aware action sequencing that stops on errors, done, explicit terminating actions, and URL changes after browser actions.
- Agent runs with schema-guided provider output, upstream-style initial actions,
directly_open_urltask URL extraction and step-zero navigation, max actions per step with upstream-style truncation, sync and async new-step/done callbacks, callback-driven stop checks, explicit programmatic stop with reasoned stop errors, max steps, max failures, step and LLM timeouts, upstream-style wait-between-actions delays, upstream-style per-action wall-clock timeout guard withBROWSER_USE_ACTION_TIMEOUT_S/action_timeout_seconds, caller-supplied task identity with checkpoint restore continuity and follow-up task reuse, validatedllm_screenshot_sizeprompt-only PNG resizing with coordinate-click scaling back to the observed viewport, upstream-style long URL shortening for user/assistant prompt text with recursive restoration before action execution/history, upstream-style fallback LLM switching for retryable main model-output provider/rate-limit failures, upstream-style finaldoneresponses after repeated failures, upstream-style final-step done-only guard whenmax_stepsis reached, upstream-style 75% step-budget warning before finalization, normalized repeated-action loop detection, previous result context, vision-aware screenshot capture and image prompt parts, upstream-stylesample_imagesprompt parts before screenshots, screenshot action next-observation image overrides, action-result image prompt parts, upstream-style page-stat prompt context with loading/skeleton hints, one-time extraction replay handling, invalid model-output recovery, loop-awareness prompt nudges, upstream flattened planning fields, configurable planning prompt nudges, per-step timing metadata, upstream-style excluded-action schema controls and pre-execution enforcement, opt-in recent browser events, upstream-style upstream-compatibletrue/false/autovision modes with auto-only screenshot action gating, vision detail levels, upstream-styledonefile-display controls, thinking/flash output-schema controls, upstream-style flattened required output fields, upstream-style prompt-history inclusion and limits, clickable-element text limits, upstream-style one-time read-state prompt blocks, upstream-style tagged agent-history/agent-state/browser-state prompt sections, upstream-style available-file-path and sensitive-data placeholder context withbu_2fa_codeTOTP generation, system-message override/extension controls, upstream-style prompt context/error truncation, typed upstream-style last-result completion helpers, upstream-compatible action-result success validation, judgement results with dedicated judge LLM routing, runtimegenerate_gifGIF artifact output from recorded screenshots, provider token usage/cost summaries forcalculate_cost, the upstream no-opinclude_tool_call_examplessetting, and step-error, model-output, model-action, thought, duration, model-action and truncated action-history interacted-element metadata for indexed actions, explicit replay action rematching for historical indexed actions, rematched replay plan construction from savedAgentHistory, replay-plan execution through generic and browser-backed action executors with per-action, error, and page-change diagnostics, current-stateAgentHistoryReplayRunorchestration, serialized replay-run and replay-recapture conformance coverage, and screenshot/URL accessors.AgentCheckpointexport/resume preserves task identity, task settings, history, initial-action execution state, pause/stop state, and managed filesystem state across a new model/session. Agents expose pause/resume control that returns an explicit paused error before model or browser work until resumed, andadd_new_taskappends upstream-style follow-up user requests while clearing pause/stop control state. Host runtimes can also register external-status interruption callbacks that abort before model or browser work without setting durable stopped state. - Schema-guided agent extraction passes the requested schema to the extraction
LLM, returns upstream-style
<structured_result>content, and records structured metadata with schema, partial status, and content statistics. Agents can route LLM-backed extract action resolution through a dedicated page-extraction LLM while leaving normal model-output calls on the main LLM. - Scripted agent replay conformance fixtures for schema-guided model output,
previous-result prompt context, action execution,
done, serialized history, longer multi-step planning/recovery replay with prompt-history limits and stagnant-page loop-awareness, managedFileSystemStatereplay through restored prompts,read_file, todo context, extracted-content numbering, and fullAgentCheckpointresume with prior history and initial-action state, browser-backed replay recapture/rematch, plus public browser lifecycle event and adapter JSON shapes, publicAgentHistoryReplayRunJSON Schema, and semantic checks for dynamic step timing metadata. browser-use-domexposes interacted-element rematching diagnostics for exact hash, stable hash, XPath, AX-name, and unique-attribute history replay foundations without changing live action execution.- OpenAI-compatible Chat Completions plus DeepSeek, Groq, Cerebras, Mistral, OpenRouter, and Vercel AI Gateway aliases, Anthropic Messages, Gemini GenerateContent, and Ollama Chat providers with structured-output requests, including Anthropic forced tool-use, Gemini native-schema and prompt fallback, DeepSeek forced tool-call, Groq model-specific JSON-schema/tool-call routing, Cerebras prompt-only, Mistral schema sanitization, Vercel model-specific prompt fallback, OpenAI-wire output-mode override payload/parser modes, and OpenRouter app attribution headers.
- CLI one-shot commands plus
actions,replay, andagentwith typed settings flags including conversation transcript saving, judge trace validation, available-file-path and sensitive-data placeholder context, OpenAI-wire structured-output mode overrides, system-message control,mcp-tools,mcp-stdio, and local persistentsessioncommands includingsession replay. - MCP stdio tools for state, actions,
AgentHistoryreplay, and agent runs, including typed input/output schemas for structured content, typedAgentSettings, OpenAI-wire structured-output mode overrides, in-process session reuse bysession_id, and reconnection to persistent CLI session records, plus persistent record creation for newsession_idcalls when a URL is supplied. - MCP stdio persistent session lifecycle for start, stop, list, and cleanup, with liveness status and conservative stale-record cleanup on session records.
- Local TCP newline-delimited JSON-RPC daemon and HTTP JSON-RPC daemon exposing
the MCP tool surface with shared in-process sessions across active
connections,
GET /healthz, and optional bearer/header token auth forPOST /rpc, plus graceful signal shutdown, supervisor pid/ready files, and packaged systemd/launchd templates for long-lived local installs. - Release tarballs include daemon supervision docs plus systemd and launchd
templates alongside the binary and license files. Tagged releases publish a
Linux x86_64 tarball, a macOS host-triple tarball, one
SHA256SUMSmanifest covering all tarballs, and a generated Homebrew formula artifact pinned to the Linux and macOS tarball checksums. WhenHOMEBREW_TAP_TOKENis configured, tagged releases also publishFormula/browser-use-rs.rbto the EvalOps Homebrew tap. TheReleaseworkflow can be manually dispatched to cutpatch,minor, ormajorCargo workspace versions before publishing those tagged artifacts. Its push-drivenautomode wakes for release candidate paths, skips release-bookkeeping and automation-only churn, then infersminoronly for substantial public Rust/source-surface work while usingpatchfor smaller releasable fixes, docs, dependencies, and packaged install asset updates. - Workspace CI for format, clippy, unit tests, schema fixtures, and conformance fixtures.
- Upstream
1577793tightened Python's AF_UNIX skill daemon socket to owner-only mode. The Rust CLI daemon currently exposes TCP/HTTP transports rather than a Unix socket file, so that chmod fix is tracked as audited and not directly applicable to the present Rust transport boundary.
- Browser profile lifecycle support now exposes bounded public lifecycle
diagnostics for core browser/target/navigation/security transitions and
stable event shapes for reconnect, target-crash/network-timeout, JavaScript
dialog, download, and storage-state lifecycle diagnostics. Live CDP wiring now
records target crash, JavaScript dialog, navigation failure, configured
download events, cookie plus attached frame-tree origin storage-state
save/load events, explicit CDP websocket closure diagnostics, bounded
actor-level reconnect attempts, deliberate stale-session invalidation and
current-target reattach after reconnect, direct navigation timeouts, and
watchdog-style stuck HTTP(S) request timeouts.
subscribe_lifecycle_eventsexposes those diagnostics throughBrowserLifecycleEventSubscriptionwith typed lag and closed-stream errors;BrowserLifecycleAdapterEventSubscriptionmaps the same stream into upstream-style subscriber categories without adding it to normal agent replies. Storage-state local/session storage origin discovery intentionally matches the frozen upstream CDP boundary:Page.getFrameTreesupplies current page plus attached frame-tree origins, thenDOMStorage.getDOMStorageItemsreads those origins. Unattached profile-wide local/session storage origins are not scraped from browser profile internals. - Raw full AX snapshots are intentionally not emitted into normal prompt or
state surfaces by default; the compact DOM carries the browser-use AX fields
needed for action selection, evaluator context, hidden/disabled suppression,
and conformance fixtures. The source-backed DOM/AX audit is recorded in
docs/CONFORMANCE.mdwith implemented action-relevant parity and explicit raw-AX non-goals. - Agent history now captures compact interacted-element metadata for indexed actions and exposes current-page rematching plus action-level replay remapping diagnostics, replay-plan construction, generic replay-plan execution, and browser-backed replay-plan execution that honors the live URL-change guard. Browser executors can capture the current DOM, recapture state between non-terminating replay actions, rematch later indexed actions against the latest DOM, and return a replay run with the captured state, plan, and guarded execution result. The public replay-run JSON shape is pinned by replay-run and replay-recapture conformance fixtures. Replay is exposed through the one-shot CLI, persistent CLI sessions, and the MCP/daemon tool surface.
- CLI sessions are local registry records. Session
statusreports registry liveness, and explicit cleanup removes stale records while refusing to remove running sessions unless forced through normal stop semantics; the daemon does not automatically restart stale browser processes. - The packaged daemon service files are local user-service templates. Distro
packages, additional macOS architectures, and installer-managed secret stores
are outside the current release surface. Homebrew tap publication is wired
but requires the
evalops/homebrew-taprepository plus aHOMEBREW_TAP_TOKENrepository secret before tagged releases publish there. Tagged releases now emit Linux and macOS tarballs, cross-tarball checksums, and a generated Homebrew formula artifact for the published triples. - Provider-specific structured-output fallbacks are source-audited against the
frozen upstream target for exposed provider families: Anthropic forced
tool-use, Gemini prompt fallback, DeepSeek forced tool-call, Groq
model-specific tool-call routing, Cerebras prompt-only guidance, Mistral
schema sanitization, Vercel model-specific prompt fallback, Ollama
formatschemas, and wrapped-JSON parsing are implemented and covered by unit tests. - Managed filesystem and agent checkpoint replay now cover serialized restore
into a new agent, restored prompt context, restored
read_file, todo context, extracted-content numbering, prior history, and initial-action execution state. - Package publishing is limited to GitHub release artifacts, the generated Homebrew formula scaffold, and optional EvalOps tap publication when the tap secret is configured.