Skip to content

Latest commit

 

History

History
430 lines (424 loc) · 28.7 KB

File metadata and controls

430 lines (424 loc) · 28.7 KB

Release Support Matrix

This release targets:

browser-use/browser-use@157779338afdcc03023010ec3c24ad63d820453c

Supported

  • Local Chrome/Chromium launch and CDP attach, including upstream-style --proxy-server and --proxy-bypass-list launch flags from BrowserProfile.proxy, BrowserProfile.disable_security insecure-content and certificate flags, and BrowserProfile.deterministic_rendering screenshot-stability flags, plus BrowserProfile.user_agent to emit typed --user-agent launch flags and BrowserProfile.profile_directory to emit persistent-profile --profile-directory flags alongside --user-data-dir. BrowserProfile.chromium_sandbox=false emits upstream no-sandbox/container Chrome flags for CI and Docker-style launches, and explicit BrowserProfile.window_size values plus default or explicit window_position values emit typed --window-size and --window-position launch geometry flags. The default profile emits upstream's --window-position=0,0 origin position. BrowserProfile.screen can supply the launch window-size fallback, while BrowserProfile.viewport, no_viewport, and device_scale_factor control CDP Emulation.setDeviceMetricsOverride on initial attach, new-tab creation, tab switch, stale-session reattach, and fallback attach after closing the focused tab. no_viewport=true keeps launch window sizing but skips the CDP device-metrics override, and launch planning rejects the upstream-invalid headless=true plus no_viewport=true combination before spawning Chrome. BrowserProfile.highlight_elements defaults to true; indexed click/input actions and coordinate clicks inject non-fatal temporary browser-side interaction highlights using upstream-default color and duration settings. Setting highlight_elements=false disables those markers while preserving the underlying action behavior. BrowserProfile.dom_highlight_elements defaults to false; when enabled, state capture refreshes a non-fatal debug overlay for the current selector map and honors filter_highlight_ids. BrowserProfile.keep_alive=true detaches the locally launched child process from Rust session ownership so dropping a one-shot CdpBrowserSession leaves Chrome running for reuse, while explicit close_browser() still sends Browser.close over CDP. Omitted/null/false keep_alive preserves the previous owned-child drop behavior. Direct CDP and cloud sessions do not own a local child process and are unaffected. BrowserProfile.minimum_wait_page_load_time and BrowserProfile.wait_for_network_idle_page_load_time default to upstream's 0.25 and 0.5 second settle waits before browser-state capture and after successful navigation. Setting either value to 0 opts out of that wait, and negative or non-finite values are rejected during profile deserialization. BrowserProfile.devtools emits --auto-open-devtools-for-tabs for headful launches and rejects the upstream-invalid headless=true plus devtools=true combination before spawning Chrome. BrowserProfile.env carries launch-process environment overrides into the local Chrome command; serde accepts upstream-style string, number, and boolean JSON env values and coerces them to the process strings Chrome receives. Launch plans include the frozen upstream CHROME_DEFAULT_ARGS baseline, BrowserProfile.ignore_default_args list/true suppression, upstream-style merged --disable-features values, and last-wins switch de-dupe so raw caller args can override typed generated switches. BrowserProfile.permissions defaults to upstream's clipboardReadWrite and notifications permissions, sends root CDP Browser.grantPermissions before target attach/create for launched and directly connected sessions, skips empty lists, and records non-fatal lifecycle diagnostics when Chrome rejects the grant. BrowserProfile.headers round-trip through serde and are sent on CDP websocket handshakes, including reconnect attempts, for remote browser/proxy endpoints that require connection-level authentication. BrowserProfile.channel round-trips upstream browser channel strings and constrains local executable resolution to channel-specific candidates when no explicit executable_path or BROWSER_USE_CHROME override is supplied. Upstream browser_binary_path and chrome_binary_path aliases deserialize into the same canonical executable_path field.
  • BrowserProfile.record_har_content=embed, record_har_mode=full, and unset record_har_path defaults; save_har_path deserializes into canonical record_har_path. Configured direct-CDP sessions record HTTPS request/response/loading events into a HAR 1.2 file on best-effort close_browser() flush, including full vs minimal filtering and omit/embed/attach body representation.
  • Video recording runtime support for record_video_dir with upstream save_recording_path alias, optional record_video_size, and record_video_framerate=30. Configured direct-CDP sessions start PNG screencast capture, acknowledge frames, switch capture on focused-target changes, and write MP4 by default through a runtime ffmpeg encoder during best-effort close_browser() flush. record_video_format=webm|gif selects WebM or the dependency-light GIF path, and MP4/WebM encoder failures record a browser diagnostic before falling back to GIF.
  • Trace path configuration parity for traces_dir with upstream trace_path alias. At the frozen upstream target, that field is described as a Playwright trace zip directory but no browser trace watchdog is wired, so the Rust port exposes an explicit direct-CDP JSON boundary instead of emulating a Playwright trace.zip. Configured direct-CDP sessions write a best-effort close-time JSON trace artifact with schema browser-use-rs.trace.v1, kind browser-use-rs.cdp_json_trace, runtime="direct_cdp", and playwright_trace_zip=false. The artifact contains lifecycle events, security diagnostics, current target ids, and the last cached DOM state. Trace write failures record a browser diagnostic without failing normal close, and trace artifact paths/kinds/metadata stay out of normal browser state, action, and agent replies.
  • Browser Use Cloud creation and stop request/response contracts, including BROWSER_USE_API_KEY/explicit-key client support, cloud_auth.json API-token fallback, 30-second request timeout, extra request headers merged after default auth/content-type headers, current-session tracking after create, explicit or current-session stop requests, auth errors, missing-session errors, action-specific create/stop Cloud error context, current-session cleanup on successful stop or 404, conversion of cdpUrl responses into CDP endpoints, and upstream-compatible omitted/null/country proxy-country serialization.
  • Browser profile URL access policies for explicit navigation, including allowed/prohibited domain patterns, allowed-domain precedence, internal browser URL allowances, data/blob URL allowances, authentication-bypass resistance, and optional IP-address blocking that canonicalizes browser-resolvable decimal, hex, octal, short-form, percent-encoded, Unicode-normalized, and IDNA-dot IPv4 hosts before classification, plus post-navigation redirect checks, blocked-navigation preflight diagnostics, navigation-capable action-boundary checks, newly observed tab closure for disallowed URLs, and event-driven target/frame navigation watchdog enforcement while a session is active. CDP sessions expose bounded BrowserLifecycleEvent diagnostics for browser connect/close, target create/switch/close, navigation start/complete, navigation failure/timeout, target crash, URL-policy block/reset/popup outcomes, reconnect, non-fatal browser diagnostics such as permission-grant failures, JavaScript dialog, sanitized download filenames, and storage-state event shapes. BrowserProfile.accept_downloads defaults to true; accepted sessions use an explicit downloads_path or a session-owned temporary directory to enable browser download behavior and CDP download lifecycle events. Upstream downloads_dir and save_downloads_path aliases deserialize into the same canonical downloads_path field and serialize back as downloads_path. Setting accept_downloads=false skips CDP download setup and PDF auto-download writes even when downloads_path is configured. Page-controlled download filenames from CDP events are reduced to safe basenames and containment helpers reject paths outside the effective downloads directory model. BrowserProfile.auto_download_pdfs defaults to true; when downloads are accepted, direct PDF viewer URLs are downloaded once per session into the effective downloads directory with safe filenames and auto_download=true lifecycle metadata. The direct-CDP path uses Network.responseReceived metadata and Network.getResponseBody bytes where Chrome exposes them, including content-disposition filenames, before falling back to conservative direct-URL downloads. Explicit auto_download_pdfs=false skips the PDF auto-download path while preserving normal browser download events. storage_state_path loads and saves browser cookie plus attached frame-tree origin local/session storage state with storage lifecycle events. CDP websocket closure records a browser-stopped lifecycle diagnostic, and unexpected websocket drops trigger bounded actor-level reconnect attempts with reconnecting/reconnected/failure lifecycle diagnostics. Registered CDP target sessions are invalidated after reconnect so stale session-scoped commands fail locally with a clear reattach error, and the current target is reattached automatically on the next session access when Chrome still exposes it. BrowserProfile.navigation_timeout_ms bounds direct Page.navigate calls and records network-timeout lifecycle diagnostics on timeout. network_request_timeout_ms records lifecycle diagnostics for HTTP(S) requests that remain active beyond the watchdog budget. Page-load settle waits share the same CDP network-event stream without adding those per-request details to normal lifecycle output.
  • Browser state with URL, title, tabs plus browser-use-style short tab ids, screenshots, page metrics, compact DOM state, element bounds, open shadow-root indexing, same-origin iframe tag and content indexing, scrollable element metadata, Chrome OOPIF cross-origin iframe target content indexing and cached-node actions. BrowserProfile.cross_origin_iframes, max_iframes, and max_iframe_depth use upstream-compatible defaults, preserve parent iframe elements when cross-origin traversal is disabled, cap same-origin iframe-document traversal inside the injected snapshot script, and cap direct Chrome OOPIF target fanout before CDP attach. Nested OOPIF offset stitching remains bounded to attached target pages Chrome exposes to this direct-CDP path; the port does not infer deeper offsets from browser profile internals. Automation-friendly data/ARIA/value attributes, native boolean/read-only state, validation patterns, data-state, static history-matching attributes, accessibility-tree role/name/description/state/value enrichment with compact ax_name/ax_description metadata and backend/frontend node ids, AX hidden/disabled suppression, hidden-element and data-browser-use-exclude subtree filtering, upstream-default BrowserProfile.paint_order_filtering=true topmost/occlusion filtering with an explicit false opt-out, hidden file-input upload targets, plain scroll-container indexing, non-content tag pruning, prompt-visible pages-above/below context for indexed scroll containers, href-less anchor tags, accessible names from labels, ARIA references, image alt text, selected dropdown values, compound control metadata, compact select option summaries, common ARIA widget roles, search affordance signals, small icon controls, tabindex-backed controls including tabindex="-1", ARIA required/autocomplete/keyshortcut interactivity signals with prompt-visible keyshortcuts, quiet AX focusable/editable/settable metadata, AX-shaped numeric value aliases, human-readable value text, contenteditable editor variants, media control compounds, duplicate long-attribute pruning, input mask/autocomplete/date-format datepicker hints, live-region, hierarchy, and multiselect state aliases, JavaScript click/pointer listener-backed controls, cursor-pointer controls, decorative SVG child pruning, static mouse/keyboard handler attributes, contained duplicate-descendant pruning for action containers, pagination affordance detection, configurable prompt-visible attributes, the upstream empty-DOM load hint, and a CDP-populated tree-shaped eval/judge DOM representation with backend-node interactive markers, shadow-root markers, iframe-content markers, compact key attributes, scroll context, and collapsed SVG contents.
  • Built-in actions for search, navigate, back navigation, 4-character tab-id switch/close, click, coordinate click, input, page or indexed element scroll, wait, text-target scroll, browser JavaScript evaluation, screenshot, native and ARIA dropdown options/selection, keyboard text/special-key/shortcut events, file upload with upstream-style agent availability checks and managed FileSystem basename containment for traversal-like relative paths, local text-file read/write/replace with upstream-style CSV row normalization and relative filename sanitization, page-aware PDF read envelopes, PDF/DOCX write/append artifacts with paginated PDF text layout, and append-only-to-existing-file semantics, upstream-aligned binary/image extension rejection, DOCX text extraction, PNG/JPEG image-file reads with one-shot image prompt parts, PDF capture, LLM-backed agent extraction with raw direct-executor extraction envelopes preserved, page search, element lookup across Chrome OOPIF iframe targets, cached observed-node click/input/scroll/dropdown/upload resolution, target-aware stale-node fallback for cached iframe actions, and done.
  • screenshot is model-facing only in upstream-style auto vision mode. Default vision still includes screenshots in normal observations, disabled vision never requests screenshots, and auto mode requests the next screenshot after the model chooses screenshot. The action writes a local .png file with an attachment path when file_name is supplied.
  • save_as_pdf writes a local PDF file, appends .pdf when missing, derives a safe page-title filename when omitted, avoids overwriting existing files, and returns the saved file as an attachment.
  • done.files_to_display appends readable requested text files to the final result and returns their attachment paths.
  • Managed FileSystem state with a browseruse_agent_data sandbox directory, default todo.md, file listing/display, extract-content numbering, serialization/restoration, nuke, and disk sync for text, CSV, PDF, and DOCX artifacts. Executor-owned relative file actions and done.files_to_display route through that sandbox while absolute external paths bypass it. Agent prompts include upstream-style <file_system> and <todo_contents> context, and large extract results can spill into managed extracted_content_N.md files. Agent-level extraction_schema supplies the default structured schema for LLM-backed extract actions that do not provide their own output_schema. Restored agents can continue from serialized FileSystemState with prompt-visible todo/file context, restored read_file behavior, and extracted-content numbering that survives replay. AgentSettings.file_system_path and the CLI --file-system-path flag place the managed filesystem under a caller-selected base directory while preserving the browseruse_agent_data subdirectory contract.
  • The upstream MCP allowed-domains advisory introduced before this target is not applicable to the current Rust MCP surface: browser-use-rs does not expose the upstream profile-merge retry tool that introduced that Python-side allowed-domain handling path.
  • Browser-aware action sequencing that stops on errors, done, explicit terminating actions, and URL changes after browser actions.
  • Agent runs with schema-guided provider output, upstream-style initial actions, directly_open_url task URL extraction and step-zero navigation, max actions per step with upstream-style truncation, sync and async new-step/done callbacks, callback-driven stop checks, explicit programmatic stop with reasoned stop errors, max steps, max failures, step and LLM timeouts, upstream-style wait-between-actions delays, upstream-style per-action wall-clock timeout guard with BROWSER_USE_ACTION_TIMEOUT_S/action_timeout_seconds, caller-supplied task identity with checkpoint restore continuity and follow-up task reuse, validated llm_screenshot_size prompt-only PNG resizing with coordinate-click scaling back to the observed viewport, upstream-style long URL shortening for user/assistant prompt text with recursive restoration before action execution/history, upstream-style fallback LLM switching for retryable main model-output provider/rate-limit failures, upstream-style final done responses after repeated failures, upstream-style final-step done-only guard when max_steps is reached, upstream-style 75% step-budget warning before finalization, normalized repeated-action loop detection, previous result context, vision-aware screenshot capture and image prompt parts, upstream-style sample_images prompt parts before screenshots, screenshot action next-observation image overrides, action-result image prompt parts, upstream-style page-stat prompt context with loading/skeleton hints, one-time extraction replay handling, invalid model-output recovery, loop-awareness prompt nudges, upstream flattened planning fields, configurable planning prompt nudges, per-step timing metadata, upstream-style excluded-action schema controls and pre-execution enforcement, opt-in recent browser events, upstream-style upstream-compatible true/false/auto vision modes with auto-only screenshot action gating, vision detail levels, upstream-style done file-display controls, thinking/flash output-schema controls, upstream-style flattened required output fields, upstream-style prompt-history inclusion and limits, clickable-element text limits, upstream-style one-time read-state prompt blocks, upstream-style tagged agent-history/agent-state/browser-state prompt sections, upstream-style available-file-path and sensitive-data placeholder context with bu_2fa_code TOTP generation, system-message override/extension controls, upstream-style prompt context/error truncation, typed upstream-style last-result completion helpers, upstream-compatible action-result success validation, judgement results with dedicated judge LLM routing, runtime generate_gif GIF artifact output from recorded screenshots, provider token usage/cost summaries for calculate_cost, the upstream no-op include_tool_call_examples setting, and step-error, model-output, model-action, thought, duration, model-action and truncated action-history interacted-element metadata for indexed actions, explicit replay action rematching for historical indexed actions, rematched replay plan construction from saved AgentHistory, replay-plan execution through generic and browser-backed action executors with per-action, error, and page-change diagnostics, current-state AgentHistoryReplayRun orchestration, serialized replay-run and replay-recapture conformance coverage, and screenshot/URL accessors. AgentCheckpoint export/resume preserves task identity, task settings, history, initial-action execution state, pause/stop state, and managed filesystem state across a new model/session. Agents expose pause/resume control that returns an explicit paused error before model or browser work until resumed, and add_new_task appends upstream-style follow-up user requests while clearing pause/stop control state. Host runtimes can also register external-status interruption callbacks that abort before model or browser work without setting durable stopped state.
  • Schema-guided agent extraction passes the requested schema to the extraction LLM, returns upstream-style <structured_result> content, and records structured metadata with schema, partial status, and content statistics. Agents can route LLM-backed extract action resolution through a dedicated page-extraction LLM while leaving normal model-output calls on the main LLM.
  • Scripted agent replay conformance fixtures for schema-guided model output, previous-result prompt context, action execution, done, serialized history, longer multi-step planning/recovery replay with prompt-history limits and stagnant-page loop-awareness, managed FileSystemState replay through restored prompts, read_file, todo context, extracted-content numbering, and full AgentCheckpoint resume with prior history and initial-action state, browser-backed replay recapture/rematch, plus public browser lifecycle event and adapter JSON shapes, public AgentHistoryReplayRun JSON Schema, and semantic checks for dynamic step timing metadata.
  • browser-use-dom exposes interacted-element rematching diagnostics for exact hash, stable hash, XPath, AX-name, and unique-attribute history replay foundations without changing live action execution.
  • OpenAI-compatible Chat Completions plus DeepSeek, Groq, Cerebras, Mistral, OpenRouter, and Vercel AI Gateway aliases, Anthropic Messages, Gemini GenerateContent, and Ollama Chat providers with structured-output requests, including Anthropic forced tool-use, Gemini native-schema and prompt fallback, DeepSeek forced tool-call, Groq model-specific JSON-schema/tool-call routing, Cerebras prompt-only, Mistral schema sanitization, Vercel model-specific prompt fallback, OpenAI-wire output-mode override payload/parser modes, and OpenRouter app attribution headers.
  • CLI one-shot commands plus actions, replay, and agent with typed settings flags including conversation transcript saving, judge trace validation, available-file-path and sensitive-data placeholder context, OpenAI-wire structured-output mode overrides, system-message control, mcp-tools, mcp-stdio, and local persistent session commands including session replay.
  • MCP stdio tools for state, actions, AgentHistory replay, and agent runs, including typed input/output schemas for structured content, typed AgentSettings, OpenAI-wire structured-output mode overrides, in-process session reuse by session_id, and reconnection to persistent CLI session records, plus persistent record creation for new session_id calls when a URL is supplied.
  • MCP stdio persistent session lifecycle for start, stop, list, and cleanup, with liveness status and conservative stale-record cleanup on session records.
  • Local TCP newline-delimited JSON-RPC daemon and HTTP JSON-RPC daemon exposing the MCP tool surface with shared in-process sessions across active connections, GET /healthz, and optional bearer/header token auth for POST /rpc, plus graceful signal shutdown, supervisor pid/ready files, and packaged systemd/launchd templates for long-lived local installs.
  • Release tarballs include daemon supervision docs plus systemd and launchd templates alongside the binary and license files. Tagged releases publish a Linux x86_64 tarball, a macOS host-triple tarball, one SHA256SUMS manifest covering all tarballs, and a generated Homebrew formula artifact pinned to the Linux and macOS tarball checksums. When HOMEBREW_TAP_TOKEN is configured, tagged releases also publish Formula/browser-use-rs.rb to the EvalOps Homebrew tap. The Release workflow can be manually dispatched to cut patch, minor, or major Cargo workspace versions before publishing those tagged artifacts. Its push-driven auto mode wakes for release candidate paths, skips release-bookkeeping and automation-only churn, then infers minor only for substantial public Rust/source-surface work while using patch for smaller releasable fixes, docs, dependencies, and packaged install asset updates.
  • Workspace CI for format, clippy, unit tests, schema fixtures, and conformance fixtures.
  • Upstream 1577793 tightened Python's AF_UNIX skill daemon socket to owner-only mode. The Rust CLI daemon currently exposes TCP/HTTP transports rather than a Unix socket file, so that chmod fix is tracked as audited and not directly applicable to the present Rust transport boundary.

Compatibility Boundaries

  • Browser profile lifecycle support now exposes bounded public lifecycle diagnostics for core browser/target/navigation/security transitions and stable event shapes for reconnect, target-crash/network-timeout, JavaScript dialog, download, and storage-state lifecycle diagnostics. Live CDP wiring now records target crash, JavaScript dialog, navigation failure, configured download events, cookie plus attached frame-tree origin storage-state save/load events, explicit CDP websocket closure diagnostics, bounded actor-level reconnect attempts, deliberate stale-session invalidation and current-target reattach after reconnect, direct navigation timeouts, and watchdog-style stuck HTTP(S) request timeouts. subscribe_lifecycle_events exposes those diagnostics through BrowserLifecycleEventSubscription with typed lag and closed-stream errors; BrowserLifecycleAdapterEventSubscription maps the same stream into upstream-style subscriber categories without adding it to normal agent replies. Storage-state local/session storage origin discovery intentionally matches the frozen upstream CDP boundary: Page.getFrameTree supplies current page plus attached frame-tree origins, then DOMStorage.getDOMStorageItems reads those origins. Unattached profile-wide local/session storage origins are not scraped from browser profile internals.
  • Raw full AX snapshots are intentionally not emitted into normal prompt or state surfaces by default; the compact DOM carries the browser-use AX fields needed for action selection, evaluator context, hidden/disabled suppression, and conformance fixtures. The source-backed DOM/AX audit is recorded in docs/CONFORMANCE.md with implemented action-relevant parity and explicit raw-AX non-goals.
  • Agent history now captures compact interacted-element metadata for indexed actions and exposes current-page rematching plus action-level replay remapping diagnostics, replay-plan construction, generic replay-plan execution, and browser-backed replay-plan execution that honors the live URL-change guard. Browser executors can capture the current DOM, recapture state between non-terminating replay actions, rematch later indexed actions against the latest DOM, and return a replay run with the captured state, plan, and guarded execution result. The public replay-run JSON shape is pinned by replay-run and replay-recapture conformance fixtures. Replay is exposed through the one-shot CLI, persistent CLI sessions, and the MCP/daemon tool surface.
  • CLI sessions are local registry records. Session status reports registry liveness, and explicit cleanup removes stale records while refusing to remove running sessions unless forced through normal stop semantics; the daemon does not automatically restart stale browser processes.
  • The packaged daemon service files are local user-service templates. Distro packages, additional macOS architectures, and installer-managed secret stores are outside the current release surface. Homebrew tap publication is wired but requires the evalops/homebrew-tap repository plus a HOMEBREW_TAP_TOKEN repository secret before tagged releases publish there. Tagged releases now emit Linux and macOS tarballs, cross-tarball checksums, and a generated Homebrew formula artifact for the published triples.
  • Provider-specific structured-output fallbacks are source-audited against the frozen upstream target for exposed provider families: Anthropic forced tool-use, Gemini prompt fallback, DeepSeek forced tool-call, Groq model-specific tool-call routing, Cerebras prompt-only guidance, Mistral schema sanitization, Vercel model-specific prompt fallback, Ollama format schemas, and wrapped-JSON parsing are implemented and covered by unit tests.
  • Managed filesystem and agent checkpoint replay now cover serialized restore into a new agent, restored prompt context, restored read_file, todo context, extracted-content numbering, prior history, and initial-action execution state.
  • Package publishing is limited to GitHub release artifacts, the generated Homebrew formula scaffold, and optional EvalOps tap publication when the tap secret is configured.