You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`page.ai(task)`| You already have a page and want to mix Playwright calls (`page.goto`, `page.clickElement`) with AI steps on the same tab. Resolves when done. |
91
-
|`agent.executeTask(task)`| "Here's a goal, figure it out." The agent owns the page; include URLs in the prompt and it navigates itself. Resolves when done. |
92
-
|`agent.executeTaskAsync(task)`| Same as `executeTask` but returns a `Task` control handle immediately — `task.pause()`, `task.resume()`, `task.cancel()`, and per-step event callbacks. For long-running flows, CLIs, or anything a user can interrupt. |
93
-
94
-
## Record & replay
95
-
96
-
-`agent.savePlan(task, result, path)` writes a JSON plan with the action sequence and a stable `xpath` + `cssPath` for each clicked / typed element.
97
-
-`agent.replay(path, { page })` re-runs those actions with no LLM calls, no screenshots, no DOM map.
98
-
-`aiFallback: true` re-plans **only** a drifted step with the LLM; the rest stays free.
99
-
-`startingUrl` (option, or `--url` on the CLI) retargets a plan at a different URL — useful for staging / preview deploys / different queries.
100
-
- Plans are human-readable and hand-editable (tweak an `inputText` value, reorder or delete steps).
101
-
102
-
> The `output` string the model produced while recording is frozen in the plan — replay does **not** regenerate it. If the value of the run is live content or fresh reasoning, keep that in a follow-up `.extract()` / `.ai()`, not inside the recorded plan.
103
-
104
84
## CLI
105
85
106
86
Everything above is available without writing code:
107
87
108
88
```bash
109
89
# Record while running
110
90
browser-agent-cli run --save-plan ./hn.plan.json \
111
-
-c "Go to news.ycombinator.com, open the Show section, then click 'More' to go to the next page"
91
+
-c "Go to Hacker News show section, go to next page and find top 3 articles"
112
92
113
-
# Replay (zero LLM calls)
93
+
# Replay: deterministic navigation (no LLM), then one fresh AI pass on the
94
+
# result page to produce an up-to-date final response. The navigation part
LLM auto-detected from `GOOGLE_API_KEY` / `GEMINI_API_KEY` → `OPENAI_API_KEY` → `ANTHROPIC_API_KEY`. Override the model with `--llm-model` or `GEMINI_MODEL` / `OPENAI_MODEL` / `ANTHROPIC_MODEL`. `replay` only needs an LLM with `--ai-fallback`. Interactive: `ctrl+p` pause, `ctrl+r` resume.
@@ -166,6 +154,26 @@ const agent = new BrowserAgent({
166
154
```
167
155
</details>
168
156
157
+
## `page.ai` vs `agent.executeTask` vs `agent.executeTaskAsync`
158
+
159
+
All three drive the browser with AI, return the same `TaskOutput`, and can be recorded + replayed.
|`page.ai(task)`| You already have a page and want to mix Playwright calls (`page.goto`, `page.clickElement`) with AI steps on the same tab. Resolves when done. |
164
+
|`agent.executeTask(task)`| "Here's a goal, figure it out." The agent owns the page; include URLs in the prompt and it navigates itself. Resolves when done. |
165
+
|`agent.executeTaskAsync(task)`| Same as `executeTask` but returns a `Task` control handle immediately — `task.pause()`, `task.resume()`, `task.cancel()`, and per-step event callbacks. For long-running flows, CLIs, or anything a user can interrupt. |
166
+
167
+
## Record & replay
168
+
169
+
-`agent.savePlan(task, result, path)` writes a JSON plan with the action sequence and a stable `xpath` + `cssPath` for each clicked / typed element.
170
+
-`agent.replay(path, { page })` re-runs those actions with no LLM calls, no screenshots, no DOM map.
171
+
-`aiFallback: true` re-plans **only** a drifted step with the LLM; the rest stays free.
172
+
-`startingUrl` (option, or `--url` on the CLI) retargets a plan at a different URL — useful for staging / preview deploys / different queries.
173
+
- Plans are human-readable and hand-editable (tweak an `inputText` value, reorder or delete steps).
174
+
175
+
> The `output` string the model produced while recording is frozen in the plan — the programmatic `agent.replay()` does **not** regenerate it. The CLI's `replay` command, by default, runs one fresh AI pass (`page.ai(plan.task, { maxSteps: 3 })`) on the result page after navigation so every CLI run ends with an up-to-date response; pass `--no-ai-finish` to get pure token-free replay and fall back to the recorded output. If you're wiring this up programmatically, run your own `.extract()` / `.ai()` on the page after `agent.replay()` instead of relying on the recorded `output`.
176
+
169
177
## License
170
178
171
179
MIT. Forked from [HyperAgent](https://github.com/hyperbrowserai/HyperAgent) (b49afe). Serverless browser support by [@sparticuz/chromium](https://github.com/Sparticuz/chromium).
0 commit comments