Include finish task in cli

azhong-git · azhong-git · commit 28baaba9a028 · 2026-04-24T01:51:04.000-07:00
diff --git a/README.md b/README.md
@@ -52,7 +52,7 @@ const page = await agent.newPage();
 
 // 1. AI navigation — recordable, replayable.
 const nav = await page.ai(
-  "Go to news.ycombinator.com, open the Show section, then click 'More' to go to the next page"
+  "Go to Hacker News show section, go to next page"
 );
 await agent.savePlan("hn show page 2", nav, "./hn.plan.json");
 
@@ -81,43 +81,31 @@ await agent.replay("./hn.plan.json", { page });   // zero tokens
 const { articles } = await page.extract(/* ... */); // tokens only here
 ```
 
-## `page.ai` vs `agent.executeTask` vs `agent.executeTaskAsync`
-
-All three drive the browser with AI, return the same `TaskOutput`, and can be recorded + replayed.
-
-| API                            | Use when                                                                                                                                                       |
-| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `page.ai(task)`                | You already have a page and want to mix Playwright calls (`page.goto`, `page.clickElement`) with AI steps on the same tab. Resolves when done.                 |
-| `agent.executeTask(task)`      | "Here's a goal, figure it out." The agent owns the page; include URLs in the prompt and it navigates itself. Resolves when done.                               |
-| `agent.executeTaskAsync(task)` | Same as `executeTask` but returns a `Task` control handle immediately — `task.pause()`, `task.resume()`, `task.cancel()`, and per-step event callbacks. For long-running flows, CLIs, or anything a user can interrupt. |
-
-## Record & replay
-
-- `agent.savePlan(task, result, path)` writes a JSON plan with the action sequence and a stable `xpath` + `cssPath` for each clicked / typed element.
-- `agent.replay(path, { page })` re-runs those actions with no LLM calls, no screenshots, no DOM map.
-- `aiFallback: true` re-plans **only** a drifted step with the LLM; the rest stays free.
-- `startingUrl` (option, or `--url` on the CLI) retargets a plan at a different URL — useful for staging / preview deploys / different queries.
-- Plans are human-readable and hand-editable (tweak an `inputText` value, reorder or delete steps).
-
-> The `output` string the model produced while recording is frozen in the plan — replay does **not** regenerate it. If the value of the run is live content or fresh reasoning, keep that in a follow-up `.extract()` / `.ai()`, not inside the recorded plan.
-
 ## CLI
 
 Everything above is available without writing code:
 
 ```bash
 # Record while running
 browser-agent-cli run --save-plan ./hn.plan.json \
-  -c "Go to news.ycombinator.com, open the Show section, then click 'More' to go to the next page"
+  -c "Go to Hacker News show section, go to next page and find top 3 articles"
 
-# Replay (zero LLM calls)
+# Replay: deterministic navigation (no LLM), then one fresh AI pass on the
+# result page to produce an up-to-date final response. The navigation part
+# is free; only the final pass spends tokens.
 browser-agent-cli replay ./hn.plan.json
 
-# Self-heal drifted steps
-browser-agent-cli replay ./hn.plan.json --ai-fallback
+# Pure replay — skip the final AI pass and just get the browser onto the
+# result page (zero LLM calls end-to-end).
+browser-agent-cli replay ./hn.plan.json --no-ai-finish
+
+# Use a different finishing task (e.g. ask for a custom summary of the
+# current page instead of re-running the recorded task).
+browser-agent-cli replay ./hn.plan.json \
+  --finish-task "Return the titles of the first 3 posts as a bullet list"
 
-# Retarget at a different URL (e.g. start from the Ask section instead)
-browser-agent-cli replay ./hn.plan.json --url https://news.ycombinator.com/ask
+# Self-heal drifted steps during replay (independent of the finish pass).
+browser-agent-cli replay ./hn.plan.json --ai-fallback
 ```
 
 LLM auto-detected from `GOOGLE_API_KEY` / `GEMINI_API_KEY` → `OPENAI_API_KEY` → `ANTHROPIC_API_KEY`. Override the model with `--llm-model` or `GEMINI_MODEL` / `OPENAI_MODEL` / `ANTHROPIC_MODEL`. `replay` only needs an LLM with `--ai-fallback`. Interactive: `ctrl+p` pause, `ctrl+r` resume.
@@ -166,6 +154,26 @@ const agent = new BrowserAgent({
 ```
 </details>
 
+## `page.ai` vs `agent.executeTask` vs `agent.executeTaskAsync`
+
+All three drive the browser with AI, return the same `TaskOutput`, and can be recorded + replayed.
+
+| API                            | Use when                                                                                                                                                       |
+| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `page.ai(task)`                | You already have a page and want to mix Playwright calls (`page.goto`, `page.clickElement`) with AI steps on the same tab. Resolves when done.                 |
+| `agent.executeTask(task)`      | "Here's a goal, figure it out." The agent owns the page; include URLs in the prompt and it navigates itself. Resolves when done.                               |
+| `agent.executeTaskAsync(task)` | Same as `executeTask` but returns a `Task` control handle immediately — `task.pause()`, `task.resume()`, `task.cancel()`, and per-step event callbacks. For long-running flows, CLIs, or anything a user can interrupt. |
+
+## Record & replay
+
+- `agent.savePlan(task, result, path)` writes a JSON plan with the action sequence and a stable `xpath` + `cssPath` for each clicked / typed element.
+- `agent.replay(path, { page })` re-runs those actions with no LLM calls, no screenshots, no DOM map.
+- `aiFallback: true` re-plans **only** a drifted step with the LLM; the rest stays free.
+- `startingUrl` (option, or `--url` on the CLI) retargets a plan at a different URL — useful for staging / preview deploys / different queries.
+- Plans are human-readable and hand-editable (tweak an `inputText` value, reorder or delete steps).
+
+> The `output` string the model produced while recording is frozen in the plan — the programmatic `agent.replay()` does **not** regenerate it. The CLI's `replay` command, by default, runs one fresh AI pass (`page.ai(plan.task, { maxSteps: 3 })`) on the result page after navigation so every CLI run ends with an up-to-date response; pass `--no-ai-finish` to get pure token-free replay and fall back to the recorded output. If you're wiring this up programmatically, run your own `.extract()` / `.ai()` on the page after `agent.replay()` instead of relying on the recorded `output`.
+
 ## License
 
 MIT. Forked from [HyperAgent](https://github.com/hyperbrowserai/HyperAgent) (b49afe). Serverless browser support by [@sparticuz/chromium](https://github.com/Sparticuz/chromium).
diff --git a/src/cli/index.ts b/src/cli/index.ts
@@ -393,15 +393,25 @@ program
 
 program
   .command("replay")
-  .description("Replay a saved plan without calling the LLM")
+  .description(
+    "Replay a saved plan deterministically, then (by default) run a single AI pass on the result page to produce a fresh final response",
+  )
   .argument(
     "<file>",
     "Path to a plan JSON file previously saved with --save-plan",
   )
   .option("-d, --debug", "Enable debug mode")
   .option(
     "--ai-fallback",
-    "Fall back to .ai() for individual steps that fail (requires an LLM to be configured)",
+    "Fall back to .ai() for individual steps that fail during replay (requires an LLM to be configured)",
+  )
+  .option(
+    "--no-ai-finish",
+    "Skip the final AI pass after replay (by default, one fresh .ai() call runs on the result page to regenerate the final response)",
+  )
+  .option(
+    "--finish-task <task>",
+    "Override the task used for the final AI pass (defaults to the plan's recorded task)",
   )
   .option(
     "-u, --url <url>",
@@ -411,13 +421,21 @@ program
     const options = this.opts();
     const debug = (options.debug as boolean) || false;
     const aiFallback = (options.aiFallback as boolean) || false;
+    // commander's `--no-ai-finish` sets options.aiFinish === false; default is true.
+    const aiFinishRequested = options.aiFinish !== false;
+    const finishTaskOverride = (options.finishTask as string) || undefined;
     const startingUrl = (options.url as string) || undefined;
 
     console.log(chalk.blue("BrowserAgent Replay"));
     const spinner = ora();
 
     try {
-      const llm = aiFallback ? await createDefaultLlm() : undefined;
+      // An LLM is needed for either --ai-fallback (mid-replay re-planning)
+      // or the post-replay finishing pass. --ai-fallback is strict (errors
+      // if no LLM); the finishing pass is best-effort (silently skipped if
+      // no LLM is configured) so `replay` still works without env vars.
+      const needsLlm = aiFallback || aiFinishRequested;
+      const llm = needsLlm ? await createDefaultLlm() : undefined;
       if (aiFallback && !llm) {
         console.error(
           chalk.red(
@@ -426,13 +444,22 @@ program
         );
         process.exit(1);
       }
+      const willRunAiFinish = aiFinishRequested && !!llm;
 
       const agent = new BrowserAgent({
         llm,
         debug,
         browserProvider: "Local",
       });
 
+      // Read the plan up-front so we can (a) use the recorded task as the
+      // default finishing prompt and (b) surface the recorded final
+      // response when no fresh pass is run. The recorded `output` is
+      // frozen at record time — it is NOT re-generated by plain replay.
+      const planJson = JSON.parse(
+        (await fs.promises.readFile(file)).toString(),
+      ) as { output?: string; task?: string };
+
       const page = await agent.newPage();
       spinner.start(`Replaying plan from ${file}`);
 
@@ -458,6 +485,78 @@ program
 
       spinner.succeed(chalk.green("Replay complete."));
 
+      // Post-replay: run a fresh AI pass so the CLI produces a real,
+      // up-to-date answer on every run. Bounded to `maxSteps: 3` so the
+      // model can look at the current page (and make a tiny correction if
+      // needed) but cannot re-do the whole navigation.
+      if (willRunAiFinish) {
+        const finishTask =
+          finishTaskOverride ??
+          planJson.task ??
+          "Based on the current page, produce the final answer to the original task.";
+        spinner.start(chalk.blue("Running final AI pass on result page..."));
+        try {
+          const result = await page.ai(finishTask, { maxSteps: 3 });
+          spinner.stop();
+          console.log(
+            boxen(result.output || "No Response", {
+              title: chalk.yellow("BrowserAgent Response"),
+              titleAlignment: "center",
+              float: "center",
+              padding: 1,
+              margin: { top: 2, left: 0, right: 0, bottom: 0 },
+            }),
+          );
+        } catch (err) {
+          spinner.fail(
+            chalk.red(
+              `Final AI pass failed: ${err instanceof Error ? err.message : String(err)}`,
+            ),
+          );
+          // Fall back to showing the recorded output so the user still
+          // sees something useful.
+          if (planJson.output) {
+            console.log(
+              boxen(planJson.output, {
+                title: chalk.dim(
+                  "Recorded Response (frozen at record time — fresh pass failed)",
+                ),
+                titleAlignment: "center",
+                float: "center",
+                padding: 1,
+                margin: { top: 1, left: 0, right: 0, bottom: 0 },
+                borderStyle: "single",
+                dimBorder: true,
+              }),
+            );
+          }
+        }
+      } else if (planJson.output) {
+        // No fresh pass requested (either --no-ai-finish or no LLM
+        // configured). Print the recorded output so the CLI run has *some*
+        // visible output, clearly labeled as archival.
+        if (aiFinishRequested && !llm) {
+          console.log(
+            chalk.dim(
+              "(No LLM configured — skipping final AI pass. Set an API key to get a fresh response on the current page.)",
+            ),
+          );
+        }
+        console.log(
+          boxen(planJson.output, {
+            title: chalk.dim(
+              "Recorded Response (frozen at record time — not re-generated)",
+            ),
+            titleAlignment: "center",
+            float: "center",
+            padding: 1,
+            margin: { top: 1, left: 0, right: 0, bottom: 0 },
+            borderStyle: "single",
+            dimBorder: true,
+          }),
+        );
+      }
+
       const shouldExit = await inquirer.confirm({
         message: "Close browser and exit?",
         default: true,