You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# why
After adding regex schema validation for element ids in `act`, the
models were left with no choice to say 'no candidate found/no action to
be taken' since an empty string on element id would fail validation.
This PR corrects the regressed behavior
# what changed
`act()` no longer throws when the model can't find a target. Callers get
the structured `success: false` result, matching the behavior
`observe()` already has ([]) when nothing matches.
- `packages/core/lib/inference.ts` — wrap the act fields (elementId,
description, method, arguments, twoStep) under action: {...}.nullable().
When the model returns action: null, inference returns element:
undefined and the existing handler branch returns { success: false,
actions: [], message: "Failed to perform act: No action found" }.
- Lift `twoStep` out of the nullable action object and back to the top
level of the act schema — small models (gpt-4.1-nano) flatten booleans
out of anyOf variants during structured output, causing spurious
AI_NoObjectGeneratedError on otherwise-valid responses.
- `packages/core/lib/prompt.ts` — update buildActSystemPrompt,
buildActPrompt, and buildStepTwoPrompt to instruct the model to set
action: null when no element matches, with an explicit ban on empty
strings / placeholder values.
- `packages/core/lib/v3/handlers/actHandler.ts` — no logic change; the
existing no-action branch is now reachable.
Act will return the following on these cases:
<img width="586" height="343" alt="Screenshot 2026-04-22 at 1 19 33 PM"
src="https://github.com/user-attachments/assets/34efa7dd-7e80-4c6e-916f-de29d4e28083"
/>
# test plan
<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Stops `act` from throwing when no element matches by making the response
`action` nullable and updating prompts/parsing to treat “no action” as
valid. Fixes the regression where empty element IDs failed validation.
Aligns with Linear STG-1849.
- **Bug Fixes**
- Changed `act` schema to `{ action: { elementId, description, method,
arguments } | null, twoStep: boolean }` with the element ID regex;
`twoStep` is top‑level and defaults to `false`.
- Updated prompts to set `action: null` when no element matches and to
not fabricate elements or use empty/placeholder values.
- Adjusted parsing to handle `action === null` and return `element:
undefined`, letting the existing no‑action path return `success: false`.
- Updated integration tests and test utils to use the new `action` shape
and top‑level `twoStep`.
<sup>Written for commit 1cb51fa.
Summary will update on new commits. <a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2027">Review in
cubic</a></sup>
<!-- End of auto-generated description by cubic. -->
---------
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: packages/core/lib/inference.ts
+41-32Lines changed: 41 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -411,34 +411,41 @@ export async function act({
411
411
constisGPT5=llmClient.modelName.includes("gpt-5");// TODO: remove this as we update support for gpt-5 configuration options
412
412
413
413
constactSchema=z.object({
414
-
elementId: z
415
-
.string()
416
-
.regex(/^\d+-\d+$/)
417
-
.describe(
418
-
"the ID string associated with the element. Never include surrounding square brackets. This field must follow the format of 'number-number'. for example, '0-76' or '16-21'",
419
-
),
420
-
description: z
421
-
.string()
422
-
.describe("a description of the accessible element and its purpose"),
423
-
method: z
424
-
.enum(
425
-
// Use Object.values() for Zod v3 compatibility - z.enum() in v3 doesn't accept TypeScript enums directly
"the ID string associated with the element. Never include surrounding square brackets. This field must follow the format of 'number-number'. for example, '0-76' or '16-21'",
421
+
),
422
+
description: z
423
+
.string()
424
+
.describe("a description of the accessible element and its purpose"),
425
+
method: z
426
+
.enum(
427
+
// Use Object.values() for Zod v3 compatibility - z.enum() in v3 doesn't accept TypeScript enums directly
"the candidate method/action to interact with the element. Select one of the available Understudy interaction methods.",
435
+
),
436
+
arguments: z.array(
437
+
z
438
+
.string()
439
+
.describe(
440
+
"the arguments to pass to the method. For example, for a click, the arguments are empty, but for a fill, the arguments are the value to fill in.",
441
+
),
442
+
),
443
+
})
444
+
.nullable()
431
445
.describe(
432
-
"the candidate method/action to interact with the element. Select one of the available Understudy interaction methods.",
446
+
"The element to act on. Return null if no element on the page matches the instruction — do NOT fabricate or guess an element, and never emit empty strings or placeholder values.",
433
447
),
434
-
arguments: z.array(
435
-
z
436
-
.string()
437
-
.describe(
438
-
"the arguments to pass to the method. For example, for a click, the arguments are empty, but for a fill, the arguments are the value to fill in.",
Copy file name to clipboardExpand all lines: packages/core/lib/prompt.ts
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -169,7 +169,7 @@ You will be given:
169
169
1. a user defined instruction about what action to take
170
170
2. a hierarchical accessibility tree showing the semantic structure of the page. The tree is a hybrid of the DOM and the accessibility tree.
171
171
172
-
Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;
172
+
Return the element that matches the instruction if it exists. If no element on the page matches the instruction, set \`action\` to null. Do not fabricate or guess an element — empty strings or placeholder values for elementId/description/method are not acceptable.`;
173
173
constcontent=actSystemPrompt.replace(/\s+/g," ");
174
174
175
175
return{
@@ -206,8 +206,8 @@ export function buildActPrompt(
206
206
General Instructions:
207
207
Provide an action for this element such as ${supportedActions.join(", ")}. Remember that to users, buttons and links look the same in most cases.
208
208
When choosing non-left click actions, provide right or middle as the argument
209
-
If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
210
-
ONLY return one action. If multiple actions are relevant, return the most relevant one.
209
+
If the action is completely unrelated to a potential action to be taken on the page, or no matching element exists, set \`action\` to null. Do not fabricate or guess an element.
210
+
ONLY return one action. If multiple actions are relevant, return the most relevant one.
211
211
If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.
212
212
If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.
213
213
If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.
@@ -246,8 +246,8 @@ export function buildStepTwoPrompt(
246
246
247
247
General Instructions:
248
248
Provide an action for this element such as ${supportedActions.join(", ")}. Remember that to users, buttons and links look the same in most cases.
249
-
If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
250
-
ONLY return one action. If multiple actions are relevant, return the most relevant one.
249
+
If the action is completely unrelated to a potential action to be taken on the page, or no matching element exists, set \`action\` to null. Do not fabricate or guess an element.
250
+
ONLY return one action. If multiple actions are relevant, return the most relevant one.
251
251
If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.
252
252
If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.
253
253
If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.
0 commit comments