Skip to content

Commit 21ba2de

Browse files
Remove error thrown on no action taken (#2027)
# why After adding regex schema validation for element ids in `act`, the models were left with no choice to say 'no candidate found/no action to be taken' since an empty string on element id would fail validation. This PR corrects the regressed behavior # what changed `act()` no longer throws when the model can't find a target. Callers get the structured `success: false` result, matching the behavior `observe()` already has ([]) when nothing matches. - `packages/core/lib/inference.ts` — wrap the act fields (elementId, description, method, arguments, twoStep) under action: {...}.nullable(). When the model returns action: null, inference returns element: undefined and the existing handler branch returns { success: false, actions: [], message: "Failed to perform act: No action found" }. - Lift `twoStep` out of the nullable action object and back to the top level of the act schema — small models (gpt-4.1-nano) flatten booleans out of anyOf variants during structured output, causing spurious AI_NoObjectGeneratedError on otherwise-valid responses. - `packages/core/lib/prompt.ts` — update buildActSystemPrompt, buildActPrompt, and buildStepTwoPrompt to instruct the model to set action: null when no element matches, with an explicit ban on empty strings / placeholder values. - `packages/core/lib/v3/handlers/actHandler.ts` — no logic change; the existing no-action branch is now reachable. Act will return the following on these cases: <img width="586" height="343" alt="Screenshot 2026-04-22 at 1 19 33 PM" src="https://github.com/user-attachments/assets/34efa7dd-7e80-4c6e-916f-de29d4e28083" /> # test plan <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Stops `act` from throwing when no element matches by making the response `action` nullable and updating prompts/parsing to treat “no action” as valid. Fixes the regression where empty element IDs failed validation. Aligns with Linear STG-1849. - **Bug Fixes** - Changed `act` schema to `{ action: { elementId, description, method, arguments } | null, twoStep: boolean }` with the element ID regex; `twoStep` is top‑level and defaults to `false`. - Updated prompts to set `action: null` when no element matches and to not fabricate elements or use empty/placeholder values. - Adjusted parsing to handle `action === null` and return `element: undefined`, letting the existing no‑action path return `success: false`. - Updated integration tests and test utils to use the new `action` shape and top‑level `twoStep`. <sup>Written for commit 1cb51fa. Summary will update on new commits. <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2027">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
1 parent 732b384 commit 21ba2de

5 files changed

Lines changed: 65 additions & 50 deletions

File tree

packages/core/lib/inference.ts

Lines changed: 41 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -411,34 +411,41 @@ export async function act({
411411
const isGPT5 = llmClient.modelName.includes("gpt-5"); // TODO: remove this as we update support for gpt-5 configuration options
412412

413413
const actSchema = z.object({
414-
elementId: z
415-
.string()
416-
.regex(/^\d+-\d+$/)
417-
.describe(
418-
"the ID string associated with the element. Never include surrounding square brackets. This field must follow the format of 'number-number'. for example, '0-76' or '16-21'",
419-
),
420-
description: z
421-
.string()
422-
.describe("a description of the accessible element and its purpose"),
423-
method: z
424-
.enum(
425-
// Use Object.values() for Zod v3 compatibility - z.enum() in v3 doesn't accept TypeScript enums directly
426-
Object.values(SupportedUnderstudyAction) as unknown as readonly [
427-
string,
428-
...string[],
429-
],
430-
)
414+
action: z
415+
.object({
416+
elementId: z
417+
.string()
418+
.regex(/^\d+-\d+$/)
419+
.describe(
420+
"the ID string associated with the element. Never include surrounding square brackets. This field must follow the format of 'number-number'. for example, '0-76' or '16-21'",
421+
),
422+
description: z
423+
.string()
424+
.describe("a description of the accessible element and its purpose"),
425+
method: z
426+
.enum(
427+
// Use Object.values() for Zod v3 compatibility - z.enum() in v3 doesn't accept TypeScript enums directly
428+
Object.values(SupportedUnderstudyAction) as unknown as readonly [
429+
string,
430+
...string[],
431+
],
432+
)
433+
.describe(
434+
"the candidate method/action to interact with the element. Select one of the available Understudy interaction methods.",
435+
),
436+
arguments: z.array(
437+
z
438+
.string()
439+
.describe(
440+
"the arguments to pass to the method. For example, for a click, the arguments are empty, but for a fill, the arguments are the value to fill in.",
441+
),
442+
),
443+
})
444+
.nullable()
431445
.describe(
432-
"the candidate method/action to interact with the element. Select one of the available Understudy interaction methods.",
446+
"The element to act on. Return null if no element on the page matches the instruction — do NOT fabricate or guess an element, and never emit empty strings or placeholder values.",
433447
),
434-
arguments: z.array(
435-
z
436-
.string()
437-
.describe(
438-
"the arguments to pass to the method. For example, for a click, the arguments are empty, but for a fill, the arguments are the value to fill in.",
439-
),
440-
),
441-
twoStep: z.boolean(),
448+
twoStep: z.boolean().default(false),
442449
});
443450

444451
type ActResponse = z.infer<typeof actSchema>;
@@ -512,12 +519,14 @@ export async function act({
512519
});
513520
}
514521

515-
const parsedElement = {
516-
elementId: actData.elementId,
517-
description: String(actData.description),
518-
method: String(actData.method),
519-
arguments: actData.arguments,
520-
};
522+
const parsedElement = actData.action
523+
? {
524+
elementId: actData.action.elementId,
525+
description: String(actData.action.description),
526+
method: String(actData.action.method),
527+
arguments: actData.action.arguments,
528+
}
529+
: undefined;
521530

522531
return {
523532
element: parsedElement,

packages/core/lib/prompt.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ You will be given:
169169
1. a user defined instruction about what action to take
170170
2. a hierarchical accessibility tree showing the semantic structure of the page. The tree is a hybrid of the DOM and the accessibility tree.
171171
172-
Return the element that matches the instruction if it exists. Otherwise, return an empty object.`;
172+
Return the element that matches the instruction if it exists. If no element on the page matches the instruction, set \`action\` to null. Do not fabricate or guess an element — empty strings or placeholder values for elementId/description/method are not acceptable.`;
173173
const content = actSystemPrompt.replace(/\s+/g, " ");
174174

175175
return {
@@ -206,8 +206,8 @@ export function buildActPrompt(
206206
General Instructions:
207207
Provide an action for this element such as ${supportedActions.join(", ")}. Remember that to users, buttons and links look the same in most cases.
208208
When choosing non-left click actions, provide right or middle as the argument
209-
If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
210-
ONLY return one action. If multiple actions are relevant, return the most relevant one.
209+
If the action is completely unrelated to a potential action to be taken on the page, or no matching element exists, set \`action\` to null. Do not fabricate or guess an element.
210+
ONLY return one action. If multiple actions are relevant, return the most relevant one.
211211
If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.
212212
If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.
213213
If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.
@@ -246,8 +246,8 @@ export function buildStepTwoPrompt(
246246
247247
General Instructions:
248248
Provide an action for this element such as ${supportedActions.join(", ")}. Remember that to users, buttons and links look the same in most cases.
249-
If the action is completely unrelated to a potential action to be taken on the page, return an empty object.
250-
ONLY return one action. If multiple actions are relevant, return the most relevant one.
249+
If the action is completely unrelated to a potential action to be taken on the page, or no matching element exists, set \`action\` to null. Do not fabricate or guess an element.
250+
ONLY return one action. If multiple actions are relevant, return the most relevant one.
251251
If the user is asking to scroll to a position on the page, e.g., 'halfway' or 0.75, etc, you must return the argument formatted as the correct percentage, e.g., '50%' or '75%', etc.
252252
If the user is asking to scroll to the next chunk/previous chunk, choose the nextChunk/prevChunk method. No arguments are required here.
253253
If the action implies a key press, e.g., 'press enter', 'press a', 'press space', etc., always choose the press method with the appropriate key as argument — e.g. 'a', 'Enter', 'Space'. Do not choose a click action on an on-screen keyboard. Capitalize the first character like 'Enter', 'Tab', 'Escape' only for special keys.

packages/core/tests/integration/flowLogger.spec.ts

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -213,10 +213,12 @@ test.describe("flow logger integration", () => {
213213
const llmClient = createScriptedAisdkTestLlmClient({
214214
jsonResponses: {
215215
act: (options) => ({
216-
elementId: findLastEncodedId(options),
217-
description: `click ${buttonText}`,
218-
method: "click",
219-
arguments: [],
216+
action: {
217+
elementId: findLastEncodedId(options),
218+
description: `click ${buttonText}`,
219+
method: "click",
220+
arguments: [],
221+
},
220222
twoStep: false,
221223
}),
222224
},
@@ -435,10 +437,12 @@ test.describe("flow logger integration", () => {
435437
const llmClient = createScriptedAisdkTestLlmClient({
436438
jsonResponses: {
437439
act: (options) => ({
438-
elementId: findLastEncodedId(options),
439-
description: `click ${buttonText}`,
440-
method: "click",
441-
arguments: [],
440+
action: {
441+
elementId: findLastEncodedId(options),
442+
description: `click ${buttonText}`,
443+
method: "click",
444+
arguments: [],
445+
},
442446
twoStep: false,
443447
}),
444448
},

packages/core/tests/integration/testUtils.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ function resolveJsonResponseKey(
133133
};
134134
const properties = schema?.properties ?? {};
135135

136-
if ("elementId" in properties && "twoStep" in properties) {
136+
if ("action" in properties && "twoStep" in properties) {
137137
return "act";
138138
}
139139

packages/core/tests/integration/timeouts.spec.ts

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,10 +116,12 @@ function createToolTimeoutTestLlmClient(
116116
if (responseModelName === "act") {
117117
return {
118118
data: {
119-
elementId: "1-0",
120-
description: "click body",
121-
method: "click",
122-
arguments: [],
119+
action: {
120+
elementId: "1-0",
121+
description: "click body",
122+
method: "click",
123+
arguments: [],
124+
},
123125
twoStep: false,
124126
},
125127
usage,

0 commit comments

Comments
 (0)