Skip to content

Support gpt 5.4 cua upd#1892

Closed
alexcarv318 wants to merge 4 commits intobrowserbase:mainfrom
alexcarv318:support-gpt-5.4-cua-upd
Closed

Support gpt 5.4 cua upd#1892
alexcarv318 wants to merge 4 commits intobrowserbase:mainfrom
alexcarv318:support-gpt-5.4-cua-upd

Conversation

@alexcarv318
Copy link
Copy Markdown

@alexcarv318 alexcarv318 commented Mar 26, 2026

why

The original GPT-5.4 native CUA support work from #1792 could not be merged cleanly due to conflicts with current main.
This PR carries that work forward on top of the latest code so the feature can be reviewed and merged.

All implementation credit goes to @Kylejeong2. I only resolved merge conflicts and migrated the changes onto the current branch.

what changed

  • Cherry-picked the GPT-5.4 native CUA support commits from feat: add support for gpt 5.4 native computer use #1792 onto current main.
  • Resolved merge conflicts in OpenAICUAClient while preserving both:
    • existing current-main behavior, and
    • new GPT-5.4 batched computer_call.actions handling.
  • Kept model/provider mappings and public CUA model exports aligned with GPT-5.4 support.
  • Included the related example and test updates from the original PR.

test plan

  • corepack pnpm install
  • corepack pnpm run test:core -- packages/core/dist/esm/tests/unit/public-api/llm-and-agents.test.js
  • Manual smoke run of example command:
    • corepack pnpm --filter @browserbasehq/stagehand run example -- gpt54-cua-example
    • (execution reached model call; requires valid OPENAI_API_KEY for full runtime success)

Summary by cubic

Adds native Computer Use support for OpenAI gpt-5.4 via the new computer tool with batched actions. Keeps the legacy computer_use_preview flow for backward compatibility.

  • New Features
    • Map gpt-5.4 to the OpenAI provider and add to AVAILABLE_CUA_MODELS.
    • Use computer for gpt-5.x; support single action and batched actions, execute all actions per batch, and reply with a computer_screenshot (with detail).
    • Preserve preview flow: single action, input_image outputs, and current_url on outputs.
    • Add gpt5-4-cua-example.ts plus type and test updates.

Written for commit 0c60317. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 26, 2026

⚠️ No Changeset found

Latest commit: 0c60317

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Mar 26, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Confidence score: 2/5

  • High-risk due to a concrete sanitization violation: packages/core/examples/gpt54-cua-example.ts prints raw caught errors, which can leak sensitive upstream details.
  • Hardcoded model allowlist changes in packages/core/lib/v3/types/public/agent.ts and packages/core/lib/v3/agent/AgentProvider.ts contradict the requirement to accept any provider/model name and could block valid models.
  • Severity is medium-to-high (6–8/10) with clear policy impact, so merge confidence is low despite the limited surface area.
  • Pay close attention to packages/core/examples/gpt54-cua-example.ts, packages/core/lib/v3/types/public/agent.ts, packages/core/lib/v3/agent/AgentProvider.ts - error sanitization and hardcoded model allowlist changes.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/types/public/agent.ts">

<violation number="1" location="packages/core/lib/v3/types/public/agent.ts:452">
P1: Custom agent: **Ensure we never check against hardcoded lists of allowed LLM model names**

Avoid expanding the hardcoded AVAILABLE_CUA_MODELS allowlist. The rule requires accepting any provider/model name instead of enumerating allowed models, otherwise CUA model support stays gated behind manual updates each time OpenAI adds a model.</violation>
</file>

<file name="packages/core/lib/v3/agent/AgentProvider.ts">

<violation number="1" location="packages/core/lib/v3/agent/AgentProvider.ts:17">
P2: Custom agent: **Ensure we never check against hardcoded lists of allowed LLM model names**

Adding a new hardcoded model name extends the allowlist, which violates the rule against hardcoded LLM model lists. The provider should be derived from the model name (e.g., provider/model format) or required via client options instead of enumerating models.</violation>
</file>

<file name="packages/core/examples/gpt54-cua-example.ts">

<violation number="1" location="packages/core/examples/gpt54-cua-example.ts:41">
P1: Custom agent: **Exception and error message sanitization**

Do not print raw caught errors to the console; this violates the exception/error sanitization requirement and can leak sensitive details from upstream failures.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User
    participant Agent as Stagehand Agent
    participant Client as OpenAICUAClient
    participant OpenAI as OpenAI API (gpt-5.4)
    participant Browser as Browser/Page

    User->>Agent: execute(instruction)
    Agent->>Client: Process instruction

    loop Step Loop (maxSteps)
        Client->>OpenAI: Request next action (with history + screenshot)
        OpenAI-->>Client: Return Tool Call (computer_call)

        alt NEW: Model is gpt-5.4
            Note over Client,Browser: Handle batched actions
            loop For each action in "actions" array
                Client->>Browser: NEW: Execute browser action (click, type, etc.)
            end
            Client->>Browser: CHANGED: Capture screenshot
            Browser-->>Client: Screenshot data
            Client->>Client: NEW: Wrap in "computer_screenshot" type
            Note right of Client: Includes "detail" field (original/high/low)
        else Legacy (computer-use-preview)
            Client->>Browser: Execute single "action"
            Client->>Browser: Capture screenshot
            Browser-->>Client: Screenshot data
            Client->>Client: Wrap in "input_image" type
        end

        Client->>OpenAI: Send Tool Output (actions results + screenshot)
        
        opt Final Response
            OpenAI-->>Client: Return final text response
            Client-->>Agent: Return execution result
            Agent-->>User: Return result.message
        end
    end
Loading

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/lib/v3/types/public/agent.ts
Comment thread packages/core/examples/gpt5-4-cua-example.ts
Comment thread packages/core/lib/v3/agent/AgentProvider.ts
@miguelg719 miguelg719 force-pushed the support-gpt-5.4-cua-upd branch from 0829a8f to 0c60317 Compare April 21, 2026 22:49
@github-actions github-actions Bot added external-contributor:mirrored An internal mirrored PR currently exists for this external contributor PR. and removed external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR was approved by @miguelg719 and mirrored to #2022. All further discussion should happen on that PR.

@github-actions github-actions Bot closed this Apr 21, 2026
miguelg719 pushed a commit that referenced this pull request Apr 22, 2026
@github-actions github-actions Bot added external-contributor:completed The mirrored PR has been merged and the external contributor flow is complete. and removed external-contributor:mirrored An internal mirrored PR currently exists for this external contributor PR. labels Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The mirrored PR #2022 has been merged into main. This original external contributor PR will stay closed as completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:completed The mirrored PR has been merged and the external contributor flow is complete. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants