[Claimed #1912] Feat: Add Anthropic CUA adaptive thinking#1954
[Claimed #1912] Feat: Add Anthropic CUA adaptive thinking#1954miguelg719 merged 8 commits intomainfrom
Conversation
Add comprehensive tests for the new adaptive thinking API used by Claude 4.6 models (claude-opus-4-6, claude-sonnet-4-6). Tests verify: - Adaptive thinking uses thinking.type: 'adaptive' (not 'enabled') - Effort levels are passed via output_config.effort (not budget_tokens) - All effort levels: low, medium, high, max - Older models continue using deprecated budget_tokens API - Model name detection handles provider-prefixed names These tests define the expected API contract per: https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update AnthropicCUAClient to use the correct API contract for
Claude 4.6 models (claude-opus-4-6, claude-sonnet-4-6).
Claude 4.6 models use adaptive thinking:
- thinking: { type: "adaptive" }
- output_config: { effort: "low" | "medium" | "high" | "max" }
This replaces the deprecated API:
- thinking: { type: "enabled", budget_tokens: N }
Changes:
- Add ThinkingEffort type for effort levels
- Add thinkingEffort option to ClientOptions
- Detect 4.6 models and use adaptive thinking with output_config
- Keep backward compatibility with thinkingBudget for older models
- Add deprecation notice for thinkingBudget on 4.6 models
The implementation follows the API contract documented at:
https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add test for default "medium" effort when thinkingEffort not set - Add tests verifying temperature=1 is set for adaptive thinking - Add test that older models don't force temperature=1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Always set temperature=1 when adaptive thinking is enabled (required by API) - Default to "medium" effort for Claude 4.6 models when thinkingEffort not set - This ensures adaptive thinking works out of the box for 4.6 models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This mirrored PR has been merged into |
🦋 Changeset detectedLatest commit: ebe95e7 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
1 issue found across 3 files
Confidence score: 5/5
- This looks low risk to merge: the only reported issue is a documentation/behavior mismatch with low severity (3/10), not a functional break in core logic.
- In
packages/core/lib/v3/types/public/model.ts, docs sayThinkingEffortdefaults tohigh, while runtime behavior usesmediumwhenthinkingEffortis unset, which could mislead integrators about default model behavior. - Pay close attention to
packages/core/lib/v3/types/public/model.ts- align theThinkingEffortdefault in docs with the actual adaptive-thinking fallback (medium).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/types/public/model.ts">
<violation number="1" location="packages/core/lib/v3/types/public/model.ts:101">
P3: The new `ThinkingEffort` docs claim `high` is the default, but the implementation defaults adaptive thinking to `medium` when `thinkingEffort` is unset.</violation>
</file>
Architecture diagram
sequenceDiagram
participant App as Application Logic
participant Client as AnthropicCUAClient
participant SDK as Anthropic SDK (Beta)
participant API as Anthropic API
Note over App,API: Runtime Flow for Adaptive Thinking
App->>Client: getAction(messages)
Client->>Client: Detect model version (4.6 vs older)
alt NEW: Model is Claude 4.6 (Opus or Sonnet)
Client->>Client: NEW: Set thinking.type = "adaptive"
Client->>Client: NEW: Set output_config.effort = thinkingEffort (default: "medium")
Client->>Client: NEW: Force temperature = 1
Note right of Client: Required for adaptive thinking mode
else CHANGED: Older Claude models (e.g., 4.5)
opt thinkingBudget provided
Client->>Client: CHANGED: Set thinking.type = "enabled"
Client->>Client: CHANGED: Set budget_tokens = thinkingBudget
end
end
Client->>SDK: beta.messages.create({ model, messages, thinking, ... })
Note over SDK,API: Uses computer_20251124 header
SDK->>API: POST /v1/messages
API-->>SDK: Response (with thinking blocks)
SDK-->>Client: Message Object
Client-->>App: Action Result
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| } | ||
|
|
||
| // Track user-specified temperature so we can warn if adaptive thinking overrides it | ||
| this.userTemperature = clientOptions?.temperature; |
There was a problem hiding this comment.
will this not clash with Opus 4.7 not supporting temp?
There was a problem hiding this comment.
pirate
left a comment
There was a problem hiding this comment.
needs a test for claude-opus-4-7 to make sure temperature/adapting thinking doesn't break
|
@filip-michalsky @pirate this pr doesn't add support for opus 4.7 on CUA. Regardless, a temperature for default of 1.0 is still supported but deprecated per the documentation here. Any non-default value will throw a 400, but we should scope removing passing temperature as its own PR |
sounds good |
requested changes are oos for this pr
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/stagehand@3.3.0 ### Minor Changes - [#1980](#1980) [`e471d2e`](e471d2e) Thanks [@shrey150](https://github.com/shrey150)! - Support Browserbase verified session settings and bump the Browserbase SDK. ### Patch Changes - [#1954](#1954) [`732b384`](732b384) Thanks [@github-actions](https://github.com/apps/github-actions)! - Update Anthropic CUA to use adaptive thinking - [#2001](#2001) [`20b601d`](20b601d) Thanks [@shrey150](https://github.com/shrey150)! - Include `agent.execute()` usage in `stagehand.metrics` for API-backed sessions. - [#1983](#1983) [`8543c11`](8543c11) Thanks [@github-actions](https://github.com/apps/github-actions)! - Add variable substitution to the keys tool in both live execution and cache replay paths. When keys steps with `method="type"` contain `%variableName%` tokens, they are now resolved against the provided variables. This brings the keys tool to parity with the type tool's variable handling. - [#1973](#1973) [`14b64ec`](14b64ec) Thanks [@monadoid](https://github.com/monadoid)! - Enable strict structured outputs for supported model paths. - [#2028](#2028) [`a500de1`](a500de1) Thanks [@tkattkat](https://github.com/tkattkat)! - Remove deprecated provider option - [#1975](#1975) [`8f7192c`](8f7192c) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - make file upload elements more explicit in page snapshot ## @browserbasehq/stagehand-docs@1.0.1 ### Patch Changes - [#2017](#2017) [`6b9b46d`](6b9b46d) Thanks [@monadoid](https://github.com/monadoid)! - Document the optional MCP `start` `sessionId` parameter for attaching to an existing Browserbase session. ## @browserbasehq/stagehand-evals@1.1.11 ### Patch Changes - Updated dependencies \[[`732b384`](732b384), [`20b601d`](20b601d), [`8543c11`](8543c11), [`14b64ec`](14b64ec), [`a500de1`](a500de1), [`e471d2e`](e471d2e), [`8f7192c`](8f7192c)]: - @browserbasehq/stagehand@3.3.0 ## @browserbasehq/stagehand-server-v3@3.6.3 ### Patch Changes - Updated dependencies \[[`732b384`](732b384), [`20b601d`](20b601d), [`8543c11`](8543c11), [`14b64ec`](14b64ec), [`a500de1`](a500de1), [`e471d2e`](e471d2e), [`8f7192c`](8f7192c)]: - @browserbasehq/stagehand@3.3.0 ## @browserbasehq/stagehand-server-v4@3.6.3 ### Patch Changes - Updated dependencies \[[`732b384`](732b384), [`20b601d`](20b601d), [`8543c11`](8543c11), [`14b64ec`](14b64ec), [`a500de1`](a500de1), [`e471d2e`](e471d2e), [`8f7192c`](8f7192c)]: - @browserbasehq/stagehand@3.3.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Mirrored from external contributor PR #1912 after approval by @miguelg719.
Original author: @chromiebot
Original PR: #1912
Approved source head SHA:
0ea0332c525017c727743d11a68b8eb74f76b646@chromiebot, please continue any follow-up discussion on this mirrored PR. When the external PR gets new commits, this same internal PR will be marked stale until the latest external commit is approved and refreshed here.
Original description
why
what changed
test plan
Summary by cubic
Add adaptive thinking for Anthropic Claude 4.6 models in the
CUAclient with effort controls and automatictemperature=1. Keeps legacythinkingBudgetfor older models and improves model detection and tool versioning.New Features
claude-opus-4-6*,claude-sonnet-4-6*, incl. provider-prefixed) and sendsthinking: { type: "adaptive" }withoutput_config.effort(defaults to"medium").ThinkingEffortandthinkingEffortinClientOptions("none" | "low" | "medium" | "high" | "max");"none"disables adaptive thinking.temperature: 1with adaptive thinking and logs when overriding a user value; logs whenthinkingBudgetis provided on 4.6 models.computer_20251124for 4.6 andclaude-opus-4-5-20251101; older models continue withthinking: { type: "enabled", budget_tokens }.Migration
thinkingEffort;thinkingBudgetis ignored.temperature: 1; setthinkingEffort: "none"to disable.Written for commit ebe95e7. Summary will update on new commits. Review in cubic