Skip to content

Commit 7a2d18f

Browse files
authored
🤖 feat: add advisor max output tokens experiment setting (#3167)
## Summary Add an **Advisor Max Output Tokens** experiment setting that lets users cap the advisor tool's response length via `maxOutputTokens` in the `generateText()` call. Users can choose **Unlimited** (no cap, today's behavior) or **Limited** with a positive integer value. ## Background The advisor tool currently has no output length control — it generates responses of arbitrary length. For cost management and response-time tuning, users need a way to limit how many tokens the advisor model produces per invocation. ## Implementation Mirrors the existing `advisorMaxUsesPerTurn` pattern end-to-end: - **Config contract:** `advisorMaxOutputTokens` added to `ProjectsConfig`, on-disk schema, and oRPC API schemas with `number | null | undefined` semantics (null = explicit Unlimited, undefined = unset/legacy). - **Runtime plumbing:** `aiService.ts` reads the config, normalizes to `number | undefined`, and passes through `advisorRuntime.maxOutputTokens`. The advisor tool conditionally spreads `maxOutputTokens` into `generateText()`. - **Settings UI:** New "Max Output Tokens" row in `AdvisorToolExperimentConfig` with Unlimited/Limited select + numeric input, matching the Max Uses / Turn control. - **Router fix:** Two bugs caught during dogfooding — `saveConfig` handler wasn't persisting the field, and `getConfig` handler wasn't returning it. Both fixed by adding the field to the oRPC router. ## Validation - Typecheck, lint, and all 5 advisor tool tests pass (2 new: limited + unlimited propagation) - UI dogfooding in sandbox: Limited 1000 → reload → restored; Unlimited → reload → restored - Config file inspection confirmed `"advisorMaxOutputTokens": 1000` (limited) and `null` (unlimited) - Advisor tool successfully invoked with 128-token cap (65 tokens produced) ## Risks Low — the feature is additive and gated behind the existing advisor experiment toggle. Existing configs remain valid with no migration. `undefined`/`null` both map to "do not pass `maxOutputTokens`", preserving today's behavior. --- <details> <summary>📋 Implementation Plan</summary> # Advisor tool max output tokens — implementation plan ## Goal Add an **Advisor Max Output Tokens** experiment setting that lets users choose either: - **Unlimited** (preserve today’s behavior: advisor responses are not explicitly capped), or - **Limited** with a **free-text numeric value** such as `1000` or `2000` Then propagate that setting through config persistence and runtime plumbing so the advisor tool passes it to the `generateText(...)` call in `src/node/services/tools/advisor.ts`. ## Recommended approach ### Approach A — mirror the existing advisor “Max Uses / Turn” pattern end-to-end - **Recommendation:** Yes — this should be the default implementation path. - **Why:** The repo already has the same UX and persistence pattern for advisor max uses, plus an existing `generateText({ maxOutputTokens })` precedent elsewhere. - **Net product-code LoC estimate:** **~80–120 LoC** across UI, schema/runtime plumbing, and advisor tool wiring. - **Testing LoC (separate from estimate above):** **~25–45 LoC**. - **Estimated lift:** **small feature / low-risk**, usually **0.5–1 engineer day** including tests and dogfooding. ### Optional fallback slice — backend plumbing first, UI second - **Recommendation:** Only use this if scheduling forces a split rollout. - **What ships first:** persist + plumb `advisorMaxOutputTokens` into the advisor tool, but do not expose it in Settings yet. - **Net product-code LoC estimate:** **~20–35 LoC** for backend-only plumbing. - **Tradeoff:** lower immediate lift, but incomplete for the user story because the setting is not discoverable/editable in the app. ## Evidence already gathered Confirmed touchpoints from repo investigation: - `src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsx` - already owns advisor model, reasoning level, and **Max Uses / Turn** UI/state/save flow. - contains the closest copyable pattern for **Unlimited / Limited + numeric input**. - `src/common/config/schemas/appConfigOnDisk.ts` - already persists `advisorMaxUsesPerTurn` as nullable optional config. - `src/common/orpc/schemas/api.ts` - already carries advisor config through config get/save schemas. - `src/common/types/project.ts` - already defines advisor-related config shape. - `src/node/config.ts` - already parses and serializes advisor config. - `src/node/services/aiService.ts` - already reads advisor config and assembles `advisorRuntime` for the tool layer. - `src/common/utils/tools/tools.ts` - defines the `advisorRuntime` contract used by the advisor tool. - `src/node/services/tools/advisor.ts` - contains the advisor tool’s `generateText(...)` call site. - `src/node/services/system1/system1AgentRunner.ts` - already demonstrates that `generateText(...)` accepts `maxOutputTokens` in this codebase. - Likely adjacent tests: - `src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx` - `src/node/services/tools/advisor.test.ts` ## Recommended product/technical semantics Use a new persisted field named: - `advisorMaxOutputTokens` Recommended meaning: - `undefined` = **unset / older config / default path** - `null` = **user explicitly selected Unlimited** - positive integer = **explicit max output token cap** ### Runtime interpretation To preserve today’s behavior, treat **both `undefined` and `null` as “do not pass `maxOutputTokens`”** when invoking the advisor model. That gives us: - no surprise regression for existing users/configs - a clean explicit “Unlimited” state in the UI/config - future flexibility if we ever decide to add a default cap later ### Validation recommendation - Accept **positive integers only**. - Do **not** add a hard-coded upper bound in v1 unless an existing shared constraint already exists. - If a provider rejects an excessively high value, let provider/runtime validation surface that naturally rather than guessing a provider-agnostic max in the UI. ## Suggested agent breakdown ### Lane 1 — Config contract + persistence agent **Owner:** backend/config-focused exec agent **Depends on:** none Deliverables: - Add `advisorMaxOutputTokens` to shared config types/schemas. - Preserve backward compatibility for existing configs. - Ensure config get/save round-trip works. Files to update: - `src/common/types/project.ts` - `src/common/config/schemas/appConfigOnDisk.ts` - `src/common/orpc/schemas/api.ts` - `src/node/config.ts` Implementation notes: - Mirror the existing nullable optional advisor setting conventions. - Reuse existing positive-integer parsing/normalization helpers where possible. - Apply defensive programming: assert parsed values are either `null`, `undefined`, or a positive integer before storing/emitting. Quality gate: - Run targeted config/schema tests if they already exist. - Confirm no type drift in the config API surface. ### Lane 2 — Advisor runtime plumbing agent **Owner:** backend/runtime-focused exec agent **Depends on:** Lane 1 field definition merged or rebased Deliverables: - Read the new config field from runtime config. - Extend the advisor runtime contract. - Pass `maxOutputTokens` into the advisor tool’s `generateText(...)` call only when limited. Files to update: - `src/node/services/aiService.ts` - `src/common/utils/tools/tools.ts` - `src/node/services/tools/advisor.ts` Implementation notes: - Prefer normalizing `null`/`undefined` before the tool call so `advisor.ts` only sees either: - `undefined` (unlimited / omit field), or - a validated positive integer. - Add lightweight assertions near the normalization boundary, e.g. impossible states should fail fast in development/tests. - Keep the change minimal: this should be a parameter pass-through, not a refactor. Quality gate: - Add/extend a unit test proving: - limited config passes `maxOutputTokens: <n>` into `generateText(...)` - unlimited/unset config omits that parameter ### Lane 3 — Experiments settings UI agent **Owner:** frontend/settings-focused exec agent **Depends on:** Lane 1 schema name locked; can work in parallel once naming/semantics are stable Deliverables: - Add the new setting row to the advisor experiment config UI. - Keep the interaction model aligned with **Max Uses / Turn**. - Persist `null` for Unlimited and a positive integer for Limited. Primary file: - `src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsx` Likely test file: - `src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx` Implementation notes: - Follow the existing pattern already in the file: - mode state: `"unlimited" | "limited"` - draft string state for the input - last-known-valid integer - shared positive-integer parsing helper - save-on-change / debounced save flow already used for advisor settings - Recommended UI copy: - Label: `Max Output Tokens` - Helper text: `Per advisor response.` - On initial load: - if config is `null` or `undefined`, show **Unlimited** - if config is a number, show **Limited** and that value - On blur / invalid draft handling: - snap back to the last valid integer rather than persisting garbage - Avoid introducing a separate custom component unless the file becomes noticeably harder to read; minimal duplication is acceptable here because an existing pattern already exists. Quality gate: - Add/extend UI coverage for: - initial Unlimited render from unset/null config - switching to Limited and entering `1000` - switching back to Unlimited - invalid input recovery on blur - correct `saveConfig` payload shape ### Lane 4 — Integration + regression test agent **Owner:** test-focused exec agent **Depends on:** Lanes 1–3 available locally Deliverables: - Harden unit/integration coverage so the feature is safe to iterate on later. - Verify no regressions to the existing advisor settings save flow. Suggested test matrix: - **Node / advisor tool unit test** - `advisorRuntime.maxOutputTokens = 1000` -> `generateText` called with `maxOutputTokens: 1000` - `advisorRuntime.maxOutputTokens = undefined` -> `maxOutputTokens` omitted - **Config persistence / round-trip** - serialize + deserialize `null` - serialize + deserialize `2000` - old config with missing field remains valid - **Settings UI test** - render + persist Unlimited - render + persist Limited value - preserve other advisor settings when only this field changes Quality gate: - Run the narrowest targeted tests first, then one broader suite covering touched layers. ## Execution sequence ### Phase 0 — Lock semantics and file ownership 1. Confirm the feature name is `advisorMaxOutputTokens` everywhere. 2. Lock the runtime rule: **omit `maxOutputTokens` when Unlimited/unset**. 3. Decide that v1 uses **positive integer only** with **no hard-coded provider-agnostic max**. **Exit criteria:** all lanes use the same naming and runtime semantics. ### Phase 1 — Land shared config contract 1. Extend shared types and schemas. 2. Add node config parse/serialize support. 3. Ensure config APIs can carry the field. **Dogfood / quality gate after Phase 1:** - Run schema/config-focused tests. - Manually inspect a saved config payload in tests/logs to confirm `null` vs number semantics are preserved. ### Phase 2 — Land advisor runtime plumbing 1. Extend `advisorRuntime` shape. 2. Normalize the new field in `aiService`. 3. Pass it into `generateText(...)` only when limited. 4. Add a focused unit test around the advisor tool call. **Dogfood / quality gate after Phase 2:** - Capture a terminal screenshot of the focused advisor unit test passing. - If easy in a local sandbox, temporarily set a very small cap and verify the advisor response becomes visibly short; otherwise rely on the unit test here and perform full UI dogfood in Phase 4. ### Phase 3 — Land the experiments UI 1. Add the new row to `AdvisorToolExperimentConfig`. 2. Wire initial-load state from persisted config. 3. Wire save payload updates. 4. Add UI tests. **Dogfood / quality gate after Phase 3:** - Start the app in a sandbox and capture: - one annotated screenshot showing the new control on the Experiments page - one repro video for switching Unlimited -> Limited (`1000`) -> save/reload - one repro video for switching Limited -> Unlimited -> save/reload ### Phase 4 — Final regression + end-to-end verification 1. Re-run targeted tests for config, UI, and advisor tool behavior. 2. Run broader repo validation. 3. Perform one end-to-end advisor exercise with the setting enabled. 4. Package dogfood artifacts for reviewer handoff. **Exit criteria:** feature works in UI, persists across reload, and reaches the `generateText(...)` call. ## Concrete implementation checklist ### 1) Shared config and IPC - Add `advisorMaxOutputTokens?: number | null` to the shared project/config type. - Add nullable-positive-int schema support in on-disk config schema. - Add the field to config get/save API schemas using the repo’s `.nullish()` convention for tool/input-like API payloads where applicable. - Ensure `getConfig()` exposes the field and `saveConfig()` accepts it. ### 2) Node config handling - Parse the field from disk using the same family of helpers used for advisor numeric config today. - Serialize `null` and numbers faithfully. - Leave missing/legacy configs untouched. ### 3) Runtime plumbing - Read `cfg.advisorMaxOutputTokens` where advisor runtime values are assembled in `src/node/services/aiService.ts`. - Normalize to either `undefined` or a positive integer. - Extend `ToolConfiguration["advisorRuntime"]` so the tool layer can receive the value. ### 4) Advisor tool execution - Thread `maxOutputTokens` into `src/node/services/tools/advisor.ts`. - Keep the generate call shape minimal; do not add provider-specific branching. - Assert impossible values early in development/test builds. ### 5) Settings UI - Mirror the existing “Max Uses / Turn” control structure. - Use the same free-text numeric input pattern rather than a hard-coded preset list. - Ensure the setting does not clobber model/thinking/max-uses fields when saved. - Keep helper copy short and parallel with the existing advisor controls. ### 6) Tests - UI test for render/edit/save/reload semantics. - Backend unit test for `generateText(...)` arg propagation. - Config round-trip coverage for number/null/missing. ## Dogfooding plan This feature touches the settings UI and advisor execution path, so dogfooding must include both **UI evidence** and **one runtime behavior check**. ### Sandbox setup Use the repo’s sandboxed dev server flow so testing does not collide with other local mux instances. Recommended command: ```bash BACKEND_PORT=3001 VITE_PORT=5174 make dev-server-sandbox ``` Notes derived from the skill: - this gives the run its own temporary `MUX_ROOT` - default behavior copies provider/config state if present, which is helpful for advisor testing - pinning ports makes browser automation deterministic ### Browser automation setup Use `agent-browser` directly (not `npx`). Prefer named sessions and `batch` for simple multi-step sequences. Suggested session name: - `advisor-max-output-tokens` Suggested evidence directory: - `./dogfood-output/advisor-max-output-tokens/` ### UI dogfood flow (must capture screenshots + video) 1. Open the sandboxed app with `agent-browser`. 2. Navigate to **Settings -> Experiments -> Advisor Tool**. 3. Take an **annotated screenshot** of the page showing the new control. 4. Start a **repro video** before interacting. 5. Switch to **Limited**, enter `1000`, blur/save, and reload the settings view. 6. Take screenshots for each major step plus the final persisted state. 7. Stop the video. 8. Repeat with **Unlimited** and verify reload persistence. 9. Check browser console/errors after both flows. Reviewer artifacts required: - annotated screenshot of the control in context - video: Unlimited -> Limited (`1000`) -> persisted after reload - video: Limited -> Unlimited -> persisted after reload - step screenshots referenced by the videos ### Runtime dogfood flow (must capture screenshots + video) 1. Set `Max Output Tokens` to a visibly small value such as `64` or `128`. 2. Trigger the advisor tool with a prompt that would normally produce a long answer. 3. Record the full interaction from before submit through the advisor response. 4. Capture screenshots of: - the setting value in Experiments - the prompt before submission - the resulting short advisor answer 5. Repeat with **Unlimited** and verify the response is materially longer / no cap behavior resumes. If live-provider dogfooding is blocked by credentials or cost concerns: - still capture the UI dogfood artifacts above, and - capture a terminal screenshot of the focused backend test proving `generateText(...)` receives `maxOutputTokens` when limited and omits it when Unlimited. ### agent-browser usage notes to follow - Use `snapshot -i` to discover refs. - Re-snapshot after navigation or DOM changes. - Prefer `wait <selector>`, `wait --text`, or a short fixed wait over `wait --load networkidle`. - Use `record start/stop` for interactive repros. - For human-readable repro videos, pause briefly between actions and capture step screenshots. ## Validation plan ### Targeted validation during development - run focused advisor tool unit tests - run focused settings UI tests - run any existing config round-trip tests that cover advisor settings ### Final validation before handoff Run the repo-standard broader checks appropriate for touched code: ```bash make typecheck make test make lint ``` If time and environment allow, finish with: ```bash make static-check ``` ## Risks and mitigations ### Risk: silently changing existing behavior - **Mitigation:** treat missing/unset config as Unlimited so existing users keep current behavior. ### Risk: provider-specific max-token quirks - **Mitigation:** keep v1 validation to “positive integer only” and rely on provider/runtime errors rather than inventing an arbitrary shared max. ### Risk: UI state drift / saving invalid drafts - **Mitigation:** copy the existing advisor numeric-input pattern, including last-valid-value recovery on blur. ### Risk: accidentally clobbering other advisor settings on save - **Mitigation:** extend the existing advisor save payload rather than introducing a separate save path. ## Acceptance criteria ### Functional - Experiments settings show a new **Max Output Tokens** control for the advisor tool. - Users can choose **Unlimited** or **Limited**. - Limited mode accepts a positive integer typed as free text. - The selected value persists across reloads. - Unlimited mode persists explicitly and restores correctly. - The advisor tool passes `maxOutputTokens` into `generateText(...)` only when Limited. ### Non-functional - Existing configs remain valid with no migration required. - Existing advisor settings behavior is unchanged except for the new option. - No new console/runtime errors in dogfood flows. - Focused tests cover UI persistence and backend propagation. ## Handoff notes for the implementation team - Keep the implementation intentionally boring: this should look like a sibling of the existing advisor max-uses feature, not a new subsystem. - Favor small stacked commits/PR slices in the order above so regressions are easy to isolate. - Preserve defensive programming practices: validate/normalize once, assert impossible states, and avoid hiding bad assumptions. - When handing back for review, include the dogfood screenshot/video artifacts alongside the test results so reviewers can verify the flow without re-running it. </details> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$26.93`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=26.93 -->
1 parent 5104d54 commit 7a2d18f

11 files changed

Lines changed: 259 additions & 15 deletions

File tree

src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsx

Lines changed: 134 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,24 @@ import { getErrorMessage } from "@/common/utils/errors";
2525
import { enforceThinkingPolicy, getThinkingPolicyForModel } from "@/common/utils/thinking/policy";
2626

2727
const DEFAULT_LIMITED_MAX_USES = ADVISOR_DEFAULT_MAX_USES_PER_TURN;
28+
const DEFAULT_LIMITED_MAX_OUTPUT_TOKENS = 16000;
2829

2930
assert(
3031
Number.isInteger(DEFAULT_LIMITED_MAX_USES) && DEFAULT_LIMITED_MAX_USES > 0,
3132
"Advisor limited mode must seed a positive default"
3233
);
34+
assert(
35+
Number.isInteger(DEFAULT_LIMITED_MAX_OUTPUT_TOKENS) && DEFAULT_LIMITED_MAX_OUTPUT_TOKENS > 0,
36+
"Advisor limited output tokens mode must seed a positive default"
37+
);
3338

3439
type AdvisorMode = "limited" | "unlimited";
3540

3641
interface AdvisorSettingsState {
3742
advisorModelString: string | null;
3843
advisorThinkingLevel: ThinkingLevel;
3944
advisorMaxUsesPerTurn: number | null;
45+
advisorMaxOutputTokens: number | null;
4046
}
4147

4248
interface AdvisorSavePayload extends AdvisorSettingsState {
@@ -60,6 +66,14 @@ function normalizeAdvisorMaxUsesPerTurn(value: number | null | undefined): numbe
6066
return value;
6167
}
6268

69+
function normalizeAdvisorMaxOutputTokens(value: number | null | undefined): number | null {
70+
if (!Number.isInteger(value) || value == null || value <= 0) {
71+
return null;
72+
}
73+
74+
return value;
75+
}
76+
6377
function parsePositiveInteger(value: string): number | null {
6478
const trimmedValue = value.trim();
6579
if (!/^\d+$/.test(trimmedValue)) {
@@ -80,39 +94,50 @@ function normalizeAdvisorDraft(params: {
8094
maxUsesMode: AdvisorMode;
8195
limitedDraft: string;
8296
lastValidLimitedValue: number;
97+
maxOutputTokensMode: AdvisorMode;
98+
outputTokensDraft: string;
99+
lastValidOutputTokensValue: number;
83100
}): AdvisorSettingsState {
84101
const normalizedModelString = normalizeAdvisorModelString(params.advisorModelString);
85102
const normalizedThinkingLevel = enforceThinkingPolicy(
86103
normalizedModelString ?? "",
87104
params.advisorThinkingLevel
88105
);
89106

90-
if (params.maxUsesMode === "unlimited") {
91-
return {
92-
advisorModelString: normalizedModelString,
93-
advisorThinkingLevel: normalizedThinkingLevel,
94-
advisorMaxUsesPerTurn: null,
95-
};
107+
let normalizedMaxUsesPerTurn: number | null = null;
108+
if (params.maxUsesMode === "limited") {
109+
assert(
110+
Number.isInteger(params.lastValidLimitedValue) && params.lastValidLimitedValue > 0,
111+
"Advisor limited mode must keep a positive saved value"
112+
);
113+
normalizedMaxUsesPerTurn =
114+
parsePositiveInteger(params.limitedDraft) ?? params.lastValidLimitedValue;
96115
}
97116

98-
assert(
99-
Number.isInteger(params.lastValidLimitedValue) && params.lastValidLimitedValue > 0,
100-
"Advisor limited mode must keep a positive saved value"
101-
);
117+
let normalizedMaxOutputTokens: number | null = null;
118+
if (params.maxOutputTokensMode === "limited") {
119+
assert(
120+
Number.isInteger(params.lastValidOutputTokensValue) && params.lastValidOutputTokensValue > 0,
121+
"Advisor limited max output tokens mode must keep a positive saved value"
122+
);
123+
normalizedMaxOutputTokens =
124+
parsePositiveInteger(params.outputTokensDraft) ?? params.lastValidOutputTokensValue;
125+
}
102126

103127
return {
104128
advisorModelString: normalizedModelString,
105129
advisorThinkingLevel: normalizedThinkingLevel,
106-
advisorMaxUsesPerTurn:
107-
parsePositiveInteger(params.limitedDraft) ?? params.lastValidLimitedValue,
130+
advisorMaxUsesPerTurn: normalizedMaxUsesPerTurn,
131+
advisorMaxOutputTokens: normalizedMaxOutputTokens,
108132
};
109133
}
110134

111135
function areAdvisorSettingsEqual(a: AdvisorSettingsState, b: AdvisorSettingsState): boolean {
112136
return (
113137
a.advisorModelString === b.advisorModelString &&
114138
a.advisorThinkingLevel === b.advisorThinkingLevel &&
115-
a.advisorMaxUsesPerTurn === b.advisorMaxUsesPerTurn
139+
a.advisorMaxUsesPerTurn === b.advisorMaxUsesPerTurn &&
140+
a.advisorMaxOutputTokens === b.advisorMaxOutputTokens
116141
);
117142
}
118143

@@ -126,6 +151,13 @@ export function AdvisorToolExperimentConfig() {
126151
const [maxUsesMode, setMaxUsesMode] = useState<AdvisorMode>("limited");
127152
const [limitedDraft, setLimitedDraft] = useState(String(DEFAULT_LIMITED_MAX_USES));
128153
const [lastValidLimitedValue, setLastValidLimitedValue] = useState(DEFAULT_LIMITED_MAX_USES);
154+
const [maxOutputTokensMode, setMaxOutputTokensMode] = useState<AdvisorMode>("unlimited");
155+
const [outputTokensDraft, setOutputTokensDraft] = useState(
156+
String(DEFAULT_LIMITED_MAX_OUTPUT_TOKENS)
157+
);
158+
const [lastValidOutputTokensValue, setLastValidOutputTokensValue] = useState(
159+
DEFAULT_LIMITED_MAX_OUTPUT_TOKENS
160+
);
129161

130162
const [loaded, setLoaded] = useState(false);
131163
const [loadFailed, setLoadFailed] = useState(false);
@@ -167,23 +199,36 @@ export function AdvisorToolExperimentConfig() {
167199
const normalizedThinkingLevel =
168200
coerceThinkingLevel(cfg.advisorThinkingLevel) ?? THINKING_LEVEL_OFF;
169201
const normalizedMaxUsesPerTurn = normalizeAdvisorMaxUsesPerTurn(cfg.advisorMaxUsesPerTurn);
202+
const normalizedMaxOutputTokens = normalizeAdvisorMaxOutputTokens(
203+
cfg.advisorMaxOutputTokens
204+
);
170205
// Match the backend starter cap when the setting is unset; only an explicit null is
171206
// the user's Unlimited opt-in.
172207
const nextMaxUsesMode: AdvisorMode =
173208
cfg.advisorMaxUsesPerTurn === null ? "unlimited" : "limited";
174209
const nextLimitedValue = normalizedMaxUsesPerTurn ?? DEFAULT_LIMITED_MAX_USES;
175210
const nextMaxUsesPerTurn = nextMaxUsesMode === "unlimited" ? null : nextLimitedValue;
211+
const nextMaxOutputTokensMode: AdvisorMode =
212+
cfg.advisorMaxOutputTokens == null ? "unlimited" : "limited";
213+
const nextOutputTokensValue =
214+
normalizedMaxOutputTokens ?? DEFAULT_LIMITED_MAX_OUTPUT_TOKENS;
215+
const nextMaxOutputTokens =
216+
nextMaxOutputTokensMode === "unlimited" ? null : nextOutputTokensValue;
176217

177218
taskSettingsRef.current = normalizedTaskSettings;
178219
setAdvisorModelString(normalizedModelString ?? "");
179220
setAdvisorThinkingLevel(normalizedThinkingLevel);
180221
setMaxUsesMode(nextMaxUsesMode);
181222
setLimitedDraft(String(nextLimitedValue));
182223
setLastValidLimitedValue(nextLimitedValue);
224+
setMaxOutputTokensMode(nextMaxOutputTokensMode);
225+
setOutputTokensDraft(String(nextOutputTokensValue));
226+
setLastValidOutputTokensValue(nextOutputTokensValue);
183227
lastSyncedRef.current = {
184228
advisorModelString: normalizedModelString,
185229
advisorThinkingLevel: normalizedThinkingLevel,
186230
advisorMaxUsesPerTurn: nextMaxUsesPerTurn,
231+
advisorMaxOutputTokens: nextMaxOutputTokens,
187232
};
188233
setLoadFailed(false);
189234
setLoaded(true);
@@ -226,6 +271,9 @@ export function AdvisorToolExperimentConfig() {
226271
maxUsesMode,
227272
limitedDraft,
228273
lastValidLimitedValue,
274+
maxOutputTokensMode,
275+
outputTokensDraft,
276+
lastValidOutputTokensValue,
229277
});
230278
const lastSynced = lastSyncedRef.current;
231279

@@ -264,12 +312,14 @@ export function AdvisorToolExperimentConfig() {
264312
advisorModelString: payload.advisorModelString,
265313
advisorThinkingLevel: payload.advisorThinkingLevel,
266314
advisorMaxUsesPerTurn: payload.advisorMaxUsesPerTurn,
315+
advisorMaxOutputTokens: payload.advisorMaxOutputTokens,
267316
})
268317
.then(() => {
269318
lastSyncedRef.current = {
270319
advisorModelString: payload.advisorModelString,
271320
advisorThinkingLevel: payload.advisorThinkingLevel,
272321
advisorMaxUsesPerTurn: payload.advisorMaxUsesPerTurn,
322+
advisorMaxOutputTokens: payload.advisorMaxOutputTokens,
273323
};
274324
if (isMountedRef.current) {
275325
setSaveError(null);
@@ -304,6 +354,9 @@ export function AdvisorToolExperimentConfig() {
304354
loaded,
305355
maxUsesMode,
306356
lastValidLimitedValue,
357+
maxOutputTokensMode,
358+
outputTokensDraft,
359+
lastValidOutputTokensValue,
307360
]);
308361

309362
useEffect(() => {
@@ -342,6 +395,7 @@ export function AdvisorToolExperimentConfig() {
342395
advisorModelString: payload.advisorModelString,
343396
advisorThinkingLevel: payload.advisorThinkingLevel,
344397
advisorMaxUsesPerTurn: payload.advisorMaxUsesPerTurn,
398+
advisorMaxOutputTokens: payload.advisorMaxOutputTokens,
345399
})
346400
.catch(() => undefined)
347401
.finally(() => {
@@ -388,6 +442,41 @@ export function AdvisorToolExperimentConfig() {
388442
setLastValidLimitedValue(normalizedValue);
389443
};
390444

445+
const setAdvisorMaxOutputTokensMode = (value: string) => {
446+
if (value !== "unlimited" && value !== "limited") {
447+
return;
448+
}
449+
if (value === "limited") {
450+
const nextValue = parsePositiveInteger(outputTokensDraft) ?? lastValidOutputTokensValue;
451+
assert(
452+
Number.isInteger(nextValue) && nextValue > 0,
453+
"Advisor limited max output tokens needs a positive restored value"
454+
);
455+
setLastValidOutputTokensValue(nextValue);
456+
setOutputTokensDraft(String(nextValue));
457+
}
458+
setMaxOutputTokensMode(value);
459+
};
460+
461+
const handleOutputTokensDraftChange = (rawValue: string) => {
462+
setOutputTokensDraft(rawValue);
463+
const parsedValue = parsePositiveInteger(rawValue);
464+
if (parsedValue == null) {
465+
return;
466+
}
467+
setLastValidOutputTokensValue(parsedValue);
468+
};
469+
470+
const handleOutputTokensDraftBlur = () => {
471+
const normalizedValue = parsePositiveInteger(outputTokensDraft) ?? lastValidOutputTokensValue;
472+
assert(
473+
Number.isInteger(normalizedValue) && normalizedValue > 0,
474+
"Advisor output tokens draft must resolve to a positive integer"
475+
);
476+
setOutputTokensDraft(String(normalizedValue));
477+
setLastValidOutputTokensValue(normalizedValue);
478+
};
479+
391480
const effectiveAdvisorModelStringForThinking =
392481
normalizeAdvisorModelString(advisorModelString) ?? "";
393482
const allowedThinkingLevels = getThinkingPolicyForModel(effectiveAdvisorModelStringForThinking);
@@ -507,6 +596,38 @@ export function AdvisorToolExperimentConfig() {
507596
</div>
508597
</div>
509598

599+
<div className="flex items-center justify-between gap-4">
600+
<div className="flex-1">
601+
<div className="text-foreground text-sm">Max Output Tokens</div>
602+
<div className="text-muted text-xs">Per advisor response.</div>
603+
</div>
604+
<div className="flex items-center justify-end gap-2">
605+
<Select value={maxOutputTokensMode} onValueChange={setAdvisorMaxOutputTokensMode}>
606+
<SelectTrigger className="border-border-medium bg-modal-bg h-9 w-32">
607+
<SelectValue />
608+
</SelectTrigger>
609+
<SelectContent>
610+
<SelectItem value="unlimited">Unlimited</SelectItem>
611+
<SelectItem value="limited">Limited</SelectItem>
612+
</SelectContent>
613+
</Select>
614+
{maxOutputTokensMode === "limited" ? (
615+
<Input
616+
aria-label="Advisor max output tokens"
617+
type="number"
618+
min={1}
619+
step={1}
620+
value={outputTokensDraft}
621+
onChange={(event: React.ChangeEvent<HTMLInputElement>) =>
622+
handleOutputTokensDraftChange(event.target.value)
623+
}
624+
onBlur={handleOutputTokensDraftBlur}
625+
className="border-border-medium bg-modal-bg h-9 w-28"
626+
/>
627+
) : null}
628+
</div>
629+
</div>
630+
510631
{saveError ? <div className="text-danger-light text-xs">{saveError}</div> : null}
511632
</div>
512633
);

src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import React from "react";
2-
import { cleanup, fireEvent, render, waitFor } from "@testing-library/react";
2+
import { cleanup, fireEvent, render, waitFor, within } from "@testing-library/react";
33
import { afterEach, beforeEach, describe, expect, mock, test } from "bun:test";
44

55
import * as ActualSelectPrimitiveModule from "@/browser/components/SelectPrimitive/SelectPrimitive";
@@ -14,13 +14,15 @@ interface MockConfig {
1414
advisorModelString: string | null;
1515
advisorThinkingLevel: ThinkingLevel | null | undefined;
1616
advisorMaxUsesPerTurn: number | null | undefined;
17+
advisorMaxOutputTokens: number | null | undefined;
1718
}
1819

1920
interface SaveConfigInput {
2021
taskSettings: TaskSettings;
2122
advisorModelString?: string | null;
2223
advisorThinkingLevel?: ThinkingLevel | null;
2324
advisorMaxUsesPerTurn?: number | null;
25+
advisorMaxOutputTokens?: number | null;
2426
}
2527

2628
interface MockAPIClient {
@@ -204,6 +206,7 @@ function createMockAPI(configOverrides: Partial<MockConfig> = {}) {
204206
advisorModelString: null,
205207
advisorThinkingLevel: undefined,
206208
advisorMaxUsesPerTurn: undefined,
209+
advisorMaxOutputTokens: undefined,
207210
...configOverrides,
208211
};
209212

@@ -213,6 +216,7 @@ function createMockAPI(configOverrides: Partial<MockConfig> = {}) {
213216
advisorModelString: config.advisorModelString,
214217
advisorThinkingLevel: config.advisorThinkingLevel,
215218
advisorMaxUsesPerTurn: config.advisorMaxUsesPerTurn,
219+
advisorMaxOutputTokens: config.advisorMaxOutputTokens,
216220
})
217221
);
218222

@@ -223,6 +227,7 @@ function createMockAPI(configOverrides: Partial<MockConfig> = {}) {
223227
: null;
224228
config.advisorThinkingLevel = input.advisorThinkingLevel ?? null;
225229
config.advisorMaxUsesPerTurn = input.advisorMaxUsesPerTurn ?? null;
230+
config.advisorMaxOutputTokens = input.advisorMaxOutputTokens ?? null;
226231
return Promise.resolve();
227232
});
228233

@@ -297,8 +302,21 @@ describe("ExperimentsSection advisor config", () => {
297302

298303
function chooseSelectOption(view: ReturnType<typeof render>, label: string, optionText: string) {
299304
const trigger = getSelectTrigger(view, label);
305+
const row = trigger.parentElement?.parentElement;
306+
if (!(row instanceof window.HTMLElement)) {
307+
throw new Error(`Could not find row for ${label}`);
308+
}
309+
300310
fireEvent.pointerDown(trigger);
301-
fireEvent.click(view.getByText(optionText));
311+
312+
const option = within(row)
313+
.getAllByRole("button", { name: optionText })
314+
.find((element) => element !== trigger);
315+
if (!(option instanceof window.HTMLElement)) {
316+
throw new Error(`Could not find ${optionText} option for ${label}`);
317+
}
318+
319+
fireEvent.click(option);
302320
}
303321

304322
test("keeps the advisor row toggle-only when the experiment is disabled", async () => {
@@ -340,6 +358,7 @@ describe("ExperimentsSection advisor config", () => {
340358
advisorModelString: null,
341359
advisorThinkingLevel: THINKING_LEVEL_OFF,
342360
advisorMaxUsesPerTurn: null,
361+
advisorMaxOutputTokens: null,
343362
});
344363
});
345364

@@ -357,6 +376,7 @@ describe("ExperimentsSection advisor config", () => {
357376
advisorModelString: null,
358377
advisorThinkingLevel: THINKING_LEVEL_OFF,
359378
advisorMaxUsesPerTurn: 3,
379+
advisorMaxOutputTokens: null,
360380
});
361381
});
362382
});
@@ -380,6 +400,7 @@ describe("ExperimentsSection advisor config", () => {
380400
advisorModelString: null,
381401
advisorThinkingLevel: THINKING_LEVEL_OFF,
382402
advisorMaxUsesPerTurn: null,
403+
advisorMaxOutputTokens: null,
383404
});
384405
});
385406

src/common/config/schemas/appConfigOnDisk.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ export const AppConfigOnDiskSchema = z
6565
advisorModelString: z.string().optional(),
6666
advisorThinkingLevel: ThinkingLevelSchema.optional(),
6767
advisorMaxUsesPerTurn: z.number().int().positive().nullable().optional(),
68+
advisorMaxOutputTokens: z.number().int().positive().nullable().optional(),
6869
hiddenModels: z.array(z.string()).optional(),
6970
preferredCompactionModel: z.string().optional(),
7071
agentAiDefaults: AgentAiDefaultsSchema.optional(),

0 commit comments

Comments
 (0)