feat: deployment-wide default model and default plan model#672
feat: deployment-wide default model and default plan model#672bleleve wants to merge 12 commits into
Conversation
…amble)
Adds the backend for plan-mode: a human-in-the-loop planning gate where
the agent persists a markdown plan that must be approved before any
code-changing turn runs. This PR is headless — no UI or bot triggers
yet (those follow as separate PRs in the stack).
Changes:
- New `plans` table in the SessionDO SQLite (monotonic versions per
session); `PlanService` + `plans.handler` covering save / get / list
/ approve / reject.
- New endpoints under `/sessions/:id/plan{,/approve,/reject}` and
`/sessions/:id/plans` (list).
- `SessionMessageQueue` reads `getCurrentPlan()` and attaches
`resumeContext.currentPlan` to dispatched `PromptCommand`s. The
planning-turn gate is terminal-status aware (approved/rejected exits
plan mode without flipping `plan_mode` so the history bubble stays
visible).
- Sandbox `bridge.py` builds a restate-and-confirm preamble from
`resumeContext.currentPlan` and prepends it to the user content
before forwarding to OpenCode.
- New `wrapUntrustedContent` helper in `@open-inspect/shared` wraps
plan + user content in XML tags from the sandbox runtime (security:
isolates plan-as-context from prompt-injection inside user messages).
- Shared `model-defaults.ts` ships the `fetchModelDefaults` helper —
used by bots in subsequent PRs to source the default model + default
plan model from a single place (the control plane). Falls back to
env vars + shared constants until the `/model-preferences` endpoint
is extended (next PR in the stack).
- `MODEL_ALIAS_MAP` and `DEFAULT_PLAN_MODEL` added to shared models.
Tests:
- Unit tests for `PlanService`, `plans.handler`, plan-mode behavior in
`MessageQueue`, plan persistence in the repository.
- Sandbox-runtime tests for `bridge.py` preamble + XML wrapping.
- All test fixtures updated to include the new required `plan_*`
fields on `SessionRow`.
Verification: `npm run typecheck` (workspace), `npm run lint`, `npm test`
(shared 183/183, control-plane 1148/1148), `pytest` (sandbox-runtime
plan-related tests 41/41) — all green on this branch based on upstream/main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds plan-first HITL: persisted versioned plans, approval/amend flows, model-preferences defaults with fallback, control-plane plan APIs and internal routes, message-queue dispatch gating with resumeContext, sandbox plan saving and prompt-safety wrappers, web UI defaults, DB migrations, and tests. ChangesPlan-Mode HITL Workflows
Sequence Diagram(s) sequenceDiagram
participant Client
participant ControlPlane
participant SessionDO
participant Repository
participant Sandbox
Client->>ControlPlane: POST /sessions/:id/plan (savePlan)
ControlPlane->>SessionDO: forward to /internal/plan (savePlan)
SessionDO->>Repository: savePlan -> insert plan row
SessionDO->>ControlPlane: broadcast plan_status (awaiting_approval)
Client->>ControlPlane: POST /plan/approve
ControlPlane->>SessionDO: forward to /internal/planApprove
SessionDO->>Repository: updatePlanApprovalStatus & snapshotPlanCost
SessionDO->>Sandbox: dispatch implementation prompt (flush)
Sandbox->>SessionDO: execution_complete (with plan artifact)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/web/src/components/settings/models-settings.tsx (1)
61-62:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winOnly set
dirtywhen a toggle actually changes state.Line 61 and Line 88 mark the form dirty even when disabling is blocked (default model guard) or the action is a no-op. That enables “Save” despite unchanged preferences.
Proposed adjustment
const toggleModel = (modelId: string) => { setToggleError(null); setEnabledModels((prev) => { const next = new Set(prev); if (next.has(modelId)) { if (next.size <= 1) return prev; if (modelId === defaultModel || modelId === defaultPlanModel) { setToggleError( `"${formatModelNameLower(modelId)}" is the current default — pick a different default before disabling.` ); return prev; } next.delete(modelId); + setDirty(true); } else { next.add(modelId); + setDirty(true); } return next; }); - setDirty(true); }; ... const toggleCategory = (category: (typeof MODEL_OPTIONS)[number], enable: boolean) => { setToggleError(null); setEnabledModels((prev) => { const next = new Set(prev); let blockedDefault: string | null = null; + let changed = false; for (const model of category.models) { if (enable) { - next.add(model.id); + if (!next.has(model.id)) { + next.add(model.id); + changed = true; + } } else { if (model.id === defaultModel || model.id === defaultPlanModel) { blockedDefault = model.id; continue; } - next.delete(model.id); + if (next.delete(model.id)) changed = true; } } if (next.size === 0) return prev; if (blockedDefault) { setToggleError( `"${formatModelNameLower(blockedDefault)}" is the current default — pick a different default before disabling.` ); } + if (changed) setDirty(true); return next; }); - setDirty(true); };Also applies to: 88-89
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/components/settings/models-settings.tsx` around lines 61 - 62, The handler currently calls setDirty(true) even when a toggle is a no-op or disabled by the default-model guard; update the toggle logic (the function that updates models and calls setDirty, e.g., the model-toggle handler that calls setModels and setDirty) to first compare the intended new enabled state with the current state and return early if they are identical or if disabling is blocked, only proceeding to call setModels and then setDirty(true) when an actual change occurs; ensure the default-model guard prevents state changes and that setDirty is only invoked after a successful state mutation.
🧹 Nitpick comments (2)
packages/web/src/components/settings/models-settings.tsx (1)
36-41: ⚡ Quick winMove initial SWR-to-state sync out of render.
Line 36 performs render-phase state updates. In React, this is safer and more predictable in an effect.
Proposed refactor
-import { useMemo, useState } from "react"; +import { useEffect, useMemo, useState } from "react"; ... - // Sync SWR data into local state once on initial load - if (data && !initialized) { - setEnabledModels(new Set(data.enabledModels)); - if (data.defaultModel) setDefaultModel(data.defaultModel); - if (data.defaultPlanModel) setDefaultPlanModel(data.defaultPlanModel); - setInitialized(true); - } + useEffect(() => { + if (!data || initialized) return; + setEnabledModels(new Set(data.enabledModels)); + if (data.defaultModel) setDefaultModel(data.defaultModel); + if (data.defaultPlanModel) setDefaultPlanModel(data.defaultPlanModel); + setInitialized(true); + }, [data, initialized]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/components/settings/models-settings.tsx` around lines 36 - 41, The render is performing state updates when SWR data arrives; move that sync into an effect: create a useEffect that depends on [data, initialized], and inside it, if (data && !initialized) call setEnabledModels(new Set(data.enabledModels)), setDefaultModel(data.defaultModel) (if present), setDefaultPlanModel(data.defaultPlanModel) (if present), and setInitialized(true); remove the inline state updates from the render path so only the useEffect containing setEnabledModels, setDefaultModel, setDefaultPlanModel, and setInitialized handles initializing state from data.packages/control-plane/src/session/durable-object.ts (1)
779-779: ⚡ Quick winExtract the default inactivity timeout into a named constant.
Line 779 restates the default timeout literal inline. Please define it once as a constant and reuse it here.
♻️ Proposed refactor
+const DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS = 900_000; + const config = { ...DEFAULT_LIFECYCLE_CONFIG, @@ inactivity: { ...DEFAULT_LIFECYCLE_CONFIG.inactivity, - timeoutMs: parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS || "900000", 10), + timeoutMs: parseInt( + this.env.SANDBOX_INACTIVITY_TIMEOUT_MS || + String(DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS), + 10 + ), },As per coding guidelines, "Define each default timeout value exactly once as a named constant and import it everywhere rather than restating literal values".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/durable-object.ts` at line 779, Define a single named constant for the default sandbox inactivity timeout (e.g., SANDBOX_INACTIVITY_TIMEOUT_MS_DEFAULT = 900000) and export it from a shared constants module or at top of this file, then replace the inline literal in the existing timeout parsing (the expression currently: parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS || "900000", 10)) to use the constant (e.g., parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS ?? String(SANDBOX_INACTIVITY_TIMEOUT_MS_DEFAULT), 10)), updating any other occurrences to import/use the same constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/PLAN_MODE.md`:
- Around line 133-150: The fenced ASCII diagram block starting with "`@mention` /
label / web prompt" and ending with "approve / reject / amend ──▶ next prompt
runs as build" is missing a language tag (MD040); update the opening fence to
include a language label such as text (e.g., change ``` to ```text) so
markdownlint is satisfied while leaving the diagram content unchanged.
In `@packages/control-plane/README.md`:
- Around line 75-77: Update the sentence starting "When a session opts into plan
mode the agent emits a markdown plan and the message queue is gated until the
user approves, rejects, or amends." to clarify that the queue is not completely
stalled: state that the agent emits a markdown plan and non-plan messages are
gated pending user approval, but amendments can still be dispatched as planning
turns while approval is pending (i.e., the system holds non-planning messages
rather than hard-blocking all message activity); keep the reference to
docs/PLAN_MODE.md intact.
In `@packages/control-plane/src/session/http/handlers/plans.handler.ts`:
- Around line 156-162: The parser in readApprovalBody can return non-object
values (e.g., JSON.parse("null") => null) which later causes approvePlan to
dereference properties and crash; inside readApprovalBody (and its try block)
validate that the parsed value is a non-null object (typeof parsed === "object"
&& parsed !== null) and only return it cast as ApprovalRequestBody when that
check passes, otherwise return {} so malformed-but-valid JSON does not lead to
runtime errors.
In `@packages/control-plane/src/types.ts`:
- Line 94: Update the comment for the SANDBOX_INACTIVITY_TIMEOUT_MS property to
reference the shared default constant instead of repeating the literal (900000);
change the comment from "Inactivity timeout in ms (default: 900000 = 15 min)" to
something like "Inactivity timeout in ms (Defaults to
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS)" so it references the named constant used
elsewhere.
In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py`:
- Around line 744-747: The token event handler is appending cumulative token
text to text_buffer, causing duplicated/ corrupted plan_text when planMode=True;
to fix it, change the logic in the token-handling block (the code that uses
text_buffer when handling token events and later builds plan_text) to append
only the incremental delta instead of the full cumulative value—e.g., keep a
last_text (or reuse the last element of text_buffer) and compute new_chunk =
event_value[len(last_text):] and append new_chunk, or replace the last entry
with the latest full value rather than appending it; update the code that
constructs plan_text from text_buffer accordingly so it no longer double-counts
tokens (references: text_buffer, the token event handler, and the code that
produces plan_text when planMode is True).
- Around line 639-643: The returned XML-like wrappers inject raw plan text
(plan_content and previous plan content) which can include closing tags like
</saved_plan> or </previous_plan>; fix by escaping those sequences before
embedding: create a safe variable (e.g., safe_plan =
plan_content.strip().replace("</saved_plan>",
"</saved_plan>").replace("</previous_plan>", "</previous_plan>"))
and use safe_plan in the string construction; apply the same escaping logic to
the other embedded plan variable (the previous plan insertion) so no raw closing
tags are emitted into the wrapper.
- Line 844: The call using a hardcoded 30.0 timeout (resp = await
self.http_client.post(..., timeout=30.0)) should use a single named timeout
constant instead; replace the literal with a module- or package-level constant
(e.g., DEFAULT_HTTP_TIMEOUT or PLAN_SAVE_TIMEOUT) and import it where needed, or
if a suitable constant already exists elsewhere in the codebase, import that
constant into sandbox_runtime.bridge and use it in the self.http_client.post
call so the timeout value is defined exactly once.
In `@packages/shared/src/model-defaults.ts`:
- Around line 50-57: The current logic treats the control-plane response as
valid only if both data.defaultModel and data.defaultPlanModel are present,
causing both fallbacks to be discarded when one field is missing; change the
code to apply per-field fallbacks instead: when res.ok and data is parsed
(ModelPreferencesResponse), set defaultModel = data.defaultModel ??
existingFallbackDefaultModel and defaultPlanModel = data.defaultPlanModel ??
existingFallbackDefaultPlanModel (i.e., preserve whichever fallback remains when
the other CP value is absent), and apply the same per-field pattern for the
second occurrence around the 62-65 block so each field is independently chosen
from data or fallback.
In `@packages/shared/src/prompt-safety.ts`:
- Around line 55-60: The current escapedContent replacements only match exact
"<user_content" and "</user_content>" strings and miss variants with extra
whitespace, attributes, or casing; change the logic that builds escapedContent
to use case-insensitive regex replacements that match opening and closing
user_content tags with optional whitespace/attributes (e.g. use regex like
/<\s*\/\s*user_content\b[^>]*>/gi for closing and /<\s*user_content\b[^>]*>/gi
for opening) and replace them with their neutralized forms (preserving the
existing escaped-backslash/slash pattern used in the code) so all variants such
as "</user_content >", "<USER_CONTENT>", or "<user_content attr='x'>" are
correctly neutralized; update the variable escapedContent accordingly where it
is defined.
In `@terraform/environments/production/workers-linear.tf`:
- Line 49: The environment variable DEFAULT_PLAN_MODEL is using
"claude-opus-4-6" while other configs expect the vendor-prefixed format; update
the value of DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" to match
control-plane and GitHub-bot, and also verify/update DEFAULT_MODEL to the same
"anthropic/claude-opus-4-6" format so both variables use a consistent
vendor-prefixed model identifier.
In `@terraform/environments/production/workers-slack.tf`:
- Line 51: DEFAULT_PLAN_MODEL is using "claude-opus-4-6" without the required
provider prefix; update the value of DEFAULT_PLAN_MODEL to
"anthropic/claude-opus-4-6" to match other workers, and also check and align
DEFAULT_MODEL (the sibling variable) to use the same "anthropic/..." prefixed
model string so formats are consistent across Slack-bot, control-plane and
GitHub-bot.
---
Outside diff comments:
In `@packages/web/src/components/settings/models-settings.tsx`:
- Around line 61-62: The handler currently calls setDirty(true) even when a
toggle is a no-op or disabled by the default-model guard; update the toggle
logic (the function that updates models and calls setDirty, e.g., the
model-toggle handler that calls setModels and setDirty) to first compare the
intended new enabled state with the current state and return early if they are
identical or if disabling is blocked, only proceeding to call setModels and then
setDirty(true) when an actual change occurs; ensure the default-model guard
prevents state changes and that setDirty is only invoked after a successful
state mutation.
---
Nitpick comments:
In `@packages/control-plane/src/session/durable-object.ts`:
- Line 779: Define a single named constant for the default sandbox inactivity
timeout (e.g., SANDBOX_INACTIVITY_TIMEOUT_MS_DEFAULT = 900000) and export it
from a shared constants module or at top of this file, then replace the inline
literal in the existing timeout parsing (the expression currently:
parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS || "900000", 10)) to use the
constant (e.g., parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS ??
String(SANDBOX_INACTIVITY_TIMEOUT_MS_DEFAULT), 10)), updating any other
occurrences to import/use the same constant.
In `@packages/web/src/components/settings/models-settings.tsx`:
- Around line 36-41: The render is performing state updates when SWR data
arrives; move that sync into an effect: create a useEffect that depends on
[data, initialized], and inside it, if (data && !initialized) call
setEnabledModels(new Set(data.enabledModels)),
setDefaultModel(data.defaultModel) (if present),
setDefaultPlanModel(data.defaultPlanModel) (if present), and
setInitialized(true); remove the inline state updates from the render path so
only the useEffect containing setEnabledModels, setDefaultModel,
setDefaultPlanModel, and setInitialized handles initializing state from data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1908a84e-fc9a-4b70-ba44-702777be5ccf
📒 Files selected for processing (52)
docs/GETTING_STARTED.mddocs/HOW_IT_WORKS.mddocs/PLAN_MODE.mdpackages/control-plane/README.mdpackages/control-plane/src/db/model-preferences.tspackages/control-plane/src/router.tspackages/control-plane/src/routes/model-preferences.test.tspackages/control-plane/src/routes/model-preferences.tspackages/control-plane/src/sandbox/lifecycle/manager.test.tspackages/control-plane/src/session/contracts.tspackages/control-plane/src/session/durable-object.tspackages/control-plane/src/session/http/handlers/child-sessions.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/src/session/http/handlers/pull-request.handler.test.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.test.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.tspackages/control-plane/src/session/http/routes.test.tspackages/control-plane/src/session/http/routes.tspackages/control-plane/src/session/initialize.tspackages/control-plane/src/session/message-queue.test.tspackages/control-plane/src/session/message-queue.tspackages/control-plane/src/session/openai-token-refresh-service.test.tspackages/control-plane/src/session/pull-request-service.test.tspackages/control-plane/src/session/repository.test.tspackages/control-plane/src/session/repository.tspackages/control-plane/src/session/schema.tspackages/control-plane/src/session/services/plan.service.test.tspackages/control-plane/src/session/services/plan.service.tspackages/control-plane/src/session/types.tspackages/control-plane/src/types.tspackages/control-plane/src/utils/models.tspackages/control-plane/test/integration/cleanup.tspackages/control-plane/test/integration/model-preferences.test.tspackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/sandbox-runtime/tests/test_bridge_message_tracking.pypackages/sandbox-runtime/tests/test_bridge_resume_context.pypackages/shared/src/index.tspackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/shared/src/models.tspackages/shared/src/prompt-safety.test.tspackages/shared/src/prompt-safety.tspackages/shared/src/types/index.tspackages/web/src/components/settings/models-settings.tsxpackages/web/src/hooks/use-enabled-models.tspackages/web/src/lib/session-list.test.tsterraform/d1/migrations/0021_add_default_models_to_model_preferences.sqlterraform/environments/production/workers-control-plane.tfterraform/environments/production/workers-github.tfterraform/environments/production/workers-linear.tfterraform/environments/production/workers-slack.tf
…eMurray#671 Must-fix: - bridge.py: escape XML special chars in plan content before interpolating into <saved_plan> and <previous_plan> wrappers (1.1) — prevents a hostile `</saved_plan>` in the plan body from breaking out of the wrapper and injecting instructions outside it. - bridge.py: replace the cumulative token buffer's append with overwrite semantics (1.2) — OpenCode token events carry the FULL accumulated text since the response start (not incremental deltas), so appending produced corrupted plan bodies with compounded prefixes at end-of-turn. - plans.handler.ts: readApprovalBody now throws InvalidApprovalBodyError on JSON parse failure instead of silently coercing the body to {} (1.3) — errorResponseForApproval maps it to HTTP 400 so malformed client requests surface instead of leaking through with partial parameters. Should-fix: - plan.service.ts: dedup guard now requires non-null messageId (1.4) — null is the "no message context" marker, so two identical-body events with null messageId are legitimately distinct saves and must each bump the version. - session/types.ts: import PlanSource from @open-inspect/shared (1.5) instead of redeclaring it locally — prevents contract drift between the control plane row type and the shared API surface. - durable-object.ts: extract DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS module constant (1.6); also cleans up a stray /** doc-comment opener left over from an earlier opencode-config strip. - bridge.py: extract PLAN_SAVE_TIMEOUT_SECONDS module constant (1.7). - prompt-safety.ts: <user_content> tag neutralization is now case- insensitive and whitespace-tolerant (1.8) — catches `<USER_CONTENT>`, `< user_content >`, `</ user_content >`, attribute-bearing tags, etc. Nitpicks: - session-lifecycle.handler.ts: remove redundant getValidModelOrDefault after validation; add log.warn when planMode=true but planModel is invalid and we fall back to DEFAULT_PLAN_MODEL (1.9). - router.ts: type forwardPlanApproval's internalPath parameter with SessionInternalPath instead of bypassing via `as never` (1.10). - control-plane/src/types.ts: reference DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS in the SANDBOX_INACTIVITY_TIMEOUT_MS env-var comment (1.11). - docs/PLAN_MODE.md: add `text` language tag to architecture diagram fence to satisfy MD040 (1.12). - packages/control-plane/README.md: clarify Plan Mode behavior — prompts continue to queue and dispatch as *planning turns* until approval/rejection/amendment (not "the message queue is gated") (1.13). Test coverage gaps: - repository.test.ts: new case for upsertSession({ planMode: true, planModel }) asserting plan_mode=1 and plan_model are persisted (1.14). - plan.service.test.ts: new cases for null-messageId dedup (skipped guard) and MAX_PLAN_CONTENT_BYTES boundary (accept-at-limit, throw- over-limit) (1.15). Targeted regression tests: - test_bridge_resume_context.py: assertions that resume + planning preambles escape XML specials in the plan body and that token-buffer overwrite semantics produce a single non-duplicated snapshot. - plans.handler.test.ts: assertions that malformed-JSON bodies on approve/reject return HTTP 400 with code "invalid_body". Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16 files, control-plane 65 files all green) && pytest (41 + 4 new regression cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#671 Three items flagged on the first fix commit (a2cad10): 🟠 Major (plans.handler.ts) — Don't clear reasoning effort when no model override is provided. Previously the approve handler validated the user's implementationReasoningEffort against `""` when no model override was set; validateReasoningEffort returned null, which then propagated to approvePlanAndFlush. The service treats null as an explicit "clear", so an approval request that sent reasoning effort without a model silently wiped the session's persisted reasoning_effort. New semantics: - undefined (omitted) → no change - explicit null → forwarded as null (intentional clear) - string + model → validated against the model - string, no model → return 400 with code "invalid_reasoning_effort" Two new regression cases in plans.handler.test.ts. ⚡ Quick win (durable-object.ts) — NaN-guard the inactivity timeout parse. parseInt(env_value, 10) returns NaN when the env var is set to a non-numeric string (e.g. "abc" or "5 minutes"); the lifecycle manager would then schedule alarms with NaN ms, which is undefined behavior. New parseSandboxInactivityTimeoutMs() helper checks Number.isFinite and value > 0 before accepting the env value, falling back to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS otherwise. ⚡ Style (durable-object.ts) — Drop the "// 15 minutes" trailing comment on DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS. Per repo guideline, don't restate literal timeout values in comments — the constant name is the source of truth. Verification: npm run typecheck, npm run lint, npm test -w @open-inspect/control-plane (65/65 files, all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26491b5 to
572e4bb
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/control-plane/src/routes/model-preferences.test.ts`:
- Line 145: The test currently asserts plan fallback against DEFAULT_MODEL;
change the assertion in model-preferences.test (the expect on
body.defaultPlanModel) to use the plan-specific constant DEFAULT_PLAN_MODEL
instead of DEFAULT_MODEL so the test verifies the dedicated plan fallback
contract (update the expect call to toBe(DEFAULT_PLAN_MODEL)).
In `@packages/control-plane/src/routes/model-preferences.ts`:
- Around line 35-47: resolveDefaults currently picks
defaultModel/defaultPlanModel from env/shared fallbacks without ensuring those
values are present in the enabledModels set, violating the same invariant
enforced by setPreferences; after computing defaultModel and defaultPlanModel in
resolveDefaults (and in the analogous fallback logic at lines 57-69), reconcile
them against the enabledModels you will return by either adding any missing
defaultModel/defaultPlanModel into enabledModels or choosing a fallback from
enabledModels (e.g., first enabled) and update the returned defaults accordingly
so the returned defaults are always members of the enabledModels set.
In `@packages/control-plane/src/session/http/handlers/plans.handler.ts`:
- Around line 115-131: The handler currently treats a null return from
deps.validateReasoningEffort as an explicit “clear” sentinel; change this so
that after calling validateReasoningEffort(implementationModel,
body.implementationReasoningEffort) you check whether the return is null but the
incoming body.implementationReasoningEffort was not null/undefined, and if so
return the 400 Response with code "invalid_reasoning_effort"; keep the existing
behavior that only an explicit body.implementationReasoningEffort === null
clears the stored value. Ensure you reference the implementationReasoningEffort
variable, validateReasoningEffort call, and implementationModel existence check
when making this change.
In
`@packages/control-plane/src/session/http/handlers/session-lifecycle.handler.ts`:
- Around line 115-129: When planMode is true, don't hardcode DEFAULT_PLAN_MODEL
here; instead resolve the deployment-wide effective default (the DB > env >
shared fallback) before initializing planModel and use that value when
body.planModel is omitted. Update the logic around planMode/planModel to call
the existing configuration lookup (e.g., a getEffectiveDefaultPlanModel() or
equivalent config/service method) to populate the fallback, keep the
isValidModel() check for an explicitly provided body.planModel and emit the
deps.getLog().warn only when an invalid plan was requested, and assign the
resolved effective default when no planModel is provided.
In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py`:
- Line 651: Replace the unsafe xml_escape usage for attribute values with a
proper attribute-quoting helper: use quoteattr() (e.g.,
xml.sax.saxutils.quoteattr) when inserting the version into the saved_plan and
previous_plan attributes so embedded double quotes won’t break the XML-like
attribute boundary; locate the f-string that builds '<saved_plan version="...">'
and the analogous previous_plan construction and wrap the version value with
quoteattr(str(version)) instead of xml_escape(str(version)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0301759-5532-4418-b3c4-01d0a0f87175
📒 Files selected for processing (30)
docs/GETTING_STARTED.mddocs/PLAN_MODE.mdpackages/control-plane/README.mdpackages/control-plane/src/db/model-preferences.tspackages/control-plane/src/router.tspackages/control-plane/src/routes/model-preferences.test.tspackages/control-plane/src/routes/model-preferences.tspackages/control-plane/src/session/durable-object.tspackages/control-plane/src/session/http/handlers/plans.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.tspackages/control-plane/src/session/repository.test.tspackages/control-plane/src/session/services/plan.service.test.tspackages/control-plane/src/session/services/plan.service.tspackages/control-plane/src/session/types.tspackages/control-plane/src/types.tspackages/control-plane/test/integration/cleanup.tspackages/control-plane/test/integration/model-preferences.test.tspackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/sandbox-runtime/tests/test_bridge_resume_context.pypackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/shared/src/prompt-safety.tspackages/web/src/components/settings/models-settings.tsxpackages/web/src/hooks/use-enabled-models.tsterraform/d1/migrations/0021_add_default_models_to_model_preferences.sqlterraform/environments/production/workers-control-plane.tfterraform/environments/production/workers-github.tfterraform/environments/production/workers-linear.tfterraform/environments/production/workers-slack.tf
✅ Files skipped from review due to trivial changes (4)
- terraform/environments/production/workers-github.tf
- docs/PLAN_MODE.md
- packages/control-plane/README.md
- docs/GETTING_STARTED.md
| const planMode = body.planMode === true; | ||
| let planModel: string | null = null; | ||
| if (planMode) { | ||
| if (body.planModel && isValidModel(body.planModel)) { | ||
| planModel = body.planModel; | ||
| } else { | ||
| planModel = DEFAULT_PLAN_MODEL; | ||
| if (body.planModel) { | ||
| deps.getLog().warn("Invalid plan model, using default", { | ||
| requested_plan_model: body.planModel, | ||
| default_plan_model: planModel, | ||
| }); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Honor the deployment-wide plan default here.
When plan mode is enabled and the caller omits planModel, this always persists DEFAULT_PLAN_MODEL. That bypasses the new DB > env > shared fallback chain, so server-created sessions won't respect a configured defaultPlanModel unless every caller sends it explicitly. Please resolve the effective default before this init path, rather than hardcoding the shared fallback here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/control-plane/src/session/http/handlers/session-lifecycle.handler.ts`
around lines 115 - 129, When planMode is true, don't hardcode DEFAULT_PLAN_MODEL
here; instead resolve the deployment-wide effective default (the DB > env >
shared fallback) before initializing planModel and use that value when
body.planModel is omitted. Update the logic around planMode/planModel to call
the existing configuration lookup (e.g., a getEffectiveDefaultPlanModel() or
equivalent config/service method) to populate the fallback, keep the
isValidModel() check for an explicitly provided body.planModel and emit the
deps.getLog().warn only when an invalid plan was requested, and assign the
resolved effective default when no planModel is provided.
There was a problem hiding this comment.
Acknowledging this — confirmed valid concern, but deferring this one as a follow-up.
The DO-side lifecycle handler doesn't currently take ModelPreferencesStore (or any D1-accessing dep) — it works through a pinned set of repository methods. Wiring the deployment default through here means:
- adding a
resolveDefaultPlanModel: () => Promise<string | null>dep onSessionLifecycleHandlerDeps - constructing it in the DO's
durable-object.tswithnew ModelPreferencesStore(env.DB) - updating all unit-test fixtures that build the handler
In practice, every current caller (web home, GitHub bot, Linear bot, Slack bot) already resolves defaultPlanModel via fetchModelDefaults before calling POST /sessions and passes it explicitly as planModel. The DO falling back to the shared constant for callers that omit planModel is a defensive last-resort path, not the happy path.
Tracked as a separate follow-up so this PR doesn't grow into the DO-deps refactor surface. Happy to address in a stacked PR once #671 + #672 land. Marking this thread as something to revisit.
There was a problem hiding this comment.
@bleleve, that's a well-reasoned deferral — the "callers already resolve via fetchModelDefaults before the session POST" invariant is a solid reason to keep this PR scope-contained. The DEFAULT_PLAN_MODEL fallback remaining as a last-resort safety net is acceptable for now.
Would you like me to open a GitHub issue to track the follow-up work? It could capture the specific changes needed:
- Add
resolveDefaultPlanModel: () => Promise<string | null>toSessionLifecycleHandlerDeps - Wire
new ModelPreferencesStore(env.DB)indurable-object.ts - Update unit-test fixtures for the handler
Just say the word and I'll create it.
CodeRabbit follow-up review on ColeMurray#672 (the only thread that didn't auto- resolve after the previous fix pushes). The issue technically lives in ColeMurray#671's plans.handler.ts but CodeRabbit only flagged it now after seeing the post-fix state. `JSON.parse("null")` returns null, `JSON.parse("[1,2]")` returns an array, `JSON.parse("42")` returns a number. All three are syntactically valid JSON but none of them is the object payload approvePlan/rejectPlan expect. The previous code would parse successfully then crash later when dereferencing `body.implementationModel` on `null` (TypeError) or silently accept arrays and primitives as if they were valid bodies. Reject these early with HTTP 400 (`code: "invalid_body"`) via the existing InvalidApprovalBodyError path. Two new regression cases cover JSON-null and JSON-array bodies. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (65/65 files, 2 new test cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
572e4bb to
d03dab5
Compare
…leMurray#671 Two new items flagged after the previous fix pushes: 🟠 plans.handler.ts — Reject invalid implementationReasoningEffort. validateReasoningEffort() returns null for unsupported values, but null is also our explicit "clear the persisted effort" sentinel. A request with a typo (e.g. "hgih" instead of "high") was previously accepted and silently cleared the session's reasoning_effort. Now we distinguish the two cases: explicit-null body stays null (intentional clear), but a string that fails validation returns 400 with code "invalid_reasoning_effort" instead of being coerced. 🟡 bridge.py — Use quoteattr for the version XML attribute value. xml_escape() does NOT escape `"`, so a version string containing a quote could break the attribute boundary of <saved_plan version=...> and <previous_plan version=...>. Switching to xml.sax.saxutils. quoteattr (which handles the surrounding quotes itself) closes that edge case. Body content keeps xml_escape since it's not in an attribute context. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green) && pytest tests/test_bridge_resume_context.py (19 cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d03dab5 to
a659a13
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/shared/src/model-defaults.ts`:
- Around line 65-67: fetchModelDefaults currently returns env.DEFAULT_MODEL /
env.DEFAULT_PLAN_MODEL verbatim on the fallback path (used to set
defaultModel/defaultPlanModel), which can be unqualified; modify
fetchModelDefaults to validate and normalize any fallback env values into
canonical provider/model IDs (e.g., call a new or existing
normalizeModelId(model) helper) before assigning defaultModel and
defaultPlanModel, falling back to DEFAULT_MODEL / DEFAULT_PLAN_MODEL constants
only after normalization/validation fails, and update model-defaults tests to
assert canonical (fully-qualified) IDs are returned when env variables are used;
reference fetchModelDefaults, defaultModel, defaultPlanModel, env.DEFAULT_MODEL,
env.DEFAULT_PLAN_MODEL, DEFAULT_MODEL, and DEFAULT_PLAN_MODEL in your changes.
In `@packages/web/src/components/settings/models-settings.tsx`:
- Around line 45-100: The updater mutates an outer `changed` flag which is
unsafe; instead compute change purely inside the updater (or compute the next
set before calling setEnabledModels) and only call setDirty when the new Set
actually differs from the previous Set. Update toggleModel and toggleCategory:
stop writing to the outer `changed`, build a local next Set inside the updater
(or compute next outside), determine whether prev and next differ by comparing
sizes and membership, call setToggleError from inside the updater when blocking
a default (using defaultModel/defaultPlanModel and formatModelNameLower), return
prev when no change is needed, and call setDirty(true) only if the comparison
shows a real change. Ensure you reference setEnabledModels, setDirty,
setToggleError, toggleModel, toggleCategory, defaultModel, defaultPlanModel, and
formatModelNameLower when making these edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 63eb8d19-4faa-44e4-bcdb-26126bca2b8c
📒 Files selected for processing (17)
docs/GETTING_STARTED.mdpackages/control-plane/src/db/model-preferences.tspackages/control-plane/src/routes/model-preferences.test.tspackages/control-plane/src/routes/model-preferences.tspackages/control-plane/src/session/http/handlers/plans.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/test/integration/cleanup.tspackages/control-plane/test/integration/model-preferences.test.tspackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/web/src/components/settings/models-settings.tsxpackages/web/src/hooks/use-enabled-models.tsterraform/d1/migrations/0021_add_default_models_to_model_preferences.sqlterraform/environments/production/workers-control-plane.tfterraform/environments/production/workers-github.tfterraform/environments/production/workers-linear.tfterraform/environments/production/workers-slack.tf
✅ Files skipped from review due to trivial changes (1)
- docs/GETTING_STARTED.md
… GET /model-preferences CodeRabbit follow-up on ColeMurray#672 (only remaining unresolved thread on the PR after the previous fixes auto-resolved): when no DB preferences exist, the fallback chain could return defaultModel / defaultPlanModel sourced from env vars or shared constants that weren't members of the returned enabledModels set. That broke the same invariant enforced by setPreferences() — a subsequent PUT with the unchanged returned tuple would fail validation, leaving the Settings page unable to re-save. New reconcileDefaultsWithEnabled() helper substitutes the first enabled model for any default that isn't in enabledModels (the same fallback the web Settings UI applies when a stored default becomes disabled). Applied at all three GET sites: no-DB-binding fallback, happy path, and store-throws fallback. New regression test asserts that env-var defaults outside the enabled set are reconciled. The existing test capturing the old (broken) behavior was rewritten to use enabledModels that contain the env-var values — the more realistic configuration where both are aligned. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green, 1 new regression case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on ColeMurray#672: env vars like `DEFAULT_MODEL` / `DEFAULT_PLAN_MODEL` are sometimes configured with bare model names (e.g. `claude-sonnet-4-6`) rather than the fully-qualified form (`anthropic/claude-sonnet-4-6`). The CP `/model-preferences` endpoint already returns normalized IDs, but the env-var fallback path previously surfaced the raw env value verbatim — so during a CP outage callers could see a mix of qualified and bare IDs depending on how the operator configured the worker. New `normalizeFallback()` helper passes the candidate through `isValidModel` + `normalizeModelId`. Invalid candidates return undefined so the chain falls through to the next link (env → shared constant); valid bare or qualified IDs get normalized to the qualified form before being returned. Three existing tests rewritten to assert the normalized output (they previously captured the bug). Two new regression cases: - Bare env vars get normalized. - Invalid env vars are ignored and the chain falls through to the shared constant. Verification: npm run build -w @open-inspect/shared, npm test -w @open-inspect/shared (16/16 files green) && npm test -w @open-inspect/control-plane (66/66 files green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/control-plane/src/routes/model-preferences.test.ts (1)
165-168:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the plan fallback constant in this assertion.
Line 167 should assert the plan fallback contract (
DEFAULT_PLAN_MODEL), notDEFAULT_MODEL, otherwise this test can miss plan-specific fallback regressions.Suggested fix
- expect(body.defaultPlanModel).toBe(DEFAULT_MODEL); + expect(body.defaultPlanModel).toBe(DEFAULT_PLAN_MODEL);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/routes/model-preferences.test.ts` around lines 165 - 168, The test currently asserts plan fallback using DEFAULT_MODEL; change the assertion to use the plan-specific constant DEFAULT_PLAN_MODEL so the getValidModelOrDefault / plan fallback contract is validated: update the expectation in the model-preferences.test.ts unit where it checks body.defaultPlanModel to compare against DEFAULT_PLAN_MODEL instead of DEFAULT_MODEL.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@packages/control-plane/src/routes/model-preferences.test.ts`:
- Around line 165-168: The test currently asserts plan fallback using
DEFAULT_MODEL; change the assertion to use the plan-specific constant
DEFAULT_PLAN_MODEL so the getValidModelOrDefault / plan fallback contract is
validated: update the expectation in the model-preferences.test.ts unit where it
checks body.defaultPlanModel to compare against DEFAULT_PLAN_MODEL instead of
DEFAULT_MODEL.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8cf7706d-27da-4e81-bdef-c95932072118
📒 Files selected for processing (17)
docs/GETTING_STARTED.mdpackages/control-plane/src/db/model-preferences.tspackages/control-plane/src/routes/model-preferences.test.tspackages/control-plane/src/routes/model-preferences.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/test/integration/cleanup.tspackages/control-plane/test/integration/model-preferences.test.tspackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/web/src/components/settings/models-settings.tsxpackages/web/src/hooks/use-enabled-models.tsterraform/d1/migrations/0021_add_default_models_to_model_preferences.sqlterraform/environments/production/workers-control-plane.tfterraform/environments/production/workers-github.tfterraform/environments/production/workers-linear.tfterraform/environments/production/workers-slack.tf
✅ Files skipped from review due to trivial changes (3)
- terraform/environments/production/workers-github.tf
- terraform/environments/production/workers-linear.tf
- terraform/environments/production/workers-slack.tf
When a plan is approved, the control-plane now enqueues a synthetic "Implement the approved plan vN..." prompt server-side and flushes the queue, so every client (web + Slack/Linear/GitHub bots) starts the implementation turn through the same code path. Previously only the web client did this (via a follow-up WebSocket prompt after the approve call). Approvals coming from bots stalled at "Execution complete" right after the plan was generated — Build cost stayed at $0 — because nothing was kicking off the build turn. The synthetic prompt is authored by a stable per-session "system" participant (created lazily on first dispatch) and uses the new "system" MessageSource so the UI and event log can distinguish it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
enqueuePromptFromApi already flushes the message queue at the end of its own path, so the implementation-prompt enqueue done in onDispatchImplementationPrompt picks up both the synthetic prompt and any user messages that piled up during awaiting_approval. The separate onPlanApproved callback that fired a second processMessageQueue was a no-op (queue already busy) and the comment claiming it flushed the synthetic prompt was misleading. Remove the dep and the wiring. Addresses reef review on ColeMurray#65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GET /model-preferences CodeRabbit follow-up on ColeMurray#672 (only remaining unresolved thread on the PR after the previous fixes auto-resolved): when no DB preferences exist, the fallback chain could return defaultModel / defaultPlanModel sourced from env vars or shared constants that weren't members of the returned enabledModels set. That broke the same invariant enforced by setPreferences() — a subsequent PUT with the unchanged returned tuple would fail validation, leaving the Settings page unable to re-save. New reconcileDefaultsWithEnabled() helper substitutes the first enabled model for any default that isn't in enabledModels (the same fallback the web Settings UI applies when a stored default becomes disabled). Applied at all three GET sites: no-DB-binding fallback, happy path, and store-throws fallback. New regression test asserts that env-var defaults outside the enabled set are reconciled. The existing test capturing the old (broken) behavior was rewritten to use enabledModels that contain the env-var values — the more realistic configuration where both are aligned. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green, 1 new regression case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on ColeMurray#672: env vars like `DEFAULT_MODEL` / `DEFAULT_PLAN_MODEL` are sometimes configured with bare model names (e.g. `claude-sonnet-4-6`) rather than the fully-qualified form (`anthropic/claude-sonnet-4-6`). The CP `/model-preferences` endpoint already returns normalized IDs, but the env-var fallback path previously surfaced the raw env value verbatim — so during a CP outage callers could see a mix of qualified and bare IDs depending on how the operator configured the worker. New `normalizeFallback()` helper passes the candidate through `isValidModel` + `normalizeModelId`. Invalid candidates return undefined so the chain falls through to the next link (env → shared constant); valid bare or qualified IDs get normalized to the qualified form before being returned. Three existing tests rewritten to assert the normalized output (they previously captured the bug). Two new regression cases: - Bare env vars get normalized. - Invalid env vars are ignored and the chain falls through to the shared constant. Verification: npm run build -w @open-inspect/shared, npm test -w @open-inspect/shared (16/16 files green) && npm test -w @open-inspect/control-plane (66/66 files green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
935b3af to
a72780b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/control-plane/src/db/model-preferences.ts (1)
56-107:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize model IDs before deduping and persisting them.
isValidModel()is paired with normalization elsewhere because bare aliases are accepted inputs. Here Line 57 dedupes raw strings and writes them verbatim, so alias and canonical forms of the same model can survive as separateenabledModelsentries, anddefaultModel/defaultPlanModelcan be stored in non-canonical form. Normalize first, then dedupe and validate membership against the normalized set so this row stays canonical for every caller.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/db/model-preferences.ts` around lines 56 - 107, In setPreferences, normalize model IDs before deduping, validating, and persisting: map prefs.enabledModels through the canonicalizer (e.g., normalizeModel(id) or the existing normalization helper used elsewhere), then build the unique list from those normalized IDs, validate via isValidModel against the normalized values, and create enabledSet from the normalized unique list; also normalize prefs.defaultModel and prefs.defaultPlanModel (and validate they exist in the normalized enabledSet) and persist the JSON of the normalized unique list and the normalized default values so stored rows always contain canonical model IDs.
♻️ Duplicate comments (1)
packages/web/src/components/settings/models-settings.tsx (1)
45-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep the
setEnabledModelsupdaters pure and derivedirtyfrom committed state.
changedandsetToggleError(...)are both mutated from inside the updater callback, thenchangedis read aftersetEnabledModels(...)returns. In React this callback can run more than once in development, so these handlers can mark the form dirty even when they returnprev(for example, blocking the current default or refusing to remove the last enabled model). Computenextfrom the current render state before callingsetEnabledModels, or compareprev/nextin a pure helper and only callsetDirty(true)when a real change is committed.In React 19, can a useState updater function be invoked multiple times in development/Strict Mode, and should it avoid side effects like mutating outer variables or calling other state setters?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/components/settings/models-settings.tsx` around lines 45 - 100, The updater callbacks in toggleModel and toggleCategory are impure: they mutate outer var changed and call setToggleError inside the setEnabledModels updater; make them pure by computing the new Set (next) and any blockedDefault/changed result outside the state updater (or in a pure helper that takes prev and returns {next, blockedDefault, changed}), then call setEnabledModels with the resulting next only once and call setToggleError and setDirty after setEnabledModels based on the computed blockedDefault/changed; in short, stop mutating changed or invoking setToggleError inside the setEnabledModels updater in functions toggleModel and toggleCategory and instead derive and apply side effects from the committed next state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/control-plane/src/session/services/plan.service.ts`:
- Around line 175-179: approvePlanAndFlush currently calls approvePlan() which
persists the "approved" state before calling the non-idempotent
this.deps.onDispatchImplementationPrompt(result.plan.version), so a thrown error
leaves the session committed but dispatch never queued; change the flow to
ensure the dispatch is durably retried or made idempotent: either perform the
dispatch enqueue inside the same DB transaction/update as approvePlan (e.g.,
extend approvePlan to set an extra "dispatch_pending" flag or write the dispatch
job record atomically), or have approvePlan return a transactional token and
only persist final "approved" after successful enqueue; update approvePlan,
approvePlanAndFlush and the onDispatchImplementationPrompt usage to use this
atomic enqueue/flag approach so retries won’t leave the session in an
invalid_status.
In `@packages/control-plane/test/integration/plan-approval.test.ts`:
- Around line 1-3: This test file is missing D1 cleanup; add a beforeEach that
calls cleanD1Tables() to ensure D1 state is reset between tests. Import or
reference the existing cleanD1Tables helper and add beforeEach(async () => {
await cleanD1Tables(); }) at the top-level of the suite (next to
initSession/queryDO usage) so tests using defineWorkersConfig with
isolatedStorage: false don't leak state; ensure the call runs before any
initSession or test logic.
---
Outside diff comments:
In `@packages/control-plane/src/db/model-preferences.ts`:
- Around line 56-107: In setPreferences, normalize model IDs before deduping,
validating, and persisting: map prefs.enabledModels through the canonicalizer
(e.g., normalizeModel(id) or the existing normalization helper used elsewhere),
then build the unique list from those normalized IDs, validate via isValidModel
against the normalized values, and create enabledSet from the normalized unique
list; also normalize prefs.defaultModel and prefs.defaultPlanModel (and validate
they exist in the normalized enabledSet) and persist the JSON of the normalized
unique list and the normalized default values so stored rows always contain
canonical model IDs.
---
Duplicate comments:
In `@packages/web/src/components/settings/models-settings.tsx`:
- Around line 45-100: The updater callbacks in toggleModel and toggleCategory
are impure: they mutate outer var changed and call setToggleError inside the
setEnabledModels updater; make them pure by computing the new Set (next) and any
blockedDefault/changed result outside the state updater (or in a pure helper
that takes prev and returns {next, blockedDefault, changed}), then call
setEnabledModels with the resulting next only once and call setToggleError and
setDirty after setEnabledModels based on the computed blockedDefault/changed; in
short, stop mutating changed or invoking setToggleError inside the
setEnabledModels updater in functions toggleModel and toggleCategory and instead
derive and apply side effects from the committed next state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5bde1f79-5b3b-4b0f-9122-08e449f47b0f
📒 Files selected for processing (21)
docs/GETTING_STARTED.mdpackages/control-plane/src/db/model-preferences.tspackages/control-plane/src/routes/model-preferences.test.tspackages/control-plane/src/routes/model-preferences.tspackages/control-plane/src/session/durable-object.tspackages/control-plane/src/session/services/plan.service.test.tspackages/control-plane/src/session/services/plan.service.tspackages/control-plane/test/integration/cleanup.tspackages/control-plane/test/integration/helpers.tspackages/control-plane/test/integration/model-preferences.test.tspackages/control-plane/test/integration/plan-approval.test.tspackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/shared/src/types/index.tspackages/web/src/components/settings/models-settings.tsxpackages/web/src/hooks/use-enabled-models.tsterraform/d1/migrations/0021_add_default_models_to_model_preferences.sqlterraform/environments/production/workers-control-plane.tfterraform/environments/production/workers-github.tfterraform/environments/production/workers-linear.tfterraform/environments/production/workers-slack.tf
| async approvePlanAndFlush(request: ApprovalRequest = {}): Promise<ApprovalResult> { | ||
| const result = this.approvePlan(request); | ||
| if (this.deps.onDispatchImplementationPrompt) { | ||
| await this.deps.onDispatchImplementationPrompt(result.plan.version); | ||
| } |
There was a problem hiding this comment.
Avoid committing approval before a non-idempotent dispatch step.
approvePlan() persists "approved" before onDispatchImplementationPrompt(...) runs. If that callback throws, this method returns an error after the state change is already committed; a retry then fails with "invalid_status" and the implementation prompt may never be queued. This needs a durable retry/idempotency path so approved sessions cannot get stranded between the DB write and the follow-up dispatch.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/control-plane/src/session/services/plan.service.ts` around lines 175
- 179, approvePlanAndFlush currently calls approvePlan() which persists the
"approved" state before calling the non-idempotent
this.deps.onDispatchImplementationPrompt(result.plan.version), so a thrown error
leaves the session committed but dispatch never queued; change the flow to
ensure the dispatch is durably retried or made idempotent: either perform the
dispatch enqueue inside the same DB transaction/update as approvePlan (e.g.,
extend approvePlan to set an extra "dispatch_pending" flag or write the dispatch
job record atomically), or have approvePlan return a transactional token and
only persist final "approved" after successful enqueue; update approvePlan,
approvePlanAndFlush and the onDispatchImplementationPrompt usage to use this
atomic enqueue/flag approach so retries won’t leave the session in an
invalid_status.
There was a problem hiding this comment.
Considered but declining the structural change. Rationale:
onDispatchImplementationPromptresolves tomessageService.enqueuePrompt(...), which is a sequence of SQL writes on the DO's local SQLite. Under the Cloudflare DO model (single-threaded V8 isolate, transactional storage) the exception path is extremely narrow — a storage-layer failure here would also fail theapprovePlanwrite that preceded it.- The failure mode is recoverable without operator intervention: the gate is now
"approved"so any subsequent user prompt to the session (web typing, Slack follow-up, etc.) dispatches normally. The only loss is the auto-fired"Implement the plan vN"prompt — the user can type "go" and continue. - The proposed atomic alternatives (single-transaction approve+enqueue, or two-phase commit with a dispatch_pending flag) cross the plan-service ↔ message-service boundary and add real complexity. Not justified for a near-zero-probability, easily-recoverable failure.
If we ever start seeing dispatch failures in logs (none observed in onboardiq production since #65 merged), we'd revisit with an idempotent re-dispatch path keyed off plan_approval_status === "approved" + absence of source === "system" message for the current plan version — that's a smaller delta than a full transaction.
Note: this comment surfaced on #672 but the code lives in #671 (the foundation diff). All upstream PRs target base=main, so #672's cumulative diff includes foundation's changes — that's why CodeRabbit flagged it here when the foundation fix was force-pushed during the rebase.
There was a problem hiding this comment.
@bleleve, understood — the rationale holds up well. The shared SQLite failure domain under the Cloudflare DO model does meaningfully narrow the gap between the two writes, and the graceful degradation path (session left in "approved", next user prompt dispatches normally) is a reasonable trade-off against crossing the plan-service ↔ message-service boundary. The future recovery heuristic (plan_approval_status === "approved" + no source === "system" message for the current plan version) is also a clean signal if a retry path ever becomes necessary.
Thanks for the context on the #671/#672 diff layering — noted.
(´・ω・`)
✏️ Learnings added
Learnt from: bleleve
Repo: ColeMurray/background-agents PR: 672
File: packages/control-plane/src/session/services/plan.service.ts:175-179
Timestamp: 2026-05-22T23:47:38.393Z
Learning: In `packages/control-plane/src/session/services/plan.service.ts`, the `approvePlanAndFlush` method intentionally persists `"approved"` status before calling `onDispatchImplementationPrompt`. This is a deliberate design decision: under the Cloudflare Durable Object model (single-threaded V8 isolate, transactional local SQLite), a storage failure during `onDispatchImplementationPrompt` (which resolves to `messageService.enqueuePrompt` — also a sequence of SQLite writes) would also fail the preceding `approvePlan` write, making the exception path extremely narrow. The failure is considered recoverable without operator intervention: with status `"approved"`, any subsequent user message dispatches normally; only the auto-fired synthetic implementation prompt is lost. The proposed atomic alternatives (dispatch_pending flag, two-phase commit across plan-service and message-service) were explicitly rejected as adding unjustified complexity. A future idempotent re-dispatch path keyed off `plan_approval_status === "approved"` + absence of a `source === "system"` message for the current plan version was noted as the preferred remediation if dispatch failures are ever observed in production.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
…D1Tables Addresses CodeRabbit review on ColeMurray#672. Control-plane integration tests run under isolatedStorage=false (workers-sdk SQLite WAL cleanup bug), so D1 state can leak across files when each suite forgets to clean. Match the pattern used by other integration suites and reset D1 in beforeEach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the existing `model_preferences` table with `default_model` and
`default_plan_model` columns. Surfaces them via a new section in
Settings → Models so the deployment-wide defaults are configurable from
the UI instead of being scattered across Terraform env vars on each
worker.
Why: with plan-mode (preceding PR in stack), every deployment now
needs two default models — one for build turns, one for plan turns.
The control-plane is the natural source of truth (already serves
`enabledModels` via `/model-preferences`). The bots and the web UI
read from this single place instead of each worker carrying its own
`DEFAULT_MODEL` env var that must be kept in sync.
Changes:
- D1 migration `0021_add_default_models_to_model_preferences.sql`
adds two nullable columns via `ALTER TABLE ADD COLUMN`. Existing
deployments keep working because the read path falls back to the
env var, then to the shared library constant.
- `db/model-preferences.ts` extended to read/write the new fields
atomically across the three-field tuple.
- `GET /model-preferences` now returns `{ enabledModels, defaultModel,
defaultPlanModel }`; `PUT` validates `defaultModel ∈ enabledModels`.
- Web Settings → Models gets a new "Default Models" section with two
combobox pickers. Disabling a model that's the current default is
blocked inline.
- Terraform: production workers gain `DEFAULT_MODEL` (control-plane,
which didn't have one) and `DEFAULT_PLAN_MODEL` (all four). These
remain fallbacks — the DB value wins when set.
- `docs/GETTING_STARTED.md` gets a "Configure Default Models" subsection.
Verification: `npm run typecheck`, `npm run lint`, `npm test`
(shared 183/183, control-plane 1162/1162, web 259/259) — all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GET /model-preferences CodeRabbit follow-up on ColeMurray#672 (only remaining unresolved thread on the PR after the previous fixes auto-resolved): when no DB preferences exist, the fallback chain could return defaultModel / defaultPlanModel sourced from env vars or shared constants that weren't members of the returned enabledModels set. That broke the same invariant enforced by setPreferences() — a subsequent PUT with the unchanged returned tuple would fail validation, leaving the Settings page unable to re-save. New reconcileDefaultsWithEnabled() helper substitutes the first enabled model for any default that isn't in enabledModels (the same fallback the web Settings UI applies when a stored default becomes disabled). Applied at all three GET sites: no-DB-binding fallback, happy path, and store-throws fallback. New regression test asserts that env-var defaults outside the enabled set are reconciled. The existing test capturing the old (broken) behavior was rewritten to use enabledModels that contain the env-var values — the more realistic configuration where both are aligned. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green, 1 new regression case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on ColeMurray#672: env vars like `DEFAULT_MODEL` / `DEFAULT_PLAN_MODEL` are sometimes configured with bare model names (e.g. `claude-sonnet-4-6`) rather than the fully-qualified form (`anthropic/claude-sonnet-4-6`). The CP `/model-preferences` endpoint already returns normalized IDs, but the env-var fallback path previously surfaced the raw env value verbatim — so during a CP outage callers could see a mix of qualified and bare IDs depending on how the operator configured the worker. New `normalizeFallback()` helper passes the candidate through `isValidModel` + `normalizeModelId`. Invalid candidates return undefined so the chain falls through to the next link (env → shared constant); valid bare or qualified IDs get normalized to the qualified form before being returned. Three existing tests rewritten to assert the normalized output (they previously captured the bug). Two new regression cases: - Bare env vars get normalized. - Invalid env vars are ignored and the chain falls through to the shared constant. Verification: npm run build -w @open-inspect/shared, npm test -w @open-inspect/shared (16/16 files green) && npm test -w @open-inspect/control-plane (66/66 files green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a72780b to
861a58b
Compare
Summary
Extends the existing
model_preferencestable withdefault_modelanddefault_plan_modelcolumns. Surfaces them via a new section in Settings → Models so the deployment-wide defaults are configurable from the UI instead of being scattered across Terraform env vars on each worker.Why: with plan-mode (PR #671), every deployment now needs two default models — one for build turns, one for plan turns. The control-plane is the natural source of truth (already serves
enabledModelsvia/model-preferences). The bots and the web UI read from this single place instead of each worker carrying its ownDEFAULT_MODELenv var that must be kept in sync.What's in this PR
0021_add_default_models_to_model_preferences.sqladds two nullable columns viaALTER TABLE ADD COLUMN. Existing deployments keep working because the read path falls back to the env var, then to the shared library constant.db/model-preferences.tsextended to read/write the new fields atomically across the three-field tuple.GET /model-preferencesnow returns{ enabledModels, defaultModel, defaultPlanModel };PUTvalidatesdefaultModel ∈ enabledModels.DEFAULT_MODEL(control-plane, which didn't have one) andDEFAULT_PLAN_MODEL(all four). These remain fallbacks — the DB value wins when set.docs/GETTING_STARTED.mdgets a "Configure Default Models" subsection.Stack
Test plan
npm run typecheck(workspace) — greennpm run lint— greennpm test -w @open-inspect/shared— 183/183npm test -w @open-inspect/control-plane— 1162/1162 (incl. 14 new model-preferences tests)npm test -w @open-inspect/web— 259/259Migration safety
The
ALTER TABLE ADD COLUMN ... DEFAULT NULLmigration is purely additive. Code readsdefaultModel/defaultPlanModelwith fallback chainDB > env var > shared constant, so un-migrated instances keep working transparently.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests