feat(linear-bot): plan-mode label triggers and approve/reject commands#674
Draft
bleleve wants to merge 13 commits into
Draft
feat(linear-bot): plan-mode label triggers and approve/reject commands#674bleleve wants to merge 13 commits into
bleleve wants to merge 13 commits into
Conversation
…amble)
Adds the backend for plan-mode: a human-in-the-loop planning gate where
the agent persists a markdown plan that must be approved before any
code-changing turn runs. This PR is headless — no UI or bot triggers
yet (those follow as separate PRs in the stack).
Changes:
- New `plans` table in the SessionDO SQLite (monotonic versions per
session); `PlanService` + `plans.handler` covering save / get / list
/ approve / reject.
- New endpoints under `/sessions/:id/plan{,/approve,/reject}` and
`/sessions/:id/plans` (list).
- `SessionMessageQueue` reads `getCurrentPlan()` and attaches
`resumeContext.currentPlan` to dispatched `PromptCommand`s. The
planning-turn gate is terminal-status aware (approved/rejected exits
plan mode without flipping `plan_mode` so the history bubble stays
visible).
- Sandbox `bridge.py` builds a restate-and-confirm preamble from
`resumeContext.currentPlan` and prepends it to the user content
before forwarding to OpenCode.
- New `wrapUntrustedContent` helper in `@open-inspect/shared` wraps
plan + user content in XML tags from the sandbox runtime (security:
isolates plan-as-context from prompt-injection inside user messages).
- Shared `model-defaults.ts` ships the `fetchModelDefaults` helper —
used by bots in subsequent PRs to source the default model + default
plan model from a single place (the control plane). Falls back to
env vars + shared constants until the `/model-preferences` endpoint
is extended (next PR in the stack).
- `MODEL_ALIAS_MAP` and `DEFAULT_PLAN_MODEL` added to shared models.
Tests:
- Unit tests for `PlanService`, `plans.handler`, plan-mode behavior in
`MessageQueue`, plan persistence in the repository.
- Sandbox-runtime tests for `bridge.py` preamble + XML wrapping.
- All test fixtures updated to include the new required `plan_*`
fields on `SessionRow`.
Verification: `npm run typecheck` (workspace), `npm run lint`, `npm test`
(shared 183/183, control-plane 1148/1148), `pytest` (sandbox-runtime
plan-related tests 41/41) — all green on this branch based on upstream/main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…eMurray#671 Must-fix: - bridge.py: escape XML special chars in plan content before interpolating into <saved_plan> and <previous_plan> wrappers (1.1) — prevents a hostile `</saved_plan>` in the plan body from breaking out of the wrapper and injecting instructions outside it. - bridge.py: replace the cumulative token buffer's append with overwrite semantics (1.2) — OpenCode token events carry the FULL accumulated text since the response start (not incremental deltas), so appending produced corrupted plan bodies with compounded prefixes at end-of-turn. - plans.handler.ts: readApprovalBody now throws InvalidApprovalBodyError on JSON parse failure instead of silently coercing the body to {} (1.3) — errorResponseForApproval maps it to HTTP 400 so malformed client requests surface instead of leaking through with partial parameters. Should-fix: - plan.service.ts: dedup guard now requires non-null messageId (1.4) — null is the "no message context" marker, so two identical-body events with null messageId are legitimately distinct saves and must each bump the version. - session/types.ts: import PlanSource from @open-inspect/shared (1.5) instead of redeclaring it locally — prevents contract drift between the control plane row type and the shared API surface. - durable-object.ts: extract DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS module constant (1.6); also cleans up a stray /** doc-comment opener left over from an earlier opencode-config strip. - bridge.py: extract PLAN_SAVE_TIMEOUT_SECONDS module constant (1.7). - prompt-safety.ts: <user_content> tag neutralization is now case- insensitive and whitespace-tolerant (1.8) — catches `<USER_CONTENT>`, `< user_content >`, `</ user_content >`, attribute-bearing tags, etc. Nitpicks: - session-lifecycle.handler.ts: remove redundant getValidModelOrDefault after validation; add log.warn when planMode=true but planModel is invalid and we fall back to DEFAULT_PLAN_MODEL (1.9). - router.ts: type forwardPlanApproval's internalPath parameter with SessionInternalPath instead of bypassing via `as never` (1.10). - control-plane/src/types.ts: reference DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS in the SANDBOX_INACTIVITY_TIMEOUT_MS env-var comment (1.11). - docs/PLAN_MODE.md: add `text` language tag to architecture diagram fence to satisfy MD040 (1.12). - packages/control-plane/README.md: clarify Plan Mode behavior — prompts continue to queue and dispatch as *planning turns* until approval/rejection/amendment (not "the message queue is gated") (1.13). Test coverage gaps: - repository.test.ts: new case for upsertSession({ planMode: true, planModel }) asserting plan_mode=1 and plan_model are persisted (1.14). - plan.service.test.ts: new cases for null-messageId dedup (skipped guard) and MAX_PLAN_CONTENT_BYTES boundary (accept-at-limit, throw- over-limit) (1.15). Targeted regression tests: - test_bridge_resume_context.py: assertions that resume + planning preambles escape XML specials in the plan body and that token-buffer overwrite semantics produce a single non-duplicated snapshot. - plans.handler.test.ts: assertions that malformed-JSON bodies on approve/reject return HTTP 400 with code "invalid_body". Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16 files, control-plane 65 files all green) && pytest (41 + 4 new regression cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#671 Three items flagged on the first fix commit (a2cad10): 🟠 Major (plans.handler.ts) — Don't clear reasoning effort when no model override is provided. Previously the approve handler validated the user's implementationReasoningEffort against `""` when no model override was set; validateReasoningEffort returned null, which then propagated to approvePlanAndFlush. The service treats null as an explicit "clear", so an approval request that sent reasoning effort without a model silently wiped the session's persisted reasoning_effort. New semantics: - undefined (omitted) → no change - explicit null → forwarded as null (intentional clear) - string + model → validated against the model - string, no model → return 400 with code "invalid_reasoning_effort" Two new regression cases in plans.handler.test.ts. ⚡ Quick win (durable-object.ts) — NaN-guard the inactivity timeout parse. parseInt(env_value, 10) returns NaN when the env var is set to a non-numeric string (e.g. "abc" or "5 minutes"); the lifecycle manager would then schedule alarms with NaN ms, which is undefined behavior. New parseSandboxInactivityTimeoutMs() helper checks Number.isFinite and value > 0 before accepting the env value, falling back to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS otherwise. ⚡ Style (durable-object.ts) — Drop the "// 15 minutes" trailing comment on DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS. Per repo guideline, don't restate literal timeout values in comments — the constant name is the source of truth. Verification: npm run typecheck, npm run lint, npm test -w @open-inspect/control-plane (65/65 files, all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b936e55 to
d483095
Compare
CodeRabbit follow-up review on ColeMurray#672 (the only thread that didn't auto- resolve after the previous fix pushes). The issue technically lives in ColeMurray#671's plans.handler.ts but CodeRabbit only flagged it now after seeing the post-fix state. `JSON.parse("null")` returns null, `JSON.parse("[1,2]")` returns an array, `JSON.parse("42")` returns a number. All three are syntactically valid JSON but none of them is the object payload approvePlan/rejectPlan expect. The previous code would parse successfully then crash later when dereferencing `body.implementationModel` on `null` (TypeError) or silently accept arrays and primitives as if they were valid bodies. Reject these early with HTTP 400 (`code: "invalid_body"`) via the existing InvalidApprovalBodyError path. Two new regression cases cover JSON-null and JSON-array bodies. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (65/65 files, 2 new test cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d483095 to
858a1ec
Compare
…leMurray#671 Two new items flagged after the previous fix pushes: 🟠 plans.handler.ts — Reject invalid implementationReasoningEffort. validateReasoningEffort() returns null for unsupported values, but null is also our explicit "clear the persisted effort" sentinel. A request with a typo (e.g. "hgih" instead of "high") was previously accepted and silently cleared the session's reasoning_effort. Now we distinguish the two cases: explicit-null body stays null (intentional clear), but a string that fails validation returns 400 with code "invalid_reasoning_effort" instead of being coerced. 🟡 bridge.py — Use quoteattr for the version XML attribute value. xml_escape() does NOT escape `"`, so a version string containing a quote could break the attribute boundary of <saved_plan version=...> and <previous_plan version=...>. Switching to xml.sax.saxutils. quoteattr (which handles the surrounding quotes itself) closes that edge case. Body content keeps xml_escape since it's not in an attribute context. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green) && pytest tests/test_bridge_resume_context.py (19 cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3389a2e to
8606b2c
Compare
When a plan is approved, the control-plane now enqueues a synthetic "Implement the approved plan vN..." prompt server-side and flushes the queue, so every client (web + Slack/Linear/GitHub bots) starts the implementation turn through the same code path. Previously only the web client did this (via a follow-up WebSocket prompt after the approve call). Approvals coming from bots stalled at "Execution complete" right after the plan was generated — Build cost stayed at $0 — because nothing was kicking off the build turn. The synthetic prompt is authored by a stable per-session "system" participant (created lazily on first dispatch) and uses the new "system" MessageSource so the UI and event log can distinguish it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
enqueuePromptFromApi already flushes the message queue at the end of its own path, so the implementation-prompt enqueue done in onDispatchImplementationPrompt picks up both the synthetic prompt and any user messages that piled up during awaiting_approval. The separate onPlanApproved callback that fired a second processMessageQueue was a no-op (queue already busy) and the comment claiming it flushed the synthetic prompt was misleading. Remove the dep and the wiring. Addresses reef review on ColeMurray#65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8606b2c to
04369f2
Compare
5 tasks
…D1Tables Addresses CodeRabbit review on ColeMurray#672. Control-plane integration tests run under isolatedStorage=false (workers-sdk SQLite WAL cleanup bug), so D1 state can leak across files when each suite forgets to clean. Match the pattern used by other integration suites and reset D1 in beforeEach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the existing `model_preferences` table with `default_model` and
`default_plan_model` columns. Surfaces them via a new section in
Settings → Models so the deployment-wide defaults are configurable from
the UI instead of being scattered across Terraform env vars on each
worker.
Why: with plan-mode (preceding PR in stack), every deployment now
needs two default models — one for build turns, one for plan turns.
The control-plane is the natural source of truth (already serves
`enabledModels` via `/model-preferences`). The bots and the web UI
read from this single place instead of each worker carrying its own
`DEFAULT_MODEL` env var that must be kept in sync.
Changes:
- D1 migration `0021_add_default_models_to_model_preferences.sql`
adds two nullable columns via `ALTER TABLE ADD COLUMN`. Existing
deployments keep working because the read path falls back to the
env var, then to the shared library constant.
- `db/model-preferences.ts` extended to read/write the new fields
atomically across the three-field tuple.
- `GET /model-preferences` now returns `{ enabledModels, defaultModel,
defaultPlanModel }`; `PUT` validates `defaultModel ∈ enabledModels`.
- Web Settings → Models gets a new "Default Models" section with two
combobox pickers. Disabling a model that's the current default is
blocked inline.
- Terraform: production workers gain `DEFAULT_MODEL` (control-plane,
which didn't have one) and `DEFAULT_PLAN_MODEL` (all four). These
remain fallbacks — the DB value wins when set.
- `docs/GETTING_STARTED.md` gets a "Configure Default Models" subsection.
Verification: `npm run typecheck`, `npm run lint`, `npm test`
(shared 183/183, control-plane 1162/1162, web 259/259) — all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Must-fix: - model-defaults.ts: per-field fallback chain (2.1) — defaultModel and defaultPlanModel now resolve independently as `DB > env > shared constant`. The previous all-or-nothing gate required BOTH fields in the CP response; when only one was present, both fields fell back together to env/constants, discarding the partial DB value. Should-fix: - models-settings.tsx: setDirty only fires on an actual state change (2.2). Previously toggling a model off would set dirty even when the toggle was blocked by the default-model guard, enabling Save on a no-op. The fix tracks a `changed` flag inside the updater and only calls setDirty when the underlying Set actually mutated. - models-settings.tsx: SWR data sync moved from render phase into useEffect (2.3). The previous in-render setState would surface a React warning under StrictMode and risked an unnecessary re-render. - terraform/environments/production/workers-{linear,slack}.tf: align DEFAULT_PLAN_MODEL to "anthropic/claude-opus-4-6" (2.4) to match the format used by workers-control-plane.tf and workers-github.tf. The bare "claude-opus-4-6" string was not a fully-qualified model ID and would be rejected by isValidModel. Targeted regression test: - model-defaults.test.ts: two new cases verifying that when the CP response contains only defaultModel (resp. defaultPlanModel), the other field independently falls back to env without the present field being discarded. Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16/16, control-plane 66/66, web 32/32 — all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… GET /model-preferences CodeRabbit follow-up on ColeMurray#672 (only remaining unresolved thread on the PR after the previous fixes auto-resolved): when no DB preferences exist, the fallback chain could return defaultModel / defaultPlanModel sourced from env vars or shared constants that weren't members of the returned enabledModels set. That broke the same invariant enforced by setPreferences() — a subsequent PUT with the unchanged returned tuple would fail validation, leaving the Settings page unable to re-save. New reconcileDefaultsWithEnabled() helper substitutes the first enabled model for any default that isn't in enabledModels (the same fallback the web Settings UI applies when a stored default becomes disabled). Applied at all three GET sites: no-DB-binding fallback, happy path, and store-throws fallback. New regression test asserts that env-var defaults outside the enabled set are reconciled. The existing test capturing the old (broken) behavior was rewritten to use enabledModels that contain the env-var values — the more realistic configuration where both are aligned. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green, 1 new regression case). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up on ColeMurray#672: env vars like `DEFAULT_MODEL` / `DEFAULT_PLAN_MODEL` are sometimes configured with bare model names (e.g. `claude-sonnet-4-6`) rather than the fully-qualified form (`anthropic/claude-sonnet-4-6`). The CP `/model-preferences` endpoint already returns normalized IDs, but the env-var fallback path previously surfaced the raw env value verbatim — so during a CP outage callers could see a mix of qualified and bare IDs depending on how the operator configured the worker. New `normalizeFallback()` helper passes the candidate through `isValidModel` + `normalizeModelId`. Invalid candidates return undefined so the chain falls through to the next link (env → shared constant); valid bare or qualified IDs get normalized to the qualified form before being returned. Three existing tests rewritten to assert the normalized output (they previously captured the bug). Two new regression cases: - Bare env vars get normalized. - Invalid env vars are ignored and the chain falls through to the shared constant. Verification: npm run build -w @open-inspect/shared, npm test -w @open-inspect/shared (16/16 files green) && npm test -w @open-inspect/control-plane (66/66 files green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Linear gains the plan-mode workflow. An issue labeled `plan` (or
`plan-<alias>` to pick a plan model) opens a planning session; the
bot pushes an `elicitation` activity carrying the plan markdown and
a recap of the supported replies. `approve` / `reject` comments on
the issue route to the control plane's `/plan/{approve,reject}`
endpoints. The native Linear plan widget transitions to
`plan_awaiting_approval` so the agent's state is visible from the
issue view.
Label conventions (all dash-separated — Linear actually rejects `:`
in label names, so the legacy `model:opus` format documented upstream
cannot be created in Linear's UI):
- `plan` — trigger plan mode
- `plan-<alias>` — trigger plan mode with a specific plan model
- `model-<alias>` — pick the build model (no plan mode)
- `build-<alias>` — pick the build model on plan approval
- `review-<alias>` — pick the model for review automations
The alias → canonical model map (`MODEL_ALIAS_MAP`) lives in
`@open-inspect/shared` so the Linear and GitHub parsers stay in sync.
> ⚠ Breaking change: this PR adopts the dash-separated label format.
> Existing deployments that got `model:<name>` labels in via API
> workarounds will need to relabel their issues.
Other changes:
- `model-resolution.ts` parses the label set into a resolved decision
(`{ mode: "plan" | "build", model?, planModel? }`).
- `webhook-handler.ts` integrates the plan resolution into session
creation; on terminal status (approved/rejected), the elicitation
is replaced with the verdict.
- `callbacks.ts` handles the `/internal/plan-approved` and
`/internal/plan-rejected` callbacks from the control plane.
- `linear-client.ts` gains `normalizeLinearCommentBody` to safely
decode webhook comment bodies (markdown vs structured blocks).
- `escapeHtml` import switched from `./webhook-handler` to
`@open-inspect/shared` (moved in PR A).
Stack: A → B → { C | **D** | E | F → G }
Depends on: A (plan backend, shared types, fetchModelDefaults),
B (defaultModel / defaultPlanModel).
Parallel to: C, E, F.
Verification: `npm run typecheck`, `npm run lint`, `npm test`
(linear-bot, all green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
04369f2 to
8f81a6e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Linear gains the plan-mode workflow. An issue labeled `plan` (or `plan-` to pick a plan model) opens a planning session; the bot pushes an `elicitation` activity carrying the plan markdown and a recap of the supported replies. `approve` / `reject` comments on the issue route to the control plane's `/plan/{approve,reject}` endpoints. The native Linear plan widget transitions to `plan_awaiting_approval` so the agent's state is visible from the issue view.
What's in this PR
Stack
Test plan
🤖 Generated with Claude Code