Skip to content

🤖 feat(goals): per-goal auto-compaction threshold override#3357

Open
ammar-agent wants to merge 6 commits into
mainfrom
goals-compact-vsn1
Open

🤖 feat(goals): per-goal auto-compaction threshold override#3357
ammar-agent wants to merge 6 commits into
mainfrom
goals-compact-vsn1

Conversation

@ammar-agent
Copy link
Copy Markdown
Collaborator

@ammar-agent ammar-agent commented May 21, 2026

Summary

Adds an optional per-goal auto-compaction threshold override. Each goal can pin its own auto-compaction behavior (compact aggressively for cost, or disable for fidelity) independently of the workspace's per-model slider. Tri-state matching budgetCents / turnCap: omitted = no override (the workspace setting applies), null = explicit clear, integer 0–100 = explicit percent (100 = compaction disabled for this goal). Surfaced as an embedded slider tile in the GoalTab — Default / Customize / drag-to-Off — and as /goal --compact <value> on the slash command.

Background

The right-sidebar auto-compact slider is keyed per-model, not per-goal — see getAutoCompactionThresholdKey(model) in src/common/constants/storage.ts. That works for the steady-state default, but goal-driven workflows often have different cost-vs-fidelity preferences than the workspace default:

  • A long research goal might want aggressive compaction (40–50%) for cost control while the workspace is otherwise fine at the 70% default.
  • A short refactor or migration goal might want no compaction so the agent keeps the full plan in view (drag the slider all the way right → "Off").
  • Or a single goal might want a different threshold than the rest of that workspace's flow.

Defaults to off (the field is absent on legacy/new goal records → no override → existing behavior preserved), so users have to actively opt in to change anything.

Implementation

Data model (src/common/orpc/schemas/goal.ts, src/common/types/goal.ts)

  • New autoCompactionThresholdPct: z.number().int().min(0).max(100).nullable().optional() on GoalRecordV1Schema, GoalSnapshotSchema, GoalSetInputSchema, GoalBoardAddUpcomingInputSchema, GoalBoardUpdateUpcomingInputSchema. Optional + nullable so legacy records load without migration.
  • toGoalSnapshot carries the field through to the renderer-visible snapshot using the same Object.hasOwn-style tri-state preservation pattern as the existing completionSummary handling.

Backend service (src/node/services/workspaceGoalService.ts)

  • SetGoalInput and PendingGoalMutation get the field; createGoal, applyMutableFields, setGoalInternal (mid-stream + projected record), setGoalImmediately (replace branch), addUpcomingGoal, updateUpcomingGoal each thread it through.
  • applyMutableFields uses Object.hasOwn so null (explicit clear) survives the patch while undefined (no opinion) preserves the existing value — same pattern as budgetCents / turnCap.
  • The same-objective shortcut in setGoalImmediately adds autoCompactionThresholdPct to its hasMutableChange predicate so a threshold-only edit triggers the apply branch instead of being treated as a no-op.

Compaction consumer (src/node/services/compactionMonitor.ts, src/node/services/agentSession.ts)

  • CompactionMonitor.checkBeforeSend / checkMidStream accept an optional thresholdOverride (decimal 0–1). When finite, it replaces this.threshold for the single call — the monitor's persistent state is untouched, so the override layer is per-call and never leaks across goals.
  • Override >= 1 clamps to compaction-disabled for that call (matches the slider's "off" sentinel).
  • Override 0 is honored as the aggressive extreme (Codex P2 fix — see the dedicated "override of 0% is honored" test). Only null / undefined / NaN / Infinity / negative values fall back to this.threshold.
  • AgentSession.resolveGoalAutoCompactionThresholdOverride() fetches the active goal via the existing injected workspaceGoalService and converts the persisted percent → decimal. Called from both compaction call sites (on-send and mid-stream usage-delta handler). Errors are swallowed silently because the resolver runs on the hot path of every send; a corrupt persisted goal record must never interrupt streaming.

Renderer banner (src/browser/components/ChatPane/ChatPane.tsx)

  • effectiveAutoCompactionThreshold consults workspaceState.goal?.autoCompactionThresholdPct first, falling back to the per-model slider value, so the warning banner stays in lockstep with what the backend will actually do.

UI — GoalTab slider tile (src/browser/features/RightSidebar/GoalTab.tsx)

Replaces the previous text-input editor with an embedded slider tile that has two visual modes:

  • Default mode (value == null) — workspace per-model slider governs. Tile renders "Default — workspace setting" plus a Customize button. Slider is hidden because we don't know the workspace per-model threshold at this layer; a ghost thumb would imply a value that isn't really in effect.
  • Override mode (value is a number) — native range slider with thumb at the override value. Drag commits on release (onMouseUp / onTouchEnd / onKeyUp), not on every drag step, so backend mutation rate stays bounded by user gestures. The right end is labeled "Off" mirroring the workspace per-model slider; dragging there commits 100 (per-goal disabled). A Use default button clears back to null.

Customize seeds the override at DEFAULT_AUTO_COMPACTION_THRESHOLD_PERCENT (70) so the slider has a concrete value to drag from. Cost of an accidental Customize is one click on Use default.

The slider uses onInput + onChange together: happy-dom (the tests/ui runtime) only synthesizes the native input event for range inputs, so onInput is required for deterministic tests, and onChange satisfies React's controlled-input contract. The two handlers route to the same idempotent draft updater.

UI — responsive grid layout (src/browser/features/RightSidebar/GoalTab.tsx)

The active-goal stat grid is now container-query-driven:

  • Narrow (default, grid-cols-2): Budget col-span-2 (full row); Turns + Elapsed half-half; Auto-compact col-span-2 (full row, slider has room to breathe).
  • Wider (@md:grid-cols-3, triggered when the dynamically-resizable right sidebar passes the @md container width): Budget + Turns + Auto-compact share row 1 (each col-span-1); Elapsed becomes a thin full-width row below. BudgetTile's internal copy stacks vertically at the wider breakpoint so its cost / remaining pair still fits in ~1/3 width.

Container queries ([container-type:inline-size] + @md:) are used instead of viewport breakpoints because the right sidebar is dynamically resizable per AGENTS.md, so sm: / md: wouldn't have responded to sidebar resizing.

UI — create form (src/browser/features/RightSidebar/GoalTab.tsx)

  • GoalCreateForm reuses the same slider tile with local state instead of the previous text input + parser. null (Default mode) means the create intent omits the field entirely; a number is sent verbatim.

UI — slash command + handler chain

  • /goal --compact <value> parses with the same vocabulary as the slider's Off/Default sentinels; composes with -b and --turns in the documented order; non-leading occurrences stay as goal text. The parser flows through resolveGoalSetIntent (verbatim pass-through, since there's no workspace-level default for this field yet) into the existing setGoal IPC.
  • RightSidebar.handleGoalUpdateAutoCompactionThresholdPct mirrors handleGoalUpdateBudget / handleGoalUpdateTurnCap, wiring through setGoalWithSingleConflictRetry so the optimistic-concurrency retry loop covers all three.
  • Tab registry context picks up onUpdateAutoCompactionThresholdPct so the prop is typed end-to-end.

Validation

  • make typecheck
  • make lint
  • make fmt-check
  • make storybook-build
  • bun test src/node/services/compactionMonitor.test.ts13 pass (5 override-lane tests + 1 dedicated Codex P2 test pinning 0% as honored)
  • bun test src/node/services/workspaceGoalService.test.ts129 pass (6 threshold-pinning tests including legacy-record back-compat)
  • bun test src/browser/utils/slashCommands/goal.test.ts22 pass (7 --compact parser tests)
  • bun test src/browser/features/RightSidebar/GoalTab.test.tsx32 pass (9 slider-tile tests: Default/Off/numeric display; Customize commits 70; drag commits snapped value on mouseUp; drag-to-100 commits 100; mouseUp without change does NOT commit; Use default clears to null; canEdit gating)
  • Broader bun test src/node/services/ src/browser/features/RightSidebar/ src/browser/utils/slashCommands/ src/browser/utils/goals/3554 pass / 0 fail

Risks

Low–medium, scoped to the auto-compaction subsystem.

  • Legacy goal records without the field continue to behave exactly as before — the field is .optional() + .nullable() and applyMutableFields only touches it via Object.hasOwn. A dedicated test (workspaceGoalService.test.ts) writes a pre-field record to disk and asserts getGoal returns it parsed with the field undefined.
  • Mid-stream + on-send compaction paths are on the hot path for every send + every usage delta. The override resolver swallows errors silently so a corrupt persisted goal record never interrupts streaming; the monitor's resolveEffectiveThreshold does the same for malformed values it receives.
  • Renderer / backend agreement on the effective threshold is pinned by both consulting the same goal snapshot field, so the warning banner and the actual compaction trigger stay in sync. The Codex P2 fix (honor 0% as the aggressive extreme rather than silently falling back) closed the last gap here.
  • Slider drag → backend is gated to the release event, not the per-pixel drag step, so the backend mutation rate is bounded by user gestures (mouseUp / touchEnd / keyUp) and the draft !== value guard inside commit() further suppresses no-op releases.

Follow-ups (deferred)

  • Workspace-level default for the per-goal override (would let users say "all my goals in this workspace start with aggressive compaction").
  • Upcoming-row inline editor for the threshold (currently editable via updateUpcomingGoal API but not surfaced in the UI's row editor; can be set via --compact on the slash command or after promotion).

Pains

  • happy-dom's range-input event synthesis differs from real browsers — it fires the native input event but does NOT fire the change event when fireEvent.input is dispatched, so a slider driven only by React's onChange is read-only in tests. Took a debug-test pass to isolate (initially looked like a stale-closure / React state-flush issue). Resolved by routing drag updates through both onInput and onChange; the duplicate handler is documented inline.
  • Two iterations on layout: the first pass kept Budget full-width at all widths, but per advisor guidance switched to a 3-col layout at the @md container width so Budget / Turns / Auto-compact line up on one row exactly as requested. BudgetTile's internal copy had to gain a vertical stacking variant for the narrower tile size.

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: xhigh • Cost: $53.97

Adds an optional `autoCompactionThresholdPct` field to the goal record so
each goal can pin its own auto-compaction behavior independently of the
workspace's per-model slider. Lives as a tri-state mirroring `budgetCents`
/ `turnCap`: omitted = no override (model-level slider applies), `null`
= explicit clear, 0–100 integer = explicit percent (`100` = compaction
disabled for this goal).

Why: in a goal-driven flow the user may prioritize cost control
(aggressive compaction) over fidelity, or vice-versa, on a per-goal
basis without retuning the global model slider. Backward-compatible by
construction — legacy records and goals that never set the field
continue to behave exactly as before.

This commit lands the data model + behavior. Tests follow in a separate
commit to keep the diff reviewable.
Pins the data model + behavior added in the previous commit:

* CompactionMonitor: per-call `thresholdOverride` wins over internal
  threshold for both `checkBeforeSend` and `checkMidStream`; override of
  `1.0` disables for that call; null/NaN/non-finite/non-positive falls
  back to the monitor's own threshold (defensive against corrupt
  persisted goal records).

* WorkspaceGoalService: `setGoal` persists the field, `applyMutableFields`
  edit-in-place can clear it (null) without dropping unrelated state,
  100 round-trips distinct from null (per-goal disable vs. no override),
  `addUpcomingGoal` / `updateUpcomingGoal` honor the field, and legacy
  goal records without the field still load.

* /goal slash command: integer / off / disable / default / none / clear
  / 75%, composes with -b and --turns in documented order, rejects
  out-of-range values, treats non-leading occurrences as goal text.

* GoalTab UI: tile renders Default / Off / N% across the tri-state,
  inline editor submits the percent, "off" → 100, blank → null,
  out-of-range surfaces an error, Edit affordance is gated on the
  handler being supplied.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Per-goal auto-compaction threshold override. Tri-state mirroring budgetCents/turnCap: omitted = no override, null = clear, integer 0–100 = explicit percent (100 = disabled for that goal). Threads through schemas, WorkspaceGoalService mutators, CompactionMonitor (new per-call thresholdOverride lane), AgentSession on-send + mid-stream call sites, ChatPane warning banner, GoalTab UI tile + inline editor + GoalCreateForm, /goal --compact slash command, and the RightSidebar handler chain.

Particular attention welcome on:

  1. Override resolution in CompactionMonitor.resolveEffectiveThreshold — the defensive fallback for malformed values (null / NaN / non-positive / non-finite). Compaction is on the hot path; thrown errors there would interrupt streams.
  2. Persistence tri-state in applyMutableFields and the upcoming-goal mutators — Object.hasOwn so null survives a patch while omitted preserves the existing value. Same pattern as budgetCents/turnCap.
  3. Legacy record loading — schema is .optional().nullable(). There's a dedicated back-compat test that writes a pre-field goal.json and asserts it parses.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

await setGoalWithSingleConflictRetry({
objective: resolved.objective,
budgetCents: resolved.budgetCents,
...(resolved.turnCap != null ? { turnCap: resolved.turnCap } : {}),
});

P1 Badge Forward create-form compact override to setGoal

The new Auto-compact input in GoalCreateForm is parsed into resolved.autoCompactionThresholdPct, but this value is never included in the setGoalWithSingleConflictRetry payload during goal creation. As a result, creating a goal with off, 100, or any explicit percentage silently drops the override and persists the workspace default instead, so the newly added create-time control is effectively non-functional until the user edits the goal again.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/compactionMonitor.ts Outdated
`resolveEffectiveThreshold` was rejecting `override <= 0` and falling
back to the workspace threshold, but the schema admits `0` as a valid
per-goal value (the aggressive extreme of the slider). That created a
renderer/backend mismatch: the GoalTab tile shows "0%" and the GoalTab
editor accepts the value, but the backend silently behaved as if the
override was absent.

Narrow the corruption guard to negative values only and document why
`0` deserves an explicit pass. Add a dedicated test pinning the
contract for both `checkBeforeSend` and `checkMidStream`.

Picked up via PR #3357 Codex review.
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed your P2:

  • CompactionMonitor.resolveEffectiveThreshold now treats 0 as a valid aggressive-extreme override and only rejects truly corrupt values (null / NaN / ±Infinity / negative). Comment in the source spells out why renderer/backend agreement on the 0 extreme matters.
  • New dedicated test override of 0% is honored (Codex P2: renderer/backend must agree) in compactionMonitor.test.ts pins both checkBeforeSend and checkMidStream for that contract.
  • Updated the negative test list to remove 0 from the "should fall back" cases and document why.

All checks still green locally (make static-check, make storybook-build, full bun test suite — 3553 pass / 0 fail).

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Replace the text-input editor for the per-goal auto-compaction
override with an embedded slider tile and reshape the active-goal
grid to be container-query responsive.

UX:
- Default mode (no override): tile shows 'Default — workspace setting'
  with a 'Customize' button. Slider is hidden because we don't know
  the workspace per-model threshold at this layer and a ghost thumb
  would imply a value that isn't really in effect.
- Override mode (value is a number): native range slider with thumb
  at the override value. Drag commits on release (mouseUp/touchEnd/
  keyUp), not on every drag step, so backend mutation rate stays
  bounded by user gestures. 'Use default' clears back to null.
- The right end of the slider is labeled 'Off' to mirror the
  workspace per-model slider — dragging there commits 100 (per-goal
  disabled).

Layout:
- Active-goal grid now uses '[container-type:inline-size]' + '@md:'
  so the sidebar's dynamic width (per AGENTS.md) drives the
  breakpoint, not the viewport. Narrow stays 2-col (Budget full row,
  Turns | Elapsed, Auto-compact full row). Wider goes to 3-col with
  Budget | Turns | Auto-compact on one row and Elapsed as a thin
  full-width row below.
- BudgetTile internal copy stacks vertically at the wider breakpoint
  so its cost / remaining pair still fits in ~1/3 width.
- GoalCreateForm reuses the same slider, with local state instead of
  the previous text-input + parser path.

Drop:
- 'compactThreshold' EditingField variant + the inline text editor.
- parseGoalAutoCompactionThresholdInput (no longer used inside
  GoalTab; the slash-command parser is separate).

Tests:
- Replace text-input drive tests with slider-drive tests:
  Default mode / Off display / numeric override display; Customize
  commits 70; drag commits snapped value on mouseUp; drag-to-100
  commits 100 (Off); mouseUp without change does NOT commit
  (anti-spam guard); Use default clears to null; canEdit gating
  hides Customize and disables the slider.
- Use 'onInput' + 'onChange' both — happy-dom only synthesizes the
  native input event for range inputs, so onInput is required for
  tests, and onChange satisfies React's controlled-input contract.

Validation: typecheck \u2713 lint \u2713 fmt-check \u2713 storybook-build \u2713
bun test src/node/services/ src/browser/features/RightSidebar/
src/browser/utils/slashCommands/ src/browser/utils/goals/ — 3554 pass.

---

_Generated with `mux` \u2022 Model: `anthropic:claude-opus-4-7` \u2022 Thinking: `xhigh` \u2022 Cost: `$53.97`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh costs=53.97 -->
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Slider-driven aesthetic refinement on the per-goal auto-compact tile. Worth a fresh pass — the previous review approved the text-input + inline editor design, but this commit replaces it with a slider tile (Default / Customize / drag / Use default) and reshapes the active-goal grid to be container-query responsive (Budget + Turns + Auto-compact share a row at the wider @md breakpoint; Elapsed becomes a thin full-width row below).

The schema, backend, monitor, and slash-command paths are unchanged from the previous round you approved — only GoalTab.tsx and its tests moved.

Specific things to look at:

  • GoalAutoCompactSlider / GoalAutoCompactSliderOverride — the two-mode split + the "prevValue" sync pattern instead of useEffect (per the react-effects skill).
  • The onInput + onChange duplicate handler for happy-dom compatibility — comments explain why.
  • The container-query layout ([container-type:inline-size] + @md: variants) on the grid + BudgetTile.
  • parseGoalAutoCompactionThresholdInput was dropped from GoalTab.tsx since the slider commits typed values directly; the slash-command parser is unaffected.

Validation: typecheck / lint / fmt-check / storybook-build all green; 3554 tests pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3f7bcb102

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/features/RightSidebar/GoalTab.tsx
Touch-capable browsers fire `touchend` and then synthesize `mouseup`
within ~50–100ms. The slider's `commit` was wired to both, so a
single finger release ran the backend mutation twice: the parent's
async `setGoal` hadn't resolved between the two events, so the
`draft !== value` guard couldn't suppress the second call. This
produced duplicate `setGoal` writes (and duplicate
`goal_replaced` history entries) on touch devices.

Introduce a `justCommittedRef` flag on `GoalAutoCompactSliderOverride`:
- `commit()` checks the flag first; if set, it bails before calling
  `onChange`.
- `commit()` sets the flag immediately before calling `onChange` —
  the synthetic `mouseup` that follows on the same tick is then
  short-circuited.
- `updateDraftFromEvent` (the onInput/onChange path) clears the
  flag whenever a fresh drag/keyboard gesture starts.

We use a ref instead of state because we don't want a re-render
between commit() and the synthetic mouseup that's about to fire on
the same tick. Per AGENTS.md's "avoid timing-based coordination"
rule, the flag lifetime is tied to user intent (the next `input`
event) rather than a setTimeout.

Tests: two new tests pin the contract — one asserts that touchEnd
+ mouseUp from the same gesture commit exactly once; one asserts a
follow-up gesture commits again (the flag resets correctly on next
drag). Without the fix the first test fails with two calls.

Validation: typecheck \u2713 lint \u2713 fmt-check \u2713
bun test src/browser/features/RightSidebar/GoalTab.test.tsx — 34 pass.

---

_Generated with `mux` \u2022 Model: `anthropic:claude-opus-4-7` \u2022 Thinking: `xhigh` \u2022 Cost: `$54.20`_

<!-- mux-attribution: model=anthropic:claude-opus-4-7 thinking=xhigh costs=54.20 -->
@ammar-agent
Copy link
Copy Markdown
Collaborator Author

@codex review

Addressed the P2 finding (PRRT_kwDOPxxmWM6D9ZRp):

The range input wires both onTouchEnd and onMouseUp to commit, but many touch browsers emit a synthetic mouse-up after touch-end; in that sequence commit() runs twice before value is refreshed from the async mutation, so the draft !== value guard passes both times and onChange(draft) is sent twice.

Fixed in 30774e5 with a justCommittedRef flag on GoalAutoCompactSliderOverride:

  • commit() bails if the flag is set, and sets the flag immediately before calling onChange so the synthetic mouseup that follows on the same tick is short-circuited.
  • updateDraftFromEvent (the onInput/onChange path) clears the flag whenever a fresh drag/keyboard gesture starts, so subsequent edits commit normally.
  • Used a ref (not state) because we explicitly don't want a re-render between commit() and the synthetic mouseup. The flag lifetime is tied to user intent (the next input event), not a setTimeout — per AGENTS.md's "avoid timing-based coordination" rule.

Two new tests pin the contract:

  • touch + synthesized mouseup from the same gesture commits only once (Codex P2) — touchEnd + mouseUp dispatched sequentially must result in exactly one setGoal call. Without the fix this fails with two calls.
  • a fresh drag after a release commits again (per-gesture flag resets) — confirms the flag resets correctly so the slider doesn't lock up after one edit.

Validation: typecheck / lint / fmt-check all green; GoalTab tests 34 pass.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant