Skip to content

Commit 3b49cc7

Browse files
sohmnclaude
andcommitted
Merge origin/main (v1.52.0.0 plan-tune) into feat/fanout-skill
Second post-ship merge. PR garrytan#1741 (garrytan/enable-plan-tune) landed at v1.52.0.0 while our PR was waiting on review. Our v1.53.0.0 claim is still clean per gstack-next-version queue check (no new claims). Conflicts resolved: - VERSION: kept 1.53.0.0 (ahead of main's 1.52.0.0) - package.json: synced to 1.53.0.0 - CHANGELOG.md: our 1.53.0.0 entry preserved above main's new 1.52.0.0 entry; existing 1.51.0.0 from previous merge unchanged Regenerated: - fanout/SKILL.md (gained 4 lines from main's preamble updates) - gstack/llms.txt + scripts/proactive-suggestions.json Tests: fanout 6/6 pass post-merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 parents 867e04e + ce5fbfa commit 3b49cc7

87 files changed

Lines changed: 6349 additions & 164 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,74 @@ If your team or your single instance of Claude Code is sitting on a finished des
3333
- Design doc at [`docs/designs/FANOUT.md`](docs/designs/FANOUT.md) documents the 4-layer slab detection heuristic, Slab 0 promotion logic, conflict resolution rules, and edge cases.
3434
- No new infrastructure: skill is auto-discovered by `setup` via the existing top-level-directory glob at [setup:620-633](setup).
3535

36+
## [1.52.0.0] - 2026-05-27
37+
38+
## **`/plan-tune` settings actually do something now. Hooks make capture deterministic, preferences binding, and free-text answers loop back as memory.**
39+
40+
Before this release, plan-tune was a profile inspector with a hollow substrate. Every gstack skill told the agent "log this AskUserQuestion fire," and in weeks of dogfood, zero events ever landed. Preferences were agent-honored convention. Declared profile dimensions sat in a JSON file doing nothing. After this release: a PostToolUse hook captures every AUQ fire whether the agent remembers to log or not. A PreToolUse hook substitutes auto-decided answers when you've set `never-ask`. Free-text "Other" responses get dream-cycled through Claude into structured proposals you approve, then injected into future related questions as inline context. Codex sessions are backfilled by a structured-JSONL parser, not regex on transcript text.
41+
42+
The cathedral lands behind one explicit consent prompt at `./setup` (with diff preview, backup, and one-command rollback) and stays on once installed.
43+
44+
### The numbers that matter
45+
46+
Measured against the existing v1.49 substrate. Reproduce with `bun test test/plan-tune-gates.test.ts test/question-log-hook.test.ts test/question-preference-hook.test.ts test/memory-cache-injection.test.ts test/distill-free-text.test.ts test/distill-apply.test.ts test/declared-annotation.test.ts test/gstack-codex-session-import.test.ts test/skill-e2e-plan-tune-cathedral.test.ts`.
47+
48+
| Metric | Before (v1.49.0.0) | After (v1.52.0.0) | Δ |
49+
|---|---|---|---|
50+
| AUQ events captured per session | 0 (agent convention) | every fire (hook) | substrate works |
51+
| `never-ask` preferences enforced | 0% (agent convention) | 100% (hook + deny+reason) | actually binds |
52+
| Declared profile annotations | 0 / week | every signal_key match | profile renders |
53+
| Dream-cycle memory persistence | 0 (no mechanism) | per-project + gbrain mirror | cross-project recall |
54+
| Codex session backfill | none (regex idea) | structured JSONL parser | future-proof |
55+
| Per-PR test cost added | $0 | $0 (deterministic; no claude -p) | gate-tier safe |
56+
| Unit + E2E tests added | — | 96 tests / 8 new files | green |
57+
58+
| Layer | What it does | Where it lives |
59+
|---|---|---|
60+
| 1 — Capture | PostToolUse hook → question-log.jsonl with dedup + async derive | hosts/claude/hooks/question-log-hook.ts |
61+
| 2 — Enforcement | PreToolUse hook → deny+reason with auto-decided option | hosts/claude/hooks/question-preference-hook.ts |
62+
| 3 — Annotation | declared profile → kebab signal_key → plain-English phrase | scripts/declared-annotation.ts |
63+
| 4 — Surfaces | host-aware Stats, Recent auto-decisions, Audit unmarked | plan-tune/SKILL.md.tmpl |
64+
| 5 — Discoverability | setup hook-install prompt + post-ship nudge | setup, ship/SKILL.md.tmpl |
65+
| 6 — Tests | 5 E2E scenarios, all gate tier, $0 cost | test/skill-e2e-plan-tune-cathedral.test.ts |
66+
| 7 — Installation | schema-aware bin: PreToolUse + PostToolUse, backup + rollback | bin/gstack-settings-hook |
67+
| 8 — Dream cycle | Anthropic SDK distill + gbrain put_page + memory injection | bin/gstack-distill-* + Layer 2 inject |
68+
69+
Highest-impact number is the third row: declared profile annotations now render inline before every AUQ that matches a signal_key. Set `declared.scope_appetite = 0.85` once during /plan-tune setup, and every "should I bundle this fix?" question shows up with "(your profile leans complete-implementation)" on the recommended option. The same loop applies to verbose-vs-terse, consult-vs-delegate, and ship-now-vs-get-the-design-right.
70+
71+
### What this means for solo builders
72+
73+
The feature compounds now. Each AskUserQuestion you answer "Other" with free text gets captured by the hook, batched into proposals by `gstack-distill-free-text` (3/day cap, ~$0.01 per run), reviewed via `/plan-tune distill`, and applied as either a `never-ask` preference, a declared-profile nudge, or a reusable memory nugget that routes to your gbrain (when configured) and reappears as context the next time a related question fires. The dream cycle is the unlock — without it, every nuanced answer evaporated after one turn. Now they accumulate. Run `./setup` and accept the hook-install prompt to turn it on, then `/plan-tune` whenever you want to see what your profile knows about you.
74+
75+
### Itemized changes
76+
77+
**Added**
78+
- `hosts/claude/hooks/question-log-hook` — PostToolUse hook, matcher covers `AskUserQuestion` + `mcp__*__AskUserQuestion`. Captures every AUQ fire with marker-first question_id (D18), hash-fallback observed-only, source-tagged.
79+
- `hosts/claude/hooks/question-preference-hook` — PreToolUse hook with `(recommended)`-label parser, refuse-on-ambiguous (D2 safety), project-then-global preference precedence (D8), one-way safety override. Auto-decided events logged from the hook itself since deny prevents PostToolUse from firing.
80+
- `scripts/declared-annotation.ts` — `getDeclaredAnnotation(signal_key)` with kebab→underscore namespace mapping. Returns null in the middle band, plain-English phrase in strong bands (>= 0.7 or <= 0.3).
81+
- `bin/gstack-codex-session-import` — structured JSONL parser for `~/.codex/sessions/`. Marker-first recovery with pattern fallback, source-tagged `codex-import-marker` / `codex-import-pattern`.
82+
- `bin/gstack-distill-free-text` — Layer 8 dream cycle distiller. Anthropic SDK direct call (Haiku 4.5), 3/day rate cap per slug (D7), cumulative cost log, sync-or-background execution context (D14).
83+
- `bin/gstack-distill-apply` — applies one approved proposal to its surface (preference / declared-nudge / memory-nugget), with optional `--gbrain-published true` flag.
84+
- `setup` — interactive consent prompt for hook installation with diff preview, backup, one-command rollback. Marker-gated so users are asked at most once.
85+
- `ship/SKILL.md.tmpl` Step 21 — post-success plan-tune nudge, marker-gated for at-most-once.
86+
- `docs/spikes/claude-code-hook-mutation.md` + `docs/spikes/codex-session-format.md` — Phase 1 spike outputs that pinned protocol contracts before implementation.
87+
- 96 new tests across 8 files: STATE_ROOT honoring, v1.49 gates, settings-hook schema-aware ops, both hooks, declared-annotation, codex import, distill bin, distill apply, memory injection, 5 cathedral E2E scenarios.
88+
89+
**Changed**
90+
- `bin/gstack-settings-hook` schema-aware rewrite: PreToolUse + PostToolUse registration with `_gstack_source` tag for dedup, `add-event` / `remove-source` / `diff-event` / `rollback` / `list-sources` subcommands. Legacy `add`/`remove` SessionStart shape preserved verbatim.
91+
- `bin/gstack-question-log` — accepts source, tool_use_id, free_text; composite dedup on (source, tool_use_id) across last 100 lines (D3); async-fires `gstack-developer-profile --derive` after every successful write (D17 — without this, sample_size stayed 0).
92+
- Three bins (`gstack-question-log`, `gstack-question-preference`, `gstack-developer-profile`) + `gstack-config` now honor `GSTACK_STATE_ROOT` env var as highest-priority override (D16 Codex correction — without this, isolation tests silently wrote to real ~/.gstack).
93+
- `scripts/resolvers/question-tuning.ts` preamble — added marker-embedding convention (`<gstack-qid:{id}>`) and `(recommended)` label convention. Hook enforcement gates on marker presence.
94+
- `scripts/question-registry.ts` — added `signal_key: 'decision-autonomy'` to `land-and-deploy-merge-confirm` and `land-and-deploy-rollback` so the autonomy dimension has a real signal source.
95+
- `scripts/psychographic-signals.ts` — added `decision-autonomy` signal map.
96+
- `plan-tune/SKILL.md.tmpl` — new sections (Recent auto-decisions, Audit unmarked, Dream cycle review, Dream cycle distill); host-aware Stats with source breakdown + MARKED %; Step 0 routing extended with dream-cycle gate.
97+
- `bin/gstack-uninstall` — also cleans up `plan-tune-cathedral`-tagged hooks during uninstall.
98+
99+
**For contributors**
100+
- 4 cross-model tension resolutions during eng review locked in: project preferences win over global (D8), hash IDs are observed-only never preference keys (D18), AUQ matcher covers MCP variants (Codex correction), enforcement uses `permissionDecision: "deny"` + reason instead of `"allow"` + `updatedInput` until the AUQ input shape is verified against real Claude Code (T6 conservative path).
101+
- Plan-review preamble byte budget ratcheted 39000 → 40000 in `test/gen-skill-docs.test.ts` (~700 bytes added by the marker convention).
102+
- 9 Codex outside-voice findings folded directly without re-prompting (matcher correction, derive wiring, settings.json consent, signal_key namespace, etc.).
103+
36104
## [1.51.0.0] - 2026-05-27
37105

38106
## **Long-running browser sessions hold flat RSS on the Bun side. `$B memory` gives every future OOM receipts instead of a screenshot.** Four CDP-resource leak classes closed and pinned with tripwires; a structured diagnostic surfaces Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes in real time.
@@ -86,6 +154,29 @@ The next time you leave a gbrowser session running for days, the Bun side holds
86154
- Coverage audit: 44% pre-diagnostic-tests → ~62% after adding the formatter coverage. Strong paths (CDP session lifecycle, body materialization, history cap, tab guardrail, SSE cleanup) all at 100% with invariant tests. Extension UI tests deferred (no extension test harness in this repo today).
87155
- The CDP-session cleanup tripwire is the most reusable artifact here — any future addition of CDP work should route through the two helpers. Trying to call `newCDPSession` outside `cdp-bridge.ts` fails CI immediately with a pointer to the right helper.
88156

157+
## [1.49.0.0] - 2026-05-26
158+
159+
## **`/plan-tune` learns to ask for consent before logging, and runs the 5-question setup automatically when your profile is empty.**
160+
161+
Run `/plan-tune` the first time and you get an opt-in prompt. Accept and the 5-question wizard fills in your declared profile in about two minutes. Decline and `/plan-tune` never asks again. Contributors see a slightly different prompt explaining that local question-log data helps gstack calibrate, but the default is the same: off until you say yes.
162+
163+
If you already opted in via `gstack-config set question_tuning true` and skipped the wizard, the next `/plan-tune` runs just the 5-question setup so your profile actually has values.
164+
165+
Both flows write marker files in `~/.gstack/` so you're asked at most once per choice.
166+
167+
### Itemized changes
168+
169+
**Added**
170+
- `/plan-tune` consent prompt with contributor-specific copy. Honored by `~/.gstack/.question-tuning-prompted` marker.
171+
- `/plan-tune` setup gate. Catches `question_tuning: true` with empty `declared`. Honored by `~/.gstack/.declared-setup-prompted` marker.
172+
173+
**Changed**
174+
- `TODOS.md` E1 dependency line aligned with the canonical 90-day gate in `docs/designs/PLAN_TUNING_V0.md`. The 7-day diversity gate is for displaying inferred values in `/plan-tune` output; the 90-day gate is for shipping behavior adaptation. Both gates documented inline in `plan-tune/SKILL.md.tmpl`.
175+
- `TODOS.md` E1 substrate constraint: E1 adaptations land as advisory annotations on AskUserQuestion recommendations, not as runtime AUTO_DECIDE on inferred profile alone.
176+
177+
**For contributors**
178+
- `plan-tune/SKILL.md` size budget override (50,123 → 52,963 bytes, ×1.06 vs v1.44.1 baseline). Reason logged to audit trail.
179+
89180
## [1.48.0.0] - 2026-05-26
90181

91182
## **Agents stop dropping AskUserQuestion options when there are 5+.** A new canonical preamble rule + runtime gate makes Conductor's 4-option cap a split-or-batch decision, not a silent trim.

TODOS.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -717,7 +717,24 @@ reads it yet.
717717

718718
**Effort:** L (human: ~1 week / CC: ~4h)
719719
**Priority:** P0
720-
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
720+
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
721+
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
722+
Distinct from the lighter-weight diversity-display gate
723+
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
724+
AND days_span >= 7`) used in /plan-tune to render the inferred column —
725+
display is a UI affordance, promotion to E1 needs a much higher bar
726+
because behavioral adaptation is consequential and hard to revert. Prior
727+
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
728+
729+
**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
730+
skill prose is agent-compliance-based. Tests can verify templates contain the
731+
right reads of `~/.gstack/developer-profile.json` and the right decision
732+
points, but tests cannot prove agents obey them at runtime. E1 ships
733+
adaptations as **advisory annotations on AskUserQuestion recommendations**
734+
("Recommended via your profile: <choice>") until there's a hard runtime
735+
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
736+
of E1; explicit per-question preferences remain the only AUTO_DECIDE
737+
source.
721738

722739
### E3 — `/plan-tune narrative` + `/plan-tune vibe`
723740

autoplan/SKILL.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -654,7 +654,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
654654

655655
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary][option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
656656

657-
After answer, log best-effort:
657+
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
658+
659+
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
660+
661+
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
658662
```bash
659663
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"autoplan","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
660664
```

0 commit comments

Comments
 (0)