Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
9cc211f
feat(plan-tune): explicit-consent surface + setup gate for question_t…
garrytan May 27, 2026
1f18c5a
chore: bump version and changelog (v1.49.0.0)
garrytan May 27, 2026
3d5ae48
Merge branch 'main' into garrytan/enable-plan-tune
garrytan May 27, 2026
b5aad4a
Merge branch 'main' into garrytan/enable-plan-tune (catch up to v1.48…
garrytan May 27, 2026
065a8b1
feat(bins): honor GSTACK_STATE_ROOT override for test isolation
garrytan May 27, 2026
9e0e185
test(plan-tune): regression coverage for v1.49 consent + setup gates
garrytan May 27, 2026
6dc1138
docs(spikes): Claude hook mutation + Codex session format
garrytan May 27, 2026
2147532
feat(settings-hook): schema-aware PreToolUse/PostToolUse registration
garrytan May 27, 2026
a8a0447
feat(hooks): PostToolUse capture hook for AskUserQuestion
garrytan May 27, 2026
873fccc
feat(hooks): PreToolUse enforcement hook for AskUserQuestion preferences
garrytan May 27, 2026
fa590c4
feat(scripts): declared-annotation helper + autonomy signal_key wiring
garrytan May 27, 2026
dbe4844
feat(setup): install plan-tune cathedral hooks with explicit consent UX
garrytan May 27, 2026
990c9d5
feat(bin): gstack-codex-session-import — structured Codex transcript …
garrytan May 27, 2026
e02b1b1
feat(bin): gstack-distill-free-text — Layer 8 dream cycle distiller
garrytan May 27, 2026
241be5c
feat(bin): gstack-distill-apply — apply distillation proposals with g…
garrytan May 27, 2026
d3aceb6
feat(hooks): Layer 8 memory injection via per-session cache
garrytan May 27, 2026
66f27cf
feat(plan-tune): SKILL.md surfaces for cathedral T13
garrytan May 27, 2026
e82efc0
feat(preamble): inject <gstack-qid:...> marker convention into questi…
garrytan May 27, 2026
3eab893
feat(ship): plan-tune discoverability nudge after first successful ship
garrytan May 27, 2026
d50d474
test(plan-tune): 5 cathedral E2E scenarios + touchfile registration
garrytan May 27, 2026
ccd9507
chore: bump VERSION to 1.50.0.0 + plan-tune cathedral CHANGELOG
garrytan May 27, 2026
8ebf24b
chore: bump VERSION 1.50.0.0 → 1.52.0.0 (queue collision with #1742)
garrytan May 27, 2026
401a325
Merge remote-tracking branch 'origin/main' into garrytan/enable-plan-…
garrytan May 27, 2026
e9ad752
chore: cap audit — remove distill rate cap, loosen size/budget gates
garrytan May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,73 @@
# Changelog

## [1.52.0.0] - 2026-05-27

## **`/plan-tune` settings actually do something now. Hooks make capture deterministic, preferences binding, and free-text answers loop back as memory.**

Before this release, plan-tune was a profile inspector with a hollow substrate. Every gstack skill told the agent "log this AskUserQuestion fire," and in weeks of dogfood, zero events ever landed. Preferences were agent-honored convention. Declared profile dimensions sat in a JSON file doing nothing. After this release: a PostToolUse hook captures every AUQ fire whether the agent remembers to log or not. A PreToolUse hook substitutes auto-decided answers when you've set `never-ask`. Free-text "Other" responses get dream-cycled through Claude into structured proposals you approve, then injected into future related questions as inline context. Codex sessions are backfilled by a structured-JSONL parser, not regex on transcript text.

The cathedral lands behind one explicit consent prompt at `./setup` (with diff preview, backup, and one-command rollback) and stays on once installed.

### The numbers that matter

Measured against the existing v1.49 substrate. Reproduce with `bun test test/plan-tune-gates.test.ts test/question-log-hook.test.ts test/question-preference-hook.test.ts test/memory-cache-injection.test.ts test/distill-free-text.test.ts test/distill-apply.test.ts test/declared-annotation.test.ts test/gstack-codex-session-import.test.ts test/skill-e2e-plan-tune-cathedral.test.ts`.

| Metric | Before (v1.49.0.0) | After (v1.52.0.0) | Δ |
|---|---|---|---|
| AUQ events captured per session | 0 (agent convention) | every fire (hook) | substrate works |
| `never-ask` preferences enforced | 0% (agent convention) | 100% (hook + deny+reason) | actually binds |
| Declared profile annotations | 0 / week | every signal_key match | profile renders |
| Dream-cycle memory persistence | 0 (no mechanism) | per-project + gbrain mirror | cross-project recall |
| Codex session backfill | none (regex idea) | structured JSONL parser | future-proof |
| Per-PR test cost added | $0 | $0 (deterministic; no claude -p) | gate-tier safe |
| Unit + E2E tests added | — | 96 tests / 8 new files | green |

| Layer | What it does | Where it lives |
|---|---|---|
| 1 — Capture | PostToolUse hook → question-log.jsonl with dedup + async derive | hosts/claude/hooks/question-log-hook.ts |
| 2 — Enforcement | PreToolUse hook → deny+reason with auto-decided option | hosts/claude/hooks/question-preference-hook.ts |
| 3 — Annotation | declared profile → kebab signal_key → plain-English phrase | scripts/declared-annotation.ts |
| 4 — Surfaces | host-aware Stats, Recent auto-decisions, Audit unmarked | plan-tune/SKILL.md.tmpl |
| 5 — Discoverability | setup hook-install prompt + post-ship nudge | setup, ship/SKILL.md.tmpl |
| 6 — Tests | 5 E2E scenarios, all gate tier, $0 cost | test/skill-e2e-plan-tune-cathedral.test.ts |
| 7 — Installation | schema-aware bin: PreToolUse + PostToolUse, backup + rollback | bin/gstack-settings-hook |
| 8 — Dream cycle | Anthropic SDK distill + gbrain put_page + memory injection | bin/gstack-distill-* + Layer 2 inject |

Highest-impact number is the third row: declared profile annotations now render inline before every AUQ that matches a signal_key. Set `declared.scope_appetite = 0.85` once during /plan-tune setup, and every "should I bundle this fix?" question shows up with "(your profile leans complete-implementation)" on the recommended option. The same loop applies to verbose-vs-terse, consult-vs-delegate, and ship-now-vs-get-the-design-right.

### What this means for solo builders

The feature compounds now. Each AskUserQuestion you answer "Other" with free text gets captured by the hook, batched into proposals by `gstack-distill-free-text` (3/day cap, ~$0.01 per run), reviewed via `/plan-tune distill`, and applied as either a `never-ask` preference, a declared-profile nudge, or a reusable memory nugget that routes to your gbrain (when configured) and reappears as context the next time a related question fires. The dream cycle is the unlock — without it, every nuanced answer evaporated after one turn. Now they accumulate. Run `./setup` and accept the hook-install prompt to turn it on, then `/plan-tune` whenever you want to see what your profile knows about you.

### Itemized changes

**Added**
- `hosts/claude/hooks/question-log-hook` — PostToolUse hook, matcher covers `AskUserQuestion` + `mcp__*__AskUserQuestion`. Captures every AUQ fire with marker-first question_id (D18), hash-fallback observed-only, source-tagged.
- `hosts/claude/hooks/question-preference-hook` — PreToolUse hook with `(recommended)`-label parser, refuse-on-ambiguous (D2 safety), project-then-global preference precedence (D8), one-way safety override. Auto-decided events logged from the hook itself since deny prevents PostToolUse from firing.
- `scripts/declared-annotation.ts` — `getDeclaredAnnotation(signal_key)` with kebab→underscore namespace mapping. Returns null in the middle band, plain-English phrase in strong bands (>= 0.7 or <= 0.3).
- `bin/gstack-codex-session-import` — structured JSONL parser for `~/.codex/sessions/`. Marker-first recovery with pattern fallback, source-tagged `codex-import-marker` / `codex-import-pattern`.
- `bin/gstack-distill-free-text` — Layer 8 dream cycle distiller. Anthropic SDK direct call (Haiku 4.5), 3/day rate cap per slug (D7), cumulative cost log, sync-or-background execution context (D14).
- `bin/gstack-distill-apply` — applies one approved proposal to its surface (preference / declared-nudge / memory-nugget), with optional `--gbrain-published true` flag.
- `setup` — interactive consent prompt for hook installation with diff preview, backup, one-command rollback. Marker-gated so users are asked at most once.
- `ship/SKILL.md.tmpl` Step 21 — post-success plan-tune nudge, marker-gated for at-most-once.
- `docs/spikes/claude-code-hook-mutation.md` + `docs/spikes/codex-session-format.md` — Phase 1 spike outputs that pinned protocol contracts before implementation.
- 96 new tests across 8 files: STATE_ROOT honoring, v1.49 gates, settings-hook schema-aware ops, both hooks, declared-annotation, codex import, distill bin, distill apply, memory injection, 5 cathedral E2E scenarios.

**Changed**
- `bin/gstack-settings-hook` schema-aware rewrite: PreToolUse + PostToolUse registration with `_gstack_source` tag for dedup, `add-event` / `remove-source` / `diff-event` / `rollback` / `list-sources` subcommands. Legacy `add`/`remove` SessionStart shape preserved verbatim.
- `bin/gstack-question-log` — accepts source, tool_use_id, free_text; composite dedup on (source, tool_use_id) across last 100 lines (D3); async-fires `gstack-developer-profile --derive` after every successful write (D17 — without this, sample_size stayed 0).
- Three bins (`gstack-question-log`, `gstack-question-preference`, `gstack-developer-profile`) + `gstack-config` now honor `GSTACK_STATE_ROOT` env var as highest-priority override (D16 Codex correction — without this, isolation tests silently wrote to real ~/.gstack).
- `scripts/resolvers/question-tuning.ts` preamble — added marker-embedding convention (`<gstack-qid:{id}>`) and `(recommended)` label convention. Hook enforcement gates on marker presence.
- `scripts/question-registry.ts` — added `signal_key: 'decision-autonomy'` to `land-and-deploy-merge-confirm` and `land-and-deploy-rollback` so the autonomy dimension has a real signal source.
- `scripts/psychographic-signals.ts` — added `decision-autonomy` signal map.
- `plan-tune/SKILL.md.tmpl` — new sections (Recent auto-decisions, Audit unmarked, Dream cycle review, Dream cycle distill); host-aware Stats with source breakdown + MARKED %; Step 0 routing extended with dream-cycle gate.
- `bin/gstack-uninstall` — also cleans up `plan-tune-cathedral`-tagged hooks during uninstall.

**For contributors**
- 4 cross-model tension resolutions during eng review locked in: project preferences win over global (D8), hash IDs are observed-only never preference keys (D18), AUQ matcher covers MCP variants (Codex correction), enforcement uses `permissionDecision: "deny"` + reason instead of `"allow"` + `updatedInput` until the AUQ input shape is verified against real Claude Code (T6 conservative path).
- Plan-review preamble byte budget ratcheted 39000 → 40000 in `test/gen-skill-docs.test.ts` (~700 bytes added by the marker convention).
- 9 Codex outside-voice findings folded directly without re-prompting (matcher correction, derive wiring, settings.json consent, signal_key namespace, etc.).

## [1.51.0.0] - 2026-05-27

## **Long-running browser sessions hold flat RSS on the Bun side. `$B memory` gives every future OOM receipts instead of a screenshot.** Four CDP-resource leak classes closed and pinned with tripwires; a structured diagnostic surfaces Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes in real time.
Expand Down Expand Up @@ -53,6 +121,29 @@ The next time you leave a gbrowser session running for days, the Bun side holds
- Coverage audit: 44% pre-diagnostic-tests → ~62% after adding the formatter coverage. Strong paths (CDP session lifecycle, body materialization, history cap, tab guardrail, SSE cleanup) all at 100% with invariant tests. Extension UI tests deferred (no extension test harness in this repo today).
- The CDP-session cleanup tripwire is the most reusable artifact here — any future addition of CDP work should route through the two helpers. Trying to call `newCDPSession` outside `cdp-bridge.ts` fails CI immediately with a pointer to the right helper.

## [1.49.0.0] - 2026-05-26

## **`/plan-tune` learns to ask for consent before logging, and runs the 5-question setup automatically when your profile is empty.**

Run `/plan-tune` the first time and you get an opt-in prompt. Accept and the 5-question wizard fills in your declared profile in about two minutes. Decline and `/plan-tune` never asks again. Contributors see a slightly different prompt explaining that local question-log data helps gstack calibrate, but the default is the same: off until you say yes.

If you already opted in via `gstack-config set question_tuning true` and skipped the wizard, the next `/plan-tune` runs just the 5-question setup so your profile actually has values.

Both flows write marker files in `~/.gstack/` so you're asked at most once per choice.

### Itemized changes

**Added**
- `/plan-tune` consent prompt with contributor-specific copy. Honored by `~/.gstack/.question-tuning-prompted` marker.
- `/plan-tune` setup gate. Catches `question_tuning: true` with empty `declared`. Honored by `~/.gstack/.declared-setup-prompted` marker.

**Changed**
- `TODOS.md` E1 dependency line aligned with the canonical 90-day gate in `docs/designs/PLAN_TUNING_V0.md`. The 7-day diversity gate is for displaying inferred values in `/plan-tune` output; the 90-day gate is for shipping behavior adaptation. Both gates documented inline in `plan-tune/SKILL.md.tmpl`.
- `TODOS.md` E1 substrate constraint: E1 adaptations land as advisory annotations on AskUserQuestion recommendations, not as runtime AUTO_DECIDE on inferred profile alone.

**For contributors**
- `plan-tune/SKILL.md` size budget override (50,123 → 52,963 bytes, ×1.06 vs v1.44.1 baseline). Reason logged to audit trail.

## [1.48.0.0] - 2026-05-26

## **Agents stop dropping AskUserQuestion options when there are 5+.** A new canonical preamble rule + runtime gate makes Conductor's 4-option cap a split-or-batch decision, not a silent trim.
Expand Down
19 changes: 18 additions & 1 deletion TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -717,7 +717,24 @@ reads it yet.

**Effort:** L (human: ~1 week / CC: ~4h)
**Priority:** P0
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
Distinct from the lighter-weight diversity-display gate
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
AND days_span >= 7`) used in /plan-tune to render the inferred column —
display is a UI affordance, promotion to E1 needs a much higher bar
because behavioral adaptation is consequential and hard to revert. Prior
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.

**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
skill prose is agent-compliance-based. Tests can verify templates contain the
right reads of `~/.gstack/developer-profile.json` and the right decision
points, but tests cannot prove agents obey them at runtime. E1 ships
adaptations as **advisory annotations on AskUserQuestion recommendations**
("Recommended via your profile: <choice>") until there's a hard runtime
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
of E1; explicit per-question preferences remain the only AUTO_DECIDE
source.

### E3 — `/plan-tune narrative` + `/plan-tune vibe`

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.51.0.0
1.52.0.0
6 changes: 5 additions & 1 deletion autoplan/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -654,7 +654,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST

Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.

After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.

**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.

After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"autoplan","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
Expand Down
Loading
Loading