Skip to content

Commit ccd9507

Browse files
garrytanclaude
andcommitted
chore: bump VERSION to 1.50.0.0 + plan-tune cathedral CHANGELOG
Plan-tune cathedral T17. Bumps VERSION 1.49.0.0 → 1.50.0.0 (MINOR per CLAUDE.md scale-aware rule: this is substantial new capability — 8 layers, ~3000 LOC, 96 new tests, deterministic substrate + dream-cycle distillation). CHANGELOG entry follows the release-summary format from CLAUDE.md: - Two-line bold headline naming what changed for users (deterministic capture, binding preferences, free-text memory loop) - Lead paragraph: before/after framed concretely (zero events captured → every fire, agent-honored → hook-enforced, declared profile → injected context, regex backfill → structured JSONL parser) - Two tables: metric deltas + layer/where-it-lives. Real numbers (96 tests, ~$0.01 per distill, 3/day cap), no AI vocabulary, no em dashes. - "What this means for solo builders" close: ties dream cycle to the compounding loop and points to ./setup as the on-ramp. - Itemized Added/Changed/For contributors sections list every layer's surfaces with file paths. Also: - Refreshed test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md to match the regenerated ship templates (Step 21 nudge added). - Rebased plan-tune entry in parity-baseline-v1.47.0.0.json from 51717 → 64017 bytes with a baseline_note explaining the cathedral T13 expansion. Documents that the new Dream cycle, Recent auto-decisions, Audit unmarked, Dream cycle review/distill sections are load-bearing, not bloat. Without the rebase, the size-budget gate fails — and the cathedral's whole point is making /plan-tune do more, not less. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d50d474 commit ccd9507

7 files changed

Lines changed: 226 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,73 @@
11
# Changelog
22

3+
## [1.50.0.0] - 2026-05-27
4+
5+
## **`/plan-tune` settings actually do something now. Hooks make capture deterministic, preferences binding, and free-text answers loop back as memory.**
6+
7+
Before this release, plan-tune was a profile inspector with a hollow substrate. Every gstack skill told the agent "log this AskUserQuestion fire," and in weeks of dogfood, zero events ever landed. Preferences were agent-honored convention. Declared profile dimensions sat in a JSON file doing nothing. After this release: a PostToolUse hook captures every AUQ fire whether the agent remembers to log or not. A PreToolUse hook substitutes auto-decided answers when you've set `never-ask`. Free-text "Other" responses get dream-cycled through Claude into structured proposals you approve, then injected into future related questions as inline context. Codex sessions are backfilled by a structured-JSONL parser, not regex on transcript text.
8+
9+
The cathedral lands behind one explicit consent prompt at `./setup` (with diff preview, backup, and one-command rollback) and stays on once installed.
10+
11+
### The numbers that matter
12+
13+
Measured against the existing v1.49 substrate. Reproduce with `bun test test/plan-tune-gates.test.ts test/question-log-hook.test.ts test/question-preference-hook.test.ts test/memory-cache-injection.test.ts test/distill-free-text.test.ts test/distill-apply.test.ts test/declared-annotation.test.ts test/gstack-codex-session-import.test.ts test/skill-e2e-plan-tune-cathedral.test.ts`.
14+
15+
| Metric | Before (v1.49.0.0) | After (v1.50.0.0) | Δ |
16+
|---|---|---|---|
17+
| AUQ events captured per session | 0 (agent convention) | every fire (hook) | substrate works |
18+
| `never-ask` preferences enforced | 0% (agent convention) | 100% (hook + deny+reason) | actually binds |
19+
| Declared profile annotations | 0 / week | every signal_key match | profile renders |
20+
| Dream-cycle memory persistence | 0 (no mechanism) | per-project + gbrain mirror | cross-project recall |
21+
| Codex session backfill | none (regex idea) | structured JSONL parser | future-proof |
22+
| Per-PR test cost added | $0 | $0 (deterministic; no claude -p) | gate-tier safe |
23+
| Unit + E2E tests added | — | 96 tests / 8 new files | green |
24+
25+
| Layer | What it does | Where it lives |
26+
|---|---|---|
27+
| 1 — Capture | PostToolUse hook → question-log.jsonl with dedup + async derive | hosts/claude/hooks/question-log-hook.ts |
28+
| 2 — Enforcement | PreToolUse hook → deny+reason with auto-decided option | hosts/claude/hooks/question-preference-hook.ts |
29+
| 3 — Annotation | declared profile → kebab signal_key → plain-English phrase | scripts/declared-annotation.ts |
30+
| 4 — Surfaces | host-aware Stats, Recent auto-decisions, Audit unmarked | plan-tune/SKILL.md.tmpl |
31+
| 5 — Discoverability | setup hook-install prompt + post-ship nudge | setup, ship/SKILL.md.tmpl |
32+
| 6 — Tests | 5 E2E scenarios, all gate tier, $0 cost | test/skill-e2e-plan-tune-cathedral.test.ts |
33+
| 7 — Installation | schema-aware bin: PreToolUse + PostToolUse, backup + rollback | bin/gstack-settings-hook |
34+
| 8 — Dream cycle | Anthropic SDK distill + gbrain put_page + memory injection | bin/gstack-distill-* + Layer 2 inject |
35+
36+
Highest-impact number is the third row: declared profile annotations now render inline before every AUQ that matches a signal_key. Set `declared.scope_appetite = 0.85` once during /plan-tune setup, and every "should I bundle this fix?" question shows up with "(your profile leans complete-implementation)" on the recommended option. The same loop applies to verbose-vs-terse, consult-vs-delegate, and ship-now-vs-get-the-design-right.
37+
38+
### What this means for solo builders
39+
40+
The feature compounds now. Each AskUserQuestion you answer "Other" with free text gets captured by the hook, batched into proposals by `gstack-distill-free-text` (3/day cap, ~$0.01 per run), reviewed via `/plan-tune distill`, and applied as either a `never-ask` preference, a declared-profile nudge, or a reusable memory nugget that routes to your gbrain (when configured) and reappears as context the next time a related question fires. The dream cycle is the unlock — without it, every nuanced answer evaporated after one turn. Now they accumulate. Run `./setup` and accept the hook-install prompt to turn it on, then `/plan-tune` whenever you want to see what your profile knows about you.
41+
42+
### Itemized changes
43+
44+
**Added**
45+
- `hosts/claude/hooks/question-log-hook` — PostToolUse hook, matcher covers `AskUserQuestion` + `mcp__*__AskUserQuestion`. Captures every AUQ fire with marker-first question_id (D18), hash-fallback observed-only, source-tagged.
46+
- `hosts/claude/hooks/question-preference-hook` — PreToolUse hook with `(recommended)`-label parser, refuse-on-ambiguous (D2 safety), project-then-global preference precedence (D8), one-way safety override. Auto-decided events logged from the hook itself since deny prevents PostToolUse from firing.
47+
- `scripts/declared-annotation.ts` — `getDeclaredAnnotation(signal_key)` with kebab→underscore namespace mapping. Returns null in the middle band, plain-English phrase in strong bands (>= 0.7 or <= 0.3).
48+
- `bin/gstack-codex-session-import` — structured JSONL parser for `~/.codex/sessions/`. Marker-first recovery with pattern fallback, source-tagged `codex-import-marker` / `codex-import-pattern`.
49+
- `bin/gstack-distill-free-text` — Layer 8 dream cycle distiller. Anthropic SDK direct call (Haiku 4.5), 3/day rate cap per slug (D7), cumulative cost log, sync-or-background execution context (D14).
50+
- `bin/gstack-distill-apply` — applies one approved proposal to its surface (preference / declared-nudge / memory-nugget), with optional `--gbrain-published true` flag.
51+
- `setup` — interactive consent prompt for hook installation with diff preview, backup, one-command rollback. Marker-gated so users are asked at most once.
52+
- `ship/SKILL.md.tmpl` Step 21 — post-success plan-tune nudge, marker-gated for at-most-once.
53+
- `docs/spikes/claude-code-hook-mutation.md` + `docs/spikes/codex-session-format.md` — Phase 1 spike outputs that pinned protocol contracts before implementation.
54+
- 96 new tests across 8 files: STATE_ROOT honoring, v1.49 gates, settings-hook schema-aware ops, both hooks, declared-annotation, codex import, distill bin, distill apply, memory injection, 5 cathedral E2E scenarios.
55+
56+
**Changed**
57+
- `bin/gstack-settings-hook` schema-aware rewrite: PreToolUse + PostToolUse registration with `_gstack_source` tag for dedup, `add-event` / `remove-source` / `diff-event` / `rollback` / `list-sources` subcommands. Legacy `add`/`remove` SessionStart shape preserved verbatim.
58+
- `bin/gstack-question-log` — accepts source, tool_use_id, free_text; composite dedup on (source, tool_use_id) across last 100 lines (D3); async-fires `gstack-developer-profile --derive` after every successful write (D17 — without this, sample_size stayed 0).
59+
- Three bins (`gstack-question-log`, `gstack-question-preference`, `gstack-developer-profile`) + `gstack-config` now honor `GSTACK_STATE_ROOT` env var as highest-priority override (D16 Codex correction — without this, isolation tests silently wrote to real ~/.gstack).
60+
- `scripts/resolvers/question-tuning.ts` preamble — added marker-embedding convention (`<gstack-qid:{id}>`) and `(recommended)` label convention. Hook enforcement gates on marker presence.
61+
- `scripts/question-registry.ts` — added `signal_key: 'decision-autonomy'` to `land-and-deploy-merge-confirm` and `land-and-deploy-rollback` so the autonomy dimension has a real signal source.
62+
- `scripts/psychographic-signals.ts` — added `decision-autonomy` signal map.
63+
- `plan-tune/SKILL.md.tmpl` — new sections (Recent auto-decisions, Audit unmarked, Dream cycle review, Dream cycle distill); host-aware Stats with source breakdown + MARKED %; Step 0 routing extended with dream-cycle gate.
64+
- `bin/gstack-uninstall` — also cleans up `plan-tune-cathedral`-tagged hooks during uninstall.
65+
66+
**For contributors**
67+
- 4 cross-model tension resolutions during eng review locked in: project preferences win over global (D8), hash IDs are observed-only never preference keys (D18), AUQ matcher covers MCP variants (Codex correction), enforcement uses `permissionDecision: "deny"` + reason instead of `"allow"` + `updatedInput` until the AUQ input shape is verified against real Claude Code (T6 conservative path).
68+
- Plan-review preamble byte budget ratcheted 39000 → 40000 in `test/gen-skill-docs.test.ts` (~700 bytes added by the marker convention).
69+
- 9 Codex outside-voice findings folded directly without re-prompting (matcher correction, derive wiring, settings.json consent, signal_key namespace, etc.).
70+
371
## [1.49.0.0] - 2026-05-26
472

573
## **`/plan-tune` learns to ask for consent before logging, and runs the 5-question setup automatically when your profile is empty.**

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.49.0.0
1+
1.50.0.0

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "gstack",
3-
"version": "1.49.0.0",
3+
"version": "1.50.0.0",
44
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
55
"license": "MIT",
66
"type": "module",

test/fixtures/golden/claude-ship-SKILL.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
650650

651651
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary][option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
652652

653-
After answer, log best-effort:
653+
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
654+
655+
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
656+
657+
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
654658
```bash
655659
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
656660
```
@@ -3082,6 +3086,29 @@ This step is automatic — never skip it, never ask for confirmation.
30823086
30833087
---
30843088
3089+
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
3090+
3091+
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
3092+
per machine. Single line, non-blocking, marker-gated so it never re-fires.
3093+
3094+
```bash
3095+
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
3096+
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
3097+
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
3098+
echo ""
3099+
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
3100+
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
3101+
echo "auto-decides your never-ask preferences."
3102+
touch "$_NUDGE_MARKER"
3103+
fi
3104+
```
3105+
3106+
If the marker exists, OR question_tuning is already on, the nudge is a
3107+
no-op. The marker guarantees at-most-once per machine. To re-enable:
3108+
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
3109+
3110+
---
3111+
30853112
## Important Rules
30863113
30873114
- **Never skip tests.** If tests fail, stop.

test/fixtures/golden/codex-ship-SKILL.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -324,7 +324,36 @@ Effort both-scales: when an option involves effort, label both human-team and CC
324324

325325
Net line closes the tradeoff. Per-skill instructions may add stricter rules.
326326

327-
12. **Non-ASCII characters — write directly, never \u-escape.** When any
327+
### Handling 5+ options — split, never drop
328+
329+
AskUserQuestion caps every call at **4 options**. With 5+ real options, NEVER
330+
drop, merge, or silently defer one to fit. Pick a compliant shape:
331+
332+
- **Batch into ≤4-groups** — for coherent alternatives (e.g. version bumps,
333+
layout variants). One call, 5th surfaced only if first 4 don't fit.
334+
- **Split per-option** — for independent scope items (e.g. "ship E1..E6?").
335+
Fire N sequential calls, one per option. Default to this when unsure.
336+
337+
Per-option call shape: `D<N>.k` header (e.g. D3.1..D3.5), ELI10 per option,
338+
Recommendation, kind-note (no completeness score — Include/Defer/Cut/Hold are
339+
decision actions), and 4 buckets:
340+
**A) Include**, **B) Defer**, **C) Cut**, **D) Hold** (stop chain, discuss).
341+
342+
After the chain, fire `D<N>.final` to validate the assembled set (reprompt
343+
dependency conflicts) and confirm shipping it. Use `D<N>.revise-<k>` to
344+
revise one option without re-running the chain.
345+
346+
For N>6, fire a `D<N>.0` meta-AskUserQuestion first (proceed / narrow / batch).
347+
348+
question_ids for split chains: `<skill>-split-<option-slug>` (kebab-case ASCII,
349+
≤64 chars, `-2`/`-3` suffix on collision). The runtime checker
350+
(`bin/gstack-question-preference`) refuses `never-ask` on any `*-split-*` id,
351+
so split chains are never AUTO_DECIDE-eligible — the user's option set is sacred.
352+
353+
**Full rule + worked examples + Hold/dependency semantics:** see
354+
`docs/askuserquestion-split.md` in the gstack repo. Read on demand when N>4.
355+
356+
**Non-ASCII characters — write directly, never \u-escape.** When any
328357
string field (question, option label, option description) contains
329358
Chinese (繁體/簡體), Japanese, Korean, or other non-ASCII text, emit
330359
the literal UTF-8 characters in the JSON string. **Never escape them
@@ -357,6 +386,9 @@ Before calling AskUserQuestion, verify:
357386
- [ ] Net line closes the decision
358387
- [ ] You are calling the tool, not writing prose
359388
- [ ] Non-ASCII characters (CJK / accents) written directly, NOT \u-escaped
389+
- [ ] If you had 5+ options, you split (or batched into ≤4-groups) — did NOT drop any
390+
- [ ] If you split, you checked dependencies between options before firing the chain
391+
- [ ] If a per-option Hold fires, you stopped the chain immediately (didn't queue)
360392

361393

362394
## Artifacts Sync (skill start)
@@ -604,7 +636,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
604636

605637
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `$GSTACK_BIN/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary][option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
606638

607-
After answer, log best-effort:
639+
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
640+
641+
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
642+
643+
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
608644
```bash
609645
$GSTACK_BIN/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
610646
```
@@ -2660,6 +2696,29 @@ This step is automatic — never skip it, never ask for confirmation.
26602696
26612697
---
26622698
2699+
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
2700+
2701+
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
2702+
per machine. Single line, non-blocking, marker-gated so it never re-fires.
2703+
2704+
```bash
2705+
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
2706+
_QT=$($GSTACK_ROOT/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
2707+
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
2708+
echo ""
2709+
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
2710+
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
2711+
echo "auto-decides your never-ask preferences."
2712+
touch "$_NUDGE_MARKER"
2713+
fi
2714+
```
2715+
2716+
If the marker exists, OR question_tuning is already on, the nudge is a
2717+
no-op. The marker guarantees at-most-once per machine. To re-enable:
2718+
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
2719+
2720+
---
2721+
26632722
## Important Rules
26642723
26652724
- **Never skip tests.** If tests fail, stop.

0 commit comments

Comments
 (0)