Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fc29358
feat(brain): brain-cache-spec.ts — single source of truth for cache l…
garrytan May 27, 2026
446a4dc
feat(brain): gstack-core@1.0.0 schema pack (T1 / Phase 0)
garrytan May 27, 2026
237f4d8
feat(brain): gstack-brain-cache CLI (T2a) — core subcommands
garrytan May 27, 2026
9607a37
feat(brain): concurrent-refresh lockfile dedup (T15 / D3)
garrytan May 27, 2026
40e5dcf
feat(brain): salience privacy allowlist gate (T17 / D9)
garrytan May 27, 2026
7c9f1c2
feat(brain): bootstrap + list + purge subcommands (T2b / T18)
garrytan May 27, 2026
8f65b86
feat(brain): brain-aware planning resolvers + 3 new placeholders (T4)
garrytan May 27, 2026
5373bc3
feat(brain): gstack-config brain integration helpers (T5+T10+T16)
garrytan May 27, 2026
71ee115
feat(brain): wire 5 planning skill templates with BRAIN_* placeholder…
garrytan May 27, 2026
0c63591
feat(brain): setup-gbrain trust-policy step + sync-gbrain flags (T5b …
garrytan May 27, 2026
e884617
test(brain): schema migration + fence-block fallback + preflight budg…
garrytan May 27, 2026
3730235
docs(todos): brain-aware planning follow-ups (T11)
garrytan May 27, 2026
2d5d076
test(brain): bump schema-migration test timeout to 60s
garrytan May 27, 2026
065f290
Merge remote-tracking branch 'origin/main' into garrytan/upgrade-gsta…
garrytan May 27, 2026
c2c21d1
chore: bump version and changelog (v1.50.0.0)
garrytan May 27, 2026
9299ad1
chore: merge origin/main (v1.48.0.0 AskUserQuestion split)
garrytan May 27, 2026
4ab0f13
fix(test): tighten put_page regression pin to CLI subcommand
garrytan May 27, 2026
7939269
feat(brain): gstack-config gbrain-refresh subcommand
garrytan May 27, 2026
748047f
feat(brain): gen-skill-docs respects gbrain-detection override
garrytan May 27, 2026
17293cf
feat(brain): setup runs gbrain detection + conditional SKILL.md regen
garrytan May 27, 2026
b6b3fbc
refactor(brain): compress GBRAIN_* resolvers, move template prose to …
garrytan May 27, 2026
78313b4
feat(brain): wire SAVE_RESULTS for plan-design-review + plan-devex-re…
garrytan May 27, 2026
7c82ec0
test(brain): resolver compression + detection-override regression pins
garrytan May 27, 2026
33b0167
test(brain): fake-CLI agent-obedience E2E for /office-hours writeback
garrytan May 27, 2026
d9c5b15
test(brain): real PGLite round-trip E2E (matched-pair persistence)
garrytan May 27, 2026
181e457
chore(brain): touchfiles + TODOS + CHANGELOG for v1.50.0.0
garrytan May 27, 2026
49cae73
test(brain): tighten prompt + relax slug assertion in writeback E2E
garrytan May 27, 2026
10d5abb
chore: bump version to 1.51.1.0
garrytan May 29, 2026
5cbe109
Merge origin/main into garrytan/upgrade-gstack-gbrain-v1
garrytan May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,83 @@
# Changelog

## [1.52.1.0] - 2026-05-27

## **Brain-aware planning lands. Five planning skills read structured context from any personal gbrain before asking — same questions, smarter answers, no token tax.**

`/office-hours`, `/plan-ceo-review`, `/plan-eng-review`, `/plan-design-review`, and `/plan-devex-review` now preflight a typed entity model from your gbrain (Wintermute, local PGLite, or any thin-client MCP) before their first AskUserQuestion. Reviews stop asking "what's the product?" / "who's the target user?" / "what was your prior scope call?" — that context loads from cached digests of typed `gstack/product`, `gstack/goal`, `gstack/developer-persona`, `gstack/brand`, `gstack/competitive-intel`, `gstack/skill-run`, `gstack/user-profile`, and `gstack/take` pages. The brain becomes a structured model of your product and your judgment patterns, not just a search index.

The unlock: every planning skill filters its recommendations through "what does the user actually want right now, what is this product, what have we decided before." That's the qualitative shift codex outside-voice argued for — the brain telling reviews "this contradicts your January CEO plan" or "your developer persona digest says first-time CLI users; this plan adds 3 setup commands."

### The numbers that matter

Source: `bun test test/brain-cache-spec.test.ts test/skill-preflight-budget.test.ts` (verifies budgets statically) and `bin/gstack-brain-cache get product` smoke (verifies warm-hit latency).

| Surface | Before | After | Δ |
|---|---|---|---|
| Planning-skill cold-start tokens (preflight context) | 0 (asked everything) | 500–1500 tokens (warm hit) / 5–15 KB once-per-day (cold miss) | brain-as-model, not just search |
| MCP calls per skill invocation (warm hit) | n/a (no integration) | 0 (single disk read) | 95% path |
| MCP calls per skill invocation (cold miss) | n/a | 4–8 parallel calls, ~1–2s once | bounded |
| Autoplan (4 sequential skills) preflight cost | n/a | 1 cold-miss + 3 warm-hits via lockfile dedup | concurrent dedup saves 4× |
| New typed brain page kinds | 0 | 8 (`gstack-core@1.0.0` schema pack) | first-class entity model |
| Per-endpoint trust policies | 0 (sync mode global only) | 1 per `sha8(MCP URL)` namespace, hash collision → sha16 | shared-brain safe |
| New gate-tier tests | 0 | 10 files / 111 assertions | every correctness path covered |

The cache layer keeps the brain integration honest: 95% of invocations are a single disk read at ~10–30ms; cold-miss pays a one-time ~1–2s tax that's deduplicated across concurrent autoplan dispatches via a project-scoped lockfile. Salience is filtered by an allowlist (`projects/`, `concepts/`, `gstack/`) before write so personal pages — family, therapy, reflection — never leak into work-flow planning prompts. The trust-policy primitive makes personal-brain auto-push safe and shared-brain reads conservative by default.

### What this means for you

If you use planning skills today: every invocation gets sharper without you doing anything different. The skills ask fewer redundant questions and surface "this contradicts your Jan plan" / "your Feb TTHW benchmark was 2:15 vs the 5:30 baseline" / "tendency to under-expand on infra plans" — the brain doing the bookkeeping that your memory shouldn't have to.

If you use a remote MCP brain (Wintermute or your own): `/setup-gbrain` Step 9.5 asks the trust-policy question once per endpoint. Personal endpoint → `~/.gstack/` artifacts auto-push and calibration takes write back to your brain. Shared/team endpoint → reads only, prompts before writes, user-namespaced via federation sources or `users/<slug>/gstack/` prefix.

If you use local PGLite: auto-detected as personal; no question fires. The cache lives at `~/.gstack/{,projects/<slug>/}brain-cache/` with per-entity TTLs.

If you're a contributor: the new resolver pattern (`{{BRAIN_PREFLIGHT}}` / `{{BRAIN_CACHE_REFRESH}}` / `{{BRAIN_WRITE_BACK}}`) is the template seam for the brain integration. Empty string for any skill not in `SKILL_DIGEST_SUBSETS` — drop the placeholders anywhere with zero cost.

Phase 2 calibration write-back is gated behind the `BRAIN_CALIBRATION_WRITEBACK` feature flag (default off) until upstream gbrain ships `takes_add` / `takes_resolve` MCP ops (filed in TODOS.md as P2). When the flag flips, the existing skill templates pick up the write-back behavior with no template changes.

### Itemized changes

**Added**
- `scripts/brain-cache-spec.ts` — single source of truth for `BRAIN_CACHE_ENTITIES` (8 entities × TTL + budget + invalidation rules), `SKILL_DIGEST_SUBSETS` (per-skill which files to load), `SALIENCE_DEFAULT_ALLOWLIST`, `SKILL_CALIBRATION_WEIGHTS`, trust-policy + schema-pack constants.
- `scripts/gstack-schema-pack.ts` — `gstack-core@1.0.0` schema pack with 8 typed page kinds: `user-profile`, `product`, `goal`, `developer-persona`, `brand`, `competitive-intel`, `skill-run`, `take`. Frontmatter shapes, retention policies, link verbs for `mcp__gbrain__schema_graph`.
- `bin/gstack-brain-cache` — three-tier cache CLI: `get` / `refresh` / `invalidate` / `digest` / `meta` / `bootstrap` / `list` / `purge` subcommands. Atomic writes, TTL staleness, schema-version full-rebuild on mismatch, stale-but-usable fallback, concurrent-refresh lockfile dedup.
- `scripts/resolvers/gbrain.ts` — three new resolver functions: `generateBrainPreflight`, `generateBrainCacheRefresh`, `generateBrainWriteBack`. Empty-string for non-preflight skills (defensive).
- `bin/gstack-config` — `brain_trust_policy@<endpoint-hash>` namespace, `endpoint-hash` subcommand (sha8 with collision → sha16 escalation), `resolve-user-slug` subcommand (D4 A3 identity resolution chain: `whoami` → `$USER` → `sha8(git email)` → `anonymous-<sha8(hostname)>`).
- `setup-gbrain` Step 9.5 — brain trust policy question per-endpoint. Local auto-set personal; remote-ambiguous asks; personal flips `artifacts_sync_mode=full`.
- `sync-gbrain` — `--refresh-cache` flag (replaces planned `/brain-refresh-context` skill per D1 fold), `--audit` flag (gstack-owned page summary + salience leak check), Step 1 trust-policy gate.
- 10 new gate-tier test files (111 assertions): `brain-cache-spec`, `gstack-schema-pack`, `brain-cache-roundtrip`, `cache-concurrent-refresh`, `salience-allowlist`, `brain-preflight`, `user-slug-fallback`, `schema-version-migration`, `takes-fence-fallback`, `skill-preflight-budget`.

**Changed**
- 5 planning SKILL.md.tmpl files wired with `{{BRAIN_PREFLIGHT}}` (top of skill body) and `{{BRAIN_CACHE_REFRESH}}` / `{{BRAIN_WRITE_BACK}}` (end of skill) placeholders.
- `scripts/resolvers/index.ts` registers `BRAIN_PREFLIGHT`, `BRAIN_CACHE_REFRESH`, `BRAIN_WRITE_BACK`.

**For contributors**
- Three follow-ups deferred to `TODOS.md` (P2 / P3): `/gstack-reflect` nightly synthesis, cross-machine brain-cache sync, dedicated `/gstack-onboarding` skill.
- Upstream gbrain dependency for Phase 2: `takes_add` + `takes_resolve` MCP ops in `~/git/gbrain/` (filed as P2 in TODOS.md). Phase 2 wiring already exists behind `BRAIN_CALIBRATION_WRITEBACK` flag; flag flips when upstream lands.
- Plan / CEO + eng review record: `~/.claude/plans/hm-interesting-well-why-dapper-eagle.md` (Approach B + 5 cherry-picks + 11 D-decisions from full eng review + codex outside-voice synthesis).

### Save-results path: works under any CLI when gbrain is on PATH

Brain-aware planning saves the actual review document to gbrain, not just preflight digests and calibration takes. Setup detects gbrain at install time and, if present, the planning skills emit compressed `gbrain put "<prefix>/<feature-slug>"` instructions for `office-hours/`, `ceo-plans/`, `eng-reviews/`, `design-reviews/`, and `devex-reviews/` slug spaces. If gbrain is not detected, the save-results block is suppressed entirely. Zero token overhead for users without gbrain. If you install gbrain after running `./setup`, run `gstack-config gbrain-refresh` to pick up the change.

Token cost stays tight: the inline save-results block is ~150 tokens per planning skill (down from ~1000 a naive un-suppression would have added). The full save template (heredoc body, entity-stub instructions, throttle handling, backlinks) lives in `docs/gbrain-write-surfaces.md` §Save Template and the agent reads it on demand only when it actually saves. Same compression discipline for the brain-context-load block: ~115 tokens with skip-header pointing to §Context Load.

| Detection state | Per-planning-skill token overhead | What the agent does on save |
|---|---|---|
| gbrain on PATH + `gstack-config gbrain-refresh` says `local_status: "ok"` | ~250 tokens (CONTEXT_LOAD + SAVE_RESULTS, compressed) | reads `docs/gbrain-write-surfaces.md` on demand, calls `gbrain put <prefix>/<slug>` |
| gbrain not on PATH | 0 tokens | block suppressed at gen-time, nothing rendered |
| GBrain or Hermes host adapter | full inline render (unchanged) | calls `gbrain put` always |

Wired for all five planning skills uniformly: `office-hours`, `plan-ceo-review`, `plan-eng-review`, `plan-design-review`, `plan-devex-review`. The last two gained the `{{GBRAIN_SAVE_RESULTS}}` placeholder in their templates (previously only the first three had it, so design-review and devex-review produced no retrievable page even under GBrain CLI).

Coverage: a free resolver-level unit test pins per-skill slug + tag metadata + the compressed token budget (`test/resolvers-gbrain-save-results.test.ts`, 10 tests / 53 assertions); a free override-mechanism test asserts the detection file gates resolver rendering correctly across `detected: true`, `detected: false`, and `no file` states (`test/gbrain-detection-override.test.ts`, 4 tests); a periodic-tier fake-CLI E2E drives `/office-hours` against a stub `gbrain` on PATH and asserts the agent actually calls `gbrain put office-hours/<slug>` with valid YAML frontmatter (`test/skill-e2e-office-hours-brain-writeback.test.ts`, ~$0.50-1/run); a periodic-tier real-CLI round-trip drives `gbrain init --pglite` + `gbrain put` + `gbrain get` against an isolated temp HOME and asserts the body survives (`test/skill-e2e-gbrain-roundtrip-local.test.ts`, ~$0.001/run, skips if `VOYAGE_API_KEY` is unset). Together: the agent obeys the resolver instruction, the resolver emits a valid CLI shape, and the CLI persists the page on the local engine. Remote/Supabase routing is gbrain's contract to honor — the same CLI shape covers all engines, so gstack stops at local round-trip coverage.

**For contributors (save-results layer):**
- `bin/gstack-config gbrain-refresh` re-runs `bin/gstack-gbrain-detect` and writes `~/.gstack/gbrain-detection.json`. `./setup` runs this at the end of install and conditionally regenerates Claude-host SKILL.md with `bun run gen:skill-docs:user` (added package.json script) so detected installs get the brain blocks immediately.
- The default `bun run gen:skill-docs` (CI canonical) ignores the detection file. Committed SKILL.md stays reproducible regardless of any developer's local gbrain state. Use `bun run gen:skill-docs:user` for user-local installs.
- Two follow-ups deferred to `TODOS.md` (P2): re-verify calibration takes when gbrain v0.42+ ships `takes_add` (the `BRAIN_CALIBRATION_WRITEBACK` flag flips); extend the brain-writeback E2E to the other 4 planning skills.

## [1.52.0.0] - 2026-05-27

## **`/plan-tune` settings actually do something now. Hooks make capture deterministic, preferences binding, and free-text answers loop back as memory.**
Expand Down
162 changes: 162 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2070,3 +2070,165 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr
### Auto-upgrade mode + smart update check
- Config CLI (`bin/gstack-config`), auto-upgrade via `~/.gstack/config.yaml`, 12h cache TTL, exponential snooze backoff (24h→48h→1wk), "never ask again" option, vendored copy sync on upgrade
**Completed:** v0.3.8

---

## Brain-aware planning follow-ups (filed v1.48.0.0 via /plan-ceo-review + /plan-eng-review)

These are the deferred cherry-picks (E2/E3/E4) from the v1.48 brain-aware
planning plan at `~/.claude/plans/hm-interesting-well-why-dapper-eagle.md`.
The foundation (Phase 0 entity model + Phase 0.5 cache + Phase 1 preflight
+ Phase 1.5 trust policy + Phase 2 write-back scaffolding) ships in
v1.48.0.0. These follow-ups extend it.

### P2: /gstack-reflect nightly synthesis skill (E2)

**What:** Scheduled skill that reads weekly `gstack/skill-run` + takes +
`get_recent_salience` and synthesizes a `gstack/insight` page surfaced at
next skill preflight.

**Why:** Cross-time pattern detection is the compounding move. "You ran 4
plan-ceo on infra this week, 0 on product — is product work getting
starved?" surfaces patterns the user wouldn't notice.

**Pros:** Brain compounds across TIME, not just across skills. Patterns
become actionable.

**Cons:** "You're starving product work" is high-judgment territory; needs
opt-out per project, careful insight templates.

**Context:** Deferred from v1.48.0.0 cherry-pick (D4) — wait 4-6 weeks for
real `gstack/skill-run` data to accumulate before designing the reflection
layer against real patterns instead of imagined ones.

**Effort:** L (human ~1-2 days, CC ~4-6h)

**Depends on:** Phase 0 (gstack/skill-run page type from v1.48.0.0) +
~6 weeks of accumulated data

### P3: Cross-machine brain-cache sync (E3)

**What:** Push compressed digests through the gstack-brain-sync git pipeline
so the brain-cache survives moving between Macs / Conductor workspaces.

**Why:** Eliminates the cold-miss tax on every new machine (~1-2s once per
machine per day).

**Pros:** Instant warm cache on new machines.

**Cons:** Cache poisoning risk if not designed carefully (hash invariants,
endpoint-binding, conflict resolution).

**Context:** Deferred from v1.48.0.0 cherry-pick (D5) — single-machine
cache is fine for V1; correctness risk needs its own design pass.

**Effort:** M (human ~4h, CC ~30min)

**Depends on:** Brain-cache layer from v1.48.0.0

### P3: /gstack-onboarding dedicated skill (E4)

**What:** Guided 5-minute setup skill for new gstack installs: walks user
through reading CLAUDE.md + README + recent commits to build `gstack/product`
and active goals with explicit AUQs.

**Why:** Better UX than the inline bootstrap (which only fires when a
planning skill is invoked).

**Pros:** Cleaner cold-start, explicit ceremony.

**Cons:** Inline bootstrap (in scope for v1.48) already covers the
cold-start path adequately.

**Context:** Deferred from v1.48.0.0 cherry-pick (D6) — observe inline
bootstrap performance first; add dedicated skill if friction is real.

**Effort:** S (human ~2h, CC ~15min)

**Depends on:** Inline bootstrap subcommand from v1.48.0.0

### P2: Upstream gbrain takes_add + takes_resolve MCP ops

**What:** Add `mcp__gbrain__takes_add` and `mcp__gbrain__takes_resolve`
ops in `~/git/gbrain/src/core/operations.ts`. Extract the markdown-fence
mirror logic from `commands/takes.ts:570` into a reusable
`engine.resolveTake()` helper.

**Why:** Unlocks Phase 2 calibration write-back without the fence-block
fallback. ~150 LOC. Already on gbrain's v0.31.x roadmap.

**Pros:** Clean Phase 2 path, removes the "fall back to put_page" smell.

**Cons:** Lives in upstream gbrain repo, not helsinki — separate PR.

**Context:** Phase 2 write-back is already wired in v1.48.0.0 behind the
BRAIN_CALIBRATION_WRITEBACK feature flag (default off). Flag flips to
true once upstream gbrain ships these ops. ~50 LOC follow-up in
helsinki to swap the fallback for the preferred op.

**Effort:** S (human ~1d, CC ~1h) in gbrain repo; trivial wire-up in
helsinki.

**Depends on:** None (parallel-track from v1.48.0.0)

### P3: Background-refresh hook supervision

**What:** Codex outside-voice raised that "background refresh at skill END"
is hand-wavy. Add proper process supervision: PID file, timeout, failure
log, cross-platform spawn.

**Why:** Current implementation backgrounds with `&` which works but
leaves no observability when a refresh fails.

**Context:** Deferred from v1.48.0.0 codex tension T3. Stays low priority
until users report stale digests where a background refresh silently
failed.

**Effort:** S (human ~2h, CC ~20min)

### P2: Re-verify calibration takes when gbrain v0.42+ lands

**What:** When upstream gbrain ships `takes_add` MCP op and we flip
`BRAIN_CALIBRATION_WRITEBACK` from FALSE to TRUE, re-run the manual
probe in `docs/gbrain-write-surfaces.md` against `/office-hours` and
confirm `gbrain takes_list` surfaces a `kind=bet` entry with the
expected weight (0.9 for office-hours, per
`scripts/brain-cache-spec.ts:151-157`).

**Why:** Today the calibration take path falls back to writing inside a
`gbrain put` fence block because `takes_add` isn't available yet. Once
v0.42+ ships, the agent will call `takes_add` directly — we should
confirm the new path actually persists a queryable take.

**Context:** v1.50.0.0 plan §"NOT in scope". The fence-block fallback
test (`test/takes-fence-fallback.test.ts`) covers wiring for both paths;
this TODO is about live verification of the preferred path when it
becomes available.

**Effort:** XS (human ~15min, CC ~5min)

**Depends on:** Upstream gbrain v0.42+ release shipping `takes_add` MCP
op (separate TODO above).

### P2: Extend brain-writeback E2E to the other 4 planning skills

**What:** `test/skill-e2e-office-hours-brain-writeback.test.ts` covers
the brain-writeback path for `/office-hours` only. Adding parallel
tests for `/plan-ceo-review`, `/plan-eng-review`, `/plan-design-review`,
and `/plan-devex-review` would bring per-skill agent-obedience coverage
to parity with the resolver unit test
(`test/resolvers-gbrain-save-results.test.ts`, which covers wiring for
all 5).

**Why:** The resolver test proves the right instructions get emitted;
the E2E proves the agent actually obeys. Today we only have that
end-to-end signal for one of five planning skills.

**Context:** v1.50.0.0 plan §"NOT in scope". Extract `makeFakeGbrain`
into `test/helpers/fake-gbrain.ts` when the second consumer arrives
(YAGNI for one consumer today).

**Effort:** S (human ~1d, CC ~1h). Periodic-tier (~$2-4 total for 4
runs).

**Depends on:** None.
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.52.0.0
1.52.1.0
Loading
Loading