Skip to content

Commit d21ba06

Browse files
garrytanclaude
andauthored
v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup (#1432)
* refactor: batch-import architecture (D1-D8) + F6 atomic state + F9 full-file hash bin/gstack-memory-ingest.ts: rewrite memory ingest around `gbrain import <dir>` batch path. Replaces per-file gbrainPutPage loop (~470s of subprocess startup per cold run) with prepare-then-batch: walkAllSources -> preparePages: mtime-skip + optional gitleaks (--scan-secrets) + parse -> writeStaged: mkdir -p per slug segment, hierarchical (D1) -> snapshot ~/.gbrain/sync-failures.jsonl byte offset -> runGbrainImport (async spawn) -> parseImportJson -> readNewFailures: read appended bytes, map back to source paths (D7) -> state.sessions[path] = {...} for files NOT in failed set -> saveStateAtomic (F6) + cleanupStagingDir Architecture decisions: D1 hierarchical staging dir D2 cut over, deleted gbrainPutPage entirely D3 source-file gitleaks made opt-in via --scan-secrets (gstack-brain-sync owns the cross-machine boundary; per-file scan was redundant ~470s tax) D4 OK/ERR verdict (no DEGRADED tri-state) D5 unified state schema (no separate skip-list) D6 trust gbrain content_hash idempotency (no skip_reason bookkeeping) D7 byte-offset snapshot of sync-failures.jsonl + per-source mapping F6 saveState uses tmp+rename atomic write F9 fileSha256 removes 1MB cap; full-file hash (no more silent tail-edit misses on long partial transcripts) Signal handling: installSignalForwarder propagates SIGTERM/SIGINT to the gbrain child process AND synchronously cleans the staging dir before process.exit. Pre-fix, orchestrator timeouts left gbrain processes orphaned holding the PGLite write lock (observed: 15-hour-CPU-time orphan still alive a day later). parseImportJson returns null on unparseable output (treated as ERR by caller) instead of silently zeroing through. gbrainAvailable() probes for the `import` subcommand instead of `put`. Plan + review chain at /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: orchestrator OK/ERR verdict parser for batch memory ingest gstack-gbrain-sync.ts: memory-stage parser now picks [memory-ingest] ERR lines preferentially over the latest [memory-ingest] line, strips the prefix and any leading 'ERR: ' for cleaner summary output, and surfaces '(killed by signal / timeout)' when the child exits with status=null. Matches D6's OK/ERR contract: per-file failures (FILE_TOO_LARGE etc.) show in the summary count but only system-level failures (gbrain crash, process kill, missing CLI) mark the stage ERR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: batch-ingest writer regressions + refresh golden ship fixtures test/gstack-memory-ingest.test.ts: 5 new tests for the batch-import architecture: 1. D1 hierarchical staging slug round-trip — asserts staged file lives in transcripts/claude-code/<dir>/*.md, not flat at staging root 2. Frontmatter injection — asserts title/type/tags written into the staged page's YAML block 3. D7 sync-failures.jsonl exclusion — files listed as failed by gbrain do NOT get state-recorded; one of two test sessions lands, the other stays un-ingested for retry next run 4. Missing-`import`-subcommand error path — when gbrain only advertises legacy `put`, memory-ingest exits 1 with [memory-ingest] ERR 5. --scan-secrets opt-in path — verifies a dirty-source file is skipped via the secret-scan match when the flag is on, while a clean session in the same run still gets staged Replaces the prior put-per-file shim with an import-batch shim. The shim fails loudly (exit 99) if the new code ever regresses to per-file `gbrain put` calls. test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md: refresh golden baselines to match the current generated SKILL.md content after the v1.31.0.0 AskUserQuestion fallback-clause deletion. Goldens were stale from that release; test was failing on origin/main before this PR. Caught by the /ship test pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v1.33.0.0 docs: design doc, P2 perf TODOs, gbrain guidance block, changelog docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8 decisions (D1-D8), source-verified gbrain behaviors (content_hash idempotency, frontmatter parity, path-authoritative slug, per-file failure surface), measured performance vs plan target, F9 hash migration one-time cliff note, and follow-up TODOs. CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain indicating this worktree's pin and how the agent should prefer gbrain search over Grep for semantic queries. TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation (5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1 SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import at the prepare-batch level for true no-op fast paths. VERSION + package.json: bump to 1.33.0.0 (queue-aware via bin/gstack-next-version — skipped v1.32.0.0 which is claimed by sibling worktree garrytan/wellington / PR #1431). CHANGELOG.md: v1.33.0.0 entry per the release-summary format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: setup-gbrain/memory.md reflects opt-in per-file gitleaks Per-file gitleaks scanning during memory ingest is now opt-in via --scan-secrets (or GSTACK_MEMORY_INGEST_SCAN_SECRETS=1). Update the user-facing reference doc so it stops claiming "every page passes through gitleaks." Also corrects the /gbrain-sync → /sync-gbrain command typo and the post-incident recovery section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7489506 commit d21ba06

12 files changed

Lines changed: 1523 additions & 223 deletions

CHANGELOG.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,76 @@
11
# Changelog
22

3+
## [1.33.0.0] - 2026-05-11
4+
5+
## **`/sync-gbrain` memory stage no longer infinite-loops or silently throws away progress.**
6+
## **Per-file gitleaks scanning is opt-in, signal handling actually kills the gbrain child, and state writes are atomic.**
7+
8+
`/sync-gbrain` memory ingest used to spawn `gitleaks detect` plus `gbrain put` once per file across 1,841+ transcripts and artifacts, then the orchestrator SIGTERM'd the whole pipeline at 35 minutes with no state flush. Every cold run started from zero and burned 35 minutes for nothing. v1.33 rewrites the memory stage around `gbrain import <dir>` (batch path that's been in gbrain since v0.20). The prepare phase walks sources, parses transcripts and artifacts, writes prepared markdown into a hierarchical staging directory mirroring slug structure, then invokes `gbrain import` once. Per-file failures get read back from `~/.gbrain/sync-failures.jsonl` via a byte-offset snapshot so the state file only records files that actually landed in PGLite. `--scan-secrets` is now an opt-in flag because `gstack-brain-sync` already runs a regex-based secret scanner at the actual cross-machine boundary (git push), making per-file ingest scans redundant defense-in-depth that cost ~470 seconds on every cold run.
9+
10+
The signal handler now propagates `SIGTERM` and `SIGINT` to the gbrain child and synchronously cleans up the staging directory before `process.exit`, fixing the orphan-process bug that left gbrain holding the PGLite write lock and burning CPU for hours after the orchestrator gave up. State file writes use `tmp+rename` for atomicity so a crash mid-write can't truncate the ingest state. The full-file `sha256` change detection (was capped at 1MB) catches tail edits to long partial transcripts that the old algorithm silently missed.
11+
12+
### The numbers that matter
13+
14+
Source: live run on `~/.gstack/projects/` corpus (5,135 transcripts + artifacts), `bin/gstack-memory-ingest.ts --bulk` on a fresh PGLite at gbrain v0.31.2.
15+
16+
| Metric | Before (v1.31.x) | After (v1.33) | Δ |
17+
|---|---|---|---|
18+
| Cold run completes | no, 35-min loop + null exit | yes | works |
19+
| Prepare phase time (5,135 files) | ~10-12 min | <10 sec | ~60x |
20+
| Per-file gitleaks scans | 1,841 mandatory | 0 by default, opt-in via `--scan-secrets` | gated |
21+
| State file flushed on SIGTERM | no, loss-on-kill | yes, sync cleanup before exit | fixed |
22+
| Orphan gbrain child after timeout | yes, observed 15hr CPU drain | no, signal forwarded | fixed |
23+
| FILE_TOO_LARGE blocks all advancement | yes | no, failed paths excluded via D7 | fixed |
24+
| Tests in `test/gstack-memory-ingest.test.ts` | 17 | 21 | +4 |
25+
26+
| Decision | What landed |
27+
|---|---|
28+
| D1 hierarchical staging | `writeStaged` does `mkdir -p` per slug segment |
29+
| D2 cut over | `gbrainPutPage` deleted, no `--legacy-ingest` flag |
30+
| D3 source-first secret scan | Scan opt-in via `--scan-secrets`, default off |
31+
| D4 OK/ERR verdict | Per-file failures show in summary but only system errors mark ERR |
32+
| D5 unified state schema | No separate skip-list file |
33+
| D6 trust idempotency | gbrain's content_hash dedup makes reruns cheap |
34+
| D7 sync-failures byte-offset | `readNewFailures` reads only appended bytes since pre-import snapshot |
35+
| F6 atomic state writes | `tmp+rename` instead of direct overwrite |
36+
| F9 full-file sha256 | Removes 1MB cap that silently swallowed tail edits |
37+
38+
Prepare phase dropped from ~10 minutes to <10 seconds because the dominant cost was `gitleaks detect` cold start (~256ms per file, 5,135 files = 22 minutes of subprocess startup). The cross-machine secret boundary is `git push`, and `gstack-brain-sync` already runs its own regex scanner there. Local PGLite ingest of files that already live on disk in plaintext doesn't change exposure. The opt-in flag survives for users who want per-file ingest scanning, but it's no longer the default tax on every cold run.
39+
40+
### What this means for builders
41+
42+
If you've been hitting the 35-minute hang on `/sync-gbrain`, it's gone. The architecture is correct on this side now. A separate `gbrain import` performance issue surfaced during testing where the gbrain CLI itself takes >10 minutes on 5,131-file staging dirs (10 seconds on 501 files), which is filed as a P2 TODO for gbrain proper. That's the next bottleneck to chase, but it lives in gbrain's import path, not in the gstack orchestrator. Run `/sync-gbrain` after upgrading. If you've been seeing the loop, this fixes it.
43+
44+
### Itemized changes
45+
46+
#### Added
47+
- `bin/gstack-memory-ingest.ts:1093` — `preparePages` pure function: walk sources, mtime-skip via state, optional gitleaks scan (`--scan-secrets`), parse transcripts and artifacts, render frontmatter with `title`/`type`/`tags` injected.
48+
- `bin/gstack-memory-ingest.ts:920` — `writeStaged` writes prepared markdown into a hierarchical staging directory mirroring slug structure. `mkdir -p` per slug segment. Slugs containing `/` (like `transcripts/claude-code/foo`) get the matching subdirectory tree so gbrain's path-authoritative `slugifyPath` round-trips exactly.
49+
- `bin/gstack-memory-ingest.ts:961` — `parseImportJson` reads gbrain's `--json` last-line payload. Returns `null` (treated as `system_error` by caller) instead of zero-padded silently when the line doesn't parse.
50+
- `bin/gstack-memory-ingest.ts:993` — `readNewFailures` snapshots `~/.gbrain/sync-failures.jsonl` byte offset before import, reads only appended bytes after, maps gbrain's staging-relative paths back to source paths via the `stagedPathToSource` map.
51+
- `bin/gstack-memory-ingest.ts:1009` — `runGbrainImport` async wrapper around `child_process.spawn` so the signal forwarder has a child reference to kill on parent `SIGTERM`/`SIGINT`. Pre-2026-05-11 `spawnSync` made signal forwarding impossible and gbrain orphaned every time the orchestrator timed out.
52+
- `bin/gstack-memory-ingest.ts:1218` — `installSignalForwarder` registers `SIGTERM`/`SIGINT` handlers that forward to the live child, synchronously clean up the active staging directory, then exit. Async `finally` blocks don't run after `process.exit` from inside a signal handler, so cleanup has to happen in the handler itself.
53+
- `bin/gstack-memory-ingest.ts:194` — `--scan-secrets` CLI flag and `GSTACK_MEMORY_INGEST_SCAN_SECRETS=1` env var to opt back into per-file gitleaks scanning during the prepare phase. Off by default.
54+
- `test/gstack-memory-ingest.test.ts:457` — 5 new tests covering hierarchical staging slug round-trip, frontmatter injection, D7 sync-failures exclusion, missing-`import`-subcommand error path, and `--scan-secrets` dirty-source skipping with a fake gitleaks shim.
55+
- `docs/designs/SYNC_GBRAIN_BATCH_INGEST.md` — full design doc with D1-D8 decisions, source-verified gbrain behaviors, performance measurements, F9 hash migration notes.
56+
57+
#### Changed
58+
- `bin/gstack-memory-ingest.ts:288` — `saveState` now uses `tmp+rename` for atomicity (F6) so a crash mid-write can't truncate the state file. Matches the orchestrator's existing pattern at `gstack-gbrain-sync.ts:508`.
59+
- `bin/gstack-memory-ingest.ts:307` — `fileSha256` hashes the full file (F9). Pre-2026-05-11 it stopped at 1MB, so tail edits to long partial transcripts looked unchanged and never re-imported. One-time cliff on upgrade: files whose mtime hasn't moved keep their old 1MB-capped hash, files whose mtime moves get recomputed correctly. No data loss.
60+
- `bin/gstack-memory-ingest.ts:798` — `gbrainAvailable` probes for the `import` subcommand in `--help` output (was: `put` subcommand). Without `import`, the memory stage exits non-zero with a `system_error` instead of silently degrading.
61+
- `bin/gstack-gbrain-sync.ts:442` — memory-stage parser preferentially picks `[memory-ingest] ERR` lines over the latest `[memory-ingest]` line for the summary, strips the prefix, and surfaces `(killed by signal / timeout)` when the child exits with `status=null`.
62+
63+
#### Fixed
64+
- Per-file gitleaks scan was running on every transcript and artifact during memory ingest as redundant defense-in-depth. The cross-machine secret boundary is `gstack-brain-sync` (git push), which already runs a Python regex scanner. Local PGLite ingest doesn't change exposure surface for content that already lives on disk in plaintext.
65+
- Signal handlers now kill the gbrain child and clean up the staging directory before exit. Pre-fix, every orchestrator timeout left a gbrain process holding the PGLite write lock and burning CPU until the user noticed and `kill -9`'d it manually (observed: a 15-hour-CPU-time orphan from yesterday's run was still alive today).
66+
- `parseImportJson` no longer silently returns `{imported: 0, errors: 0}` when gbrain's `--json` output doesn't parse. Returns `null`, caller surfaces as `system_error` so the orchestrator's verdict block shows ERR instead of misleading OK/0/0.
67+
- `bin/gstack-memory-ingest.ts` `require("fs")` calls replaced with top-level ESM `import`s for runtime portability.
68+
69+
#### For contributors
70+
- Plan file at `/Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md` captures the full review chain: `/investigate` → `/plan-eng-review` (5 architecture decisions D1-D5) → `/codex review` outside-voice plan challenge (9 findings, 3 reshaped the architecture into D6-D8). Plan also records the post-Codex user perf review that flipped D3 to opt-in.
71+
- `TODOS.md` filed P2: investigate `gbrain import` perf on large staging dirs (5,131 files takes >10 minutes when 501 takes 10 seconds — gbrain-side N+1 SQL or auto-link reconciliation suspected). P3: cache "no changes since last import" at the prepare-batch level for true no-op fast paths.
72+
- `Plan completion audit` ran via subagent on this branch: 17/21 DONE, 1 CHANGED (D3 made opt-in), 2 deferred (F8 benchmark harness as separate work, 24-path unit coverage went integration-only).
73+
374
## [1.32.0.0] - 2026-05-10
475

576
## **Seven contributor PRs land. Three are security or hardening.**

CLAUDE.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -778,3 +778,40 @@ Key routing rules:
778778
- Ship/deploy/PR → invoke /ship or /land-and-deploy
779779
- Save progress → invoke /context-save
780780
- Resume context → invoke /context-restore
781+
782+
## GBrain Search Guidance (configured by /sync-gbrain)
783+
<!-- gstack-gbrain-search-guidance:start -->
784+
785+
GBrain is set up and synced on this machine. The agent should prefer gbrain
786+
over Grep when the question is semantic or when you don't know the exact
787+
identifier yet.
788+
789+
**This worktree is pinned to a worktree-scoped code source** via the
790+
`.gbrain-source` file in the repo root (kubectl-style context). Any
791+
`gbrain code-def`, `code-refs`, `code-callers`, `code-callees`, or `query`
792+
call from anywhere under this worktree routes to that source by default —
793+
no `--source` flag needed. Conductor sibling worktrees of the same repo
794+
each have their own pin and their own indexed pages, so semantic results
795+
match the actual code on disk in this worktree.
796+
797+
Two indexed corpora available via the `gbrain` CLI:
798+
- This worktree's code (auto-pinned via `.gbrain-source`).
799+
- `~/.gstack/` curated memory (registered as `gstack-brain-<user>` source via
800+
the existing federation pipeline).
801+
802+
Prefer gbrain when:
803+
- "Where is X handled?" / semantic intent, no exact string yet:
804+
`gbrain search "<terms>"` or `gbrain query "<question>"`
805+
- "Where is symbol Y defined?" / symbol-based code questions:
806+
`gbrain code-def <symbol>` or `gbrain code-refs <symbol>`
807+
- "What calls Y?" / "What does Y depend on?":
808+
`gbrain code-callers <symbol>` / `gbrain code-callees <symbol>`
809+
- "What did we decide last time?" / past plans, retros, learnings:
810+
`gbrain search "<terms>" --source gstack-brain-<user>`
811+
812+
Grep is still right for known exact strings, regex, multiline patterns, and
813+
file globs. Run `/sync-gbrain` after meaningful code changes; for ongoing
814+
auto-sync across all worktrees, run `gbrain autopilot --install` once per
815+
machine — gbrain's daemon handles incremental refresh on a schedule.
816+
817+
<!-- gstack-gbrain-search-guidance:end -->

TODOS.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,66 @@
11
# TODOS
22

3+
## /sync-gbrain memory stage perf follow-up
4+
5+
### P2: Investigate `gbrain import` perf on large staging dirs
6+
7+
**What:** Cold-run time on a 5131-file staging dir is >10 min in `gbrain import`
8+
alone (after gstack's prepare phase, which is now <10s after dropping per-file
9+
gitleaks). On 501 files it took 10s. The scaling is worse than linear and the
10+
bottleneck is inside gbrain, not the gstack orchestrator.
11+
12+
**Why:** With memory-ingest's prepare phase now fast, the remaining cold-run cost
13+
is entirely on the gbrain side. Users with large corpora (5K+ files) currently pay
14+
~15-30 min on first ingest. Likely culprits in `~/git/gbrain/src/core/import-file.ts`:
15+
16+
- N+1 SQL queries: `engine.getPage(slug)` for each file's content_hash check
17+
(line 242 + 478) — should be batched into a single query
18+
- Per-page auto-link reconciliation that fires even for unchanged content
19+
- FTS / vector index updates without batching transactions
20+
21+
**Pros:** Lives in gbrain (cleaner separation). Fix in gbrain benefits other
22+
gbrain callers too (`gbrain sync`, MCP `put_page` workflows). Likely 10-50x
23+
speedup from batched queries alone.
24+
25+
**Cons:** Cross-repo change, requires gbrain test coverage for the new batched
26+
path. Not on the gstack critical path; gstack's architecture is already correct.
27+
28+
**Context:** Verified on real corpus 2026-05-10. gstack-side prepare with
29+
`--scan-secrets` off runs in <10s. The full gbrain import on the same staged
30+
dir consumes 100% CPU for >10 min. Both observations from
31+
`bin/gstack-memory-ingest.ts:ingestPass` reaching the `runGbrainImport` call
32+
quickly, then the child process taking the bulk of the wall time.
33+
34+
**Depends on:** None — gstack's batch-ingest architecture (D1-D8 in
35+
`docs/designs/SYNC_GBRAIN_BATCH_INGEST.md`) is already shipped and correct.
36+
37+
---
38+
39+
### P3: Cache "no changes since last import" at the prepare-batch level
40+
41+
**What:** Even with the prepare phase fast (<10s for 5135 files), walking and
42+
mtime-stat'ing every file on a true no-op run adds a few seconds and creates
43+
spurious staging dirs. Cache the most-recent-source-mtime per-source in the
44+
state file; if no source dir has a newer mtime, skip the walk + stage + import
45+
entirely.
46+
47+
**Why:** Most `/sync-gbrain` invocations have nothing new to ingest. The
48+
fastest path is "do nothing, fast." `gbrain doctor` should still report state,
49+
but the actual ingest pipeline can short-circuit when last_full_walk is recent
50+
and no source-tree mtime has moved.
51+
52+
**Pros:** Trivial implementation (~20 lines in `ingestPass`). Makes the
53+
incremental fast-path actually live up to "<30s" in the original plan.
54+
55+
**Cons:** Adds a cache invalidation surface. If a user edits a file but its
56+
parent dir's mtime doesn't update (rare on macOS APFS), changes get missed.
57+
Mitigation: only short-circuit when last_full_walk is recent (e.g. <1 min ago).
58+
59+
**Context:** Filed during 2026-05-10 perf testing after `--scan-secrets` was
60+
made opt-in. Lower priority than the gbrain-side perf issue above.
61+
62+
---
63+
364
## Browser-skills follow-on (Phases 2-4)
465

566
### P1: Browser-skills Phase 2 — `/scrape` and `/skillify` skill templates

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.32.0.0
1+
1.33.0.0

bin/gstack-gbrain-sync.ts

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -442,14 +442,30 @@ function runMemoryIngest(args: CliArgs): StageResult {
442442
timeout: 35 * 60 * 1000,
443443
});
444444

445-
const summary = (result.stderr || "").split("\n").filter((l) => l.includes("[memory-ingest]")).slice(-1)[0] || "ingest pass complete";
446-
445+
// D6: parse [memory-ingest] lines from the child's stderr. ERR-prefixed
446+
// lines indicate a system-level failure (gbrain crashed or CLI missing)
447+
// and the child exits non-zero. Per-file failures are summarized in the
448+
// last non-ERR [memory-ingest] line but do NOT make the verdict ERR.
449+
const stderrLines = (result.stderr || "").split("\n");
450+
const memLines = stderrLines.filter((l) => l.includes("[memory-ingest]"));
451+
const errLine = memLines.find((l) => l.includes("[memory-ingest] ERR"));
452+
const lastMemLine = memLines.slice(-1)[0];
453+
const rawSummary = errLine || lastMemLine || "ingest pass complete";
454+
// Strip the "[memory-ingest] " prefix and any leading "ERR: " for cleaner
455+
// verdict output. The orchestrator's own formatStage will prefix with OK/ERR.
456+
const summary = rawSummary
457+
.replace(/^.*\[memory-ingest\]\s*/, "")
458+
.replace(/^ERR:\s*/, "");
459+
460+
const ok = result.status === 0;
447461
return {
448462
name: "memory",
449463
ran: true,
450-
ok: result.status === 0,
464+
ok,
451465
duration_ms: Date.now() - t0,
452-
summary: result.status === 0 ? summary : `memory ingest exited ${result.status}`,
466+
summary: ok
467+
? summary
468+
: `${summary}${result.status === null ? " (killed by signal / timeout)" : ` (exit ${result.status})`}`,
453469
};
454470
}
455471

0 commit comments

Comments
 (0)