Skip to content

Commit 66f3a18

Browse files
garrytanclaudejbetala7andrey-esipovdavidfoy
authored
v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642)
* fix(gbrain-sync): --full produces an empty code index on first run of a new repo `gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks the filesystem. On a freshly-registered source (0 pages), a --full run that called reindex-code alone found nothing ("No code pages to reindex"), finished in ~1s, and left the code index permanently empty while still reporting OK. Fix: --full now runs `sync --strategy code` FIRST to create pages via the file walk, then runs `reindex-code` to honor the documented "full walk + reindex" contract for both fresh and populated sources. Contributed by @jetsetterfl via #1584. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL The freshClassify probe ran `gbrain sources list --json` with the inherited process env. When the probe ran from inside a repo with its own .env (an app DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain connected to the wrong database, and the classifier reported broken-db on otherwise-healthy brains. Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the same helper the sync orchestrator uses. DATABASE_URL is seeded from ~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no longer propagate a poisoned negative to clean directories. Contributed by @jetsetterfl via #1583. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624) /retro silently produced confidently-wrong output when "today" drifted (model session-context error) or when origin/<default> was materially behind the actual remote — git log --since returned zero or near-zero commits and the narrative was fabricated from nothing. Adds Step 0.5 with four ordered pre-check branches before any window analysis: A. No 'origin' remote → skip with "base freshness not verified" note B. Detached HEAD → skip with "base freshness not verified" note C. `git fetch origin <default>` fails (offline) → warn, proceed against last-known origin/<default> D. Fetch succeeded → compare today vs latest origin/<default> commit; if gap > window-days, BLOCK with explicit citation of latest-commit date. Skip paths still proceed to Step 1, but the disclosure is carried into the retro narrative ("offline run, window not freshness-verified") so the output is never silently confidently-wrong. Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(retro): regression for #1624 stale-base pre-flight guard 13 static-invariant tests pinning the four ordered pre-check branches in retro/SKILL.md.tmpl:Step 0.5: A. no-remote skip — must check origin presence + set verdict B. detached-HEAD skip — must gate behind prior verdict (ordering) C. fetch-fail warn — must match `if !` or `||` shape, gate by verdict D. stale-base BLOCK — must read latest-commit ISO date, cite remediation Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be named in prose so the retro output carries the cited reason rather than silently misreporting. Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer wins), or the BLOCK message stops citing today/latest-commit/remediation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611) The memory and code stages hardcoded a 35-min spawn timeout. On brains with ~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler unconditionally cleaned the dir up — so the next run found a checkpoint pointing at nothing and restaged from scratch, repeating the SIGTERM forever. Three changes: 1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default 2_100_000ms = 35min unchanged): GSTACK_SYNC_MEMORY_TIMEOUT_MS GSTACK_SYNC_CODE_TIMEOUT_MS Out-of-range or non-numeric values warn and fall back to the default. 2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at the active staging dir, the dir is PRESERVED for next-run resume. Otherwise (no checkpoint pointing here, crash before gbrain ever touched it) it's cleaned up as before. 3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in gstack-gbrain-sync.ts: - no checkpoint → fresh ingest pass - checkpoint + staging ok → set GSTACK_INGEST_RESUME_DIR; child reuses staging dir and skips writeStaged; gbrain import resumes from processedIndex+1 - checkpoint + staging gone → warn "previous checkpoint stale (staging dir gone), restaging from scratch" and proceed Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store state). Detect-then-fallback semantics per C1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-sync): regression for #1611 timeouts + resume 19 tests across three surfaces: - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric, zero, negative, below-floor, above-ceiling → warn + default; at-floor, at-ceiling, valid mid-range → accepted as-is. - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with empty dir. - SIGTERM staging preservation (3 static invariants): memory-ingest signal handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve branch must come before cleanup branch (ordering); orchestrator must pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume. Also threads process.env.HOME through readGbrainCheckpoint and stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches at process start and ignores later mutation, so the env override is the only reliable test injection point. Failing build if the timeout bounds are removed, the resume detection short-circuits incorrectly, or the SIGTERM handler regresses to unconditional cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): pre-emit verification gate kills Django-shape FP class (#1539) External user filed 4/8 false positives on a /review run against a Django + DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape: "resolvable in <5 minutes by viewing the actual code or running a simple grep" — fields that don't exist on the model, dict.get()-might-be-None on a form that returns {}-initialized cleaned_data, standard ORM save behavior called out as data loss. Extends the Confidence Calibration resolver (consumed by review, cso, plan-eng-review, ship) with a Pre-emit verification gate: Every finding MUST quote the specific code line that motivates it (file:line + verbatim text). If the reviewer cannot produce the quote, the finding is unverified — its confidence is forced to 4-5 so the existing "Suppress from main report" rule fires automatically. The finding still goes to the appendix for calibration audit, but the user does not see it in the critical-pass output. Reuses the existing suppression mechanism — no new code path. The FP classes the gate kills are enumerated in the resolver text so reviewers see the named patterns. Framework-meta nudge included for Django Meta, Rails associations, SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma generated client — the reviewer must quote the meta-construct that generates the symbol, not just grep for the literal name. Deeper framework-aware ORM verification (model introspection, migration-history- aware checks) is deliberately deferred to a future wave per T-Codex-2. Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit per T-Codex-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(review): regression for #1539 pre-emit verification gate 12 tests pinning the gate behavior: - Resolver emits the gate header + #1539 reference - Gate requires quoting file:line + verbatim text - Unverified findings forced to confidence 4-5 (auto-suppress via existing <7-rule, no new mechanism) - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM, Sequelize, Prisma - Deferred design doc reference present (1539-framework-aware-review.md) - Four named FP classes from #1539 enumerated: * field doesn't exist on model * dict.get() might be None * save() might lose fields * update_fields might miss X - All four downstream SKILL.md consumers (review, cso, plan-eng-review, ship) carry the gate text after gen:skill-docs - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows unchanged (regression on existing behavior) Failing build if the gate is removed, the suppression mechanism is re-invented separately, the framework-meta nudge drops a framework, or gen:skill-docs stops propagating the gate to consumers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(config): expose explain_level default * fix(benchmark): parse positional prompt after flags * fix(artifacts): reject malformed remote paths * fix(learnings): preserve current entries in cross-project search * fix(setup): register root gstack slash alias * fix(memory): probe gitleaks without shell builtin * fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard) In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash glob brackets like `[A-Z]` match lowercase letters too, so the existing `case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case` through validation. The function then trips `printf -v "$varname"` and `export "$varname"` with `not a valid identifier` errors that surface mid-prompt, which is exactly what the validator was supposed to prevent. Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*` contract. Declared `local` so it doesn't leak to the calling shell — `gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare assignment would mutate the caller's locale for the rest of the process (silently affecting downstream `sort`, `tr`, locale-aware globs in the same shell, etc.). The existing regression test `test/gbrain-lib-verify.test.ts:'rejects invalid var names'` already covers the macOS repro shape (passes `lower-case` and expects the validator to reject + emit `invalid var name`). On Linux CI the test silently passed because `LC_ALL=C` is the typical default; on macOS dev boxes it fails. Verified: - `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS). - `_gstack_gbrain_validate_varname lower-case; echo $?` → 2. - `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0. - Caller's LC_ALL preserved across calls (confirmed via sourced bash). * fix(land-and-deploy): detect merged PR after gh failure After `gh pr merge` exits non-zero, the PR may already be MERGED server-side (concurrent merge landed, or local cleanup phase failed AFTER the merge succeeded). Calling `gh pr merge` a second time then errors with a confusing "already merged" — and worse, the deploy workflow never runs because we stopped on the first failure. Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY non-zero exit from `gh pr merge`: - state == MERGED → record MERGE_PATH=direct, OFFER (don't force) stale-worktree cleanup on the base branch with uncommitted-work guard, proceed to §4a CI watch - state == OPEN → check autoMergeRequest; if non-null treat as merge-queue wait; if null surface both errors and STOP - state == CLOSED → STOP Hard invariant: never retry `gh pr merge` after a non-zero exit. Server state is authoritative. Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of truth) instead of the generated SKILL.md, so the next gen:skill-docs run preserves the change. Original diff by @davidfoy via #1620. Related: cli/cli#3442, cli/cli#13380. Contributed by @davidfoy via #1620. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435) When gbrain connects through a PgBouncer transaction-mode pooler (port 6543), it auto-disables prepared statements. This breaks `gbrain search` silently — the /sync-gbrain capability check fails and the GBrain Search Guidance block never gets written to CLAUDE.md. Three-layer fix: 1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env passed to every gbrain spawn. This is the single chokepoint — all gstack gbrain invocations inherit the fix. Caller can opt out with `GBRAIN_PREPARE=false`. 2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s delay for async index propagation under connection pooling. 3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field ("transaction" | "session" | null) in the preamble probe JSON so /setup-gbrain and /sync-gbrain can advise users about pooler state. Closes #1435 Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects - Single-object pooler API responses default to transaction-mode at 6543, but the shared pooler tenant on new projects only listens on session/5432 - Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note - Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat - 5 new tests covering rewrite, no-op shapes, env opt-out, array path Fixes #1301. * fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562) Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing for headless agent sessions even for normal users — /qa hangs without --no-sandbox. The kernel policy denies the unprivileged user namespaces Chromium needs. Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces the sandbox off without changing the default for everyone else. Re-authored from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper — purely additive, preserves the headed-launch sandbox-on-by-default behavior that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar. Three new regression tests cover: - linux + override=1 → false (the named use case) - darwin + override=1 → false (env wins on any platform) - override=0 → does NOT trigger (must be exactly "1") Original diff by @techcenter68 via #1562. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): mirror isCustomChromium() guard in headless launch() When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing at a baked-extension build (GBrowser / GStack Browser), the headless launch() path was unconditionally adding --disable-extensions-except / --load-extension. This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that launchHeaded() already guards against via isCustomChromium(). Mirror the existing guard: skip --load-extension flags when isCustomChromium() returns true; always push the off-screen window geometry args. * fix(browse): daemonize macOS/Linux server via setsid() `Bun.spawn().unref()` only releases the child from Bun's event loop — it does NOT call setsid(). The spawned bun server inherits the spawning shell's process session. When the CLI runs inside a session-managed shell that exits shortly after the CLI returns (Claude Code's per-command Bash sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit sends SIGHUP to every PID in the session — killing the bun server and its Chromium grandchildren within seconds of a successful `connect`. Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and pair-agent) disables the parent-process watchdog but does NOT save the server here: SIGHUP from session teardown still reaps it. Replace the macOS/Linux `Bun.spawn().unref()` with Node's `child_process.spawn({ detached: true })`, which calls setsid() and gives the server its own session leader role (PPID=1, STAT=Ss). This mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root cause, different OS surface. Verified on macOS in Conductor: pre-fix the server dies ~10–15s after connect across separate Bash invocations; post-fix the same PID stays alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/ `snapshot` across many separate shell calls. The `proc?.stderr` startup-error branch is removed since both platforms now spawn with `stdio: 'ignore'`; both fall through to the on-disk `browse-startup-error.log` written by `server.ts`'s start().catch. * fix(design): bump image-gen timeout to 240s + pin gpt-image-2 The design binary calls /v1/responses (gpt-4o + image_generation tool, quality:high, 1536x1024) but aborted the request after a hardcoded 120s. That class of request consistently takes ~140-160s end-to-end, so every generate/variants/evolve/iterate call aborted before the image returned. In /design-shotgun this cascades: Step 3c launches N parallel agents, each calling `$D generate`, each aborts at 120s and retries, all fail, the comparison board never opens — the skill appears to hang indefinitely. Reproduced the exact API call with a longer budget: HTTP 200, valid image, 143.5s. A real /design-shotgun run after the patch generated 3 variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the 161s case, which a naive 150s bump would still have failed. - Bump AbortController timeout 120_000 -> 240_000 in generate.ts, variants.ts, evolve.ts, iterate.ts (both call sites) - Pin the image_generation tool to model "gpt-image-2" design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The feedback-roundtrip.test.ts failures are a pre-existing browse-module breakage (session.clearLoadedHtml undefined), unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: fill coverage gaps for PRs #1606, #1612, #1620 Three cherry-picked PRs in this wave landed without unit-test coverage for the specific invariant they protect: #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname 8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator directly. Asserts uppercase/digit/underscore accepted, lowercase REJECTED (the macOS-locale regression case), mixed-case rejected, LC_ALL=C scoping is local (doesn't leak to caller). #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn 4 static-invariant tests on browse/src/cli.ts. The actual setsid syscall is hard to assert without a real spawn, so we pin the source shape: nodeSpawn imported from child_process; non-Windows branch uses nodeSpawn(...) with detached:true and .unref(); comment documents setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux. #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail 12 static invariants on land-and-deploy/SKILL.md.tmpl + generated SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the authoritative state query, the merge-SHA capture, non-destructive worktree cleanup with uncommitted-work guard, autoMergeRequest probe on OPEN, hard "never retry gh pr merge" rule, and atomic regen propagation. Failing build if any of the three invariants regresses. Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing glob-pattern overpermissiveness (hyphens + dots accepted) — not in #1606's scope; documented inline as a separate cleanup target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(learnings): align injection-prevention tests with PR #1619 tagged-line shape PR #1619 (preserve current entries in cross-project search) refactored gstack-learnings-search to tag rows inline (`current\t<json>` vs `cross\t<json>`) instead of filtering inside the bun block via process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or CROSS env vars — it parses the per-line tag and sets a per-entry _crossProject flag. The pre-existing test/learnings-injection.test.ts still asserted on the old SLUG + CROSS env var shape. Updates: - Remove the SLUG env var assertion (no longer set on bash command line) - Remove the bun-block CROSS env var assertion (block reads the tag now, not the env) - Add a new positive assertion that the bun block parses the tag (sourceTag | tabIndex | crossProject) - Keep the shell-interpolation safety assertion unchanged — that's independent of the SLUG refactor The CROSS env var is still SET on the bash command line (it controls whether the cross-project find runs at all), but the bun child no longer reads it. The existing "env vars set on bash command line" test continues to pin that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines ship/SKILL.md consumes the Confidence Calibration resolver via the preamble pipeline. This wave's #1539 pre-emit verification gate extends the resolver text, which propagated to ship/SKILL.md via gen:skill-docs. The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape and failed the host-config regression check. Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md to match the current generated output. Matches the Daegu wave's bisect commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591) PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not update the schema regression check in test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical order matching the rest of the schema array. Downstream sync-gbrain ignores unknown keys, so this is forward-compat. Without this, the test fails with a diff: + "gbrain_pooler_mode" because keys is the actual set returned and the expected array was pre-#1591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.43.0.0 — post-Daegu paper-cut wave Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX, behavior expansion in browse/src/browser-manager.ts headless launch, three skill-template prompt changes affecting /retro, /review, /sync-gbrain). CHANGELOG entry leads with what stopped happening: /retro stops fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping 35-min restarts on big brains, /review stops shipping framework FPs the reviewer never grep'd. 18 fixes total — 15 community PRs + 3 self-filed silent-failure issues (#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7 new regression test files. Every wave-touched test file passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Andrey Esipov <andrey.esipov@outlook.com> Co-authored-by: David Foy <davidfoy@users.noreply.github.com> Co-authored-by: mikeangstadt <mike.angstadt@closedloop.ai> Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: techcenter68 <techcenter68@users.noreply.github.com> Co-authored-by: shohu <shohu33@gmail.com> Co-authored-by: Bharat <bharat@theysaid.io> Co-authored-by: Matteo Hertel <info@matteohertel.com>
1 parent 65972f6 commit 66f3a18

55 files changed

Lines changed: 2316 additions & 131 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 93 additions & 0 deletions
Large diffs are not rendered by default.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.43.1.0
1+
1.43.2.0

bin/gstack-artifacts-url

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,19 @@ strip_git() {
4949
echo "${1%.git}"
5050
}
5151

52+
valid_owner_repo() {
53+
local owner_repo="$1"
54+
case "$owner_repo" in
55+
""|/*|*/|*//*)
56+
return 1
57+
;;
58+
esac
59+
case "$owner_repo" in
60+
*/*) return 0 ;;
61+
*) return 1 ;;
62+
esac
63+
}
64+
5265
# Parse to (host, owner_repo) regardless of input shape.
5366
parse_url() {
5467
local u="$1"
@@ -82,7 +95,7 @@ parse_url() {
8295
exit 3
8396
;;
8497
esac
85-
if [ -z "$host" ] || [ -z "$owner_repo" ] || [ "$owner_repo" = "$u" ]; then
98+
if [ -z "$host" ] || ! valid_owner_repo "$owner_repo"; then
8699
echo "gstack-artifacts-url: failed to parse host/owner from: $u" >&2
87100
exit 3
88101
fi

bin/gstack-config

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ lookup_default() {
100100
skill_prefix) echo "false" ;;
101101
checkpoint_mode) echo "explicit" ;;
102102
checkpoint_push) echo "false" ;;
103+
explain_level) echo "default" ;;
103104
codex_reviews) echo "enabled" ;;
104105
gstack_contributor) echo "false" ;;
105106
skip_eng_review) echo "false" ;;
@@ -169,8 +170,8 @@ case "${1:-}" in
169170
echo ""
170171
echo "# ─── Active values (including defaults for unset keys) ───"
171172
for KEY in proactive routing_declined telemetry auto_upgrade update_check \
172-
skill_prefix checkpoint_mode checkpoint_push codex_reviews \
173-
gstack_contributor skip_eng_review workspace_root \
173+
skill_prefix checkpoint_mode checkpoint_push explain_level \
174+
codex_reviews gstack_contributor skip_eng_review workspace_root \
174175
artifacts_sync_mode artifacts_sync_mode_prompted; do
175176
VALUE=$(grep -E "^${KEY}:" "$CONFIG_FILE" 2>/dev/null | tail -1 | awk '{print $2}' | tr -d '[:space:]' || true)
176177
SOURCE="default"
@@ -185,8 +186,8 @@ case "${1:-}" in
185186
defaults)
186187
echo "# gstack-config defaults"
187188
for KEY in proactive routing_declined telemetry auto_upgrade update_check \
188-
skill_prefix checkpoint_mode checkpoint_push codex_reviews \
189-
gstack_contributor skip_eng_review workspace_root \
189+
skill_prefix checkpoint_mode checkpoint_push explain_level \
190+
codex_reviews gstack_contributor skip_eng_review workspace_root \
190191
artifacts_sync_mode artifacts_sync_mode_prompted; do
191192
printf ' %-24s %s\n' "$KEY:" "$(lookup_default "$KEY")"
192193
done

bin/gstack-gbrain-detect

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
* "gstack_brain_sync_mode": "off"|"artifacts-only"|"full",
1919
* "gstack_brain_git": true|false,
2020
* "gstack_artifacts_remote": "https://..." | "",
21-
* "gbrain_local_status": "ok"|"no-cli"|"missing-config"|"broken-config"|"broken-db"
21+
* "gbrain_local_status": "ok"|"no-cli"|"missing-config"|"broken-config"|"broken-db",
22+
* "gbrain_pooler_mode": "transaction"|"session"|null
2223
* }
2324
*
2425
* Backward compatibility (per plan codex #5): the 9 pre-existing fields stay
@@ -42,6 +43,7 @@ import {
4243
resolveGbrainBin,
4344
readGbrainVersion,
4445
} from "../lib/gbrain-local-status";
46+
import { isTransactionModePooler } from "../lib/gbrain-exec";
4547

4648
const STATE_DIR = process.env.GSTACK_HOME || join(userHome(), ".gstack");
4749
const SCRIPT_DIR = __dirname;
@@ -98,6 +100,17 @@ function detectConfig(): { exists: boolean; engine: "pglite" | "postgres" | null
98100
return { exists: true, engine: null };
99101
}
100102

103+
// --- pooler mode detection (#1435) ---
104+
//
105+
// Reads DATABASE_URL from ~/.gbrain/config.json and checks whether it targets
106+
// a PgBouncer transaction-mode pooler (port 6543). Surfaced so /sync-gbrain
107+
// and /setup-gbrain can advise users when search may require GBRAIN_PREPARE.
108+
function detectPoolerMode(): "transaction" | "session" | "unknown" | null {
109+
const parsed = tryReadJSON(GBRAIN_CONFIG) as { database_url?: string } | null;
110+
if (!parsed?.database_url) return null;
111+
return isTransactionModePooler(parsed.database_url) ? "transaction" : "session";
112+
}
113+
101114
// --- gbrain doctor health (any nonzero exit or non-"ok"/"warnings" status → false) ---
102115
//
103116
// Uses --fast to avoid hanging on a dead DB. Per the local-status classifier
@@ -215,6 +228,7 @@ function main(): void {
215228
gstack_brain_git: detectBrainGit(),
216229
gstack_artifacts_remote: detectArtifactsRemote(),
217230
gbrain_local_status: localEngineStatus({ noCache }),
231+
gbrain_pooler_mode: detectPoolerMode(),
218232
};
219233

220234
process.stdout.write(JSON.stringify(out, null, 2) + "\n");

bin/gstack-gbrain-lib.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,22 @@
2727
# restore), D16 (pooler URL paste hygiene with redacted preview).
2828

2929
# _gstack_gbrain_validate_varname <name> — returns 0 if usable, 2 otherwise.
30+
# `local LC_ALL=C` is load-bearing twice over:
31+
# 1. In many macOS shells the default locale (e.g. en_US.UTF-8) makes `case`
32+
# glob brackets like `[A-Z]` match lowercase letters too. Without the
33+
# LC_ALL=C pin, names like `lower-case` pass validation and then trip
34+
# `printf -v "$varname"` and `export "$varname"` with "not a valid
35+
# identifier" errors the caller can't easily distinguish from other
36+
# failures.
37+
# 2. `local` is required because this file is documented as a sourced helper
38+
# (see header), so a bare `LC_ALL=C` would mutate the caller's locale for
39+
# the rest of the process — silently affecting downstream `sort`, `tr`,
40+
# and any locale-aware glob in the same shell.
41+
# Together they give ASCII-only bracket semantics on both macOS and Linux
42+
# (matching the documented `[A-Z_][A-Z0-9_]*` contract) without leaking.
3043
_gstack_gbrain_validate_varname() {
3144
local name="$1"
45+
local LC_ALL=C
3246
case "$name" in
3347
[A-Z_][A-Z0-9_]*) return 0 ;;
3448
*) return 2 ;;

bin/gstack-gbrain-supabase-provision

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ cmd_pooler_url() {
339339
# Prefer the singular Session Pooler config when Supabase returns an
340340
# array (response shape can vary by project state). Fall back to the
341341
# first PRIMARY entry if no "session" pool_mode is present.
342-
local db_user db_host db_port db_name
342+
local db_user db_host db_port db_name pool_mode
343343
local first_or_session
344344
if printf '%s' "$resp" | jq -e 'type == "array"' >/dev/null 2>&1; then
345345
first_or_session=$(printf '%s' "$resp" | jq '[.[] | select(.pool_mode == "session")][0] // .[0]')
@@ -351,11 +351,27 @@ cmd_pooler_url() {
351351
db_host=$(printf '%s' "$first_or_session" | jq -r '.db_host // empty')
352352
db_port=$(printf '%s' "$first_or_session" | jq -r '.db_port // empty')
353353
db_name=$(printf '%s' "$first_or_session" | jq -r '.db_name // empty')
354+
pool_mode=$(printf '%s' "$first_or_session" | jq -r '.pool_mode // empty')
354355

355356
if [ -z "$db_user" ] || [ -z "$db_host" ] || [ -z "$db_port" ] || [ -z "$db_name" ]; then
356357
die "pooler-url: missing pooler config fields (db_user/db_host/db_port/db_name); re-poll or check project state"
357358
fi
358359

360+
# Issue #1301: New Supabase projects' Management API returns a single
361+
# transaction-mode pooler at port 6543, but the shared pooler tenant
362+
# for fresh projects only listens on the session port 5432. Trusting
363+
# db_port verbatim makes `gbrain init` hang to TCP timeout (transaction
364+
# port unreachable) before falling into "tenant not found"-style errors
365+
# that look like auth bugs. Rewrite transaction/6543 -> session/5432.
366+
# Override with GSTACK_SUPABASE_TRUST_API_PORT=1 if a future API version
367+
# starts returning a working transaction port and this rewrite is wrong.
368+
if [ "${GSTACK_SUPABASE_TRUST_API_PORT:-0}" != "1" ] \
369+
&& [ "$pool_mode" = "transaction" ] && [ "$db_port" = "6543" ]; then
370+
echo "pooler-url: API returned transaction pooler (port 6543); shared pooler for new projects listens on session port 5432 — rewriting (set GSTACK_SUPABASE_TRUST_API_PORT=1 to disable)" >&2
371+
db_port=5432
372+
pool_mode="session"
373+
fi
374+
359375
local url="postgresql://${db_user}:${DB_PASS}@${db_host}:${db_port}/${db_name}"
360376

361377
if $json_mode; then

bin/gstack-gbrain-sync.ts

Lines changed: 172 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,115 @@ const STATE_PATH = join(GSTACK_HOME, ".gbrain-sync-state.json");
8080
const LOCK_PATH = join(GSTACK_HOME, ".sync-gbrain.lock");
8181
const STALE_LOCK_MS = 5 * 60 * 1000;
8282

83+
// Default 35-minute timeout for code-walk + memory-ingest stages. Override via
84+
// GSTACK_SYNC_CODE_TIMEOUT_MS / GSTACK_SYNC_MEMORY_TIMEOUT_MS. Bounds-checked
85+
// in resolveStageTimeoutMs below so wildly-low values don't make resume
86+
// useless and wildly-high values don't mask config typos. See #1611.
87+
const DEFAULT_STAGE_TIMEOUT_MS = 35 * 60 * 1000; // 2_100_000ms = 35min
88+
const MIN_STAGE_TIMEOUT_MS = 60_000; // 1 minute floor
89+
const MAX_STAGE_TIMEOUT_MS = 86_400_000; // 24 hour ceiling
90+
91+
/**
92+
* Parse a stage-timeout env value with bounds validation. Returns the bounded
93+
* value or the default with a stderr warning if the env was malformed or
94+
* out-of-range. Exported for the regression test.
95+
*/
96+
export function resolveStageTimeoutMs(
97+
envValue: string | undefined,
98+
envName: string,
99+
): number {
100+
if (envValue === undefined || envValue === "") return DEFAULT_STAGE_TIMEOUT_MS;
101+
const n = Number.parseInt(envValue, 10);
102+
if (!Number.isFinite(n) || Number.isNaN(n) || n <= 0) {
103+
console.warn(
104+
`[sync] ${envName}="${envValue}" is not a positive integer; falling back to ${DEFAULT_STAGE_TIMEOUT_MS}ms`,
105+
);
106+
return DEFAULT_STAGE_TIMEOUT_MS;
107+
}
108+
if (n < MIN_STAGE_TIMEOUT_MS) {
109+
console.warn(
110+
`[sync] ${envName}=${n} is below the ${MIN_STAGE_TIMEOUT_MS}ms (1min) floor; falling back to ${DEFAULT_STAGE_TIMEOUT_MS}ms`,
111+
);
112+
return DEFAULT_STAGE_TIMEOUT_MS;
113+
}
114+
if (n > MAX_STAGE_TIMEOUT_MS) {
115+
console.warn(
116+
`[sync] ${envName}=${n} is above the ${MAX_STAGE_TIMEOUT_MS}ms (24h) ceiling; falling back to ${DEFAULT_STAGE_TIMEOUT_MS}ms`,
117+
);
118+
return DEFAULT_STAGE_TIMEOUT_MS;
119+
}
120+
return n;
121+
}
122+
123+
/**
124+
* gbrain writes ~/.gbrain/import-checkpoint.json on every import run. If a
125+
* previous /sync-gbrain hit the timeout (SIGTERM = exit 143), the checkpoint
126+
* + its staging dir survive on disk. Detect both and let gbrain resume from
127+
* processedIndex+1 on the next run. If the staging dir is missing/empty/
128+
* unreadable, fall through to a fresh restage with a one-line warning so the
129+
* user sees we noticed. See #1611 + plan D1/C1.
130+
*/
131+
interface GbrainCheckpoint {
132+
dir?: string;
133+
totalFiles?: number;
134+
processedIndex?: number;
135+
completedFiles?: number;
136+
timestamp?: string;
137+
}
138+
139+
export function readGbrainCheckpoint(): GbrainCheckpoint | null {
140+
// Read HOME from env so tests can redirect via process.env.HOME = ...
141+
// (Node/Bun's os.homedir() caches at process start and ignores later
142+
// mutations.)
143+
const home = process.env.HOME || homedir();
144+
const cpPath = join(home, ".gbrain", "import-checkpoint.json");
145+
if (!existsSync(cpPath)) return null;
146+
try {
147+
const raw = readFileSync(cpPath, "utf-8");
148+
const parsed = JSON.parse(raw);
149+
if (!parsed || typeof parsed !== "object") return null;
150+
return parsed as GbrainCheckpoint;
151+
} catch {
152+
// Corrupt JSON — treat as no checkpoint and fall through to fresh restage.
153+
return null;
154+
}
155+
}
156+
157+
export type ResumeVerdict =
158+
| { kind: "no-checkpoint" }
159+
| { kind: "resume"; stagingDir: string; processedIndex: number; totalFiles: number }
160+
| { kind: "stale-staging-missing"; stagingDir: string };
161+
162+
/**
163+
* Decide whether the next memory-ingest run should resume from gbrain's
164+
* checkpoint or restage from scratch.
165+
* - no checkpoint → run a fresh ingest pass
166+
* - checkpoint + staging ok → resume (gbrain picks up at processedIndex+1)
167+
* - checkpoint + staging gone → warn, fall through to fresh restage
168+
*/
169+
export function decideResume(): ResumeVerdict {
170+
const cp = readGbrainCheckpoint();
171+
if (!cp || !cp.dir) return { kind: "no-checkpoint" };
172+
const stagingDir = cp.dir;
173+
if (!existsSync(stagingDir)) {
174+
return { kind: "stale-staging-missing", stagingDir };
175+
}
176+
// Treat "non-empty" as the safe-to-resume signal. statSync on a missing
177+
// file throws; we already handled missing above so this is dir-level shape.
178+
try {
179+
const st = statSync(stagingDir);
180+
if (!st.isDirectory()) return { kind: "stale-staging-missing", stagingDir };
181+
} catch {
182+
return { kind: "stale-staging-missing", stagingDir };
183+
}
184+
return {
185+
kind: "resume",
186+
stagingDir,
187+
processedIndex: cp.processedIndex ?? 0,
188+
totalFiles: cp.totalFiles ?? 0,
189+
};
190+
}
191+
83192
// ── CLI ────────────────────────────────────────────────────────────────────
84193

85194
function printUsage(): void {
@@ -596,28 +705,57 @@ async function runCodeImport(args: CliArgs): Promise<StageResult> {
596705
};
597706
}
598707

599-
// Step 2: Run sync or reindex.
600-
const syncArgs = args.mode === "full"
601-
? ["reindex-code", "--source", sourceId, "--yes"]
602-
: ["sync", "--strategy", "code", "--source", sourceId];
603-
604-
const syncResult = spawnGbrain(syncArgs, {
708+
// Step 2: Always run the page-creating file walk first, then (for --full)
709+
// a full re-embed.
710+
//
711+
// `gbrain reindex-code` only RE-EMBEDS pages that already exist; it never
712+
// walks the filesystem. On a freshly-registered source (0 pages) a --full
713+
// run that called reindex-code alone found nothing ("No code pages to
714+
// reindex"), finished in ~1s, and left the code index permanently empty
715+
// while still reporting OK. The page-creating walk is `sync --strategy
716+
// code`, so --full must run it FIRST, then reindex-code, to honor the
717+
// documented "full walk + reindex" contract for both fresh and populated
718+
// sources.
719+
const codeTimeoutMs = resolveStageTimeoutMs(
720+
process.env.GSTACK_SYNC_CODE_TIMEOUT_MS,
721+
"GSTACK_SYNC_CODE_TIMEOUT_MS",
722+
);
723+
const walkResult = spawnGbrain(["sync", "--strategy", "code", "--source", sourceId], {
605724
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
606-
timeout: 35 * 60 * 1000,
725+
timeout: codeTimeoutMs,
607726
baseEnv: gbrainEnv,
608727
});
609728

610-
if (syncResult.status !== 0) {
729+
if (walkResult.status !== 0) {
611730
return {
612731
name: "code",
613732
ran: true,
614733
ok: false,
615734
duration_ms: Date.now() - t0,
616-
summary: `gbrain ${syncArgs.join(" ")} exited ${syncResult.status}`,
735+
summary: `gbrain sync --strategy code --source ${sourceId} exited ${walkResult.status}`,
617736
detail: { source_id: sourceId, source_path: root, status: "failed" },
618737
};
619738
}
620739

740+
if (args.mode === "full") {
741+
const reindexResult = spawnGbrain(["reindex-code", "--source", sourceId, "--yes"], {
742+
stdio: args.quiet ? ["ignore", "ignore", "ignore"] : ["ignore", "inherit", "inherit"],
743+
timeout: codeTimeoutMs,
744+
baseEnv: gbrainEnv,
745+
});
746+
747+
if (reindexResult.status !== 0) {
748+
return {
749+
name: "code",
750+
ran: true,
751+
ok: false,
752+
duration_ms: Date.now() - t0,
753+
summary: `gbrain reindex-code --source ${sourceId} exited ${reindexResult.status}`,
754+
detail: { source_id: sourceId, source_path: root, status: "failed" },
755+
};
756+
}
757+
}
758+
621759
// Step 3: Pin this worktree's CWD to the source via .gbrain-source. Subsequent
622760
// gbrain code-def / code-refs / code-callers calls from anywhere under <root>
623761
// route to this source by default — no --source flag needed.
@@ -745,6 +883,25 @@ function runMemoryIngest(args: CliArgs): StageResult {
745883
return skipStageForLocalStatus("memory", localStatus, t0);
746884
}
747885

886+
// Resume detection (#1611 / plan D1 + C1). If a previous run hit the
887+
// timeout and gbrain left ~/.gbrain/import-checkpoint.json plus its staging
888+
// dir on disk, signal the grandchild via env so it skips the prepare phase
889+
// and lets `gbrain import` resume from processedIndex+1 against the same
890+
// staging dir. If the staging dir is gone (disk pressure cleanup, OS
891+
// reboot, user manual cleanup), warn and fall through to a fresh restage.
892+
const resume = decideResume();
893+
const childEnv = buildGbrainEnv({ announce: false });
894+
if (resume.kind === "resume") {
895+
console.error(
896+
`[sync:memory] resuming from gbrain checkpoint (${resume.processedIndex}/${resume.totalFiles} files staged at ${resume.stagingDir})`,
897+
);
898+
childEnv.GSTACK_INGEST_RESUME_DIR = resume.stagingDir;
899+
} else if (resume.kind === "stale-staging-missing") {
900+
console.error(
901+
`[sync:memory] previous checkpoint stale (staging dir ${resume.stagingDir} gone), restaging from scratch`,
902+
);
903+
}
904+
748905
const ingestPath = join(import.meta.dir, "gstack-memory-ingest.ts");
749906
const ingestArgs = ["run", ingestPath];
750907
if (args.mode === "full") ingestArgs.push("--bulk");
@@ -755,10 +912,14 @@ function runMemoryIngest(args: CliArgs): StageResult {
755912
// .env.local footgun affects gstack-memory-ingest.ts too, not just the
756913
// direct gbrain spawns in this file). The grandchild calls gbrain import
757914
// internally and must see the DATABASE_URL from gbrain's own config.
915+
const memoryTimeoutMs = resolveStageTimeoutMs(
916+
process.env.GSTACK_SYNC_MEMORY_TIMEOUT_MS,
917+
"GSTACK_SYNC_MEMORY_TIMEOUT_MS",
918+
);
758919
const result = spawnSync("bun", ingestArgs, {
759920
encoding: "utf-8",
760-
timeout: 35 * 60 * 1000,
761-
env: buildGbrainEnv({ announce: false }),
921+
timeout: memoryTimeoutMs,
922+
env: childEnv,
762923
});
763924

764925
// D6: parse [memory-ingest] lines from the child's stderr. ERR-prefixed

0 commit comments

Comments
 (0)