Skip to content

Commit 52c228e

Browse files
garrytanclaude
andauthored
v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (garrytan#1252)
* feat(paths): bin/gstack-paths helper + migrate 8 skills off inline state-root chains New bin/gstack-paths emits GSTACK_STATE_ROOT, PLAN_ROOT, TMP_ROOT exports for skill bash blocks to source via eval. Honors GSTACK_HOME → CLAUDE_PLUGIN_DATA → $HOME/.gstack → .gstack (and parallel chains for plan/tmp roots) so skills work the same in plugin installs, global installs, and CI containers without HOME. Eight skills migrate off inline ${CLAUDE_PLUGIN_DATA:-...} or ${GSTACK_HOME:-...} chains: careful, freeze, guard, unfreeze, investigate, context-save, context-restore, learn, office-hours, plan-tune, codex. Resolved values are identical, so existing tests cover correctness; the win is consolidating 11 copy-pasted fallback chains behind one helper. codex/SKILL.md.tmpl gets a new Step 0.6 Resolve portable roots that sources gstack-paths once, then replaces hardcoded ~/.claude/plans/*.md and /tmp/codex-*-XXXXXX.txt with "$PLAN_ROOT"/*.md and "$TMP_ROOT/codex-*-XXXXXX.txt". Hardening direction credited to the McGluut/gstack fork; this is upstream's factoring of the per-skill chain the fork inlined. Tests: test/gstack-paths.test.ts covers all three fallback chains with 8 unit tests (HOME unset, CLAUDE_PLUGIN_DATA set, GSTACK_HOME wins, etc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(claude-bin): Bun.which wrapper for cross-platform claude resolution Replaces 75 LOC of fork-side reimplementation (PATH parsing, Windows PATHEXT, case-insensitive Path/PATH, X_OK) with a thin wrapper around Bun.which() — the runtime built-in that already does all of it. New file is ~70 LOC including the override + arg-prefix logic the runtime doesn't cover. Override branch fixed: GSTACK_CLAUDE_BIN=wsl now resolves through Bun.which() just like a bare claude lookup would. The McGluut fork's claude-bin.ts only handled absolute-path overrides; bare commands silently returned null. Passing the override value through Bun.which fixes the documented use case for free. Five hardcoded claude spawn sites rewired through resolveClaudeCommand: - browse/src/security-classifier.ts:396 — version probe - browse/src/security-classifier.ts:496 — Haiku transcript classifier - scripts/preflight-agent-sdk.ts — preflight binary pinning - test/helpers/providers/claude.ts — LLM judge availability + run - test/helpers/agent-sdk-runner.ts — SDK harness binary resolver All retain their existing degrade-on-missing semantics. Tests: browse/test/claude-bin.test.ts has 9 unit tests including the override-PATH-resolution case the fork's version got wrong. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs+test: AGENTS.md/docs/skills.md inventory sync + private-path leak detector Inventory sync (codex-flagged drift): - /debug → /investigate (skill renamed in v1.0.1.0) - AGENTS.md grows from 21 to 40+ skills, organized by category (plan reviews, implementation, release, operational, browser, safety) - docs/skills.md gains 11 missing entries: /plan-devex-review, /devex-review, /plan-tune, /context-save, /context-restore, /health, /landing-report, /benchmark-models, /pair-agent, /setup-gbrain, /make-pdf - Stale "<5s bun test" claim dropped — slim-preamble harness + new tests means no realistic universal claim to make - Adds explicit "Mac + Linux full, curated Windows lane" platform statement + "Git Bash / MSYS today, native PowerShell future" install note New invariants in test/skill-validation.test.ts (~80 LOC): - Private-path leak detector scans every SKILL.md / SKILL.md.tmpl for known maintainer-only filenames (coordination-board.md, SEEKING_LOG.md, RATIONAL_SUBJECT.md, VALUE_SIGNAL_LOOP.md, C:\LLM Playground\go). Adapted from the McGluut fork's skill-contract-audit.ts; we don't take the script wholesale because most of its checks are already covered by test/gen-skill-docs.test.ts:1668-2074 and test/skill-validation.test.ts:1419 — only the private-path scan and doc-inventory cross-check are new. - Doc-inventory cross-check: every skill directory with a SKILL.md.tmpl must appear in both AGENTS.md and docs/skills.md. Catches the inventory drift this commit is fixing — without this test it would just drift again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(windows): curated windows-free-tests CI job + test-free-shards curation Codex's v1.18.0.0 review flagged that a windows-latest matrix entry on the existing Linux-container evals.yml workflow can't work as a drop-in, and that the free test suite has POSIX-bound dependencies a sharded runner doesn't fix on its own. This commit takes McGluut's test-free-shards.ts (190 LOC), adds a Windows-fragility scan, and runs the curated subset on a separate non-container windows-latest job. scripts/test-free-shards.ts: - Enumeration + paid-eval filtering + stable-hash sharding (FNV-1a). Adapted from McGluut/gstack fork. - Upstream-original: --windows-only filter scans each test's content for POSIX-bound patterns: hardcoded /bin/sh, spawn('sh', ...), bash -c, raw /tmp/, chmod, xargs, which claude. Files matching are excluded with the reason logged. Currently filters 25 of 128 free tests; remaining 103 run on windows-latest. .github/workflows/windows-free-tests.yml: - Separate non-container job (NOT a matrix entry on evals.yml). Runs: bun run test:windows # curated subset bun test browse/test/claude-bin.test.ts # PATHEXT+overrides on Windows bun test test/gstack-paths.test.ts # state-root resolution package.json: new test:free + test:windows scripts. Honest about scope (codex-flagged): this does NOT make the full free suite Windows-safe. The 25 excluded tests need POSIX-only surfaces ported off shell primitives (test/ship-version-sync.test.ts:72 hardcodes /bin/bash, etc). Tracked as a P4 follow-up TODO. Full Windows parity is the next wave; this release ships the curated lane. Tests: test/test-free-shards.test.ts has 14 unit tests covering enumeration, paid-eval filtering, Windows-fragility detection (POSIX patterns + safe code), and stable sharding determinism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.20.0.0 — cross-platform hardening, curated Windows lane Cross-platform hardening. Mac + Linux full, curated Windows lane added. Workspace-aware queue at ship time: - v1.17.0.0 claimed by garrytan/setup-gbrain-run (PR garrytan#1234) - v1.19.0.0 claimed by garrytan/browserharness (PR garrytan#1233) - This branch claims v1.20.0.0 (next available slot) (Initially bumped to v1.18.0.0 during plan-mode implementation; rebumped to v1.20.0.0 at /ship time when gstack-next-version detected the queue had moved.) Headline numbers (full release-note in CHANGELOG.md): - 2 new shared resolvers: bin/gstack-paths (61 LOC), browse/src/claude-bin.ts (73 LOC) - 8 skills migrated off inline state-root chains - 5 hardcoded claude spawn sites rewired through the shared resolver - 75 LOC of fork-side reimplementation replaced by Bun.which() - 103 of 128 free tests run on windows-latest (curated, ~80%) - +31 new unit tests + 3 new invariants - AGENTS.md inventory grows from 21 to 40+ skills Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): configure git identity + extend Windows-fragility curation First windows-free-tests CI run surfaced 34 failures across two patterns: 1. Tests that init a temp git repo via execSync('git commit ...') — Windows runner has no default git user.email/user.name, so the commit fails. Fix: add a "Configure git identity" step to .github/workflows/windows-free-tests.yml that sets a CI-only identity globally. 2. Tests that use POSIX-only APIs unconditionally: - file-mode bitmask checks (`stat.mode & 0o600`, `mode & 0o111`) — Windows fakes mode bits and these assertions don't compose - hardcoded forward-slash path assertions (`file.endsWith('/tab-42.json')`) — Windows path separators are '\\' Fix: extend WINDOWS_FRAGILE_PATTERNS in scripts/test-free-shards.ts to detect both. 8 additional tests now excluded from the curated Windows subset with logged reasons: - browse/test/security-review-flow.test.ts (file mode) - browse/test/security-sidepanel-dom.test.ts (forward-slash path) - browse/test/url-validation.test.ts (forward-slash path) - test/gbrain-repo-policy.test.ts (file mode) - test/relink.test.ts (file mode) - test/skill-validation.test.ts (file mode — single assertion at :934) - test/team-mode.test.ts (file mode — also kills its 30 git-init beforeEach failures) - test/upgrade-migration-v1.test.ts (file mode) Curated Windows subset: 103 → 95 tests (still ~74% of free suite). All 14 test-free-shards unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): enforce LF + build server-node.mjs in CI Second round of windows-free-tests fixes after the first push. Curated subset went from 386/34 to 58/4 fails. Remaining 4 fails + 1 error trace to two root causes: 1. Line-ending sensitivity. Windows checkout with core.autocrlf=true converts .md/.tmpl files to CRLF. Tests that parse YAML frontmatter with `/^---\n([\\s\\S]+?)\n---/` then return zero matches — skill-collision- sentinel.test.ts:120 enumerated 0 skills on Windows, cascading into 3 downstream test failures (sanity, KNOWN_COLLISIONS, /checkpoint resolved). Fix: add .gitattributes that pins LF for .md/.tmpl/.yml/.json/.toml/.sh/ .ts/.tsx/.js/.mjs/.cjs/.bash. Root-cause fix; prevents future similar tests from hitting the same trap. Also keeps bash scripts LF on Linux runners (CRLF in shebangs produces "bad interpreter" errors). 2. Module-level Windows assertion in browse/src/cli.ts:82 throws if browse/dist/server-node.mjs is missing. Any test that transitively loads cli.ts (e.g., browse/test/tab-isolation.test.ts via shard mate imports) then fails to even start. server-node.mjs is generated by bash browse/scripts/build-node-server.sh, which `bun run build` calls but `bun install` does not. Fix: add a "Build server-node.mjs" step to .github/workflows/ windows-free-tests.yml. Calls only the node-server build script, not full `bun run build` — we don't need the compiled binaries for tests and the full build is slow. Expected: skill-collision-sentinel goes 0→3 pass (sanity, KNOWN_COLLISIONS, /checkpoint resolved). tab-isolation's "unhandled error between tests" disappears. Remaining tests should be green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): platform-aware claude-bin test + curate bin/ shebang spawns Round 3 of windows-free-tests fixes. Round 2 (LF gitattributes + server-node.mjs build) cleared shard 1 entirely (skill-collision-sentinel and tab-isolation green). Shard 2 surfaced two more issues: 1. browse/test/claude-bin.test.ts:50 — the "PATH-resolvable override" test creates a fake binary 'fake-claude-cli' (no extension) and expects Bun.which to find it. On Windows, Bun.which probes PATHEXT extensions (.cmd, .exe, .bat) — a bare-name file is not discoverable. Production behavior is correct; the test was Mac/Linux-shaped. Fix: branch on process.platform. On Windows, write 'fake-claude-cli.cmd' with a Windows batch payload instead of a POSIX shebang script. 2. test/gstack-question-log.test.ts (and 18 sibling tests) — spawn a bash shebang script via spawnSync(BIN, args). Git Bash on Windows can run `bash /path/to/script` but spawnSync invokes CreateProcess directly, which doesn't parse #!/usr/bin/env bash. All these tests are Windows-fragile and can't run as-is. Fix: extend WINDOWS_FRAGILE_PATTERNS with `path.join(.., 'bin', ..)` detector. Curates 19 additional tests (benchmark-cli, brain-sync, builder-profile, explain-level-config, gbrain-*, gstack-question-*, hook-scripts, learnings, plan-tune, review-log, secret-sink-harness, taste-engine, telemetry, timeline, uninstall). Curated Windows subset: 95 → 76 tests (~59% of free suite). Still meaningful Windows coverage. The 52 excluded tests are tracked as a follow-up TODO for full Windows parity (shebang-bin spawns + POSIX file modes + raw /tmp/ etc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): curate Playwright-launching tests Round 4 of windows-free-tests fixes. Round 3 cleared shard 2 except for browse/test/batch.test.ts:35 which calls `await bm.launch()` and triggers Playwright Chromium launch. The windows-latest runner doesn't have Chromium installed (browser bring-up is a separate concern, tracked by PR garrytan#1238 windows-pty-bun-pty-fix). Fix: extend WINDOWS_FRAGILE_PATTERNS with `await \\w+\\.launch\\(` matcher. Catches batch.test.ts plus 7 sibling tests (commands, compare-board, content-security, handoff, security-live-playwright, security-sidepanel-dom, snapshot — most already excluded by other patterns). Curated Windows subset: 76 → 72 tests (~56% of free suite). Net curation across all 4 rounds: 56 of 128 free tests excluded, each with a logged reason. The 56 excluded fall into 6 buckets — POSIX shells, raw /tmp/, chmod/xargs, file mode bitmasks, forward-slash path assertions, bin/ shebang spawns, and Playwright launches — all tracked as a P4 follow-up TODO for full Windows parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): catch destructured join() bin-spawns + browse server tests Round 5 of windows-free-tests fixes. Round 4 caught Playwright launchers but two more failure shapes appeared in shard 5: 1. test/diff-scope.test.ts uses `import { join }` (destructured) and `join(import.meta.dir, '..', 'bin', 'gstack-diff-scope')`. My round-3 pattern only matched `path.join(...)` — the destructured form slipped through. Tightened the pattern to match the literal `, 'bin', '<name>'` path-segment shape regardless of whether it's `path.join` or `join` directly. 2. browse/test/sidebar-integration.test.ts spawns the browse server via `spawn(['bun', 'run', server.ts])` with BROWSE_HEADLESS_SKIP=1. The Bun-run-server.ts path is the same Playwright-on-Windows broken path that the windows-free-tests job intentionally avoids — the server-node.mjs route only kicks in for the compiled binary, not direct Bun runs of the TypeScript source. Added a BROWSE_HEADLESS_SKIP / spawn-bun-run pattern. Curated Windows subset: 72 → 73 tests (~57% of free suite). Net up by 1 because the tightened bin pattern released one test that was a false positive in the loose `path\\.join` form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): broaden bin/ pattern to match path.join(ROOT, 'bin') Round 6. Round 5 tightened the bin/ pattern to require a script-name segment after 'bin', which inadvertently released test/brain-sync.test.ts that uses: const BIN = path.join(ROOT, 'bin'); const full = bin.startsWith('/') ? bin : path.join(BIN, bin); The 'bin' segment is the LAST argument to path.join — there's no literal script name to match. The earlier looser pattern caught this; round 5 broke that. Fix: revert to `,\\s*['"]bin['"]\\s*[,)]` which matches both forms: - `, 'bin', 'script-name')` (path.join with name) — typical - `, 'bin')` (path.join ending at bin) — brain-sync style Curated subset: 73 → 66 tests (~52% of free suite). The 7 additional exclusions are all bin-script tests that were misclassified by the round-5 tightening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): guard main() with import.meta.main Round 7 of windows-free-tests fixes (and a genuine bug fix beyond Windows). browse/src/find-browse.ts called main() unconditionally at module load. main() calls process.exit(1) when no compiled `browse` binary exists at the known install paths. Any test that imports `locateBinary` from this module then exits the entire test process before any tests run. This affected the windows-free-tests CI lane because the runner intentionally doesn't compile the browse binary (only server-node.mjs is built — full binary compilation is slow and not needed for the curated subset). It would also affect any Mac/Linux contributor who runs tests in a fresh checkout before running ./setup, though the symptom is rarer there. Fix: wrap `main()` in `if (import.meta.main) { main() }`. The CLI invocation (via the find-browse binary or `bun run browse/src/find-browse.ts`) still runs main() and emits the path. Imports get only the named exports. Verified locally: - `bun run browse/src/find-browse.ts` still prints the binary path. - `import { locateBinary } from '...'` no longer exits the process. - `bun test browse/test/find-browse.test.ts` passes 4/4 (was crashing at module load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): pin LF on extensionless executables (setup, bin/*, scripts/*) Round 8 of windows-free-tests fixes. Round 7 cleared find-browse + most shards; one fail left in shard 7: test/setup-codesign.test.ts > codesign shell snippet is syntactically valid expect(received).toBeTruthy() — match was null The test extracts a bash codesign block from the `setup` file via a \\n-anchored regex, then syntax-checks it with `bash -n`. On Windows the regex returned null because the `setup` file was checked out with CRLF endings — my round-2 .gitattributes only covered files matched by extension patterns (*.md, *.sh, *.ts) and `setup` is extensionless. Fix: extend .gitattributes with explicit rules for extensionless executables: setup text eol=lf bin/* text eol=lf **/scripts/* text eol=lf This also LF-pins all the bash bin/ scripts (gstack-paths, gstack-slug, gstack-codex-probe, ...) which would otherwise break with "bad interpreter" errors on Linux if a Windows contributor accidentally committed CRLF versions. Defense in depth. Verified locally: `git check-attr eol setup bin/gstack-paths` reports `eol: lf` for both. Renormalized via `git add --renormalize` so any already-LF files in the repo stay LF after the .gitattributes change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): gen:skill-docs in workflow + known-bad list for env-specific tests Round 9 of windows-free-tests fixes. Round 8 cleared shard 7; shard 8 surfaced 4 fails: 1+2. test/gen-skill-docs.test.ts golden-file regression for Codex + Factory ship skills failed with ENOENT on `.agents/skills/gstack-ship/SKILL.md` and `.factory/skills/gstack-ship/SKILL.md`. These are gitignored gen-skill-docs outputs that the Mac/Linux CI workflows already regenerate elsewhere — the windows-free-tests lane never did. Fix: add `bun run gen:skill-docs --host all` step to windows-free-tests.yml after `bun install`. 3. test/host-config.test.ts:377 "detect finds claude" asserts the `claude` binary is on PATH. True when running inside Claude Code; false on a bare CI runner. 4. browse/test/findport.test.ts:117 asserts Bun.serve.stop() is fire-and-forget (returns undefined). Bun's Windows behavior for this polyfill differs; the assertion is Bun-on-non-Windows-specific. Both 3 and 4 are environment/runtime-specific failures that don't fit a regex pattern. Added a KNOWN_WINDOWS_INCOMPATIBLE explicit list to scripts/test-free-shards.ts so they're curated by exact path, with a reason string. The list is for cases where pattern matching can't infer the failure shape from the source file alone. Curated subset: 66 → 64 tests (~50% of free suite). 14 unit tests in test/test-free-shards.test.ts still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): curate pre-existing breakage from v1.14.0.0 sidebar refactor Round 10 of windows-free-tests fixes. Round 9 cleared shards 7+8; shard 9 surfaced ENOENT for browse/src/sidebar-agent.ts. That file was DELETED in v1.14.0.0 (sidebar REPL refactor — sidebar-agent.ts and the chat queue path were ripped in favor of the interactive xterm.js PTY). 10 security tests still reference it via top-level fs.readFileSync and fail on import. Verified locally: `bun test browse/test/security-source-contracts.test.ts` on this branch reports 0 pass, 1 fail, 1 error. Mac/Linux CI exits 0 because Bun reports module-load failures as "error" not "fail" and the exit code is 0; Windows CI exits 1 (stricter). Same pre-existing breakage on every platform — just only visible in shard 9 of the Windows lane. Fix: add WINDOWS_FRAGILE_PATTERNS entry matching `sidebar-agent.ts` / `src/sidebar-agent` references. Curates browse/test/sidebar-ux.test.ts (other 9 likely caught by paid-eval filter or earlier patterns). Tracked as a follow-up TODO: update or delete the 10 security tests that reference deleted source. Out of scope for v1.20.0.0 portability wave. Curated subset: 64 → 63 tests (~49% of free suite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): broaden sidebar-agent.ts pattern to catch all references * fix(windows-ci): catch ./bin/<name> direct path spawns * fix(windows-ci): scope Windows job to v1.20.0.0 new portability work 12 rounds of curation revealed that gstack has a long tail of tests with environment-specific assumptions (POSIX paths, /tmp, mode bits, bash spawns, deleted v1.14 sidebar refs, HOME=unset guards, Bun polyfill specifics). Each round of pattern-matching curation caught 1-2 new buckets but kept surfacing more. Honest scope for v1.20.0.0: this PR delivers two new portability primitives (bin/gstack-paths + browse/src/claude-bin.ts). The Windows CI job should verify those primitives work on Windows. Full-suite Windows parity is a P4 follow-up that requires touching many tests that aren't part of this PR's scope. Change: windows-free-tests.yml now runs: bun test test/gstack-paths.test.ts \\ browse/test/claude-bin.test.ts \\ test/test-free-shards.test.ts That's 31 tests targeting exactly the new code paths shipped here. The release-note headline ("curated Windows lane added") becomes truthful when this passes — we have a real Windows CI gate on the new portability work, not a rebadged failure-tolerant attempt at the full suite. Retained: scripts/test-free-shards.ts curation logic (informational output via `--list`, useful for future expansion of the Windows lane when contributors port specific tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): invoke bin/gstack-paths via bash (Windows shebang fix) Round 13 of windows-free-tests fixes. Round 12 (scope pivot) revealed all 8 gstack-paths tests fail on Windows because the test invokes the bash shebang script directly: spawnSync(BIN, []) # BIN = path.join(ROOT, 'bin', 'gstack-paths') Windows CreateProcess can't parse `#!/usr/bin/env bash` from the file. The script never runs on Windows via this invocation path. Fix: change to `spawnSync('bash', [BIN], ...)`. This matches production usage — the script is sourced from inside skill bash blocks via `eval "$(~/.claude/skills/gstack/bin/gstack-paths)"`, where bash is always the executor. Mac/Linux behavior is identical (bash invocation of a bash script). Verified locally: 8/8 tests still pass on macOS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): rebump v1.20.0.0 → v1.22.0.0 (queue drift) Version-gate workflow rejected v1.20.0.0 because the queue moved during the windows-free-tests fix loop: v1.16.0.0 → garrytan/gbrowser-unleashed (PR garrytan#1253) [new since last bump] v1.17.0.0 → garrytan/setup-gbrain-run (PR garrytan#1234) v1.19.0.0 → garrytan/browserharness (PR garrytan#1233) v1.21.1.0 → garrytan/pty-plan-mode-e2e (PR garrytan#1255) [new since last bump] Two new sibling PRs landed slot claims while we iterated on Windows. Next free MINOR slot is v1.22.0.0. Updated VERSION, package.json, CHANGELOG header + body. Also pushing the round-13 windows-fix in parallel (test invokes bin/gstack-paths via bash to handle Windows shebang). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): clear USERPROFILE alongside HOME (Git Bash auto-populates HOME) Final Windows fix. 29/31 pass; 2 fail in gstack-paths HOME-unset tests: (fail) CWD fallback when HOME also unset (container env) (fail) PLAN_ROOT chain: GSTACK_PLAN_DIR > CLAUDE_PLANS_DIR > HOME > CWD Root cause: Git Bash on Windows auto-populates `HOME` from `USERPROFILE` at shell startup if HOME is empty/unset. Passing `HOME: ''` to spawnSync does set HOME='' for the child, but Git Bash overwrites it from USERPROFILE during init, so the script sees `${HOME:-}` as non-empty (C:\\Users\\runneradmin) and never reaches the CWD-fallback branch. Fix: clear USERPROFILE='' too. On Linux/Mac it's a no-op (env var doesn't exist in normal env); on Windows Git Bash it stops the HOME auto-populate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): skip HOME-unset assertions on Windows (Git Bash auto-populates) 29/31 → 31/31 expected on Windows. Final fix: The 2 still-failing gstack-paths tests assert CWD-fallback behavior when HOME is genuinely unset (Linux container scenario). On Windows Git Bash, HOME gets auto-derived from USERPROFILE → HOMEDRIVE+HOMEPATH → /c/Users/<user> during shell startup. Clearing all three of those env vars in the spawn still results in HOME being non-empty by the time the script runs. The bash script's CWD-fallback logic IS correct — it just isn't exercisable through the Git Bash test surface. Skip those specific assertions on Windows; they continue to verify on Linux/Mac. This is the only platform-specific test guard introduced; it's narrowly scoped to the unreachable code path, not a bypass of the real check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c7c73e5 commit 52c228e

39 files changed

Lines changed: 1355 additions & 82 deletions

.gitattributes

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Force LF on text files we parse with `\n`-anchored regexes (frontmatter,
2+
# YAML, markdown structure tests). Without this, Windows checkouts with
3+
# core.autocrlf=true convert these to CRLF and break tests that match
4+
# /^---\n...\n---/ against SKILL.md.tmpl frontmatter, etc.
5+
*.md text eol=lf
6+
*.tmpl text eol=lf
7+
*.yml text eol=lf
8+
*.yaml text eol=lf
9+
*.json text eol=lf
10+
*.toml text eol=lf
11+
12+
# Bash scripts must always use LF — CRLF in bash scripts produces bizarre
13+
# "Bad interpreter" / "command not found" errors on Linux runners.
14+
*.sh text eol=lf
15+
*.bash text eol=lf
16+
17+
# Extensionless executables (top-level setup script + bin/gstack-* helpers).
18+
# These are bash scripts checked into git without a `.sh` suffix. Without
19+
# explicit eol=lf, Windows checkout with core.autocrlf=true converts them
20+
# to CRLF and breaks both `\n`-anchored regex tests (test/setup-codesign.test.ts)
21+
# and shebang resolution if the script is ever executed on Linux.
22+
setup text eol=lf
23+
bin/* text eol=lf
24+
**/scripts/* text eol=lf
25+
26+
# TypeScript/JavaScript: LF for portability across the bun toolchain.
27+
*.ts text eol=lf
28+
*.tsx text eol=lf
29+
*.js text eol=lf
30+
*.mjs text eol=lf
31+
*.cjs text eol=lf
32+
33+
# Binary files — never touch.
34+
*.png binary
35+
*.jpg binary
36+
*.jpeg binary
37+
*.gif binary
38+
*.ico binary
39+
*.pdf binary
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
name: Windows Free Tests
2+
3+
# Curated subset of the free test suite that runs on windows-latest.
4+
#
5+
# Codex's v1.18.0.0 review flagged that the existing evals.yml workflow uses
6+
# a Linux container, so a windows-latest matrix entry there isn't a drop-in.
7+
# This workflow is non-container, runs the curated Windows-safe subset, plus
8+
# targeted resolver tests that exercise the Bun.which-based claude binary
9+
# resolution + the GSTACK_CLAUDE_BIN override path on Windows.
10+
#
11+
# What this DOES NOT do (out of scope for v1.18.0.0):
12+
# - Run the full free suite on Windows. The 24 tests that hardcode /bin/sh,
13+
# spawn('sh',...), or raw /tmp/ paths are excluded by scripts/test-free-shards.ts
14+
# --windows-only. They need POSIX-bound surfaces to be ported off shell
15+
# primitives before they can run on Windows. Tracked as a follow-up TODO.
16+
# - Run Playwright/browser-backed tests. Browse server bring-up on Windows is
17+
# a separate concern (PR #1238 windows-pty-bun-pty-fix is in flight).
18+
19+
on:
20+
pull_request:
21+
branches: [main]
22+
workflow_dispatch:
23+
24+
concurrency:
25+
group: windows-free-${{ github.head_ref }}
26+
cancel-in-progress: true
27+
28+
jobs:
29+
windows-free-tests:
30+
runs-on: windows-latest
31+
timeout-minutes: 15
32+
33+
steps:
34+
- uses: actions/checkout@v4
35+
36+
- uses: oven-sh/setup-bun@v1
37+
with:
38+
bun-version: latest
39+
40+
- name: Configure git identity (required by tests that init temp repos)
41+
run: |
42+
git config --global user.email "windows-ci@gstack.test"
43+
git config --global user.name "Windows CI"
44+
git config --global init.defaultBranch main
45+
shell: bash
46+
47+
- name: Install dependencies
48+
run: bun install --frozen-lockfile
49+
50+
- name: Build server-node.mjs (required by Windows browse path)
51+
# browse/src/cli.ts module-level throws on Windows if server-node.mjs
52+
# is missing — Bun can't drive Playwright's Chromium on Windows
53+
# (oven-sh/bun#4253). The bundle must exist for any test that
54+
# transitively loads cli.ts to even import. We build only the
55+
# Node-compatible server bundle here; full `bun run build` would
56+
# also compile every binary which is slow and unnecessary for tests.
57+
run: bash browse/scripts/build-node-server.sh
58+
shell: bash
59+
60+
- name: Generate host SKILL.md outputs (.agents, .factory)
61+
# The golden-file regression tests in test/gen-skill-docs.test.ts read
62+
# .agents/skills/gstack-ship/SKILL.md and .factory/skills/gstack-ship/
63+
# SKILL.md. Both are gitignored — generated on demand by gen:skill-docs.
64+
# On Mac/Linux CI the existing eval workflow regenerates these as part
65+
# of its own pipeline; the windows-free-tests lane doesn't share that
66+
# so it must regenerate explicitly.
67+
run: bun run gen:skill-docs --host all
68+
shell: bash
69+
70+
# The Windows job verifies the new portability work this PR delivers,
71+
# not the entire free suite. After v1.20.0.0 ships, full-suite Windows
72+
# parity is a P4 follow-up TODO that depends on porting many tests off
73+
# POSIX-bound surfaces (raw /tmp paths, /bin/bash hardcodes, bash
74+
# shebang spawns, mode-bit assertions, deleted v1.14 sidebar refs, etc).
75+
#
76+
# The curated subset enumeration in scripts/test-free-shards.ts is
77+
# retained for future expansion — `bun run test:windows --list` gives
78+
# contributors a starting point to grow Windows coverage incrementally.
79+
#
80+
# What we verify here is exactly the new code paths v1.20.0.0 ships:
81+
# - bin/gstack-paths state-root resolution (test/gstack-paths.test.ts)
82+
# - browse/src/claude-bin.ts Bun.which wrapper + override + arg-prefix
83+
# resolution including the GSTACK_CLAUDE_BIN=wsl PATHEXT path
84+
# (browse/test/claude-bin.test.ts)
85+
# - scripts/test-free-shards.ts curation logic itself
86+
# (test/test-free-shards.test.ts)
87+
88+
- name: Show curated subset (informational — for future expansion)
89+
run: bun run scripts/test-free-shards.ts --windows-only --list
90+
shell: bash
91+
continue-on-error: true
92+
93+
- name: Verify new portability work on Windows
94+
# 31 tests targeting the new code paths added by v1.20.0.0. These
95+
# MUST pass for the release-note headline ("curated Windows lane added")
96+
# to be truthful.
97+
run: bun test test/gstack-paths.test.ts browse/test/claude-bin.test.ts test/test-free-shards.test.ts
98+
shell: bash

AGENTS.md

Lines changed: 69 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,44 +6,106 @@ designer, QA lead, release engineer, debugger, and more.
66

77
## Available skills
88

9-
Skills live in `.agents/skills/`. Invoke them by name (e.g., `/office-hours`).
9+
Skills live in `.agents/skills/` (or `~/.claude/skills/gstack/` on Claude Code).
10+
Invoke them by name (e.g., `/office-hours`).
11+
12+
### Plan-mode reviews
1013

1114
| Skill | What it does |
1215
|-------|-------------|
1316
| `/office-hours` | Start here. Reframes your product idea before you write code. |
1417
| `/plan-ceo-review` | CEO-level review: find the 10-star product in the request. |
1518
| `/plan-eng-review` | Lock architecture, data flow, edge cases, and tests. |
1619
| `/plan-design-review` | Rate each design dimension 0-10, explain what a 10 looks like. |
20+
| `/plan-devex-review` | DX-mode review: TTHW, magical moments, friction points, persona traces. |
21+
| `/plan-tune` | Self-tune AskUserQuestion sensitivity per question. |
22+
| `/autoplan` | One command runs CEO → design → eng → DX review. |
1723
| `/design-consultation` | Build a complete design system from scratch. |
24+
25+
### Implementation + review
26+
27+
| Skill | What it does |
28+
|-------|-------------|
1829
| `/review` | Pre-landing PR review. Finds bugs that pass CI but break in prod. |
19-
| `/debug` | Systematic root-cause debugging. No fixes without investigation. |
20-
| `/design-review` | Design audit + fix loop with atomic commits. |
30+
| `/codex` | Second opinion via OpenAI Codex. Review, challenge, or consult modes. |
31+
| `/investigate` | Systematic root-cause debugging. No fixes without investigation. |
32+
| `/design-review` | Live-site visual audit + fix loop with atomic commits. |
33+
| `/design-shotgun` | Generate multiple AI design variants, comparison board, iterate. |
34+
| `/design-html` | Generate production-quality Pretext-native HTML/CSS. |
35+
| `/devex-review` | Live developer experience audit (TTHW measured against the real flow). |
2136
| `/qa` | Open a real browser, find bugs, fix them, re-verify. |
22-
| `/qa-only` | Same as /qa but report only — no code changes. |
23-
| `/ship` | Run tests, review, push, open PR. One command. |
37+
| `/qa-only` | Same methodology as /qa but report only — no code changes. |
38+
39+
### Release + deploy
40+
41+
| Skill | What it does |
42+
|-------|-------------|
43+
| `/ship` | Run tests, review, push, open PR. Workspace-aware version queue. |
44+
| `/land-and-deploy` | Merge the PR, wait for CI and deploy, verify production health. |
45+
| `/canary` | Post-deploy monitoring loop using the browse daemon. |
46+
| `/landing-report` | Read-only dashboard for the workspace-aware ship queue. |
2447
| `/document-release` | Update all docs to match what you just shipped. |
48+
| `/setup-deploy` | One-time deploy config detection (Fly.io, Render, Vercel, etc.). |
49+
| `/gstack-upgrade` | Update gstack to the latest version. |
50+
51+
### Operational + memory
52+
53+
| Skill | What it does |
54+
|-------|-------------|
55+
| `/context-save` | Save working context (git state, decisions, remaining work). |
56+
| `/context-restore` | Resume from a saved context, even across Conductor workspaces. |
57+
| `/learn` | Manage what gstack learned across sessions. |
2558
| `/retro` | Weekly retro with per-person breakdowns and shipping streaks. |
59+
| `/health` | Code quality dashboard (type checker, linter, tests, dead code). |
60+
| `/benchmark` | Performance regression detection (page load, Core Web Vitals). |
61+
| `/benchmark-models` | Cross-model benchmark for skills (Claude, GPT, Gemini side-by-side). |
62+
| `/cso` | OWASP Top 10 + STRIDE security audit. |
63+
| `/setup-gbrain` | Set up gbrain for cross-machine session memory sync. |
64+
65+
### Browser + agent integration
66+
67+
| Skill | What it does |
68+
|-------|-------------|
2669
| `/browse` | Headless browser — real Chromium, real clicks, ~100ms/command. |
70+
| `/open-gstack-browser` | Launch the visible GStack Browser with sidebar + stealth. |
2771
| `/setup-browser-cookies` | Import cookies from your real browser for authenticated testing. |
72+
| `/pair-agent` | Pair a remote AI agent (OpenClaw, Codex, etc.) with your browser. |
73+
74+
### Safety + scoping
75+
76+
| Skill | What it does |
77+
|-------|-------------|
2878
| `/careful` | Warn before destructive commands (rm -rf, DROP TABLE, force-push). |
2979
| `/freeze` | Lock edits to one directory. Hard block, not just a warning. |
3080
| `/guard` | Activate both careful + freeze at once. |
3181
| `/unfreeze` | Remove directory edit restrictions. |
32-
| `/gstack-upgrade` | Update gstack to the latest version. |
82+
| `/make-pdf` | Turn any markdown file into a publication-quality PDF. |
3383

3484
## Build commands
3585

3686
```bash
3787
bun install # install dependencies
38-
bun test # run tests (free, <5s)
88+
bun test # run free tests (no API spend)
89+
bun run test:windows # curated Windows-safe subset (runs on windows-latest)
3990
bun run build # generate docs + compile binaries
4091
bun run gen:skill-docs # regenerate SKILL.md files from templates
4192
bun run skill:check # health dashboard for all skills
4293
```
4394

95+
## Platform support
96+
97+
- **macOS** + **Linux**: full test suite supported.
98+
- **Windows**: curated Windows-safe subset runs on `windows-latest` via the
99+
`windows-free-tests` CI job. Setup script (`./setup`) requires Git Bash or
100+
MSYS today; native PowerShell support is a future expansion. The `bin/gstack-paths`
101+
helper resolves state roots through `CLAUDE_PLUGIN_DATA` / `GSTACK_HOME` so plugin
102+
installs work on every platform.
103+
44104
## Key conventions
45105

46106
- SKILL.md files are **generated** from `.tmpl` templates. Edit the template, not the output.
47107
- Run `bun run gen:skill-docs --host codex` to regenerate Codex-specific output.
48108
- The browse binary provides headless browser access. Use `$B <command>` in skills.
49109
- Safety skills (careful, freeze, guard) use inline advisory prose — always confirm before destructive operations.
110+
- State paths resolve via `bin/gstack-paths` (sourced via `eval "$(...)"`). Honors `GSTACK_HOME`, `CLAUDE_PLUGIN_DATA`, `CLAUDE_PLANS_DIR`.
111+
- The `claude` CLI binary resolves via `browse/src/claude-bin.ts` (`Bun.which()` + `GSTACK_CLAUDE_BIN` override). Set `GSTACK_CLAUDE_BIN=wsl` plus `GSTACK_CLAUDE_BIN_ARGS='["claude"]'` to run Claude through WSL on Windows.

0 commit comments

Comments
 (0)