Skip to content

Boiling the Windows CI Lake#1122

Draft
scarson wants to merge 6 commits intogarrytan:mainfrom
scarson:docs/windows-ci-rfc
Draft

Boiling the Windows CI Lake#1122
scarson wants to merge 6 commits intogarrytan:mainfrom
scarson:docs/windows-ci-rfc

Conversation

@scarson
Copy link
Copy Markdown

@scarson scarson commented Apr 21, 2026

Boiling the Windows CI Lake

Status: Draft for discussion. Not a merge request yet — direction confirm first.

📖 Read the RFC rendered, not as a diff: docs/designs/WINDOWS_CI.md on scarson:docs/windows-ci-rfc

What's in this PR

Three files:

  • docs/designs/WINDOWS_CI.md — the RFC itself. Full problem statement, phased rollout, goals/non-goals, receipts. (rendered view)
  • .github/workflows/windows-smoke.yml — Phase 1 of the RFC. ~95 lines. Builds on windows-latest (free for public repos under GitHub's Jan 2026 pricing), asserts .exe layout, runs 5 Windows-sensitive unit-test files + a make-pdf render smoke.
  • browse/test/home-dir-resolution.test.ts — a regression test for the fix(browse,design): resolve ~/.gstack via os.homedir() on Windows #1120 home-directory-fallback class of bug, included because without it the Phase 1 coverage table has a ❌ row.

TL;DR

gstack ships Windows bugs that no CI job ever sees, because no workflow runs on Windows. Five instances of this pattern have been filed in the last month (one merged in #1024, four open from me in the last 24 hours: #1118, #1119, #1120, #1121). This RFC proposes a phased rollout, starting with a cheap smoke CI that would have surfaced all five at PR time.

It builds on the root-cause fix in #1024 (v0.18.0.1) — that PR made CI failures loud; this RFC makes the right code paths actually run.

Receipts

Phase 1 is not a proposal — it's already running on scarson/gstack. Every row of the RFC's coverage table has a live CI run link.

What PR Status Run Wall clock
Green baseline (all four fix PRs applied) scarson#4 ✅ success run 24713325443 59s
Catches #1024 build-fail scarson#2 ❌ fail (expected) run 24713460340 ~1m
Catches #1118 binary-resolution scarson#3 ❌ fail (expected) run 24713462662 ~1m
Catches #1119 shebang-spawn scarson#5 ❌ fail (expected) run 24713463096 ~1m
Catches #1120 home-dir (via new regression test) scarson#7 ❌ fail (expected) run 24713464002 ~1m
Catches #1121 ACL / chmod no-op scarson#6 ❌ fail (expected) run 24713465618 ~1m

Under 1:15 wall clock on GitHub-hosted windows-latest with a cold runner — well under the 3-5 min estimate in the original RFC draft. Cost: $0 (public repo).

Why the title

From gstack's "Boil the Lake" ethos: always do the complete thing when AI makes the marginal cost near-zero. The complete thing here isn't "propose a workflow" — it's "ship a workflow that demonstrably catches every recent Windows regression, against live CI receipts." That's what this PR is.

The one open question

Gate or continue-on-error for the first 2 weeks? My default: continue-on-error: true, report failures in the PR Checks tab, flip to gating after a clean-signal review. This is the one judgment call I don't want to make unilaterally — you have better context on gstack's flake-tolerance norms. See the RFC §"Open question for @garrytan" for reasoning.

Everything else (runner sourcing, scope of the YAML, Phase 2 skip-list ownership, tier-vocabulary placement) is proposed with a defensible default in the RFC body. Push back inline on any of them if they land wrong.

Sequencing

This PR assumes #1118 / #1119 / #1121 merge first — those add the test files this workflow invokes. If you'd prefer to sequence differently, say so; the workflow's test list is straightforward to narrow.

What this PR does NOT do

  • Phase 2 (broader unit test subset + skip-list triage) — separate PR, only after Phase 1 lands.
  • Phase 3 (E2E + make-pdf-gate on Windows) — gated on widening pdftotext.normalize() to handle Xpdf/Poppler-Windows output. Not volunteered work on my end.
  • Solve the pdftotext tolerance question from make-pdf-gate.yml:26. That's its own work.
  • Enable full Windows matrix expansion across every existing workflow. Phase 1 is a new standalone workflow, not a matrix addition to existing ones.

🤖 Generated with Claude Code

scarson and others added 6 commits April 21, 2026 03:38
RFC proposing a cheap Phase-1 smoke CI to catch the recurring
"works on Linux/macOS, breaks on Windows" bug pattern before merge.

Builds on the root-cause fix in garrytan#1024 (v0.18.0.1) — that PR made CI
failures loud; this RFC makes the right code paths actually run.

Phase 1 (this deliverable): windows-smoke.yml that builds binaries,
asserts .exe layout, smoke-tests each compiled binary, runs four
Windows-specific unit tests plus a make-pdf render smoke.

Phase 2 (follow-on): broader unit test subset, gated on triaging
7 pre-existing Windows-flaky tests.

Phase 3 (unowned, flagged only): E2E + make-pdf-gate on Windows,
gated on widening pdftotext.normalize() to handle Xpdf / Poppler-
Windows output divergences.

Draft for discussion. One open question: gate or continue-on-error
for the first 2 weeks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the Phase 1 workflow proposed in docs/designs/WINDOWS_CI.md.

Triggers:
  - pull_request against main (paths: browse, make-pdf, design, scripts,
    bin, package.json, bun.lockb, self)
  - push to main (same paths)
  - workflow_dispatch

Steps:
  1. bun install --frozen-lockfile
  2. bun run build
  3. Assert .exe layout (browse, find-browse, pdf, design) + server-node.mjs
  4. Smoke-test binaries via --version
  5. Windows-specific unit tests (4 files)
  6. make-pdf render smoke

Includes a concurrency group so rapid repush on the same PR doesn't stack
runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tashonna-labs
Copy link
Copy Markdown

The focused smoke-test philosophy is pragmatic — it avoids the trap of trying to make every test pass on Windows before shipping anything. The path filter coverage looks solid for the primary Windows-compatibility culprits. One thing I noticed: bun.lockb is in the trigger paths, which means any dependency bump will run this CI even if no Windows-relevant code changed. Might be worth evaluating after the first few weeks of run history whether the lockfile trigger is catching real issues or just adding noise — you can always narrow the paths filter once you have data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants