Skip to content

Fix reproducibility CI: regenerate stale BCF references and fail on drift#414

Merged
andrewherren merged 1 commit into
mainfrom
fix-bcf-reproducibility-refs
Jun 27, 2026
Merged

Fix reproducibility CI: regenerate stale BCF references and fail on drift#414
andrewherren merged 1 commit into
mainfrom
fix-bcf-reproducibility-refs

Conversation

@andrewherren

Copy link
Copy Markdown
Collaborator

Problem

The cross-platform reproducibility jobs were passing despite mismatches. Two issues combined:

  1. The check scripts only print differences — they never exit non-zero, so the job stayed green even when results didn't match the references.
  2. Under that blind spot, the four BCF reference CSVs had gone stale: every BCF value (200 y_hat + 100 sigma2) differed from the stored reference in both R and Python. (BART was fine.)

The current BCF output is in fact cross-platform reproducible — it agrees to ~1e-9 across ubuntu/windows/macos with 0 mismatches at the 1e-6 tolerance. The references were simply out of date, which is why every platform reported the same differences.

Changes

  • Regenerated the 4 stale BCF reference CSVs from the v0.4.5 release build (origin/main == v0.4.5), built and run in both R and Python, in the original CSV format (headers, precision, line counts unchanged).
  • Made the checks fail loudly: simple-bart/simple-bcf (R and Python) now exit non-zero on any mismatch, so future drift turns the CI job red instead of passing silently.

Verification

  • All 4 scripts pass with exit 0 against the regenerated references.
  • Confirmed the new fail path returns exit 1 when a reference is perturbed (R and Python).
  • BART references untouched (already correct).

🤖 Generated with Claude Code

…rift

The cross-platform reproducibility jobs were passing despite mismatches
because the check scripts only printed differences and never exited
non-zero. Under that blind spot, the four BCF reference CSVs had gone
stale: every BCF value (200 y_hat + 100 sigma2) differed from the stored
reference in both R and Python, while BART was fine.

Current BCF output is in fact cross-platform reproducible (agrees to
~1e-9 across ubuntu/windows/macos, 0 mismatches at the 1e-6 tolerance);
the references were simply out of date. Regenerated them from the v0.4.5
release build (origin/main == v0.4.5) in the original CSV format.

Also make simple-bart and simple-bcf (R and Python) exit non-zero on any
mismatch so future drift turns the CI job red instead of passing silently.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andrewherren andrewherren merged commit 2fc312a into main Jun 27, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant