Skip to content

test: stabilize flaky Poisson(10) niterations convergence test#98

Merged
Nimrais merged 1 commit into
mainfrom
fix/poisson-flaky-niterations-test
Jul 3, 2026
Merged

test: stabilize flaky Poisson(10) niterations convergence test#98
Nimrais merged 1 commit into
mainfrom
fix/poisson-flaky-niterations-test

Conversation

@bvdmitri

Copy link
Copy Markdown
Member

Problem

The Poisson projection convergence test fails intermittently in the ExponentialFamily.jl downstream CI job (e.g. run 28360311006). The failure is in the Poisson(10) case of test/projection/projected_to_poisson_tests.jl.

Root cause

test_projection_convergence runs a niterations convergence sub-test that sweeps niterations = 100:50:1000 with a fixed niterations_nsamples = 700 Monte-Carlo samples per run, then test_convergence_to_stable_point requires the tail's rolling std to fall below stdthreshold = 5e-2.

For λ = 10 the Poisson variance is large, so 700 samples leave a KL noise floor right at the 0.05 threshold. The test is deterministic (everything is StableRNG(42)-seeded) so it does not randomly flip — but it sits on the pass/fail boundary, and small numerical drift from a different ExponentialFamily.jl / Julia / BLAS version in the downstream job tips it over. That is what makes it look flaky.

The companion nsamples sweep already runs up to 4000 samples and passes, confirming more samples is the correct lever.

Fix

Raise niterations_nsamples from 7004000 for the Poisson(10) testset only (matching the nsamples_range ceiling that already passes). No shared helpers, other distributions, or convergence criteria are touched.

Verification

Reproduced the exact CI failure locally (same max div = 0.278) and swept sample counts:

niterations_nsamples result tail rolling-std (w5 / w10)
700 (old) ❌ fail 0.073 / 0.091
2000 0.034 / 0.028
4000 (new) 0.015 / 0.014

All three Poisson test items pass locally (8/8). The Poisson(10) item is slower (~12s vs ~4s), an acceptable cost for the ~3×-below-threshold margin.

🤖 Generated with Claude Code

The `Poisson(10)` niterations convergence sub-test used a fixed 700
Monte-Carlo samples per run. With a large `λ` the Poisson variance is
high, so the KL noise floor sat right at the stable-point `stdthreshold`
(5e-2). The test is deterministic (StableRNG(42)-seeded) but sits on the
pass/fail boundary, so small numerical perturbations from a different
ExponentialFamily.jl / Julia / BLAS version in downstream CI flip it.

Raise `niterations_nsamples` from 700 to 4000 (matching the
`nsamples_range` ceiling that already passes), lowering the tail
rolling-std from ~0.09 to ~0.015 — comfortably below the threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bvdmitri bvdmitri requested a review from Nimrais June 29, 2026 11:12
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.42%. Comparing base (3ed0092) to head (6ab2fd4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #98   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          14       14           
  Lines         520      520           
=======================================
  Hits          517      517           
  Misses          3        3           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Nimrais Nimrais merged commit 74566b4 into main Jul 3, 2026
6 of 8 checks passed
@Nimrais Nimrais deleted the fix/poisson-flaky-niterations-test branch July 3, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants