test: stabilize flaky Poisson(10) niterations convergence test by bvdmitri · Pull Request #98 · ReactiveBayes/ExponentialFamilyProjection.jl

bvdmitri · 2026-06-29T11:11:59Z

Problem

The Poisson projection convergence test fails intermittently in the ExponentialFamily.jl downstream CI job (e.g. run 28360311006). The failure is in the Poisson(10) case of test/projection/projected_to_poisson_tests.jl.

Root cause

test_projection_convergence runs a niterations convergence sub-test that sweeps niterations = 100:50:1000 with a fixed niterations_nsamples = 700 Monte-Carlo samples per run, then test_convergence_to_stable_point requires the tail's rolling std to fall below stdthreshold = 5e-2.

For λ = 10 the Poisson variance is large, so 700 samples leave a KL noise floor right at the 0.05 threshold. The test is deterministic (everything is StableRNG(42)-seeded) so it does not randomly flip — but it sits on the pass/fail boundary, and small numerical drift from a different ExponentialFamily.jl / Julia / BLAS version in the downstream job tips it over. That is what makes it look flaky.

The companion nsamples sweep already runs up to 4000 samples and passes, confirming more samples is the correct lever.

Fix

Raise niterations_nsamples from 700 → 4000 for the Poisson(10) testset only (matching the nsamples_range ceiling that already passes). No shared helpers, other distributions, or convergence criteria are touched.

Verification

Reproduced the exact CI failure locally (same max div = 0.278) and swept sample counts:

`niterations_nsamples`	result	tail rolling-std (w5 / w10)
700 (old)	❌ fail	0.073 / 0.091
2000	✅	0.034 / 0.028
4000 (new)	✅	0.015 / 0.014

All three Poisson test items pass locally (8/8). The Poisson(10) item is slower (~12s vs ~4s), an acceptable cost for the ~3×-below-threshold margin.

🤖 Generated with Claude Code

The `Poisson(10)` niterations convergence sub-test used a fixed 700 Monte-Carlo samples per run. With a large `λ` the Poisson variance is high, so the KL noise floor sat right at the stable-point `stdthreshold` (5e-2). The test is deterministic (StableRNG(42)-seeded) but sits on the pass/fail boundary, so small numerical perturbations from a different ExponentialFamily.jl / Julia / BLAS version in downstream CI flip it. Raise `niterations_nsamples` from 700 to 4000 (matching the `nsamples_range` ceiling that already passes), lowering the tail rolling-std from ~0.09 to ~0.015 — comfortably below the threshold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-29T11:29:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.42%. Comparing base (3ed0092) to head (6ab2fd4).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #98   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          14       14           
  Lines         520      520           
=======================================
  Hits          517      517           
  Misses          3        3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bvdmitri requested a review from Nimrais June 29, 2026 11:12

Nimrais approved these changes Jul 3, 2026

View reviewed changes

Nimrais merged commit 74566b4 into main Jul 3, 2026
6 of 8 checks passed

Nimrais deleted the fix/poisson-flaky-niterations-test branch July 3, 2026 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: stabilize flaky Poisson(10) niterations convergence test#98

test: stabilize flaky Poisson(10) niterations convergence test#98
Nimrais merged 1 commit into
mainfrom
fix/poisson-flaky-niterations-test

bvdmitri commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bvdmitri commented Jun 29, 2026

Problem

Root cause

Fix

Verification

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 29, 2026 •

edited

Loading