You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PlaceboInTime has three independent stochastic stages, and the seed handling for them is fragmented and incomplete. As a result, even when a user passes every seed argument the API currently exposes, the printed verdict (P(actual outside null)) and the rendered HTML report can drift between runs.
Current state
Looking at causalpy/checks/placebo_in_time.py:
Stage
Where in code
Current seeding
Per-fold experiment fits
_get_factory → clone_model(kw[\"model\"])
Inherits the headline model's sample_kwargs[\"random_seed\"]. Reproducible, but every fold gets the same seed.
Hierarchical status-quo pm.sample
line 389: pm.sample(**self.sample_kwargs)
Only if user passes PlaceboInTime(sample_kwargs={\"random_seed\": ...}). Undocumented as the recommended way to seed this stage.
Hierarchical pm.sample_posterior_predictive for theta_new
line 399
Not seeded — no parameter exposed. This is the real gap.
Assurance simulation np.random.default_rng
line 490
Uses the constructor's random_seed= argument.
The constructor docstring says `random_seed` is "RNG seed for the assurance simulation", which is technically accurate but misleading: a user who reads it as "the seed for this check" will still get non-reproducible `P(actual outside null)` because the posterior predictive call is unseeded.
Two consecutive runs of the same code produce, for example:
Run A: `P(actual outside null) = 0.915`
Run B: `P(actual outside null) = 0.923`
Both runs share the same per-fold means/sds and the same hierarchical `mu` / `tau`. The drift is entirely from the unseeded `pm.sample_posterior_predictive` on `theta_new`. This was hit in PR #871 while writing the SC sensitivity walkthrough docs.
Proposed fix
Unify the seed surface so a single argument makes the entire check deterministic.
Pass an explicit seed to pm.sample_posterior_predictive, e.g.:
Reuse the existing constructor random_seed argument as the master seed for all stochastic stages of the check (hierarchical pm.sample, pm.sample_posterior_predictive, assurance simulation, and ideally also fanned out across fold experiments so each fold gets a distinct but reproducible seed).
Update the docstring so random_seed is described as the master seed for the whole check, not just for the assurance simulation. Document that values inside sample_kwargs[\"random_seed\"] (if explicitly provided) take precedence for the hierarchical fit.
Two back-to-back runs of PlaceboInTime with the same random_seed produce identicalCheckResult.text, metadata[\"p_effect_outside_null\"], and metadata[\"null_samples\"] arrays.
Setting random_seed alone (without also threading a seed through sample_kwargs) is sufficient for full reproducibility.
Existing API surface remains backward compatible (the assurance simulation still respects random_seed).
Problem
PlaceboInTimehas three independent stochastic stages, and the seed handling for them is fragmented and incomplete. As a result, even when a user passes every seed argument the API currently exposes, the printed verdict (P(actual outside null)) and the rendered HTML report can drift between runs.Current state
Looking at
causalpy/checks/placebo_in_time.py:_get_factory→clone_model(kw[\"model\"])sample_kwargs[\"random_seed\"]. Reproducible, but every fold gets the same seed.pm.samplepm.sample(**self.sample_kwargs)PlaceboInTime(sample_kwargs={\"random_seed\": ...}). Undocumented as the recommended way to seed this stage.pm.sample_posterior_predictivefortheta_newnp.random.default_rngrandom_seed=argument.The constructor docstring says `random_seed` is "RNG seed for the assurance simulation", which is technically accurate but misleading: a user who reads it as "the seed for this check" will still get non-reproducible `P(actual outside null)` because the posterior predictive call is unseeded.
Repro
Even with everything possible seeded today:
Two consecutive runs of the same code produce, for example:
Both runs share the same per-fold means/sds and the same hierarchical `mu` / `tau`. The drift is entirely from the unseeded `pm.sample_posterior_predictive` on `theta_new`. This was hit in PR #871 while writing the SC sensitivity walkthrough docs.
Proposed fix
Unify the seed surface so a single argument makes the entire check deterministic.
pm.sample_posterior_predictive, e.g.:random_seedargument as the master seed for all stochastic stages of the check (hierarchicalpm.sample,pm.sample_posterior_predictive, assurance simulation, and ideally also fanned out across fold experiments so each fold gets a distinct but reproducible seed).random_seedis described as the master seed for the whole check, not just for the assurance simulation. Document that values insidesample_kwargs[\"random_seed\"](if explicitly provided) take precedence for the hierarchical fit.seed + fold_idx) so folds are seeded independently rather than identically — this matters once PlaceboInTime: guard against placebo folds with insufficient pre-period #875 lands and folds are more numerous.Acceptance criteria
PlaceboInTimewith the samerandom_seedproduce identicalCheckResult.text,metadata[\"p_effect_outside_null\"], andmetadata[\"null_samples\"]arrays.random_seedalone (without also threading a seed throughsample_kwargs) is sufficient for full reproducibility.random_seed).Related