Skip to content

PlaceboInTime: end-to-end reproducibility — seed posterior predictive and unify seed surface #876

@drbenvincent

Description

@drbenvincent

Problem

PlaceboInTime has three independent stochastic stages, and the seed handling for them is fragmented and incomplete. As a result, even when a user passes every seed argument the API currently exposes, the printed verdict (P(actual outside null)) and the rendered HTML report can drift between runs.

Current state

Looking at causalpy/checks/placebo_in_time.py:

Stage Where in code Current seeding
Per-fold experiment fits _get_factoryclone_model(kw[\"model\"]) Inherits the headline model's sample_kwargs[\"random_seed\"]. Reproducible, but every fold gets the same seed.
Hierarchical status-quo pm.sample line 389: pm.sample(**self.sample_kwargs) Only if user passes PlaceboInTime(sample_kwargs={\"random_seed\": ...}). Undocumented as the recommended way to seed this stage.
Hierarchical pm.sample_posterior_predictive for theta_new line 399 Not seeded — no parameter exposed. This is the real gap.
Assurance simulation np.random.default_rng line 490 Uses the constructor's random_seed= argument.

The constructor docstring says `random_seed` is "RNG seed for the assurance simulation", which is technically accurate but misleading: a user who reads it as "the seed for this check" will still get non-reproducible `P(actual outside null)` because the posterior predictive call is unseeded.

Repro

Even with everything possible seeded today:

import causalpy as cp

seed = 42
df = cp.load_data(\"sc\")

result = cp.Pipeline(
    data=df,
    steps=[
        cp.EstimateEffect(
            method=cp.SyntheticControl,
            treatment_time=70,
            control_units=[\"a\", \"b\", \"c\", \"d\", \"e\", \"f\", \"g\"],
            treated_units=[\"actual\"],
            model=cp.pymc_models.WeightedSumFitter(
                sample_kwargs={\"random_seed\": seed},
            ),
        ),
        cp.SensitivityAnalysis(
            checks=[
                cp.checks.PlaceboInTime(
                    n_folds=2,
                    sample_kwargs={\"random_seed\": seed},
                    random_seed=seed,
                ),
            ],
        ),
    ],
).run()

for cr in result.sensitivity_results:
    print(cr.text)

Two consecutive runs of the same code produce, for example:

  • Run A: `P(actual outside null) = 0.915`
  • Run B: `P(actual outside null) = 0.923`

Both runs share the same per-fold means/sds and the same hierarchical `mu` / `tau`. The drift is entirely from the unseeded `pm.sample_posterior_predictive` on `theta_new`. This was hit in PR #871 while writing the SC sensitivity walkthrough docs.

Proposed fix

Unify the seed surface so a single argument makes the entire check deterministic.

  1. Pass an explicit seed to pm.sample_posterior_predictive, e.g.:
 pp = pm.sample_posterior_predictive(
     idata, var_names=[\"theta_new\"], random_seed=self.random_seed,
 )
  1. Reuse the existing constructor random_seed argument as the master seed for all stochastic stages of the check (hierarchical pm.sample, pm.sample_posterior_predictive, assurance simulation, and ideally also fanned out across fold experiments so each fold gets a distinct but reproducible seed).
  2. Update the docstring so random_seed is described as the master seed for the whole check, not just for the assurance simulation. Document that values inside sample_kwargs[\"random_seed\"] (if explicitly provided) take precedence for the hierarchical fit.
  3. Optional: derive per-fold seeds deterministically (e.g. seed + fold_idx) so folds are seeded independently rather than identically — this matters once PlaceboInTime: guard against placebo folds with insufficient pre-period #875 lands and folds are more numerous.

Acceptance criteria

  • Two back-to-back runs of PlaceboInTime with the same random_seed produce identical CheckResult.text, metadata[\"p_effect_outside_null\"], and metadata[\"null_samples\"] arrays.
  • Setting random_seed alone (without also threading a seed through sample_kwargs) is sufficient for full reproducibility.
  • Existing API surface remains backward compatible (the assurance simulation still respects random_seed).
  • Docstring clearly describes the seeding contract.

Related

Metadata

Metadata

Assignees

Labels

OSS_PRODUCTOSS_PRODUCT project priorities. Labs members should get approval before logging hours.bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions