Add outcome falsification, random placebo folds, and model clone support#826
Conversation
These directories contain exploratory/demo scripts that are not part of the main package and should not count toward docstring coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…TimeSeries Sensitivity checks like PlaceboInTime and OutcomeFalsification need to create fresh, unfitted copies of models. These _clone methods preserve all configuration (components, sample_kwargs, mode) while resetting fitted state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Time Adds a "random" selection_method that randomly samples eligible placebo windows from the pre-intervention period, with constraints on minimum training fraction, minimum gap between folds, and optional period exclusion. Also fixes the assurance simulation to correctly model the alternative hypothesis as null baseline noise + expected treatment effect (theta_new + expected_effect), matching the paper's formulation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces outcome falsification, which re-fits the experiment with alternative outcome formulas and reports their estimated effect sizes with HDI intervals. This is an informational check (no pass/fail) that lets researchers assess whether the pattern of effects across outcomes is consistent with their causal story. Inspired by the "causal detective" approach in Gallea (2026). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a developer notebook documenting the placebo-in-time and outcome falsification sensitivity check methodology with worked examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #826 +/- ##
==========================================
+ Coverage 95.03% 95.12% +0.08%
==========================================
Files 90 92 +2
Lines 14117 14803 +686
Branches 851 890 +39
==========================================
+ Hits 13416 14081 +665
- Misses 490 505 +15
- Partials 211 217 +6 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Bit of a mega payload here. There are a bunch of definitely different themes going on. In an ideal world, future PR's would be broken up into slightly more discrete PR's. Partly because it could be easier to revert any changes, partly because it's easier to review smaller PR's. But if there are genuine interactions/dependencies then cool. Review incoming... |
drbenvincent
left a comment
There was a problem hiding this comment.
Thanks for the PR! The sensitivity-checking work here is solid — the OutcomeFalsification check and the random fold selection for PlaceboInTime are both well-motivated features, and the article notebook is pedagogically excellent.
I have comments grouped by severity below.
Must-fix
1. _clone() methods should forward user priors (pymc_models.py)
Both new _clone() methods (BayesianBasisExpansionTimeSeries and StateSpaceTimeSeries) omit priors=self._user_priors, unlike the base PyMCModel._clone() which passes it through. If a user provided custom priors to the original model, the clone silently drops them, meaning the cloned model may behave differently from the original. Both methods should forward self._user_priors for consistency with the base class pattern.
2. selection_method should use Literal (placebo_in_time.py)
The project's AGENTS.md type-checking rules specify: "use Literal for constrained string parameters." Literal is already used across 15+ source files in the codebase. selection_method: str should be selection_method: Literal["sequential", "random"].
Should-fix
3. min_gap semantics are confusing (placebo_in_time.py)
The docstring says min_gap is the "minimum index distance between any two selected candidate positions in the sorted candidate list." But test_random_fold_respects_min_gap asserts that the actual treatment times differ by >= min_gap, not the candidate-list indices. These are different quantities — a gap of 5 in candidate-list positions doesn't necessarily mean 5 units of actual time between folds (candidates may not be contiguous).
Either:
- Clarify the docstring to match the actual intent, or
- Change the constraint to operate on actual time distance between selected folds, which is what users likely care about.
4. Assurance formula fix needs a targeted test (placebo_in_time.py)
The change to the assurance simulation (from true_effect = expected_effect to true_effect = theta_new + expected_effect) is a behavioral change to existing functionality. The commit message explains the rationale well, but there's no test that specifically validates the new formula is correct (or demonstrates the old one was wrong). A test that checks assurance power changes in the expected direction under a known scenario would give confidence in this fix.
5. Bare except Exception is too broad (outcome_falsification.py)
In the run() loop, except Exception swallows everything. This is convenient for robustness, but it can hide real bugs during development (e.g., an AttributeError from a code change in the experiment class). Consider catching a more targeted set of exceptions (e.g., (ValueError, PatsyError, RuntimeError)) and letting unexpected errors propagate. The exc_info=True logging is good, but users who don't check logs will never know something unexpected happened.
6. FalsificationResult stores the full fitted experiment (outcome_falsification.py)
Each FalsificationResult holds a reference to the complete fitted BaseExperiment (including its InferenceData). For a check with many falsification formulas, this could consume significant memory. Consider whether storing only the summary statistics (already captured in effect_mean, hdi_lower, hdi_upper) would suffice for most use cases, and document that the full experiments are retained for users who want to inspect posteriors.
Suggestions / nits
7. External data dependency in notebook
The notebook fetches data at runtime from https://nyc3.digitaloceanspaces.com/owid-public/data/.... If these URLs change or go down, the notebook breaks. Consider bundling the dataset as a CausalPy example dataset, or at minimum noting the dependency in a comment.
8. "Co2" should be "CO₂" in prose
Throughout the notebook markdown, "Co2" is used where "CO₂" (or at least "CO2") would be the standard scientific notation. Variable/column names like coal_co2 are fine, but the prose should use the conventional form.
9. Test setup duplication
Several integration tests in test_outcome_falsification.py repeat the exact same setup block (create data, create experiment, create context, populate experiment_config). A shared pytest fixture would reduce ~60 lines of duplication and make the tests easier to maintain.
10. Greedy fold selection can fail unnecessarily (placebo_in_time.py)
The random fold selection algorithm picks one fold at a time without backtracking. This means a bad early pick can make it impossible to select remaining folds even when valid selections exist. For small candidate pools with large min_gap, this could lead to unnecessary ValueErrors. A note in the docstring would help, or the algorithm could retry with shuffled order.
Things I liked
- The "causal detective" pedagogical framing in the notebook is genuinely excellent.
- The informational-only design (
passed=None) forOutcomeFalsificationis the right call — this check should inform, not gate. - The test coverage is thorough, with both unit tests (no sampling) and integration tests for both new features.
- The interrogate config fix is a nice drive-by cleanup.
Must-fix: - pymc_models: forward user priors through BayesianBasisExpansionTimeSeries and StateSpaceTimeSeries _clone() (add priors kwarg to both __init__s) - placebo_in_time: tighten selection_method to Literal["sequential", "random"] Should-fix: - placebo_in_time: min_gap now measures observation-count distance between selected folds (tracked via each candidate's position in the sorted pre-period index), matching the test's intent - test_placebo_in_time: add targeted unit tests pinning the corrected assurance formula (true_effect = theta_new + expected_effect) - outcome_falsification: narrow the broad except to (PatsyError, FormulaException, DataException, ValueError, KeyError, RuntimeError, LinAlgError) and warnings.warn on skip so silent failures surface - outcome_falsification: add store_experiments flag so callers can drop fitted experiments and keep only the summary stats Nits: - notebook: add comment flagging the OWID URL runtime dependency - notebook: CO2 -> CO2 (with subscript) in markdown prose - test_outcome_falsification: extract shared its_context fixture - placebo_in_time: document greedy-selection failure mode in docstring and error message Tests: - add regression tests that _clone() preserves user_priors on both time-series models Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drbenvincent
left a comment
There was a problem hiding this comment.
Review by claude-opus-4-7-xhigh. Code-only review — Ben will manually review the new its_place_in_time_analysis.ipynb docs page.
Thanks for the thorough turnaround on the prior round — the two new assurance-formula tests in particular are a nice way to pin the corrected math, and the _clone() priors regression tests directly address what was flagged. CI is green, patch coverage is 96.65%, and every item from the previous review is accounted for in 3ab9fd6.
A few remaining concerns and nits below, none of which I think are blockers.
Should-fix
1. min_gap=1 is silently a no-op (placebo_in_time.py)
With sampling without replacement, abs(pos_i - pos_j) >= 1 is trivially true for any two distinct candidates (positions are unique integers). So the documented default "minimum 1 observation gap" doesn't actually constrain anything beyond "don't pick the same candidate twice". Either:
- document this honestly (e.g., "default
1imposes no spacing constraint"), or - pick a more useful default such as
intervention_length, which would nudge users toward non-overlapping placebo windows by default.
2. Random placebo folds can overlap each other (placebo_in_time.py)
The candidate filter pseudo_end > treatment_time prevents a fold from overlapping the real intervention, but nothing prevents two random folds whose pseudo-windows overlap each other (e.g., positions 40 and 45 with intervention_length=10). Overlapping folds share observations, which violates the exchangeability assumption implicit in the hierarchical status-quo model — each fold_mean is treated as an independent draw from a common mu_status_quo. A min_gap = intervention_length default in random mode would solve both this and (1); at minimum, a docstring warning would help.
3. store_experiments=True is still the default (outcome_falsification.py)
Adding the flag addresses the memory concern as an opt-out, but for the common case (a handful of falsification formulas, each retaining a full BaseExperiment with InferenceData) the footprint can run into hundreds of MB on PiecewiseITS or large samples. The summary stats are what most users actually need; consider flipping the default to False and documenting True as the inspect-the-posteriors opt-in.
4. _draw_expected_effect_samples ignores n for numpy arrays (placebo_in_time.py)
def _draw_expected_effect_samples(self, n: int) -> np.ndarray:
"""Draw samples from the expected-effect prior."""
prior = self.expected_effect_prior
if prior is None:
raise ValueError("expected_effect_prior is not set.")
if isinstance(prior, np.ndarray):
return priorWhen the user passes a numpy array, it is returned verbatim regardless of the requested n. _compute_assurance then wraps with i % len(expected_samples), so it doesn't crash, but a 10-element user-supplied array will cycle 10-to-1 against a 4000-sample theta_new without any signal to the user. Either tile/subsample to n, or document the cycling behaviour in the docstring (and ideally warn when len(prior) < n).
5. Greedy failure path is stochastic and unfriendly to reproducibility
The "retry with a different random_seed" suggestion in the error message defeats the purpose of passing a seed. A bounded internal retry loop (e.g., up to N reshuffles using deterministic sub-seeds derived from random_seed) would eliminate the failure for almost all realistic configurations while preserving reproducibility. Not a must-fix, but cheap and user-friendly.
Nits
6. OutcomeFalsification.__repr__ always shows store_experiments
Even at default. PlaceboInTime.__repr__ uses the nicer "only show non-default" pattern — worth mirroring here.
7. np.linalg.LinAlgError in the caught-exceptions tuple is dead defense
It's not actually raised by the PyMC sampling path for these models. Harmless, just noting.
8. Random-mode pre-period is effectively halved when treatment_end_time is unset
_compute_intervention_length falls back to data.index.max() - treatment_time, so the candidate filter pseudo_end > treatment_time requires pos_val + post_length < treatment_time. For a typical 75/25 split that leaves ~50% of the pre-period as eligible. Worth a one-line note in the random-mode docstring so users aren't surprised.
9. test_run_handles_failed_formula uses a syntactically invalid formula
The comment correctly attributes this to a Python 3.13 / patsy traceback interaction, but the more common real-world failure mode is a missing-column formula. A short link to the upstream bug (or a TODO to swap back when fixed) would be useful long term.
Process note (for future PRs, not this one)
Echoing the earlier "mega payload" point: five logically independent themes (interrogate config, time-series _clone(), random placebo folds, new OutcomeFalsification check, PanelRegression plot bug) in one PR makes review slower and revert harder. Worth keeping in mind for the next round rather than reshuffling this one.
drbenvincent
left a comment
There was a problem hiding this comment.
Follow-up by claude-opus-4-7-xhigh.
Must-fix: notebook is not wired into the docs
docs/source/notebooks/its_place_in_time_analysis.ipynb exists in the repo but is not referenced from any toctree. Specifically:
- It is not in
docs/source/notebooks/index.md(notoctreeentry under any section). - It is not linked from
docs/source/notebooks/sensitivity_checks.md, even though that page maintains a "Where examples already exist" list and a dedicatedPlaceboInTimedescription that would be the natural pointers. - No
:glob:directive picks it up automatically.
The Read the Docs preview happens to render the page because Sphinx builds every .ipynb it finds, but the page will be an orphan: not reachable from the sidebar or any index, and Sphinx will emit a "document isn't included in any toctree" warning at build time.
Question on placement
Where do you want this notebook to live in the navigation? My hunch is ITS-specific — the content is built around an ITS analysis of the UK 2008 climate intervention and PlaceboInTime is one of the headline diagnostics for ITS. In that case, it belongs in the existing Interrupted Time Series toctree in docs/source/notebooks/index.md:
:::{toctree}
:caption: Interrupted Time Series
:maxdepth: 1
its_skl.ipynb
its_pymc.ipynb
its_post_intervention_analysis.ipynb
its_covid.ipynb
its_lift_test.ipynb
its_place_in_time_analysis.ipynb
:::Alternatively, since PlaceboInTime is a sensitivity check and the prose page sensitivity_checks.md already has a section for it, it could go in the Sensitivity Checks toctree:
:::{toctree}
:caption: Sensitivity Checks
:maxdepth: 1
sensitivity_checks.md
its_place_in_time_analysis.ipynb
:::Could you confirm which section you want it under? Either way, please also add a discoverability link from sensitivity_checks.md — at minimum extending the existing line:
- `PlaceboInTime`: {doc}`pipeline_workflow`, {doc}`report_demo`, {doc}`its_place_in_time_analysis`so users land on the worked example from the sensitivity-checks prose entry point.
|
Ben thinks ITS is the most appropriate section |
drbenvincent
left a comment
There was a problem hiding this comment.
Comments on the docs
For code cells focussing on plotting, add the hide-input cell tag. That will collapse the code so that it's hidden, but viewable if people want.
See what you can do about the sampler warnings
For cells with lots of output, add the hide-output tag. That should also be collapsable. It's fine if the output is the important point, but where you get lots of lines of sampler output and it's not a core part of the story, then just hide/collapse that stuff
More from me on the notebook soon - trying to fit this review in in the evening, but sleepiness is winning.
| "Chain 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", | ||
| "Chain 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", | ||
| "Chain 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.\n", | ||
| "Sampling: [beta, delta, fourier_beta |
There was a problem hiding this comment.
Suggestion: promote plot_placebo_calibration into the codebase
claude-opus-4-7-xhigh here. Ben will manually review the docs page itself; this is a code-shape suggestion based on what the notebook defines.
The notebook defines a ~130-line helper plot_placebo_calibration(pit_check, original_result, title=...) and then calls it three times on different datasets. That repeated reuse is a strong signal it's general, not example-specific.
Why it belongs in the library
Looking at what the function actually consumes, it's entirely keyed on the PlaceboInTime CheckResult contract:
pit_check.metadata["fold_results"](list ofPlaceboFoldResult)pit_check.metadata["null_samples"]pit_check.metadata["actual_cumulative_mean"]pit_check.metadata["p_effect_outside_null"]original_result.post_impact— present on everyapplicable_methodsmember ofPlaceboInTime(ITS and SyntheticControl)
It already handles both datetime and integer pseudo_treatment_time formatting, so it's general for every experiment type PlaceboInTime supports, not ITS-specific.
There is also already a documented home for this in CheckResult:
# causalpy/checks/base.py
figures : list
Optional matplotlib figures produced by the check.
...
figures: list[Any] = field(default_factory=list)…but no check currently populates it, so GenerateReport never has plots to surface. Promoting this helper would be the first concrete user of that field.
Suggested shape
Two clean options, not mutually exclusive:
- Method on the check class —
PlaceboInTime.plot_calibration(check_result, experiment) -> Figure. Mirrors howBaseExperimentexposes its plotting methods. Lives next to the metadata schema it depends on, incausalpy/checks/placebo_in_time.py. - Auto-populate
CheckResult.figures— call the plotter insidePlaceboInTime.run()(gated behind something likemake_figures: bool = True) and append theFigureto the result. This makesGenerateReportstrictly more useful for free.
I'd lean toward (1) as the building block and (2) as the default behaviour.
Cleanups before promoting
- The
original_resultarg is only used to extractpost_impact. Sincerun()already hasexperimentin scope, you could computeactual_samplesonce duringrun()and stash it inmetadata(e.g.metadata["actual_cumulative_samples"]), then the plotter takes just theCheckResult— no second positional arg, no easy way to mismatch them. - Hard-coded
FOLD_COLORS(5 colors) silently wraps past 5 folds; promote to a module-level_DEFAULT_FOLD_COLORSor use matplotlib's color cycle. - The
print(...)fallback for "not enough folds completed" should bewarnings.warn(or return an empty figure with an annotation) —printis noisy inside library code. - Drop the trailing
plt.show(); return theFigureand let the caller / notebook display it. - Add
figsizeandaxeskwargs so it's composable and unit-testable.
Scope
Happy for this to be a follow-up PR rather than expanding this one further — the notebook can keep its inline helper for now, and a follow-up can do the move + cleanups + a test that just asserts run(..., make_figures=True) produces one Figure of the expected shape. Flag if you'd rather roll it in here.
Ben's review from 2026-04-23: PlaceboInTime (checks/placebo_in_time.py): - Add allow_overlap parameter (default False) that enforces non-overlap of pseudo-intervention windows, so random folds no longer violate the hierarchical model's exchangeability assumption by default. - Replace the "retry with a different random_seed" error path with a bounded greedy-selection retry loop (MAX_RANDOM_SELECTION_RETRIES=16) using deterministic sub-seeds derived from random_seed; failure message now names the knobs to relax (allow_overlap, min_gap, n_folds). - Warn when a pre-drawn numpy expected_effect_prior is shorter than the number of replications requested by the assurance simulation, documenting the cycling behaviour in _draw_expected_effect_samples. - Surface allow_overlap=True in __repr__ following the same "non-default only" pattern as selection_method. - Document how intervention_length falls back to data.index.max() - treatment_time when treatment_end_time is unset, shrinking the random-mode eligible window. OutcomeFalsification (checks/outcome_falsification.py): - Warn at run() when storing >= 3 fitted experiments, explaining that store_experiments=False keeps only summary statistics. - Rewrite __repr__ to hide default alpha and store_experiments flags. - Drop dead np.linalg.LinAlgError from the caught-exception tuple and the now-unused numpy import. Docs: - Wire its_place_in_time_analysis.ipynb into the ITS toctree so it stops being an orphan page. - Cross-link the notebook from sensitivity_checks.md under the "Where examples already exist" list. - Tag plot-only cells with hide-input and sampler-heavy cells with hide-output so the rendered page collapses non-essential chunks. Tests: - Pin allow_overlap default, the non-overlap invariant, the _windows_overlap helper for numeric and datetime indices, the allow_overlap opt-out, the bounded-retry reproducibility and exhaustion paths, and the expected-effect-prior cycling warning. - Pin the new OutcomeFalsification __repr__ pattern, the store_experiments memory warning at run() time, and its opt-out and below-threshold paths. - Expand the upstream-bug TODO in test_run_handles_failed_formula. Made-with: Cursor
|
Hi @cetagostini — heads up: this branch contains two commits that touch
The merged solution in #853 takes a slightly different (and reviewed/CI-validated) approach for both:
Suggested next step:
Happy to help walk through the rebase if useful. |
Ben's review from 2026-04-23: PlaceboInTime (checks/placebo_in_time.py): - Add allow_overlap parameter (default False) that enforces non-overlap of pseudo-intervention windows, so random folds no longer violate the hierarchical model's exchangeability assumption by default. - Replace the "retry with a different random_seed" error path with a bounded greedy-selection retry loop (MAX_RANDOM_SELECTION_RETRIES=16) using deterministic sub-seeds derived from random_seed; failure message now names the knobs to relax (allow_overlap, min_gap, n_folds). - Warn when a pre-drawn numpy expected_effect_prior is shorter than the number of replications requested by the assurance simulation, documenting the cycling behaviour in _draw_expected_effect_samples. - Surface allow_overlap=True in __repr__ following the same "non-default only" pattern as selection_method. - Document how intervention_length falls back to data.index.max() - treatment_time when treatment_end_time is unset, shrinking the random-mode eligible window. OutcomeFalsification (checks/outcome_falsification.py): - Warn at run() when storing >= 3 fitted experiments, explaining that store_experiments=False keeps only summary statistics. - Rewrite __repr__ to hide default alpha and store_experiments flags. - Drop dead np.linalg.LinAlgError from the caught-exception tuple and the now-unused numpy import. Docs: - Wire its_place_in_time_analysis.ipynb into the ITS toctree so it stops being an orphan page. - Cross-link the notebook from sensitivity_checks.md under the "Where examples already exist" list. - Tag plot-only cells with hide-input and sampler-heavy cells with hide-output so the rendered page collapses non-essential chunks. Tests: - Pin allow_overlap default, the non-overlap invariant, the _windows_overlap helper for numeric and datetime indices, the allow_overlap opt-out, the bounded-retry reproducibility and exhaustion paths, and the expected-effect-prior cycling warning. - Pin the new OutcomeFalsification __repr__ pattern, the store_experiments memory warning at run() time, and its opt-out and below-threshold paths. - Expand the upstream-bug TODO in test_run_handles_failed_formula. Made-with: Cursor
8342c7d to
76b477c
Compare
|
@drbenvincent all good, I did already! The feedback was implemented already as well. |
|
@drbenvincent quick ping here. All should be good now! |
|
Maybe @juanitorduz can take a look as well this should solve #914 and similars. |
|
NOTE TO SELF: I think this just needs a quick read over the rendered docs from me and @juanitorduz maybe |
|
@drbenvincent @juanitorduz small nudge again! jiji |
juanitorduz
left a comment
There was a problem hiding this comment.
small initial comments
| # ------------------------------------------------------------------ | ||
| # Validation | ||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
remove these comments :)
| # ------------------------------------------------------------------ | ||
| # Factory helper | ||
| # ------------------------------------------------------------------ |
| # ------------------------------------------------------------------ | ||
| # Effect extraction | ||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
same here (and all the following ones)
Detailed Review — PR #826Reviewer: @juanitorduz (via Daimon) This PR adds outcome falsification, random placebo fold selection, model 1. CorrectnessAssurance formula fix (placebo_in_time.py) — good catch, correctly fixed. The old code set
Minor: 2. Code Modularity & Design
Suggestion: consider extracting 3. MaintainabilityException handling in
Minor concern: 4. Type HintsGood overall. Key functions have return type annotations and parameter types. Specific notes:
5. DocstringsModule-level docstrings are excellent.
6. TestsCoverage is thorough and well-organized. The test suite covers:
Specific strengths:
One gap: no test for Another gap: no negative test for 7. NotebookI didn't do a line-by-line review of the notebook, but the commit history shows it's been through extensive iteration with Ben (heading structure, plot sizing, retina rendering, prose clarity, codespell/ruff fixes). The methodology documentation (fossil-fuel substitution case study, calibration plot interpretation, four-question structure) looks well-scaffolded. SummaryThis is a solid PR that adds meaningful sensitivity-analysis capabilities to CausalPy. The assurance formula bug fix is a real correctness improvement. The code is well-structured, the exception handling is thoughtful, and the test coverage is comprehensive. The main suggestions are:
None of these are blockers. The PR is in good shape. |
|
@cetagostini I think these are valid comments :) I suggest addressing them |
Yes, I can do in a few hours. The 5 points by Daimon? |
|
yes, if possible :) |
Re-review — PR #826 (round 2)Reviewer: @juanitorduz (via Daimon) Carlos addressed all five suggestions from the first review. Here's a point-by-point check plus notes on the additional changes. Addressed items
Additional changes (not from my suggestions)Removed section-separator comments ( Enriched
This is a good addition — downstream consumers (report generation, plotting) can now access the design configuration without needing to inspect the check object. ✅
VerdictAll review feedback is addressed. The new tests are thorough and well-motivated. No new issues found. This PR is ready to merge from a code-quality perspective. |
|
Everything address @juanitorduz |
Rename its_place_in_time_analysis.ipynb to its_placebo_in_time_analysis.ipynb and update notebook index and sensitivity_checks cross-links. Merge latest main to stay current with upstream. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@cetagostini sorry about the duration on this one. I had a few minor requests, but I just implemented them so I'm not standing in your way any more. Will merge once remote CI is green |
Summary
.marimo/and.scratch/directories from docstring coverage checks (pre-existing failure at 82.8%)_clonemethods toBayesianBasisExpansionTimeSeriesandStateSpaceTimeSeriesso sensitivity checks can create fresh, unfitted copiesPlaceboInTimewithmin_training_pct,min_gap, andexclude_periodsconstraints; fix assurance simulation to correctly model alternative hypothesis astheta_new + expected_effectOutcomeFalsificationsensitivity check that re-fits experiments with alternative outcome formulas and reports effect sizes with HDI intervals (informational, no pass/fail)its_placebo_in_time_analysis.ipynb) documenting placebo-in-time and outcome-falsification methodology with worked examplesContributes towards #914 (Bayesian assurance / operating-characteristics thinking for design assessment) but does not close it — #914 tracks a separate methodology review of iid Gaussian noise injection in
SyntheticControl.power_analysis().Test plan
OutcomeFalsificationconstruction, validation, and reprOutcomeFalsification.run()with single/multiple formulas and failed formulasPlaceboInTimerandom selection construction, validation, and geometryPlaceboInTimewith random selection modeinterrogatepasses on CI (was failing pre-existing at 82.8%)🤖 Generated with Claude Code