adding hierarchical its by NathanielF · Pull Request #833 · pymc-labs/CausalPy

NathanielF · 2026-04-09T08:35:44Z

Working on this ticket: #830

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

review-notebook-app · 2026-04-09T08:35:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-04-09T08:42:11Z

Codecov Report

❌ Patch coverage is 96.08434% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.55%. Comparing base (71be9df) to head (db21623).

Files with missing lines	Patch %	Lines
causalpy/pymc_models.py	79.19%	18 Missing and 18 partials ⚠️
...xperiments/hierarchical_interrupted_time_series.py	99.33%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #833      +/-   ##
==========================================
+ Coverage   95.51%   95.55%   +0.03%     
==========================================
  Files          98      100       +2     
  Lines       15870    16870    +1000     
  Branches      931     1037     +106     
==========================================
+ Hits        15159    16120     +961     
- Misses        504      523      +19     
- Partials      207      227      +20

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

read-the-docs-community · 2026-04-09T08:43:50Z

Documentation build overview

📚 causalpy | 🛠️ Build #33583764 | 📁 Comparing db21623 against latest (deb8774)

🔍 Preview build

594 files changed · + 189 added · ± 404 modified · - 1 deleted

+ Added

± Modified

- Deleted

api/generated/causalpy.experiments.base.BaseExperiment.plot.html

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF · 2026-04-10T17:14:23Z

I think this is good for a review.

NathanielF · 2026-04-10T17:24:47Z

Some things to consider when reviewing.

Seasonality is shared, not hierarchical. This is deliberate to control parameter count, but debatable; AR residuals compensate for unit-level deviations.
AR requires balanced panels I could probably do something with masking to handle imbalanced panels, but AR is opt-in via ar_residuals=True.
mu includes AR contribution — impact calculation is valid because AR cancels in the observed−counterfactual difference.
sigma_ar / sigma can trade off — recommend checking pair plots for identification.
Trend and AR can compete on slow drift. hierarchical shrinkage and the deterministic-vs-stochastic distinction should provide soft identification.

drbenvincent · 2026-04-24T21:35:58Z

Code review (claude-opus-4-4-xhigh)

Reviewing the code in this PR only — I have not reviewed docs/source/notebooks/hierarchical_its_launch.ipynb.

A genuinely useful addition — staggered-launch panels with hierarchical pooling and event-study/placebo parameterizations are a real gap in CausalPy, and the PyMC model is well-structured (non-centered everywhere, sensible data-adaptive priors, optional Fourier/AR machinery). The 32 tests pass locally. Below is a critical-but-constructive pass focused on correctness, repo consistency, and test coverage.

Critical issues (correctness)

1. Silent data-corruption bug in AR(1) `within_unit_tidx`

causalpy/experiments/hierarchical_interrupted_time_series.py lines 246–249:

self._n_time_steps = int(counts[0])
# Compute within-unit sequential index (assumes data sorted by unit)
self._within_unit_tidx = np.concatenate(
    [np.arange(self._n_time_steps) for _ in range(n_units)]
)

The comment ("assumes data sorted by unit") admits the assumption, but the code never validates it and never sorts. If a user passes a panel that isn't sorted unit-by-unit (very common — pandas operations often shuffle), within_unit_tidx becomes meaningless and the AR(1) innovations get associated with the wrong (unit, time) cells in ar_resid_matrix[within_unit_tidx_, unit_idx_] (pymc_models.py:2352).

Reproducer on a shuffled panel (no model fit needed):

unit_idx (shuffled order):     [2 0 1 0 2 0 1 1 2]
within_unit_tidx (assigned):   [0 1 2 0 1 2 0 1 2]
within_unit_tidx (correct):    [0 0 0 1 1 2 1 2 2]   # df.groupby(unit).cumcount()

Worse, this even applies to time_col ordering: _n_time_steps only checks counts.min() == counts.max(); a unit with the same number of rows but different timestamps still passes silently.

Fix: sort df by (unit_col, time_col) early in _prepare_data, recompute unit_idx, and derive within_unit_tidx via df.groupby(unit_col).cumcount(). A regression test on a df.sample(frac=1)-shuffled panel would have caught this.

2. `_aux` zeroing skips the pre-launch leads in `placebo`

if self.effect_type == "instant":
    post = self._post
    if not effect_on and post is not None:
        post = np.zeros_like(post)
    aux["post"] = post
else:
    D = self._D
    if not effect_on and D is not None:
        D = np.zeros_like(D)
    aux["D"] = D

For effect_type="placebo", zeroing all of D to construct the counterfactual zeroes both pre-launch leads and post-launch effects. The pre-launch bins are placebos — they belong in the "no-intervention" world. Zeroing them inflates the counterfactual deviation in the pre-period and contaminates the impact trace and any cumulative summaries.

Fix: for placebo the counterfactual mask should keep the pre-launch columns "on" and zero only the post columns (D[:, K_pre:] = 0). A test should assert that impact[pre_period].mean() is small relative to impact[post_period].mean() for a synthetic dataset with no anticipation.

3. `treatment_time_col` is never validated for constancy within a unit

A user could legitimately pass a column with different launch_week values for different rows of the same unit (e.g. accidental join). The model uses the per-row value to compute tau, so inconsistent rows silently produce a meaningless event-time mapping. A simple df.groupby(unit_col)[treatment_time_col].nunique().max() == 1 check belongs in _validate_inputs.

4. `print_coefficients()` is broken on this experiment

BaseExperiment.print_coefficients() calls self.model.print_coefficients(self.labels, …) (base.py:88), and PyMCModel.print_coefficients does az.extract(self.idata.posterior, var_names="beta").sel(treated_units=unit) (pymc_models.py:453). The hierarchical model's beta has dims ["unit", "coeffs"], not ["treated_units", "coeffs"], and the sigma is sigma_beta rather than y_hat_sigma. Calling result.print_coefficients() will raise.

Fix: either override print_coefficients in the experiment to print mu_beta/sigma_beta/per-unit alpha, or document that it's unsupported and raise a clearer error. Add a test either way.

High-value issues (style, API, consistency)

5. Naming inconsistent with every other experiment

Every other experiment uses public method names algorithm() (called from __init__); this PR uses _algorithm() and _prepare_data():

causalpy/experiments/regression_kink.py:90:        self.algorithm()
causalpy/experiments/staggered_did.py:191:        self.algorithm()
…
causalpy/experiments/hierarchical_interrupted_time_series.py:160:        self._algorithm()

This is the lone holdout among 10 experiments. Rename for consistency.

6. `effect_summary` accepts 8 parameters and ignores 7 of them

def effect_summary(
    self,
    *,
    window: ... = "post",
    direction: ... = "increase",
    alpha: float = 0.05,
    cumulative: bool = True,
    relative: bool = True,
    min_effect: float | None = None,
    treated_unit: str | None = None,
    period: ... = None,
    prefix: str = "Post-period",
    **kwargs: Any,
) -> EffectSummary:

Only alpha and prefix are read. The base class requires accepting these args, but the implementation should at minimum:

use direction to compute the appropriate tail probability (right now prob_positive is hardcoded samples > 0 regardless of direction="decrease" — users asking about a decrease get the wrong column),
use min_effect for ROPE,
emit warnings.warn (or raise NotImplementedError) for the truly unsupported ones (window, cumulative, relative, period) so users don't think they did something.

See causalpy/experiments/interrupted_time_series.py for how the existing ITS handles direction/min_effect.

7. Triple-call posterior-predictive sampling

def _algorithm(self) -> None:
    model: HierarchicalLaunchITS = self.model  # type: ignore[assignment]
    model.fit(X=self.X, y=self.y, coords=self._coords, aux=self._aux(effect_on=True))
    self.score = model.score(X=self.X, y=self.y)
    self.observed_pred = model.predict(X=self.X, aux=self._aux(effect_on=True))
    self.counterfactual_pred = model.predict(X=self.X, aux=self._aux(effect_on=False))

model.score(X, y) internally calls model.predict(X) (which runs pm.sample_posterior_predictive), then observed_pred = model.predict(...) runs it again with the same aux. For real (non-mocked) sampling this can dominate fit time. Consider computing score from the cached observed_pred.

A subtler related issue: score calls model.predict(X=self.X) without aux. This works because _pred_aux falls back to self._aux set during fit, but it's implicit and fragile.

8. Side-effecting `init` swallows unknown kwargs

def __init__(self, …, **kwargs: Any) (line 138) collects **kwargs and silently drops them. A typo like seasonalty=… will be silently swallowed. Either drop **kwargs or pass to super().

9. Pytensor `scan` deprecation warning

DeprecationWarning: Scan return signature will change. Updates dict will not be returned…
Pass `return_updates=False` to conform to the new API and avoid this warning

Triggered by pytensor.scan(...) around pymc_models.py:2344. Add return_updates=False so the new code lands without immediately accumulating warnings on AR runs.

10. Standardization of covariates is irreversible and undocumented

if X_values.shape[1] > 0:
    self._x_mean = X_values.mean(axis=0)
    self._x_std = X_values.std(axis=0)
    self._x_std[self._x_std == 0] = 1.0
    X_values = (X_values - self._x_mean) / self._x_std

Posterior mu_beta/sigma_beta are now on the standardized scale, which makes downstream interpretation tricky (an ETable showing beta in standardized units is misleading). Either document this in summary()/effect_summary() or expose unstandardized coefficients (e.g. as Deterministics scaled back).

11. Fourier `_fourier_terms` uses raw `time_col` units

def _fourier_terms(t: np.ndarray, period: float, K: int) -> np.ndarray:
    …
    cols.append(np.sin(2 * np.pi * k * t / period))

period is in the same units as time_col. With unhelpful time_col values (offset integer counters, datetimes coerced via .view('i8'), etc.) this is a footgun. Add a docstring example, and have test_with_seasonality actually assert columns are non-degenerate (currently only checks the variable exists).

Test coverage gaps

The test suite is thoughtful — happy paths plus a TestValidation block covering 7 negative scenarios — but several important behaviours are untested.

Missing tests (high priority)

AR with shuffled rows (point 1 above): test_ar_residuals_unsorted_panel — pass panel.sample(frac=1, random_state=0).reset_index(drop=True) and assert that the model either raises or fits with the correct within_unit_tidx.
Inconsistent treatment_time_col per unit (point 3 above): assert ValueError when one unit has two different launch_week values.
Counterfactual correctness for placebo (point 2 above): synthesize a panel with no true effect, fit placebo, and assert that pre-period impact.mean(("chain","draw")) is near zero.
print_coefficients() doesn't crash (point 4 above): currently has zero coverage.
generate_report() end-to-end — none of the experiment-level reporting machinery is exercised. Other experiments cover this.
get_plot_data_bayesian() / get_plot_data() — not implemented and not tested. The base class default raises NotImplementedError. Either implement it (returning data with predicted/counterfactual/impact columns appended) or test that it raises a clean error.
Maketables integration — result.__maketables_coef_table__ likely fails for the same reason as print_coefficients (it expects treated_units dim). Add a regression test in test_maketables_plugin.py style.
set_maketables_options(hdi_prob=…) — never tested for this experiment.
predictive_for_new_unit shape & finiteness for event_study with seasonality + AR — combined-feature happy paths are completely missing (every test enables one feature in isolation).
model=None path — does the default HierarchicalLaunchITS() get instantiated with sensible defaults? _default_model_class wiring isn't covered.

Missing validation tests

Empty data / single row — pd.DataFrame() and a one-row dataframe.
Single unit — does the hierarchical model degenerate gracefully when n_units == 1?
All units have the same launch_week (i.e. simultaneous treatment).
time_col with NaN / negative values / floats.
Bin edges that don't cover any observations — e.g. bin_edges=[100, 200, 300] for tau in [-10, 30]. _assign_bins returns -1 for everything and D is all zeros. Should this raise?
bin_edges of length 1 (zero bins): K_bins = 0, D becomes shape (n, 0), mu_delta has event_bin size 0. Likely crashes deep in PyMC; needs an explicit check.
placebo_edges overlapping with bin_edges (e.g. placebo_edges=[-4, 2], bin_edges=[0, 4]). Currently silently wrong.
Seasonality with K=0 — produces empty design.
Missing keys in seasonality dict ({"period": 7} without "K") — currently raises KeyError, should raise ValueError with a clearer message.

Test-quality / robustness

The panel fixture is constructed unit-by-unit, so it's already sorted — that masks the AR sorting bug (point 1) entirely. Consider shuffling in the fixture as a default, or adding a parametrized [sorted, shuffled] axis.
_make_panel(true_lift=8.0) is set up but the tests never assert recovered mu_lift is near 8.0 (only existence in the InferenceData). With mock_pymc_sample recovery isn't possible, but a single non-mocked test using small tune=20, draws=20, chains=2 gated behind @pytest.mark.slow would catch a regression where the model accidentally fits the wrong sign or magnitude. See test_pymc_models.py:25 for the established pattern.
test_predictive_unfitted_model uses cls.__new__(cls) to skip __init__. That's brittle — if anyone reorders attributes the test still passes for the wrong reason. Better: call predictive_for_new_unit on a freshly-constructed (unfitted) HierarchicalLaunchITS directly.
No @pytest.mark.integration marker used (the project defines it in pyproject.toml:121). Marking the heavier "fit + plot + summary + report" flow as integration would help CI partitioning.
Codecov reports 35 missing lines on the patch — most are likely the AR branch and the placebo branches in pymc_models.py plus the rarely-exercised plotting branches.

Minor / nits

seasonality: dict | None = None — the inner shape is undocumented ({"period": float, "K": int}). Use a TypedDict for clarity.
_placebo_check_text hardcodes HDI_PROB (94%) for pass/fail, while effect_summary accepts alpha. Share a single threshold.
Docstring at module top says "event-study-style" but the class supports three explicit types — update the wording.
Line 248: np.concatenate([np.arange(n) for _ in range(n_units)]) is np.tile(np.arange(n), n_units).
_fourier_terms returns shape (n, 2K); the label generator emits f0..f(2K-1), which doesn't distinguish sin/cos. Consider f_sin_1, f_cos_1, f_sin_2, … for posterior interpretability.
self.score = model.score(...) overwrites whatever score attribute might be inherited; not a bug, just surprising.

Suggested ordering for the author

Block on: fix points 1–4 above (AR sorting, placebo counterfactual, treatment-time validation, print_coefficients), with a regression test for each.
Should fix before merge: rename _algorithm→algorithm, audit effect_summary ignored args, kill the pytensor scan deprecation warning.
Defer to a follow-up issue: standardization surfacing in summaries, get_plot_data_bayesian/report integration, combined-feature tests, recovery test gated behind @pytest.mark.slow.

The bones of this PR are solid and the modeling is well-thought-out — the AR indexing bug and the placebo counterfactual issue are the two I would not let through without fixes plus tests.

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

drbenvincent

Thanks — this is a solid start on #830 and the hierarchical ITS / event-study direction looks right.

Before merge, two blockers on current main:

Please rebase onto main — the branch is well behind and merge status is blocked.
Add an explicit keyword-only plot() that forwards to _render_plot(). Tests call result.plot(), and generate_report() calls experiment.plot(); current main no longer inherits plot() from BaseExperiment (#886), so this raises AttributeError after rebase.

Non-blocking for this round (follow-ups or a quick fix if easy):

Document placebo-mode counterfactual semantics: when effect_on=False, pre-launch bins stay active and only post-launch bins are zeroed, which affects impact and plot_unit(). Worth a short docstring/note so users interpret those plots correctly.
Consider follow-up issues for get_plot_data_bayesian(), wiring covariate pooling priors (mu_beta / sigma_beta) through the priors API, and an optional flag to disable the hierarchical time trend.

I have not done a thorough manual pass on the new docs notebook yet — I will try to review the RTD page in the next few days, so there may be some presentational feedback to follow.

drbenvincent · 2026-07-08T12:02:18Z

@NathanielF just a ping on this - whenever you get time, there are some change requests

NathanielF · 2026-07-08T12:07:21Z

Oh sorry. Completely forgot about this.

Will look at the weekend

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

daimon-pymclabs · 2026-07-10T22:36:19Z

Review of the latest changes (branch @ `eb32475`)

Checked out the branch, diffed against the April review state, ran the suite.

Short version: the changes land essentially all of the requested work, tests pass (61/61), and a new saturation parameterization was added on top.

(a) Against the requested changes

The April code review listed 4 correctness blockers, 7 should-fix items, and a test-gap list. The June review added 2 hard blockers. Status:

#	Requested change	Status
1	AR `within_unit_tidx` corrupts on unsorted panels	Fixed. `_prepare_data` sorts by `(unit_col, time_col)` and derives the index via `groupby().cumcount()`. Regression test `test_ar_residuals_unsorted_panel`.
2	Placebo counterfactual zeroed pre-launch leads	Fixed. `_aux` copies `D` and zeroes only `D[:, n_pre_bins:]` for placebo. Test `test_placebo_counterfactual_keeps_pre_bins`.
3	`treatment_time_col` not validated for constancy	Fixed. groupby-nunique check. Test `test_inconsistent_treatment_time_per_unit`.
4	`print_coefficients()` crashed (wrong dims)	Fixed. Overridden in the experiment; prints `mu_beta`/`sigma_beta` plus per-effect-type params. Tests per effect type.
5	`_algorithm` naming holdout	Renamed to `algorithm()`, matching the other 9 experiments.
6	`effect_summary` ignored 7 of 8 args	Handled. `direction` drives `prob_positive`/`prob_negative`/two-sided; genuinely-unsupported args (`window`, `cumulative`, `min_effect`, `period`) emit `warnings.warn`.
7	Triple posterior-predictive sampling	Fixed via new `score_from_prediction()`, reusing `observed_pred`.
8	`__init__` swallowed unknown kwargs	`**kwargs` removed.
9	pytensor `scan` deprecation	`return_updates=False` at `pymc_models.py:3030`.
10	Standardization not surfaced	`print_coefficients` prints an "original scale" block dividing back through `_x_std`, and it's documented.
11	Fourier labels + seasonality footguns	Labels now `f_sin_k`/`f_cos_k`; added validation for missing keys, non-positive period, `K<1`.
June-1	Rebase onto `main`	Done (merge `d17b76f`).
June-2	Explicit keyword-only `plot()` forwarding to `_render_plot()`	Added, with `ci_prob` validation and a full docstring.

Extra validation beyond the asks: bin_edges/placebo_edges length checks, placebo/bin overlap detection, and an empty-bin guard (D.sum() == 0 raises). Test count went from 32 to 61.

Two items explicitly deferred to follow-ups remain open (fine per the original note): get_plot_data_bayesian() is not implemented (falls through to the base-class NotImplementedError, not tested for a clean raise), and there's no flag to disable the hierarchical time trend. Suggest a one-line test asserting get_plot_data() raises cleanly, or a tracking issue.

(b) Module functionality

Scope is now four effect parameterizations. The new saturation type models the post-launch effect as a per-unit Hill curve (ceiling L, half-saturation time k, exponent s, all hierarchical on the log scale) — a real addition over instant/event-study/placebo, not just a review fix. It's wired through consistently: _aux feeds tau_since = clip(tau, 0, None), and summary/print_coefficients/predictive_for_new_unit/plot all branch on it. test_recovers_saturation_parameters fits (non-mocked) and checks recovery, matching the recovery-test pattern the review asked for.

The AR-cancels-in-the-difference argument holds up given the counterfactual now shares the same AR draws; the placebo fix is what makes the impact trace trustworthy in the pre-period. Standardization is internal and now disclosed at print time, so downstream beta interpretation is no longer silently misleading.

Verification

pytest causalpy/tests/test_hierarchical_its.py: 61 passed in ~3.5 min (the coverage-gate "fail" is just the project-wide 80% threshold tripping when running a single file). Codecov reports 96% patch coverage.

Read: the blockers are resolved with a regression test behind each, and functionality expanded cleanly. Ready for a re-review. Only nudge before merge is a clean-raise test (or issue) for get_plot_data_bayesian, the one base-class method left implicit.

🤖 Posted on behalf of @NathanielF via Claude Code.

drbenvincent · 2026-07-14T10:04:12Z

Automated triage

Recommendation: review:high — no decision gate identified.

Why:

Major new feature: HierarchicalInterruptedTimeSeries experiment class with 1022-line new module, 487 lines added to pymc_models.py.
New public API surface, new notebook/gallery entry, new test file (756 lines).
All CI checks pass. PR is conflicted (DIRTY). Review already has requested changes.

Review focus:

Verify the hierarchical model specification is correct (pooling structure, priors, random effects).
Check that the new maketables_adapters.py changes integrate properly.
Confirm the gallery entry and notebook follow the project documentation conventions.

Confidence: high

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF · 2026-07-14T14:37:15Z

This is no longer DIRTY @daimon-pymclabs , @drbenvincent and should be good

adding hierarchical its

fe16774

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF added 6 commits April 9, 2026 09:48

making more robust tests

3115d32

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

improve test

b9bad58

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

adding time trend

ae039b5

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

Adding hierarchical AR process

e3619a6

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

improve testing and add counterfactual plot

832194b

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

update full notebook run.

e5e7392

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF marked this pull request as ready for review April 10, 2026 17:10

NathanielF requested review from drbenvincent and ricardoV94 April 10, 2026 17:14

drbenvincent added enhancement New feature or request major labels Apr 13, 2026

NathanielF added 2 commits April 26, 2026 09:28

update with suggested changes

6a45359

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

adding tests

657f54d

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

drbenvincent mentioned this pull request Apr 27, 2026

Add HierarchicalLinearRegression for Hierarchical DiD #860

Open

drbenvincent requested changes Jun 8, 2026

View reviewed changes

NathanielF added 6 commits July 10, 2026 20:35

update with feedback and saturation option

45ed299

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

Merge branch 'main' into hierarchical_its_event_study

d17b76f

fix precommit checks

e4d44aa

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tidying

c64f3f1

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

further tidying

912a7b8

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

tidying further

eb32475

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

NathanielF requested a review from drbenvincent July 10, 2026 22:37

drbenvincent added the review:high High-impact change requiring thorough human review label Jul 14, 2026

NathanielF added 2 commits July 14, 2026 09:58

Merge branch 'main' into hierarchical_its_event_study

4c0a87b

Update to fix pre-commit

db21623

Signed-off-by: Nathaniel <NathanielF@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding hierarchical its#833

adding hierarchical its#833
NathanielF wants to merge 17 commits into
pymc-labs:mainfrom
NathanielF:hierarchical_its_event_study

NathanielF commented Apr 9, 2026

Uh oh!

review-notebook-app Bot commented Apr 9, 2026

Uh oh!

codecov Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

read-the-docs-community Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

NathanielF commented Apr 10, 2026

Uh oh!

NathanielF commented Apr 10, 2026

Uh oh!

drbenvincent commented Apr 24, 2026 •

edited

Loading

Uh oh!

drbenvincent left a comment

Uh oh!

drbenvincent commented Jul 8, 2026

Uh oh!

NathanielF commented Jul 8, 2026

Uh oh!

daimon-pymclabs commented Jul 10, 2026

Uh oh!

drbenvincent commented Jul 14, 2026

Uh oh!

NathanielF commented Jul 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NathanielF commented Apr 9, 2026

Uh oh!

review-notebook-app Bot commented Apr 9, 2026

Uh oh!

codecov Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

read-the-docs-community Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

NathanielF commented Apr 10, 2026

Uh oh!

NathanielF commented Apr 10, 2026

Uh oh!

drbenvincent commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review (claude-opus-4-4-xhigh)

Critical issues (correctness)

1. Silent data-corruption bug in AR(1) within_unit_tidx

2. _aux zeroing skips the pre-launch leads in placebo

3. treatment_time_col is never validated for constancy within a unit

4. print_coefficients() is broken on this experiment

High-value issues (style, API, consistency)

5. Naming inconsistent with every other experiment

6. effect_summary accepts 8 parameters and ignores 7 of them

7. Triple-call posterior-predictive sampling

8. Side-effecting __init__ swallows unknown kwargs

9. Pytensor scan deprecation warning

10. Standardization of covariates is irreversible and undocumented

11. Fourier _fourier_terms uses raw time_col units

Test coverage gaps

Missing tests (high priority)

Missing validation tests

Test-quality / robustness

Minor / nits

Suggested ordering for the author

Uh oh!

drbenvincent left a comment

Choose a reason for hiding this comment

Uh oh!

drbenvincent commented Jul 8, 2026

Uh oh!

NathanielF commented Jul 8, 2026

Uh oh!

daimon-pymclabs commented Jul 10, 2026

Review of the latest changes (branch @ eb32475)

(a) Against the requested changes

(b) Module functionality

Verification

Uh oh!

drbenvincent commented Jul 14, 2026

Automated triage

Uh oh!

NathanielF commented Jul 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 9, 2026 •

edited

Loading

read-the-docs-community Bot commented Apr 9, 2026 •

edited

Loading

drbenvincent commented Apr 24, 2026 •

edited

Loading

1. Silent data-corruption bug in AR(1) `within_unit_tidx`

2. `_aux` zeroing skips the pre-launch leads in `placebo`

3. `treatment_time_col` is never validated for constancy within a unit

4. `print_coefficients()` is broken on this experiment

6. `effect_summary` accepts 8 parameters and ignores 7 of them

8. Side-effecting `init` swallows unknown kwargs

9. Pytensor `scan` deprecation warning

11. Fourier `_fourier_terms` uses raw `time_col` units

Review of the latest changes (branch @ `eb32475`)