pymc-labs · drbenvincent · Apr 24, 2026 · Apr 24, 2026
diff --git a/.github/skills/pr-review/SKILL.md b/.github/skills/pr-review/SKILL.md
@@ -0,0 +1,75 @@
+---
+name: pr-review
+description: Review CausalPy pull requests — fetch context, run a structured pass over code, tests, docs, and process, then produce maintainer-quality comments for human review before posting. Use when the user asks to review a PR, check a PR, evaluate review feedback, or audit PR comments. Complements pr-to-green (which fixes PRs) and pr-workflows (which creates them).
+---
+
+# PR Review
+
+This skill is for **reviewing** a CausalPy pull request — diagnosing what's there, judging it against project conventions, and producing actionable feedback. It is the read/diagnose/write-comment counterpart to `pr-to-green` (fix in-flight PR) and `pr-workflows` (create PR from issue).
+
+## When to use
+
+- The user asks to review, check, or evaluate a PR (e.g. "review #826", "look at PR 826", "what's outstanding on PR 826").
+- The user asks whether a PR is ready to approve, merge, or land.
+- The user asks to evaluate a contributor's response to prior review feedback ("did they address my comments?").
+- The user asks to audit existing comments on a PR for relevance, accuracy, or remaining gaps.
+
+Do **not** use this skill for:
+
+- Fixing a failing PR — use [`pr-to-green`](../pr-to-green/SKILL.md).
+- Turning an issue into a PR — use [`pr-workflows`](../pr-workflows/SKILL.md).
+- Creating new issues that come out of a review — delegate to [`github-issues`](../github-issues/SKILL.md).
+
+## Division of labour with automation
+
+A pre-commit hook never misses; an agent review is judgement-heavy. Don't duplicate the hook's job. Mechanical checks already enforced by `prek` / CI (lint, format, mypy, schema validation, etc.) are not part of this skill — assume they pass and focus the human-equivalent attention on what they can't catch.
+
+If during review you notice a *class* of issue that automation could catch but doesn't, file it as a follow-up issue via [`github-issues`](../github-issues/SKILL.md) rather than only flagging the one instance.
+
+## Workflow
+
+The full workflow is in [reference/workflow.md](reference/workflow.md). At a glance:
+
+1. **Establish context** — fetch PR metadata, all commits, all reviews, all comments (review + issue threads), file changes.
+2. **Read the relevant subset** — files changed, plus surrounding context for each change (base class for new subclass, sibling tests for new tests, etc.).
+3. **Pass over the diff with the patterns checklist** — see [reference/what-to-look-for.md](reference/what-to-look-for.md). Group findings by severity.
+4. **Verify claims** — if the contributor says "addressed all feedback", confirm against the actual code; if they say a rebase is clean, confirm against `git log`.
+5. **Draft comments for human review** — never post directly. See [reference/posting-comments.md](reference/posting-comments.md).
+6. **Post on approval** and report URLs back.
+
+For batched PR work (multi-file diffs, parallel exploration), follow `pr-to-green`'s subagent delegation patterns — same `Task` triggers (`generalPurpose` for exploration, `ci-failure-investigator` for noisy CI logs) apply.
+
+## What to look for
+
+Severity-sorted patterns (must-fix → should-fix → nits) live in [reference/what-to-look-for.md](reference/what-to-look-for.md). The two domain-specific deep-dives are:
+
+- [reference/code-patterns.md](reference/code-patterns.md) — `BaseExperiment` contract, `PyMCModel` vs. `RegressorMixin` dispatch, `_clone()` patterns, `Literal` for constrained strings, custom exceptions, `__repr__` style, memory-heavy retainers.
+- [reference/docs-patterns.md](reference/docs-patterns.md) — notebook narrative quality, glossary linking, `:::{note}` admonitions, hide-input/hide-output cell tags, citations, sampler-output noise, helper-promotion judgement.
+
+## Drafting comments
+
+The full template is in [reference/posting-comments.md](reference/posting-comments.md). Two non-negotiable rules:
+
+1. **Always draft for human review before posting.** Read the draft back and offer 2–3 framing variants (stricter / softer / approval-pending).
+2. **Cite specific code locations.** Use line:line:filepath references when calling out code, not vague paraphrases.
+
+## Mandatory pre-flight reads
+
+Before reviewing any PR, the reviewing agent should have read or be able to load:
+
+- `AGENTS.md` (root) — code structure, style, type hints, testing preferences.
+- `CONTRIBUTING.md` — process expectations.
+- `docs/source/notebooks/index.md` — toctree layout, where new notebooks should go.
+- The PR's own description and any prior review iterations.
+
+## How to extend this skill
+
+When a review surfaces a new recurring pattern, add it. The process is in [reference/how-to-extend.md](reference/how-to-extend.md). Skill drift is real — every six months or so, prune patterns that have been formalised into hooks.
+
+## Important behaviour
+
+- Never post review comments without explicit human approval.
+- Never approve, request changes, or merge a PR through `gh pr review --approve` unless explicitly instructed.
+- When evaluating a contributor's response to prior feedback, verify against the code, not their summary. A "fixed all feedback" claim is a hypothesis to test, not a fact to accept.
+- If a reviewer-bot identity has been used in the past on the PR (e.g. `claude-opus-4-7-xhigh`), preserve that attribution in posted comments so the human reviewer's voice and the agent's voice stay distinguishable.
+- Do not duplicate work already covered by hooks/CI. If something is mechanically enforceable and not enforced, file an issue rather than re-flagging the instance.
diff --git a/.github/skills/pr-review/reference/code-patterns.md b/.github/skills/pr-review/reference/code-patterns.md
@@ -0,0 +1,108 @@
+# CausalPy Code Patterns
+
+Repo-specific conventions and contracts to check against when reviewing source-code diffs. These are the patterns most likely to be inadvertently broken by a contributor (especially a new one) because they're enforced by convention rather than by the type system.
+
+## `BaseExperiment` contract
+
+All experiment classes in `causalpy/experiments/` inherit from `BaseExperiment`. Required:
+
+- Class attributes `supports_ols: bool` and `supports_bayes: bool` declared.
+- If `supports_bayes`: implement `_bayesian_plot()` and `get_plot_data_bayesian()`.
+- If `supports_ols`: implement `_ols_plot()` and `get_plot_data_ols()`.
+- Model-agnostic dispatch via `isinstance(self.model, PyMCModel)` vs. `isinstance(self.model, RegressorMixin)`.
+- Data index named `"obs_ind"`.
+- Formula parsing via patsy `dmatrices()`.
+- Custom exceptions from `causalpy.custom_exceptions`: `FormulaException`, `DataException`, `BadIndexException`.
+
+Review prompts:
+
+- New `BaseExperiment` subclass? Check both class attributes are declared.
+- Implements only `_bayesian_plot` but `supports_ols=True`? That's a bug.
+- Uses `assert` for input validation? Should be a custom exception.
+
+## `PyMCModel` contract
+
+PyMC models inherit from `PyMCModel` (extends `pm.Model`). Common interface: `fit()`, `predict()`, `score()`, `calculate_impact()`, `print_coefficients()`.
+
+- Data: `xarray.DataArray` with coords (`coeffs`, `obs_ind`, `treated_units`).
+- Stores user-supplied priors on `self._user_priors` for round-trip via `_clone()`.
+
+## `_clone()` pattern
+
+The base `PyMCModel._clone()` forwards `priors=self._user_priors`. Every override **must** do the same, or user customisations are silently dropped on clone.
+
+Spot-check template:
+
+```python
+# In the new subclass:
+def _clone(self):
+    return type(self)(
+        sample_kwargs=self.sample_kwargs,
+        priors=self._user_priors,   # <-- must be present
+        # ... any other config kwargs the subclass adds ...
+    )
+```
+
+If `priors=` is absent, that's a must-fix (MF-1 in [what-to-look-for.md](what-to-look-for.md)).
+
+## `RegressorMixin` (sklearn) compatibility
+
+Scikit-learn models use `RegressorMixin` and are made causalpy-compatible via `create_causalpy_compatible_class()`. Same external interface as `PyMCModel`. Data is numpy arrays, not xarray.
+
+Review prompt: experiment classes should branch with `isinstance(self.model, PyMCModel)` vs. `isinstance(self.model, RegressorMixin)` rather than feature-detect on the model.
+
+## `CheckResult` contract
+
+Lives in `causalpy/checks/base.py`. Has a documented `figures: list[Any]` field — but as of writing, no check populates it. If a new check defines a non-trivial plotting helper in a notebook (see [what-to-look-for.md § N-7](what-to-look-for.md#n-7-helper-used-n-times-in-a-notebook-is-library-material)), promoting it to populate `CheckResult.figures` is the natural integration point.
+
+`passed=None` is the right design for informational-only checks (e.g. `OutcomeFalsification`) — they should inform, not gate.
+
+## Type hints
+
+`AGENTS.md` is explicit. Required style (Python 3.10+):
+
+- `X | None`, not `Optional[X]`.
+- Lowercase `dict`, `list`, `tuple`, not `Dict`, `List`, `Tuple` from `typing`.
+- `Literal` for constrained string parameters (see [what-to-look-for.md § MF-2](what-to-look-for.md#mf-2-constrained-string-parameter-not-typed-as-literal)).
+
+## Custom exceptions
+
+Use, in order of preference:
+
+- `FormulaException` — patsy/formula problems.
+- `DataException` — data-shape, missing-column, dtype problems.
+- `BadIndexException` — index issues.
+
+Plain `ValueError` is acceptable when nothing more specific fits; `assert` is not.
+
+## `__repr__` style
+
+Convention: show non-default values only.
+
+```python
+def __repr__(self) -> str:
+    parts = [f"alpha={self.alpha}"] if self.alpha != 0.05 else []
+    if not self.store_experiments:
+        parts.append("store_experiments=False")
+    return f"OutcomeFalsification({', '.join(parts)})"
+```
+
+A new class with a `__repr__` that shows defaults is inconsistent with sibling classes — flag as N-1.
+
+## Helper promotion
+
+When a docs notebook defines a non-trivial helper (≥50 lines) called ≥3 times, suggest promoting to the library. Standard cleanup checklist before promotion:
+
+1. Drop UI side-effects: no `print()` in library code (use `warnings.warn`); no `plt.show()` (return the `Figure`).
+2. Drop hard-coded constants that limit reuse: e.g. `FOLD_COLORS = [...5 colors...]` should become a module-level constant or use matplotlib's color cycle.
+3. Add `figsize` and `axes` kwargs so the function is composable in user code and unit-testable.
+4. Reduce arg surface: if two args are always paired, fold one into the other (or store the dependency on the producing class so the consumer takes one arg, not two).
+5. Add a tiny test that asserts shape: `assert isinstance(out, Figure)`, `assert len(out.axes) == expected`.
+
+## Memory-heavy retainers
+
+`InferenceData` is large. Any class that retains it by default imposes a hidden cost. Default to summary-only with an opt-in (`store_X=True`) for users who want to inspect.
+
+## Backwards compatibility
+
+`AGENTS.md`: maintain compatibility for previously-released APIs only; APIs introduced in the same PR can be reshaped freely. So new public APIs (added in the PR being reviewed) are legitimate review targets for "I'd rather it look like X" feedback; APIs that already shipped are not.
diff --git a/.github/skills/pr-review/reference/docs-patterns.md b/.github/skills/pr-review/reference/docs-patterns.md
@@ -0,0 +1,115 @@
+# CausalPy Docs Patterns
+
+Repo-specific conventions for notebooks (`docs/source/notebooks/`) and knowledgebase pages (`docs/source/knowledgebase/`). Mechanical rules are enforced by `prek` / CI hooks; this file is for the judgement-heavy patterns a hook can't catch.
+
+## Notebook structure
+
+### `toctree` placement
+
+A new notebook in `docs/source/notebooks/` must be referenced from somewhere — typically `docs/source/notebooks/index.md`. Cross-link from related prose pages (e.g. `sensitivity_checks.md`) where applicable.
+
+Section assignment is a curation question:
+
+- ITS examples → `Interrupted Time Series` toctree.
+- DiD examples → `Difference in Differences`.
+- New method introduction → its own toctree section.
+- Workflow / cross-cutting → `Workflow` section.
+
+If the placement isn't obvious, ask the contributor to confirm rather than guessing.
+
+## Notebook content
+
+### Pedagogical structure
+
+Good notebooks have a recognisable arc: problem → method → data → analysis → interpretation → caveats. Reviews should call out when the arc is missing or scrambled.
+
+When evaluating pedagogy, useful prompts:
+
+- Can a reader who knows causal inference but not CausalPy follow it?
+- Is the *causal question* stated before the method is introduced?
+- Are caveats and assumptions called out, not buried?
+- Does the conclusion answer the question posed at the top?
+
+### Cell tags
+
+Two tags matter for readability:
+
+- **`hide-input`** on plotting cells — collapses the matplotlib boilerplate so the rendered docs show the plot output prominently.
+- **`hide-output`** on cells with verbose output (sampler progress, large tables, debug prints) — collapses the noise while keeping the cell runnable.
+
+Review prompt: any cell that produces ≥10 lines of sampler/progress output without `hide-output` is a candidate for the tag.
+
+### Sampler warnings
+
+Visible sampler warnings (divergences, low ESS, R̂ > 1.01) in a docs notebook are confusing for readers. Two fixes:
+
+1. Tune the sampler config (more chains, more tuning steps) to make the warnings go away genuinely.
+2. Hide them under `hide-output` if the warnings aren't a core part of the narrative.
+
+(1) is preferred when feasible. (2) is acceptable when the analysis is fundamentally about a borderline case.
+
+### External data
+
+Notebooks should prefer bundled CausalPy example datasets (loaded via `cp.load_data(...)`) over runtime fetches from external URLs. If an external fetch is unavoidable, document it with a comment so readers know what they're depending on.
+
+## MyST and cross-references
+
+### Glossary linking
+
+`AGENTS.md`: link to glossary terms (defined in `glossary.rst`) on first mention in a file.
+
+- Markdown / `.ipynb`: `` {term}`glossary term` ``
+- RST: `` :term:`glossary term` ``
+
+If a notebook introduces concepts like "treatment effect", "potential outcomes", "propensity score", etc., the first mention should be a glossary link. Subsequent mentions don't need to link.
+
+### Cross-references
+
+In Markdown, use MyST role syntax:
+
+- `` {doc}`path/to/doc` `` — link to another docs page.
+- `` {ref}`label-name` `` — link to a labelled section.
+
+Don't use raw markdown links (`[text](path/to/doc.md)`) for cross-doc references — MyST roles get full Sphinx resolution and break-link detection.
+
+### Admonitions
+
+Use MyST `:::{note}`, `:::{warning}`, `:::{important}` for callouts. Plain bold/italic emphasis is not the same — admonitions render as styled boxes.
+
+### Citations
+
+Citations live in `docs/source/references.bib`. Cite sources in example notebooks where possible. Include a reference section at the bottom of notebooks using:
+
+```markdown
+:::{bibliography}
+:filter: docname in docnames
+:::
+```
+
+If a new notebook has prose claims like "as shown in Smith et al. (2020)" without a corresponding `references.bib` entry and `:::{bibliography}` block, flag it.
+
+## Naming and organisation
+
+### Notebook filenames
+
+Pattern: `{method}_{model}.ipynb`.
+
+- `did_pymc.ipynb`, `rd_skl.ipynb`, `iv_pymc.ipynb`, `sc_pymc_brexit.ipynb`.
+- Composite topics: keep the prefix scheme so `:glob:` patterns would still work even though no section uses one today.
+
+### File placement
+
+- How-to examples → `docs/source/notebooks/`.
+- Educational / conceptual content → `docs/source/knowledgebase/`.
+- Scratch / temporary → `.scratch/` (untracked).
+
+A notebook that's mostly explaining a concept (rather than demonstrating CausalPy usage) probably belongs in `knowledgebase/`, not `notebooks/`.
+
+## API docs
+
+Auto-generated from docstrings via Sphinx autodoc. No manual API docs to write — but the docstrings themselves matter:
+
+- NumPy-style sections (`Parameters`, `Returns`, `Examples`).
+- `Examples` blocks should be runnable doctests where feasible (`make doctest` in CI).
+
+If a new public function has no `Examples` block, that's a should-fix, not a nit, because the API docs page will be terse.