Skip to content

Commit 390129f

Browse files
Add maintainer PR review skill (#920)
* Add maintainer PR review skill Co-authored-by: Cursor <cursoragent@cursor.com> * Fold draft PR review guidance into skill Co-authored-by: Cursor <cursoragent@cursor.com> * Clarify PR-type review outputs Co-authored-by: Cursor <cursoragent@cursor.com> * Address maintainer review skill feedback Co-authored-by: Cursor <cursoragent@cursor.com> * Simplify review skill monitoring guidance Co-authored-by: Cursor <cursoragent@cursor.com> * Rename PR review skill Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent cafd26f commit 390129f

13 files changed

Lines changed: 859 additions & 0 deletions

.github/skills/review-pr/SKILL.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
name: maintainer-pr-review
3+
description: Review CausalPy pull requests end-to-end by classifying PR type, checking branch freshness, mergeability, remote CI, correctness, security, tests, docs, and maintainer concerns. Use when asked to review a PR, assess a branch before merge, summarize PR risks, or request changes.
4+
---
5+
6+
# Maintainer PR Review
7+
8+
Use this skill to evaluate whether a PR is correct, safe, understandable, and merge-ready. This is a review workflow, not primarily a fix workflow.
9+
10+
## Boundary
11+
12+
- If the user asks to review, assess, summarize risks, or decide whether a PR is ready to merge, use this skill.
13+
- If the user asks to make the PR green by fixing CI, conflicts, or review comments, use `pr-to-green` for CausalPy-specific greening work.
14+
- For continuous merge-readiness monitoring, repeat this skill's intake, CI, and comment checks on a cadence. If the repository later adds a dedicated monitoring skill, prefer that.
15+
- If the review uncovers clear, small fixes and the user asked you to fix them, keep changes scoped to the PR's intent and follow the repo's commit and `prek` rules.
16+
- Never post review comments, approve, request changes, or merge through GitHub without explicit human approval.
17+
- Do not duplicate mechanical checks already covered by hooks and CI. If a recurring issue is mechanically enforceable but not enforced, recommend a follow-up issue instead of treating each instance as bespoke review work.
18+
19+
## Intake
20+
21+
Follow `resources/workflow.md` for the full workflow. At a glance:
22+
23+
1. Identify the PR number or current branch, base branch, head branch, and whether the branch tracks a remote.
24+
2. Inspect the local working tree before any git operation and preserve unrelated local changes.
25+
3. Fetch PR metadata, commits, reviews, issue comments, changed files, mergeability, and check summary.
26+
4. Check whether the branch is behind its base and whether GitHub reports conflicts. Do not resolve conflicts as part of review unless explicitly asked; report conflict risk and recommend `pr-to-green` when needed.
27+
5. Check remote CI with `gh pr checks` or the equivalent GitHub command. Distinguish failed, pending, skipped, and missing required checks.
28+
6. Inspect the full PR diff against the base branch, not only the latest commit.
29+
7. Verify contributor claims against code, tests, and branch history before accepting them.
30+
31+
## Classify the PR
32+
33+
Classify the PR by its dominant risk profile, then read the matching resource file. For mixed PRs, read every relevant resource before reviewing.
34+
35+
- Feature implementation: read `resources/pr-type-features.md`.
36+
- Bug fix: read `resources/pr-type-bug-fixes.md`.
37+
- Refactor: read `resources/pr-type-refactors.md`.
38+
- Docs or notebooks: read `resources/pr-type-docs-notebooks.md`.
39+
- Data or dataset changes: read `resources/pr-type-data-datasets.md`.
40+
- Tests, CI, packaging, or infrastructure: read `resources/pr-type-tests-ci-infra.md`.
41+
42+
When classification is unclear, state the likely categories and review against the stricter applicable checklist.
43+
44+
## Deep Dives
45+
46+
Read these when the PR touches the relevant surface:
47+
48+
- CausalPy source-code conventions: `resources/code-patterns.md`.
49+
- Documentation and notebook conventions: `resources/docs-patterns.md`.
50+
- Severity-sorted recurring review patterns: `resources/review-patterns.md`.
51+
- Drafting or posting review comments: `resources/review-comments.md`.
52+
- Updating this skill with recurring patterns: `resources/maintenance.md`.
53+
54+
## Universal Checks
55+
56+
- Correctness: the implementation matches the PR's stated intent, handles important edge cases, and does not introduce silent behavior changes.
57+
- Security and privacy: no secrets, credentials, tokens, private data, unsafe deserialization, command injection, path traversal, or unnecessary network access.
58+
- Causal/statistical accuracy: causal claims, model assumptions, estimands, priors, simulations, and examples are technically accurate and not overstated.
59+
- Public API: released APIs remain compatible unless the PR intentionally changes them and documents the change; new public APIs have explicit signatures and documentation.
60+
- Tests: behavior changes have meaningful tests in `causalpy/tests/`; PyMC-heavy tests use runtime-controlled `sample_kwargs`; no throwaway verification scripts are added.
61+
- Docs: user-facing behavior changes have docs or examples where appropriate; docs follow CausalPy notebook, MyST, glossary, and citation conventions.
62+
- Dependencies and packaging: new dependencies are justified, declared in the right place, and do not duplicate existing tools.
63+
- Performance and runtime: expensive sampling, notebook execution, data loading, and CI changes are bounded and justified.
64+
- Maintainability: the change follows local patterns, avoids broad unrelated refactors, and keeps ownership boundaries clear.
65+
66+
## CausalPy Review Norms
67+
68+
- Before reviewing code, read `AGENTS.md` and relevant local context. For docs-heavy PRs, also inspect `docs/source/notebooks/index.md`; for process-sensitive PRs, inspect `CONTRIBUTING.md` when present.
69+
- Use `$CONDA_EXE run -n CausalPy <command>` for commands that import project code, run tests, build docs, or invoke repo tooling. `AGENTS.md` defines how to detect or set `CONDA_EXE`; if it is unset, inspect that environment guidance before running project commands.
70+
- Use full permissions for commands that import PyMC, PyTensor, or matplotlib to avoid false sandbox failures.
71+
- During review, prefer targeted local checks that match the changed surface. If you edit code or prepare a commit, run `prek run` during iteration and `prek run --all-files` before handoff unless the user explicitly says not to.
72+
- For markdown-only skill or docs changes, a structural read-back may be enough; report when full checks were not run and why.
73+
74+
## Review Output
75+
76+
Lead with findings, ordered by severity. If there are no findings, say so clearly and mention residual risk or checks not run.
77+
78+
Use this structure:
79+
80+
```markdown
81+
## Findings
82+
- [severity] `path`: issue, why it matters, and what should change.
83+
84+
## Merge Readiness
85+
Verdict: approve / request changes / blocked / needs maintainer decision.
86+
Branch status: up to date or behind base; conflicts if any.
87+
CI status: green, failing, pending, skipped, or unavailable.
88+
89+
## PR Summary
90+
One short paragraph describing what the PR changes and why.
91+
92+
## Test Evidence
93+
List local and remote checks observed. Include commands only when they were actually run.
94+
95+
## Open Questions
96+
Only include questions that affect merge readiness or review confidence.
97+
```
98+
99+
When drafting comments for posting, show the draft to the user first and wait for approval. Preserve the distinction between the human maintainer's voice and any agent-authored review text.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# CausalPy Code Patterns
2+
3+
Use this resource when reviewing source-code diffs. It collects CausalPy contracts that are important but not always mechanically enforced.
4+
5+
## `BaseExperiment` Contract
6+
7+
All experiment classes in `causalpy/experiments/` inherit from `BaseExperiment`.
8+
9+
- Declare `supports_ols: bool` and `supports_bayes: bool`.
10+
- If Bayesian is supported, implement `_bayesian_plot()` and `get_plot_data_bayesian()`.
11+
- If OLS is supported, implement `_ols_plot()` and `get_plot_data_ols()`.
12+
- Dispatch by model type with `isinstance(self.model, PyMCModel)` and `isinstance(self.model, RegressorMixin)`.
13+
- Keep the data index named `"obs_ind"`.
14+
- Parse formulas through patsy `dmatrices()`.
15+
- Use project exceptions from `causalpy.custom_exceptions` for formula, data, and index errors.
16+
17+
Review prompts:
18+
19+
- Does a new subclass declare the support flags?
20+
- Does support for OLS/Bayesian match the methods actually implemented?
21+
- Does validation use custom exceptions rather than `assert`?
22+
- Does the new class preserve model-agnostic behavior or clearly reject unsupported models?
23+
24+
## `PyMCModel` Contract
25+
26+
PyMC models inherit from `PyMCModel` and share the public interface `fit()`, `predict()`, `score()`, `calculate_impact()`, and `print_coefficients()`.
27+
28+
- Use `xarray.DataArray` with coordinates such as `coeffs`, `obs_ind`, and `treated_units`.
29+
- Store user-supplied priors on `self._user_priors` so they round-trip through `_clone()`.
30+
- Keep sampling configuration explicit and runtime-controlled for tests and examples.
31+
32+
## `_clone()` Pattern
33+
34+
The base `PyMCModel._clone()` forwards `priors=self._user_priors`. Every override must preserve that behavior or user priors are silently dropped.
35+
36+
```python
37+
def _clone(self):
38+
return type(self)(
39+
sample_kwargs=self.sample_kwargs,
40+
priors=self._user_priors,
41+
# Include any subclass-specific configuration here.
42+
)
43+
```
44+
45+
If `priors=` is absent from a new override, treat it as a must-fix unless there is a strong reason the model cannot accept custom priors.
46+
47+
## Scikit-Learn Compatibility
48+
49+
Scikit-learn models use `RegressorMixin` and are made CausalPy-compatible through `create_causalpy_compatible_class()`. Experiments should not feature-detect arbitrary methods when the local pattern is type-based dispatch.
50+
51+
Review prompts:
52+
53+
- Does an experiment that claims model-agnostic support branch correctly for PyMC and scikit-learn?
54+
- Does the scikit-learn path use numpy arrays and the PyMC path use xarray-shaped outputs?
55+
- Are unsupported combinations rejected clearly?
56+
57+
## Type Hints
58+
59+
Use Python 3.10+ style:
60+
61+
- `X | None`, not `Optional[X]`.
62+
- Lowercase `dict`, `list`, and `tuple`, not `Dict`, `List`, and `Tuple`.
63+
- `Literal` for constrained string parameters.
64+
65+
Any new parameter typed as `str` but documented or implemented as a closed set of values should usually be a `Literal`.
66+
67+
## Custom Exceptions
68+
69+
Prefer:
70+
71+
- `FormulaException` for patsy/formula problems.
72+
- `DataException` for data shape, missing column, or dtype problems.
73+
- `BadIndexException` for index issues.
74+
75+
Plain `ValueError` is acceptable when no project-specific exception fits. `assert` is not input validation.
76+
77+
## `__repr__` Style
78+
79+
Prefer concise representations that show non-default values only, matching sibling classes.
80+
81+
```python
82+
def __repr__(self) -> str:
83+
parts = [f"alpha={self.alpha}"] if self.alpha != 0.05 else []
84+
if not self.store_experiments:
85+
parts.append("store_experiments=False")
86+
return f"OutcomeFalsification({', '.join(parts)})"
87+
```
88+
89+
Flag a new `__repr__` that dumps defaults when nearby classes only show meaningful deviations.
90+
91+
## Memory-Heavy Retainers
92+
93+
`InferenceData`, fitted experiments, posterior arrays, and generated figures can be large. New result/check classes should not retain heavy objects by default unless the common user path needs them.
94+
95+
Review prompt: if a result object stores a fitted model, experiment, or posterior by default, ask whether summary-only should be the default with an opt-in such as `store_experiments=True`.
96+
97+
## Backwards Compatibility
98+
99+
Preserve compatibility for previously released APIs. APIs introduced within the same PR can be reshaped freely before merge. Do not add compatibility shims for unshipped API shapes on the current branch.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# CausalPy Docs Patterns
2+
3+
Use this resource for documentation, notebook, knowledgebase, citation, and MyST review.
4+
5+
## Notebook Structure
6+
7+
New notebooks in `docs/source/notebooks/` should be reachable from the docs tree, usually through `docs/source/notebooks/index.md`.
8+
9+
- ITS examples belong under Interrupted Time Series.
10+
- DiD examples belong under Difference in Differences.
11+
- New method introductions may need a new section.
12+
- Workflow or cross-cutting notebooks may belong under Workflow.
13+
14+
If placement is ambiguous, ask the contributor to confirm rather than guessing.
15+
16+
## Notebook Narrative
17+
18+
Good example notebooks have a clear arc: problem, method, data, analysis, interpretation, caveats.
19+
20+
Review prompts:
21+
22+
- Is the causal question stated before the method is introduced?
23+
- Can a reader who knows causal inference but not CausalPy follow the analysis?
24+
- Are identifying assumptions and caveats visible rather than buried?
25+
- Does the conclusion answer the question posed at the top?
26+
- Are code outputs and prose consistent?
27+
28+
## Cell Tags and Output Noise
29+
30+
Use notebook tags to keep rendered docs readable.
31+
32+
- `hide-input` is useful for plotting cells where matplotlib boilerplate distracts from the figure.
33+
- `hide-output` is useful for sampler progress, long tables, debug prints, or warnings that are not part of the teaching goal.
34+
35+
Visible sampler warnings, divergences, low ESS, or high R-hat values in an educational notebook need attention. Prefer genuine sampler fixes when feasible; hide output only when the warning is acceptable and not central to the narrative.
36+
37+
## External Data
38+
39+
Prefer bundled CausalPy example datasets loaded through `cp.load_data(...)` over runtime fetches from external URLs. If external data is unavoidable, document the dependency and cite the source.
40+
41+
## MyST and Cross-References
42+
43+
Link glossary terms on first mention.
44+
45+
- Markdown and notebooks: `` {term}`glossary term` ``
46+
- RST: `` :term:`glossary term` ``
47+
48+
Use MyST roles for docs cross-references:
49+
50+
- `` {doc}`path/to/doc` ``
51+
- `` {ref}`label-name` ``
52+
53+
Prefer MyST admonitions such as `:::{note}`, `:::{warning}`, and `:::{important}` for callouts instead of plain bold text.
54+
55+
## Citations
56+
57+
Citations live in `docs/source/references.bib`. Example notebooks should include a bibliography block when they cite papers or datasets:
58+
59+
```markdown
60+
:::{bibliography}
61+
:filter: docname in docnames
62+
:::
63+
```
64+
65+
Flag claims or external datasets that need citations but lack bibliography entries.
66+
67+
## Naming and Placement
68+
69+
- Notebook naming pattern: `{method}_{model}.ipynb`, such as `did_pymc.ipynb` or `rd_skl.ipynb`.
70+
- How-to examples belong in `docs/source/notebooks/`.
71+
- Educational and conceptual content belongs in `docs/source/knowledgebase/`.
72+
- Scratch or generated notes belong in `.scratch/`, not tracked docs.
73+
74+
## API Docs
75+
76+
API docs are generated from docstrings via Sphinx autodoc, so public docstrings matter.
77+
78+
- Use NumPy-style sections such as `Parameters`, `Returns`, and `Examples`.
79+
- Examples should be runnable doctests where feasible.
80+
- New public functions with terse or missing examples are usually should-fix, not nits, because the generated API docs inherit that gap.
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Skill Maintenance
2+
3+
Use this resource when updating `maintainer-pr-review` with patterns learned from future reviews.
4+
5+
## When to Add a Pattern
6+
7+
Add a pattern when either condition is true:
8+
9+
- The same issue shape has appeared in two or more PRs.
10+
- The pattern is CausalPy-specific enough that a generic reviewer is unlikely to catch it.
11+
12+
Do not add a pattern just because it appeared once. Skill bloat makes agents skim and weakens review quality.
13+
14+
## Where to Add It
15+
16+
Choose the smallest useful scope:
17+
18+
| Pattern type | File |
19+
|---|---|
20+
| PR-type checklist | The matching PR-type resource file |
21+
| Severity-sorted recurring pattern | `review-patterns.md` |
22+
| CausalPy source-code convention | `code-patterns.md` |
23+
| Docs or notebook convention | `docs-patterns.md` |
24+
| Review workflow or claim verification | `workflow.md` |
25+
| Comment template or posting rule | `review-comments.md` |
26+
27+
If a pattern does not fit any file, consider whether it is truly recurring before adding a new resource.
28+
29+
## Pattern Template
30+
31+
Each recurring pattern should include:
32+
33+
1. A short ID if it belongs in `review-patterns.md`, such as `MF-1`, `SF-3`, or `N-7`.
34+
2. A one-line title.
35+
3. A concrete example or review context, preferably with a PR reference.
36+
4. A "how to spot" prompt that tells the agent exactly what to inspect.
37+
38+
```markdown
39+
### <ID>: <One-line title>
40+
41+
<Two to four sentences explaining the pattern and why it matters.>
42+
43+
- Canonical example: <PR reference and concrete instance>.
44+
- How to spot: <specific diagnostic prompt>.
45+
```
46+
47+
## When to Prune
48+
49+
Prune a pattern when:
50+
51+
- A hook or CI check now enforces it mechanically.
52+
- The pattern has not recurred in a long time and the originating context no longer matters.
53+
- A newer pattern subsumes it.
54+
55+
When a pattern becomes a hook, remove it from the skill at the same time. Keeping both creates drift.
56+
57+
## Owner and Trigger
58+
59+
The owner is the maintainer or agent using this skill when they notice drift, repeated review misses, or a pattern that has become mechanical enough for automation. When that happens, create a GitHub issue against the project with the observed pattern, example PRs, and the proposed resource file to update.
60+
61+
Use these triggers:
62+
63+
- A pattern recurs in two or more PRs.
64+
- A review finds that the skill sent an agent to a nonexistent or local-only dependency.
65+
- A hook or CI check now covers a pattern documented here.
66+
- Roughly ten PR reviews have used this skill since the last maintenance pass.
67+
- Six months have passed since the last maintenance pass.
68+
69+
## Validation for Skill Changes
70+
71+
For meaningful changes to this skill, run formatting checks and do at least one real-world smoke review when feasible. A smoke review can be a focused review of an open PR, an existing review comment thread, or a recently merged PR; record what was reviewed and what the skill helped catch in the PR description.
72+
73+
## Review Cadence
74+
75+
Every six months or after roughly ten PR reviews, check:
76+
77+
1. Are any patterns stale?
78+
2. Are any patterns now hook-enforced?
79+
3. Have new patterns recurred enough to add?
80+
4. Is `SKILL.md` still short enough to act as a router?
81+
5. Are resource links still one level deep and easy to discover?
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Bug Fix PR Review
2+
3+
Use this resource when a PR claims to correct broken behavior, regression, numerical error, documentation bug, CI failure, or user-reported issue.
4+
5+
## Review Focus
6+
7+
- Identify the bug being fixed and the evidence that it existed.
8+
- Check that the root cause is addressed directly rather than masked by a broad fallback, silent exception handling, or special-case patch.
9+
- Confirm the fix is as narrow as possible while preserving intended behavior for adjacent cases.
10+
- Look for compatibility risk: released behavior should remain stable unless the previous behavior was clearly wrong and the change is documented.
11+
- Verify error messages and project-specific exceptions remain helpful and consistent.
12+
- For statistical or causal bugs, check the corrected estimand, contrast, uncertainty interval, indexing, or posterior calculation against the intended definition.
13+
14+
## Required Evidence
15+
16+
- There should be a regression test that fails on the old behavior and passes with the fix whenever practical.
17+
- If the fix is for docs, notebooks, packaging, or CI, the relevant build/check should be run or the reason for not running it should be stated.
18+
- Tests should cover the reported failure mode, not only a broader happy path.
19+
20+
## Review Output Emphasis
21+
22+
When writing the final review, foreground the bug, affected users or workflows, root-cause evidence, whether the patch is too broad or too narrow, whether adjacent behavior changed, and whether a regression test would fail before the fix.
23+
24+
## Request Changes When
25+
26+
- No test or reproducible evidence demonstrates the fixed bug.
27+
- The patch hides the failure without addressing the underlying cause.
28+
- The fix changes unrelated behavior or introduces a new public contract accidentally.
29+
- The PR description, tests, and code disagree about what was broken.
30+
- The fix relies on brittle assumptions about array shape, index order, sampling output, or file paths.

0 commit comments

Comments
 (0)