|
| 1 | +--- |
| 2 | +name: mathlib-contribution |
| 3 | +description: > |
| 4 | + Guide for writing Mathlib4 (Lean 4) contributions that pass upstream review on the |
| 5 | + first pass. Use this whenever the user is adding or editing Lean code in a Mathlib |
| 6 | + repository — new theorems, lemmas, definitions, instances, structures, or files — or |
| 7 | + preparing/structuring a Mathlib pull request, naming a lemma, writing docstrings, |
| 8 | + choosing attributes (@[simp], @[ext], @[to_additive], @[simps]), generalizing |
| 9 | + hypotheses, or asking how to make Lean code conform to Mathlib conventions and avoid |
| 10 | + reviewer nitpicks. Trigger it even when the user does not say "Mathlib" explicitly but |
| 11 | + is clearly working in a Lean mathematics library (files under `Mathlib/`, a mathlib4 |
| 12 | + clone) or mentions Lean theorem/proof style, naming conventions, golfing, or getting a |
| 13 | + Lean PR review-ready. Prefer this over generic Lean help for anything destined for |
| 14 | + Mathlib. |
| 15 | +--- |
| 16 | + |
| 17 | +# Writing Mathlib contributions that pass review |
| 18 | + |
| 19 | +Help the user produce Lean 4 code that a Mathlib reviewer can approve on the first pass. |
| 20 | + |
| 21 | +Mathlib has unusually high and unusually *specific* standards: generality, naming, formatting, |
| 22 | +documentation, and API design are all scrutinized. New contributors lose most of their time to |
| 23 | +**avoidable** reviewer nitpicks and to re-proving things that already exist. The point of this |
| 24 | +skill is to bake those standards in *while writing*, so the diff the reviewer sees is already |
| 25 | +idiomatic and the round-trips are short. |
| 26 | + |
| 27 | +This skill carries two things: the distilled rules (here and in `references/`), and a catalog of |
| 28 | +real *wrong → right* corrections mined from recent merged PRs (`references/review-catalog.md`). |
| 29 | +Apply the rules as you write; consult the catalog and references for depth and justification. |
| 30 | + |
| 31 | +> Apply this skill's conventions to the Lean code you write, but do **not** silently restructure a |
| 32 | +> user's unrelated existing code. Suggest improvements; let the user decide. |
| 33 | +
|
| 34 | +## Workflow |
| 35 | + |
| 36 | +Move through four phases. They are not rigid gates, but **Phase 1 is the single highest-leverage |
| 37 | +step** — skipping it is the most common cause of wasted work and closed PRs. |
| 38 | + |
| 39 | +### Phase 1 — Before writing any code |
| 40 | + |
| 41 | +The most common reviewer response to a newcomer is *"this already exists as `X`"* or *"this should |
| 42 | +be more general."* Pre-empt both: |
| 43 | + |
| 44 | +- **Search for prior art first.** Put the target statement in a scratch file with `import Mathlib` |
| 45 | + and try `exact?` / `apply?` / `rw?`; search [loogle](https://loogle.lean-lang.org) and the docs. |
| 46 | + Mathlib aspires to generality and avoids duplication, so the thing you want often already exists |
| 47 | + under a non-obvious name (a vector space is `Module`, a group hom is `MonoidHom`). |
| 48 | +- **Aim for the weakest hypotheses** that still make the statement true (most general typeclasses, |
| 49 | + fewest assumptions). This is the number-one design request in review. |
| 50 | +- **Find the right home.** Use `#find_home` to locate the correct file; put declarations where |
| 51 | + they belong, not where it is convenient. Watch for import creep (don't pull `Analysis` into an |
| 52 | + `Algebra` file). |
| 53 | +- **Prove the right primitive.** Prefer the general equality/characterization over a derived |
| 54 | + special-case inequality — the weaker results then follow trivially. |
| 55 | +- **Confirm fit and disclose AI.** Mathlib is *not* "all of mathematics"; if scope is unclear, ask |
| 56 | + on Zulip first. If AI was used, the PR description must say which tool and how, and add the |
| 57 | + `LLM-generated` label for substantial AI code. See `references/pr-process.md` — this is enforced. |
| 58 | +- **Keep it small.** Many small, self-contained PRs beat one large PR. |
| 59 | + |
| 60 | +### Phase 2 — Write it idiomatically |
| 61 | + |
| 62 | +Apply the rules below inline as you write. The full rules and rationale live in `references/`. |
| 63 | +The condensed, highest-frequency rules are in [The rules that matter most](#the-rules-that-matter-most). |
| 64 | + |
| 65 | +### Phase 3 — Self-review against the catalog |
| 66 | + |
| 67 | +Before calling it done, read `references/review-catalog.md` and check your diff against it — these |
| 68 | +are the exact things reviewers ask people to change. Grep your diff for the cheap mechanical |
| 69 | +violations: `erw`, `λ`, `$`, `Type _`, empty lines inside proofs, `:= by` not ending the statement |
| 70 | +line, theorem names in camelCase, trailing periods in the PR subject. |
| 71 | + |
| 72 | +### Phase 4 — Package the pull request |
| 73 | + |
| 74 | +- Title: `type(scope): subject` (imperative, lowercase start, no trailing period; scope omits the |
| 75 | + `Mathlib/` prefix). Description gives motivation; questions/notes go below a `---` line. |
| 76 | +- Ensure CI is green (build + linters). Run `lake exe mk_all` if you added files. |
| 77 | +- Run `!bench` (comment on the PR) if you added/changed `simp` lemmas, instances, imports, or defs. |
| 78 | +- Add `@[deprecated (since := "YYYY-MM-DD")] alias` for any renamed/removed *existing* public name. |
| 79 | + |
| 80 | +Details: `references/pr-process.md`. |
| 81 | + |
| 82 | +## The rules that matter most |
| 83 | + |
| 84 | +These are the conventions most frequently flagged in real reviews. Each is one line with the |
| 85 | +*why*, because understanding the reason lets you apply it correctly in new situations. |
| 86 | + |
| 87 | +**Naming** (full rules + symbol dictionary in `references/naming.md`) |
| 88 | +- Theorem/proof names are `snake_case`; types/structures/classes are `UpperCamelCase`; other terms |
| 89 | + are `lowerCamelCase`. A function is named like its return value. (Wrong: `hasFiniteProductsOfX`.) |
| 90 | +- Translate symbols with the standard dictionary (`+`→`add`, `*`→`mul`, `⁻¹`→`inv`, `∣`→`dvd`, |
| 91 | + `∘`→`comp`, `≤`→`le`, …); use `one`/`mul` not `1`/`times`. Hypotheses after `of`, in order. |
| 92 | +- **The name must describe the actual statement.** Mismatches (`pullback` vs `pushout`, `hom` vs |
| 93 | + `inv`, `mono` vs `epi`) and typos get flagged every time. |
| 94 | +- Injectivity is `f_injective` (word at the end) plus an iff-form `f_inj`; extensionality is `.ext` |
| 95 | + with `@[ext]`. Don't put `nonempty` in a name when `[Nonempty _]` is already a typeclass argument. |
| 96 | +- Names use American spelling (`factorization`, not `factorisation`). |
| 97 | +- Coercion lemmas (for `⇑foo = …`) are named `coe_foo`; a property-of-an-object lemma reads |
| 98 | + `property_object` (`isInvertedBy_isomorphisms`); surface a disambiguating hypothesis with `_of_…`. |
| 99 | +- A predicate is a **suffix** (`principal_surjective`); place a lemma in its **subject's namespace** |
| 100 | + (`_root_.Ns.foo` if defined elsewhere); `protected` short/common names. |
| 101 | + |
| 102 | +**Statements & API** (`references/api-design.md`) |
| 103 | +- Give an **explicit type for every argument and the return type**, even when Lean could infer it — |
| 104 | + it makes the statement readable on GitHub/docs. |
| 105 | +- Put hypotheses **left of the colon** when the proof starts by introducing them. Make the bound |
| 106 | + variables of an **`iff` lemma implicit** so `.mp`/`.mpr` work directly. |
| 107 | +- Hoist assumptions shared by several declarations into a `variable` block; delete typeclass |
| 108 | + assumptions and variables that aren't actually used. |
| 109 | +- Reuse existing API instead of rebuilding it; add `@[simp]`/rewrite lemmas so callers never need |
| 110 | + to unfold your definition; use `@[simps]`/`@[simps!]` instead of hand-writing projection lemmas. |
| 111 | +- Add attributes where they belong (`@[simp]`, `@[ext]`, `@[gcongr]`, `@[fun_prop]`, `@[to_additive]`, |
| 112 | + `@[to_dual]`) and **not** speculatively ("first, do no harm"). Use `@[to_additive]`/`@[to_dual]` |
| 113 | + to generate the additive/dual statement rather than writing it twice. |
| 114 | +- Keep definitions `semireducible` (the `def` default); seal an API with a `structure` wrapper, not |
| 115 | + `irreducible`. Give `abbrev`s an explicit type. |
| 116 | +- Put canonically-inferred **constraint** hypotheses in instance-implicit `[ ]` (`[Nonempty n]`, not |
| 117 | + `(_ : Nonempty n)`); reserve `( )`/`{ }` for data. Use `def`, not `abbrev`, for API maps. |
| 118 | +- Add `@[simp]` to `*_iff` characterizations and basic `apply`/coercion lemmas; **don't** add a lemma |
| 119 | + that `simp` or `Iff.rfl` already proves. |
| 120 | +- Generalize the **definition**, not just the statement: most general type, weakest structure, most |
| 121 | + general file (e.g. define on `PiLp` over `EuclideanSpace`; prove `IsEmbedding` first, then subtype). |
| 122 | +- Generalize concrete homs to `FunLike`/`*HomClass`; use junk values to drop side conditions; don't |
| 123 | + bundle results with `∧`. `fast_instance% FunLike.…` for derived algebraic instances; a canonical |
| 124 | + object is a `def` (not an existence theorem); `class` only for typeclasses, else `structure`. |
| 125 | + |
| 126 | +**Proof & formatting** (`references/style.md`) |
| 127 | +- Lines ≤ 100 chars. Declarations are flush-left; `namespace`/`section` do not indent contents. |
| 128 | +- `by` ends the preceding line (`:= by`), never sits on its own line. Proof body indents 2 spaces; |
| 129 | + a multi-line *statement* indents continuation lines 4 spaces. |
| 130 | +- **No empty lines inside a declaration** (linter-enforced) — add a one-line comment instead. |
| 131 | +- Use `fun x ↦ …` (not `λ`); use `<|`/`|>` (not `$`). No space after unary minus (`-a`). |
| 132 | +- **No `erw`, no stray `rfl` after `simp`/`rw`** — that signals missing API; add the lemma instead. |
| 133 | +- Don't squeeze a *terminal* `simp` (don't replace it with `simp?` output) — it buries the key |
| 134 | + lemmas and breaks on renames. |
| 135 | +- Prefer the tight idiom: `rwa` over `rw …; exact`; `haveI` for instances; `rfl`/`inferInstance`/ |
| 136 | + `Iso.refl _` for trivial goals; `_` for unused pattern variables; `ext` (not `ext : 1`). |
| 137 | +- A **non-terminal `simp`** (followed by `exact`/`infer_instance`/`rw`) trips the `flexible` linter — |
| 138 | + use `simpa using …` or an explicit `rw`. Use `exact` (not `refine`) when there are no `?_`; `Iff.rfl` |
| 139 | + for a definitional iff; `simp_rw` to rewrite under binders. |
| 140 | +- **Reach for automation** (`grind`, `simp`, `gcongr`, `positivity`, `fun_prop`, `omega`) with explicit |
| 141 | + lemma lists over manual ladders; `by classical` instead of a `[DecidableEq]` argument; `by_contra!` |
| 142 | + over `by_contra; push_neg`. Don't reformat code you aren't changing. |
| 143 | + |
| 144 | +**Documentation** (`references/documentation.md`) |
| 145 | +- Current file header is the module form: copyright (current year), then `module`, then grouped |
| 146 | + `public import`s (alphabetical), then plain `import`s, then a `/-! … -/` module docstring. |
| 147 | +- Every `def` and major theorem needs a docstring, and **the docstring must match the statement** |
| 148 | + (variable names, which hypothesis is on which object). Update docs/comments when you generalize. |
| 149 | +- Use precise terminology and correct grammar/Unicode (`étale`, `an` before a vowel, right plural); |
| 150 | + cross-reference related declarations; cite the literature in `docs/references.bib`. |
| 151 | +- Docstring continuation lines are **not** indented. When *moving* code, keep the original copyright |
| 152 | + year and authors (don't claim sole authorship of relocated code). |
| 153 | +- Docstrings describe the mathematical **purpose**, not the implementation; a module docstring needs a |
| 154 | + summary, not just a title. Minimize **public** imports (`#min_imports`); keep general lemmas in general files. |
| 155 | + |
| 156 | +**Performance** |
| 157 | +- Use `Type*`, never `Type _` (the latter creates extra unification work). Avoid import creep. |
| 158 | + |
| 159 | +## Pre-submission checklist |
| 160 | + |
| 161 | +- [ ] Searched for existing/more-general results; statement uses the weakest hypotheses. |
| 162 | +- [ ] In the right file (`#find_home`); no surprising new imports; PR is small and self-contained. |
| 163 | +- [ ] Names follow the conventions and describe the statement; American spelling. |
| 164 | +- [ ] Every argument and the return type has an explicit type; iff-lemma vars implicit. |
| 165 | +- [ ] Attributes added where appropriate, not speculatively; `@[to_additive]`/`@[to_dual]` used. |
| 166 | +- [ ] Every def/major theorem has an accurate docstring; module docstring + header present. |
| 167 | +- [ ] Style: ≤100 cols, `:= by` placement, no `erw`, no `λ`/`$`/`Type _`, no empty proof lines. |
| 168 | +- [ ] Renamed/removed public names carry dated `@[deprecated]` aliases. |
| 169 | +- [ ] `lake exe mk_all` run if files added; CI green; `!bench` run if simp/instances/imports/defs. |
| 170 | +- [ ] Constraint hypotheses in `[ ]`; `@[simp]` on iff/apply lemmas; nothing `simp`/`Iff.rfl` already proves. |
| 171 | +- [ ] No non-terminal `simp` (use `simpa`/explicit); docstring continuation lines unindented. |
| 172 | +- [ ] Reached for automation (`grind`/`simp`/`fun_prop`/`positivity`) over manual proofs where possible. |
| 173 | +- [ ] PR title is `type(scope): subject`; description has motivation; AI use disclosed. |
| 174 | + |
| 175 | +## References |
| 176 | + |
| 177 | +Read the relevant file when you need depth or the user pushes back on a convention: |
| 178 | + |
| 179 | +- `references/naming.md` — capitalization rules, the symbol dictionary, structural-lemma naming |
| 180 | + (`.ext`, `_injective`/`_inj`, `ge`/`gt`), coercions, with real PR examples. |
| 181 | +- `references/style.md` — layout, `calc`, focusing dots, the full *tactic idiom* substitution table |
| 182 | + (e.g. `rw …; exact` → `rwa`), `erw`/transparency, simp-squeezing. |
| 183 | +- `references/api-design.md` — explicit types, generality, `variable` blocks, instances, attributes, |
| 184 | + `@[simps]`, transparency, and the deprecation recipe. |
| 185 | +- `references/documentation.md` — file header / module system, module docstrings, doc requirements, |
| 186 | + file location & splitting, citations. |
| 187 | +- `references/pr-process.md` — scope, the AI-disclosure policy, search-first tooling, commit/PR |
| 188 | + title & description conventions, labels, and the Bors merge lifecycle. |
| 189 | +- `references/review-catalog.md` — the consolidated *wrong → right* catalog, each correction cited |
| 190 | + to its real PR (#39365–#41076) plus the canonical examples from Mathlib's own |
| 191 | + PR-review guide. Read this during Phase 3 self-review, or whenever you're unsure why a pattern is |
| 192 | + discouraged. |
| 193 | + |
| 194 | +When in doubt, **match the surrounding code** and ask on the [`#mathlib4` Zulip](https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/). |
0 commit comments