Skip to content

Commit 8c1ac87

Browse files
committed
Implement mathlib-contribution skill
1 parent 571b8a8 commit 8c1ac87

7 files changed

Lines changed: 670 additions & 0 deletions

File tree

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
name: mathlib-contribution
3+
description: >
4+
Guide for writing Mathlib4 (Lean 4) contributions that pass upstream review on the
5+
first pass. Use this whenever the user is adding or editing Lean code in a Mathlib
6+
repository — new theorems, lemmas, definitions, instances, structures, or files — or
7+
preparing/structuring a Mathlib pull request, naming a lemma, writing docstrings,
8+
choosing attributes (@[simp], @[ext], @[to_additive], @[simps]), generalizing
9+
hypotheses, or asking how to make Lean code conform to Mathlib conventions and avoid
10+
reviewer nitpicks. Trigger it even when the user does not say "Mathlib" explicitly but
11+
is clearly working in a Lean mathematics library (files under `Mathlib/`, a mathlib4
12+
clone) or mentions Lean theorem/proof style, naming conventions, golfing, or getting a
13+
Lean PR review-ready. Prefer this over generic Lean help for anything destined for
14+
Mathlib.
15+
---
16+
17+
# Writing Mathlib contributions that pass review
18+
19+
Help the user produce Lean 4 code that a Mathlib reviewer can approve on the first pass.
20+
21+
Mathlib has unusually high and unusually *specific* standards: generality, naming, formatting,
22+
documentation, and API design are all scrutinized. New contributors lose most of their time to
23+
**avoidable** reviewer nitpicks and to re-proving things that already exist. The point of this
24+
skill is to bake those standards in *while writing*, so the diff the reviewer sees is already
25+
idiomatic and the round-trips are short.
26+
27+
This skill carries two things: the distilled rules (here and in `references/`), and a catalog of
28+
real *wrong → right* corrections mined from recent merged PRs (`references/review-catalog.md`).
29+
Apply the rules as you write; consult the catalog and references for depth and justification.
30+
31+
> Apply this skill's conventions to the Lean code you write, but do **not** silently restructure a
32+
> user's unrelated existing code. Suggest improvements; let the user decide.
33+
34+
## Workflow
35+
36+
Move through four phases. They are not rigid gates, but **Phase 1 is the single highest-leverage
37+
step** — skipping it is the most common cause of wasted work and closed PRs.
38+
39+
### Phase 1 — Before writing any code
40+
41+
The most common reviewer response to a newcomer is *"this already exists as `X`"* or *"this should
42+
be more general."* Pre-empt both:
43+
44+
- **Search for prior art first.** Put the target statement in a scratch file with `import Mathlib`
45+
and try `exact?` / `apply?` / `rw?`; search [loogle](https://loogle.lean-lang.org) and the docs.
46+
Mathlib aspires to generality and avoids duplication, so the thing you want often already exists
47+
under a non-obvious name (a vector space is `Module`, a group hom is `MonoidHom`).
48+
- **Aim for the weakest hypotheses** that still make the statement true (most general typeclasses,
49+
fewest assumptions). This is the number-one design request in review.
50+
- **Find the right home.** Use `#find_home` to locate the correct file; put declarations where
51+
they belong, not where it is convenient. Watch for import creep (don't pull `Analysis` into an
52+
`Algebra` file).
53+
- **Prove the right primitive.** Prefer the general equality/characterization over a derived
54+
special-case inequality — the weaker results then follow trivially.
55+
- **Confirm fit and disclose AI.** Mathlib is *not* "all of mathematics"; if scope is unclear, ask
56+
on Zulip first. If AI was used, the PR description must say which tool and how, and add the
57+
`LLM-generated` label for substantial AI code. See `references/pr-process.md` — this is enforced.
58+
- **Keep it small.** Many small, self-contained PRs beat one large PR.
59+
60+
### Phase 2 — Write it idiomatically
61+
62+
Apply the rules below inline as you write. The full rules and rationale live in `references/`.
63+
The condensed, highest-frequency rules are in [The rules that matter most](#the-rules-that-matter-most).
64+
65+
### Phase 3 — Self-review against the catalog
66+
67+
Before calling it done, read `references/review-catalog.md` and check your diff against it — these
68+
are the exact things reviewers ask people to change. Grep your diff for the cheap mechanical
69+
violations: `erw`, `λ`, `$`, `Type _`, empty lines inside proofs, `:= by` not ending the statement
70+
line, theorem names in camelCase, trailing periods in the PR subject.
71+
72+
### Phase 4 — Package the pull request
73+
74+
- Title: `type(scope): subject` (imperative, lowercase start, no trailing period; scope omits the
75+
`Mathlib/` prefix). Description gives motivation; questions/notes go below a `---` line.
76+
- Ensure CI is green (build + linters). Run `lake exe mk_all` if you added files.
77+
- Run `!bench` (comment on the PR) if you added/changed `simp` lemmas, instances, imports, or defs.
78+
- Add `@[deprecated (since := "YYYY-MM-DD")] alias` for any renamed/removed *existing* public name.
79+
80+
Details: `references/pr-process.md`.
81+
82+
## The rules that matter most
83+
84+
These are the conventions most frequently flagged in real reviews. Each is one line with the
85+
*why*, because understanding the reason lets you apply it correctly in new situations.
86+
87+
**Naming** (full rules + symbol dictionary in `references/naming.md`)
88+
- Theorem/proof names are `snake_case`; types/structures/classes are `UpperCamelCase`; other terms
89+
are `lowerCamelCase`. A function is named like its return value. (Wrong: `hasFiniteProductsOfX`.)
90+
- Translate symbols with the standard dictionary (`+``add`, `*``mul`, `⁻¹``inv`, ```dvd`,
91+
```comp`, ```le`, …); use `one`/`mul` not `1`/`times`. Hypotheses after `of`, in order.
92+
- **The name must describe the actual statement.** Mismatches (`pullback` vs `pushout`, `hom` vs
93+
`inv`, `mono` vs `epi`) and typos get flagged every time.
94+
- Injectivity is `f_injective` (word at the end) plus an iff-form `f_inj`; extensionality is `.ext`
95+
with `@[ext]`. Don't put `nonempty` in a name when `[Nonempty _]` is already a typeclass argument.
96+
- Names use American spelling (`factorization`, not `factorisation`).
97+
- Coercion lemmas (for `⇑foo = …`) are named `coe_foo`; a property-of-an-object lemma reads
98+
`property_object` (`isInvertedBy_isomorphisms`); surface a disambiguating hypothesis with `_of_…`.
99+
- A predicate is a **suffix** (`principal_surjective`); place a lemma in its **subject's namespace**
100+
(`_root_.Ns.foo` if defined elsewhere); `protected` short/common names.
101+
102+
**Statements & API** (`references/api-design.md`)
103+
- Give an **explicit type for every argument and the return type**, even when Lean could infer it —
104+
it makes the statement readable on GitHub/docs.
105+
- Put hypotheses **left of the colon** when the proof starts by introducing them. Make the bound
106+
variables of an **`iff` lemma implicit** so `.mp`/`.mpr` work directly.
107+
- Hoist assumptions shared by several declarations into a `variable` block; delete typeclass
108+
assumptions and variables that aren't actually used.
109+
- Reuse existing API instead of rebuilding it; add `@[simp]`/rewrite lemmas so callers never need
110+
to unfold your definition; use `@[simps]`/`@[simps!]` instead of hand-writing projection lemmas.
111+
- Add attributes where they belong (`@[simp]`, `@[ext]`, `@[gcongr]`, `@[fun_prop]`, `@[to_additive]`,
112+
`@[to_dual]`) and **not** speculatively ("first, do no harm"). Use `@[to_additive]`/`@[to_dual]`
113+
to generate the additive/dual statement rather than writing it twice.
114+
- Keep definitions `semireducible` (the `def` default); seal an API with a `structure` wrapper, not
115+
`irreducible`. Give `abbrev`s an explicit type.
116+
- Put canonically-inferred **constraint** hypotheses in instance-implicit `[ ]` (`[Nonempty n]`, not
117+
`(_ : Nonempty n)`); reserve `( )`/`{ }` for data. Use `def`, not `abbrev`, for API maps.
118+
- Add `@[simp]` to `*_iff` characterizations and basic `apply`/coercion lemmas; **don't** add a lemma
119+
that `simp` or `Iff.rfl` already proves.
120+
- Generalize the **definition**, not just the statement: most general type, weakest structure, most
121+
general file (e.g. define on `PiLp` over `EuclideanSpace`; prove `IsEmbedding` first, then subtype).
122+
- Generalize concrete homs to `FunLike`/`*HomClass`; use junk values to drop side conditions; don't
123+
bundle results with ``. `fast_instance% FunLike.…` for derived algebraic instances; a canonical
124+
object is a `def` (not an existence theorem); `class` only for typeclasses, else `structure`.
125+
126+
**Proof & formatting** (`references/style.md`)
127+
- Lines ≤ 100 chars. Declarations are flush-left; `namespace`/`section` do not indent contents.
128+
- `by` ends the preceding line (`:= by`), never sits on its own line. Proof body indents 2 spaces;
129+
a multi-line *statement* indents continuation lines 4 spaces.
130+
- **No empty lines inside a declaration** (linter-enforced) — add a one-line comment instead.
131+
- Use `fun x ↦ …` (not `λ`); use `<|`/`|>` (not `$`). No space after unary minus (`-a`).
132+
- **No `erw`, no stray `rfl` after `simp`/`rw`** — that signals missing API; add the lemma instead.
133+
- Don't squeeze a *terminal* `simp` (don't replace it with `simp?` output) — it buries the key
134+
lemmas and breaks on renames.
135+
- Prefer the tight idiom: `rwa` over `rw …; exact`; `haveI` for instances; `rfl`/`inferInstance`/
136+
`Iso.refl _` for trivial goals; `_` for unused pattern variables; `ext` (not `ext : 1`).
137+
- A **non-terminal `simp`** (followed by `exact`/`infer_instance`/`rw`) trips the `flexible` linter —
138+
use `simpa using …` or an explicit `rw`. Use `exact` (not `refine`) when there are no `?_`; `Iff.rfl`
139+
for a definitional iff; `simp_rw` to rewrite under binders.
140+
- **Reach for automation** (`grind`, `simp`, `gcongr`, `positivity`, `fun_prop`, `omega`) with explicit
141+
lemma lists over manual ladders; `by classical` instead of a `[DecidableEq]` argument; `by_contra!`
142+
over `by_contra; push_neg`. Don't reformat code you aren't changing.
143+
144+
**Documentation** (`references/documentation.md`)
145+
- Current file header is the module form: copyright (current year), then `module`, then grouped
146+
`public import`s (alphabetical), then plain `import`s, then a `/-! … -/` module docstring.
147+
- Every `def` and major theorem needs a docstring, and **the docstring must match the statement**
148+
(variable names, which hypothesis is on which object). Update docs/comments when you generalize.
149+
- Use precise terminology and correct grammar/Unicode (`étale`, `an` before a vowel, right plural);
150+
cross-reference related declarations; cite the literature in `docs/references.bib`.
151+
- Docstring continuation lines are **not** indented. When *moving* code, keep the original copyright
152+
year and authors (don't claim sole authorship of relocated code).
153+
- Docstrings describe the mathematical **purpose**, not the implementation; a module docstring needs a
154+
summary, not just a title. Minimize **public** imports (`#min_imports`); keep general lemmas in general files.
155+
156+
**Performance**
157+
- Use `Type*`, never `Type _` (the latter creates extra unification work). Avoid import creep.
158+
159+
## Pre-submission checklist
160+
161+
- [ ] Searched for existing/more-general results; statement uses the weakest hypotheses.
162+
- [ ] In the right file (`#find_home`); no surprising new imports; PR is small and self-contained.
163+
- [ ] Names follow the conventions and describe the statement; American spelling.
164+
- [ ] Every argument and the return type has an explicit type; iff-lemma vars implicit.
165+
- [ ] Attributes added where appropriate, not speculatively; `@[to_additive]`/`@[to_dual]` used.
166+
- [ ] Every def/major theorem has an accurate docstring; module docstring + header present.
167+
- [ ] Style: ≤100 cols, `:= by` placement, no `erw`, no `λ`/`$`/`Type _`, no empty proof lines.
168+
- [ ] Renamed/removed public names carry dated `@[deprecated]` aliases.
169+
- [ ] `lake exe mk_all` run if files added; CI green; `!bench` run if simp/instances/imports/defs.
170+
- [ ] Constraint hypotheses in `[ ]`; `@[simp]` on iff/apply lemmas; nothing `simp`/`Iff.rfl` already proves.
171+
- [ ] No non-terminal `simp` (use `simpa`/explicit); docstring continuation lines unindented.
172+
- [ ] Reached for automation (`grind`/`simp`/`fun_prop`/`positivity`) over manual proofs where possible.
173+
- [ ] PR title is `type(scope): subject`; description has motivation; AI use disclosed.
174+
175+
## References
176+
177+
Read the relevant file when you need depth or the user pushes back on a convention:
178+
179+
- `references/naming.md` — capitalization rules, the symbol dictionary, structural-lemma naming
180+
(`.ext`, `_injective`/`_inj`, `ge`/`gt`), coercions, with real PR examples.
181+
- `references/style.md` — layout, `calc`, focusing dots, the full *tactic idiom* substitution table
182+
(e.g. `rw …; exact``rwa`), `erw`/transparency, simp-squeezing.
183+
- `references/api-design.md` — explicit types, generality, `variable` blocks, instances, attributes,
184+
`@[simps]`, transparency, and the deprecation recipe.
185+
- `references/documentation.md` — file header / module system, module docstrings, doc requirements,
186+
file location & splitting, citations.
187+
- `references/pr-process.md` — scope, the AI-disclosure policy, search-first tooling, commit/PR
188+
title & description conventions, labels, and the Bors merge lifecycle.
189+
- `references/review-catalog.md` — the consolidated *wrong → right* catalog, each correction cited
190+
to its real PR (#39365#41076) plus the canonical examples from Mathlib's own
191+
PR-review guide. Read this during Phase 3 self-review, or whenever you're unsure why a pattern is
192+
discouraged.
193+
194+
When in doubt, **match the surrounding code** and ask on the [`#mathlib4` Zulip](https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/).
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Statements & API design
2+
3+
The hardest part of review to satisfy is design: generality, the right interface, and how a
4+
declaration interacts with automation. Aim to make the *statement* obviously useful and the API
5+
complete.
6+
7+
## Explicit types
8+
9+
Give the type of **every** argument and the **return type**, even when Lean could infer them — the
10+
statement has to be readable on GitHub and in the docs.
11+
```lean
12+
def GoodStatement (n : ℕ) : Prop := ∃ k : ℕ, n + k = 3 -- not: def Bad (n) := ∃ k, n + k = 3
13+
```
14+
15+
## Argument placement
16+
17+
- If the proof starts by introducing variables, put them **left of the colon** rather than behind
18+
``/``. (Pattern-matching definitions keep the binder on the right.)
19+
- For an **`iff` lemma**, make the quantified variables **implicit** so `.mp`/`.mpr` are usable. *(#41076)*
20+
- Put canonically-inferred **constraint** hypotheses in instance-implicit `[ ]` (`[Nonempty n]`,
21+
`[Fintype n]`), not as explicit/anonymous `(_ : Nonempty n)`; reserve `( )`/`{ }` for data. *(#40699)*
22+
- Make an argument explicit when it appears in the conclusion and can't be inferred; make it implicit
23+
when another argument determines it. Order hypotheses structural-first, "automatic" (`g _ 0 = 1`)
24+
last; for a dot-notation lemma, the subject + its property come first. *(#39623, #39422, #40219)*
25+
26+
## Generality
27+
28+
- Use the **weakest hypotheses / most general typeclasses** that make the statement true (`Mul` over
29+
`Monoid`, `Ring` over `Ring`+`Algebra ℤ`, `IsReduced` over `NoZeroDivisors`, partial measurability
30+
over full). Generalize on request, and proactively. *(#40248, #39873, #40836, #39812)*
31+
- **Generalize concrete morphisms to `FunLike`/`*HomClass`**: take `[FunLike F R S] [RingHomClass F R S]
32+
(f : F)` rather than `(f : R →+* S)`. *(#40449)*
33+
- **Generalize the definition, not only the statement:** define on the general type, with the weakest
34+
structure, in the most general file — e.g. on `PiLp` rather than `EuclideanSpace`; prove the
35+
`IsEmbedding` version first and derive the subtype case. *(#40634, #40799)*
36+
- **Use junk values to drop side conditions** where the conclusion still holds (`∀ x` rather than
37+
`∀ x ≥ 1`). *(#40569)*
38+
- Hoist shared assumptions into a `variable` block; **delete unused** assumptions and variables.
39+
*(#40976, #40944, #40928)*
40+
- Use the most specific base needed in an instance head, and prove the form that keeps instance search
41+
cheap (`T3Space` directly). Don't require an instance you can derive locally with a `have`. *(#40931, #40637)*
42+
43+
## A complete, idiomatic API
44+
45+
- Supply the full set of standard operations and their companion/dual lemmas: add `Sub` alongside
46+
`Add`/`Neg`; the `LE` form (and `LT → LE`) for an `LT` result; `comp_left` with `comp_right`. Don't
47+
bundle several results with `` — state them as **separate theorems**. *(#40890, #40802, #40569)*
48+
- **`@[simps]` / `@[simps!]`** auto-generate projection lemmas — use them over hand-written ones (and
49+
on structure operations: `@[simps, refl]`, `@[simps, symm]`). Build on an existing definition with
50+
inheritance (`def foo where __ := bar`) so `@[simps!]` reuses it. *(#40890, #39757, #40455)*
51+
- **`fast_instance% FunLike.…`** derives an algebraic instance (`Semiring`, `AddCommMonoid`, …) for a
52+
bundled-morphism type — use it instead of proving each field by hand. *(#40515, #39637)*
53+
- Add `@[simp]`/rewrite lemmas so downstream proofs never unfold your definition. *(#40991, #41063)*
54+
- **Reuse existing API** instead of reconstructing it (`rTensor`, `IsLocalHomeomorphOn`,
55+
`Algebra.adjoin_le`, `rangeRestrict`, `toZMod`, an `_iff` via `rw`); check for a defeq type that
56+
already has the instance (`⨁ i, M i` is `Π₀ i, M i`). *(#41034, #41033, #40844, #39764)*
57+
- **Don't add a lemma `simp`/`Iff.rfl`/`grind` already proves**, a definitional-equation lemma, or a
58+
duplicate of existing API; don't delete a *useful* characterization lemma. Inline a one-line proof
59+
rather than naming it. *(#40889, #40647, #40456, #39589, #41067)*
60+
- Make a canonical object a **`def`** (with explicit parameters) rather than an existence theorem —
61+
`Classical.choice`-free constructions deserve real API. Use `class` only for typeclasses; otherwise
62+
`structure`. *(#40567, #39810)*
63+
64+
## Attributes
65+
66+
Tag where appropriate — and only where appropriate ("first, do no harm"):
67+
`@[simp]`, `@[ext]` (numeric priority `@[ext 1100]`, not `@[ext high]`), `@[gcongr]`, `@[fun_prop]`,
68+
`@[to_additive]`, `@[to_dual]`, `@[simps]`, and literature tags `@[stacks …]`/`@[kerodon …]`. In
69+
particular add `@[simp]` to `*_iff` characterizations and basic `apply`/coercion lemmas (reviewers ask
70+
across *all* analogous lemmas at once), and `@[fun_prop]` to continuity/differentiability/measurability
71+
lemmas so `fun_prop` can use them. Don't add `@[simp]`/`@[simps]` speculatively, and don't let a
72+
generated `simp` lemma replace a better hand-written one. *(#40715, #40739, #40451, #40080, #40931)*
73+
Use `@[to_additive]`/`@[to_dual]` to generate the additive/dual statement rather than writing it twice.
74+
75+
## Transparency / definitional design
76+
77+
- `def` is `semireducible` by default; `abbrev` is `reducible`. Use `def` (not `abbrev`) for API maps
78+
that should not auto-unfold; give an `abbrev` an explicit type. Seal an API with a `structure`
79+
wrapper, not `irreducible`. *(#40666, #40923)*
80+
- Don't add an instance/field whose body is just `inferInstance`. For a type with several natural
81+
instances (e.g. `Matrix` with different norms), **scope** the instance rather than making it global,
82+
and document any diamond in detail. *(#41000, #40272, #39531)*
83+
- Avoid dependent types where a non-dependent encoding works; bundle new morphisms with `FunLike` and
84+
new subobjects with `SetLike`.
85+
86+
## Deprecation (renaming/removing public declarations)
87+
88+
Keep the old name as a deprecated `alias` (or deprecate with a message), **with the merge date**:
89+
```lean
90+
@[deprecated (since := "2026-06-25")] alias old_name := new_name
91+
```
92+
For `@[to_additive]` pairs deprecate both names. Deprecations may be deleted after 6 months;
93+
brand-new declarations need none. Renaming a *type* uses `@[deprecated New (since := "…")] abbrev Old := New`.
94+
*(#41033, #39707)*

0 commit comments

Comments
 (0)