Skip to content

Commit 9894555

Browse files
docs: seed initial review rules for automated code review
1 parent 7ba58da commit 9894555

2 files changed

Lines changed: 274 additions & 0 deletions

File tree

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
---
2+
name: review-autolearn
3+
description: |
4+
Mine human code-review comments from this repo, distill recurring catches into REVIEW.md
5+
rules so reviewers (and automated review) flag them before a human has to. Uses a
6+
multi-agent workflow (parallel clusterers + adversarial verify). Opens a PR with the
7+
proposed additions to the Recurring Catches section.
8+
argument-hint: "[dry] [<days>]"
9+
---
10+
11+
# /review-autolearn [dry] [<days>]
12+
13+
Humans keep catching the same things in review. Read what they caught recently, turn the
14+
recurring ones into `REVIEW.md` guidance, PR it.
15+
16+
## Arguments
17+
18+
| Arg | Description |
19+
|-----|-------------|
20+
| (none) | 14-day window, open PR |
21+
| `dry` | Print proposed diff, no PR |
22+
| `<N>` | N-day lookback window |
23+
24+
## Resolved variables
25+
26+
```
27+
REPO = $(gh repo view --json nameWithOwner --jq .nameWithOwner)
28+
TARGET = REVIEW.md
29+
```
30+
31+
---
32+
33+
## 1. Dedup — skip if already in flight
34+
35+
```bash
36+
gh pr list --repo "$REPO" --state open --label review-autolearn --json number,url
37+
```
38+
39+
Open PR exists → report its link and stop (unless `dry`).
40+
41+
## 2. Harvest
42+
43+
Three GitHub comment surfaces feed the corpus. Fetch each, then derive four files.
44+
45+
```bash
46+
since=$(date -u -d "${DAYS:-14} days ago" +%F 2>/dev/null || date -u -v-${DAYS:-14}d +%F)
47+
RAW=/tmp/review-autolearn-raw.jsonl
48+
HUMAN=/tmp/review-autolearn-human.jsonl
49+
BOTAGREE=/tmp/review-autolearn-botagree.jsonl
50+
BOTPUSHBACK=/tmp/review-autolearn-botpushback.jsonl
51+
52+
# (a) Inline review comments — pinned to diff lines
53+
gh api "repos/$REPO/pulls/comments?since=${since}T00:00:00Z&sort=created&direction=desc" \
54+
--paginate --jq '.[] | {kind:"inline", id, in_reply_to_id,
55+
pr:(.pull_request_url|split("/")|last),
56+
author:.user.login, path:.path, body:.body, diff:.diff_hunk}' \
57+
> $RAW
58+
59+
# (b) Review submission bodies — the text on APPROVE/REQUEST_CHANGES/COMMENT
60+
gh api "repos/$REPO/pulls?state=all&sort=updated&direction=desc&per_page=100" --paginate \
61+
--jq ".[] | select(.updated_at >= \"${since}\") | .number" \
62+
| while read pr; do
63+
gh api "repos/$REPO/pulls/$pr/reviews" \
64+
--jq ".[] | select(.body != \"\") | select(.submitted_at >= \"${since}\")
65+
| {kind:\"review\", id, in_reply_to_id:null, pr:\"$pr\",
66+
author:.user.login, path:null, body:.body, diff:null}"
67+
done >> $RAW
68+
69+
# (c) PR conversation comments — filtered to drop noise
70+
gh api "repos/$REPO/issues/comments?since=${since}T00:00:00Z&sort=created&direction=desc" \
71+
--paginate --jq '.[] | select(.html_url | contains("/pull/"))
72+
| {kind:"convo", id, in_reply_to_id:null,
73+
pr:(.html_url|capture("/pull/(?<n>[0-9]+)").n),
74+
author:.user.login, path:null, body:.body, diff:null}' \
75+
| jq -c 'select(.body | length > 40)
76+
| select(.body | test("^@claude\\b"; "i") | not)
77+
| select(.body | test("^(addressed|rebased|thanks|done|merged|fixed in|updated)\\b"; "i") | not)' \
78+
>> $RAW
79+
80+
# --- derive corpora ---
81+
82+
# Human-authored, top-level (inline) + all review bodies + filtered convo
83+
jq -c 'select(.author // "" | endswith("[bot]") | not)
84+
| select(.kind != "inline" or .in_reply_to_id == null)' $RAW > $HUMAN
85+
86+
# Human replies to bot inline comments, split by sentiment.
87+
# Agreements = bot catch a human validated → learn TO flag it.
88+
# Pushback = bot false positive → learn NOT to flag it.
89+
jq -s '(map(select(.kind=="inline")) | map({(.id|tostring): {author,body}}) | add) as $p
90+
| .[]
91+
| select(.kind=="inline" and .in_reply_to_id != null)
92+
| select(.author // "" | endswith("[bot]") | not)
93+
| ($p[.in_reply_to_id|tostring]) as $parent
94+
| select($parent.author // "" | endswith("[bot]"))
95+
| . + {parent_author:$parent.author, parent_body:$parent.body}' -c $RAW \
96+
> /tmp/review-autolearn-botreplies.jsonl
97+
98+
jq -c 'select(.body | test("^(fixed|amended|done|addressed|thanks|good catch|legitimate|updated|resolved|makes sense|agreed|fair)\\b"; "i"))' \
99+
/tmp/review-autolearn-botreplies.jsonl > $BOTAGREE
100+
jq -c 'select(.body | test("^(fixed|amended|done|addressed|thanks|good catch|legitimate|updated|resolved|makes sense|agreed|fair)\\b"; "i") | not)' \
101+
/tmp/review-autolearn-botreplies.jsonl > $BOTPUSHBACK
102+
103+
wc -l $HUMAN $BOTAGREE $BOTPUSHBACK
104+
```
105+
106+
Zero in all → report "no review activity in window" and stop.
107+
108+
## 3. Cluster + verify (Workflow)
109+
110+
Launch the Workflow tool with this script. Pass `args = { humanFile, botAgreeFile, botPushbackFile, targetFile, repo }`.
111+
112+
```js
113+
export const meta = {
114+
name: 'review-autolearn',
115+
description: 'Distill recurring review catches into REVIEW.md rules',
116+
phases: [
117+
{ title: 'Cluster', detail: '6 angles over the comment corpus' },
118+
{ title: 'Verify', detail: '3-vote adversarial check per candidate rule' },
119+
{ title: 'Synthesize' },
120+
],
121+
}
122+
123+
const { humanFile, botAgreeFile, botPushbackFile, targetFile, repo } = args
124+
const ANGLES = [
125+
{ src: humanFile, focus: 'spec/protocol compliance (schema.ts mismatches, HTTP status codes, transport assumptions)' },
126+
{ src: humanFile, focus: 'API surface creep (unnecessary exports, speculative abstractions, parallel APIs)' },
127+
{ src: humanFile, focus: 'cross-SDK consistency (naming, patterns diverging from the other SDK)' },
128+
{ src: humanFile, focus: 'async/lifecycle correctness (race conditions, cleanup, cancellation, task lifecycle)' },
129+
{ src: botAgreeFile, focus: 'human-validated bot catches - each entry is a bot review comment a human accepted (parent_body is the bot finding, body is the human acknowledgment). Cluster by what the bot caught; produce rules so reviewers flag the same pattern' },
130+
{ src: botPushbackFile, focus: 'reviewer false positives - humans disagreeing with bot review comments (parent_body is the bot claim, body is the human rebuttal). Produce rules of the form "do NOT flag X; it is intentional/correct because Y"' },
131+
]
132+
const RULE = { type:'object', required:['rule','section','prs'],
133+
properties:{ rule:{type:'string'}, section:{type:'string'},
134+
prs:{type:'array',items:{type:'string'}} } }
135+
const RULES = { type:'object', required:['rules'], properties:{ rules:{type:'array',items:RULE} } }
136+
const VERDICT = { type:'object', required:['keep','reason'],
137+
properties:{ keep:{type:'boolean'}, reason:{type:'string'} } }
138+
139+
const verified = await pipeline(
140+
ANGLES,
141+
a => agent(
142+
`Read ${a.src} (one JSON/line: fields include pr,author,path,body,diff,kind and for bot-reply files also parent_author,parent_body). `+
143+
`Focusing ONLY on ${a.focus}: drop praise/LGTM/nit/question-only; group remaining comments by underlying pattern. `+
144+
`A cluster qualifies iff >=2 distinct PRs AND mechanically detectable from a diff AND not already covered in ${targetFile} (grep it). `+
145+
`Return rules[] - each <=4 lines, imperative, why-in-one-clause, cite PR#s, name the REVIEW.md section to append under. Zero is valid.`,
146+
{ phase:'Cluster', label:`cluster:${a.focus.split(' ')[0]}`, schema: RULES }).then(out => ({...out, src: a.src})),
147+
out => parallel((out?.rules ?? []).map(r => () =>
148+
parallel(Array.from({length:3}, () => () => agent(
149+
`Adversarially verify this proposed REVIEW.md rule for ${repo}:\n${JSON.stringify(r)}\n`+
150+
`REFUTE (keep=false) if: <2 distinct PRs in ${out.src} actually back it, OR ${targetFile} already covers it, `+
151+
`OR it isn't mechanically detectable from a diff, OR it's too vague to act on. Default to refute when unsure.`,
152+
{ phase:'Verify', label:`verify:${r.section}`, schema: VERDICT })))
153+
.then(votes => ({ ...r, keep: votes.filter(v => v?.keep).length >= 2 }))
154+
)),
155+
)
156+
const kept = verified.flat().filter(r => r?.keep)
157+
if (!kept.length) return { diff: null, summary: 'nothing recurring' }
158+
159+
phase('Synthesize')
160+
const diff = await agent(
161+
`Produce a minimal patch for ${targetFile} appending these verified rules under the "## Recurring Catches" section. `+
162+
`Slot each rule under its "### ${'{'}section${'}'}" subsection within Recurring Catches (create the subsection if absent). `+
163+
`Do not modify Guiding Principles, Review Ordering, or Checklist sections. Dedupe overlaps with existing Recurring Catches entries. `+
164+
`Return a unified diff only.\nRules:\n${JSON.stringify(kept,null,2)}`,
165+
{ label:'synthesize' })
166+
return { diff, rules: kept }
167+
```
168+
169+
## 4. Ship
170+
171+
`dry` → print `result.diff` + backing PR#s per rule, stop.
172+
173+
Otherwise:
174+
175+
```
176+
branch: review-autolearn/<date>
177+
- gh label create review-autolearn --repo "$REPO" -c FBCA04 -d "Auto-mined review conventions" 2>/dev/null || true
178+
- git checkout -b <branch>
179+
- Apply result.diff to REVIEW.md
180+
- Commit, push, open PR:
181+
title: "docs(REVIEW.md): encode review catches from last <N>d"
182+
label: review-autolearn
183+
body: one bullet per rule → linking the backing PR comments
184+
```
185+
186+
Do NOT auto-merge — a human signs off on guidance changes.
187+
188+
## 5. Report
189+
190+
`<N rules | nothing recurring> · <PR link | dry>`

REVIEW.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# typescript-sdk Review Conventions
2+
3+
Guidance for reviewing pull requests on this repository. The first three sections are
4+
stable principles; the **Recurring Catches** section is auto-maintained from past human
5+
review rounds and grows over time.
6+
7+
## Guiding Principles
8+
9+
1. **Minimalism** — The SDK should do less, not more. Protocol correctness, transport
10+
lifecycle, types, and clean handler context belong in the SDK. Middleware engines,
11+
registry managers, builder patterns, and content helpers belong in userland.
12+
2. **Burden of proof is on addition** — The default answer to "should we add this?" is
13+
no. Removing something from the public API is far harder than not adding it.
14+
3. **Justify with concrete evidence** — Every new abstraction needs a concrete consumer
15+
today. Ask for real issues, benchmarks, real-world examples; apply the same standard
16+
to your own review (link spec sections, link code, show the simpler alternative).
17+
4. **Spec is the anchor** — The SDK implements the protocol spec. The further a feature
18+
drifts from the spec, the stronger the justification needs to be.
19+
5. **Kill at the highest level** — If the design is wrong, don't review the
20+
implementation. Lead with the highest-level concern; specific bugs are supporting
21+
detail.
22+
6. **Decompose by default** — A PR doing multiple things should be multiple PRs unless
23+
there's a strong reason to bundle.
24+
25+
## Review Ordering
26+
27+
1. **Design justification** — Is the overall approach sound? Is the complexity warranted?
28+
2. **Structural concerns** — Is the architecture right? Are abstractions justified?
29+
3. **Correctness** — Bugs, regressions, missing functionality.
30+
4. **Style and naming** — Nits, conventions, documentation.
31+
32+
## Checklist
33+
34+
**Protocol & spec**
35+
- Types match [`schema.ts`](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/draft/schema.ts) exactly (optional vs required fields)
36+
- Correct `McpError` codes; HTTP status codes match spec (e.g., 404 vs 410)
37+
- Works for both stdio and Streamable HTTP transports — no transport-specific assumptions
38+
- Cross-SDK consistency: check what `python-sdk` does for the same feature
39+
40+
**API surface**
41+
- Every new export is intentional (see CLAUDE.md § Public API Exports); helpers users can write themselves belong in a cookbook, not the SDK
42+
- New abstractions have at least one concrete callsite in the PR
43+
- One way to do things — improving an existing API beats adding a parallel one
44+
45+
**Correctness**
46+
- Async: race conditions, cleanup on cancellation, unhandled rejections, missing `await`
47+
- Error propagation: caught/rethrown properly, resources cleaned up on error paths
48+
- Type safety: no unjustified `any`, no unsafe `as` assertions
49+
- Backwards compat: public-interface changes, default changes, removed exports — flagged and justified
50+
51+
**Tests & docs**
52+
- New behavior has vitest coverage including error paths
53+
- Breaking changes documented in `docs/migration.md` and `docs/migration-SKILL.md`
54+
55+
## Recurring Catches
56+
57+
### HTTP Transport
58+
59+
- When validating `Mcp-Session-Id`, return **400** for a missing header and **404** for an unknown/expired session — never conflate `!sessionId || !transports[sessionId]` into one status, because the client needs to distinguish "fix your request" from "start a new session". Flag any diff that branches on session-id presence/lookup with a single 4xx. (#1707, #1770)
60+
61+
### Error Handling
62+
63+
- Broad `catch` blocks must not emit client-fault JSON-RPC codes (`-32700` ParseError, `-32602` InvalidParams) for server-internal failures like stream setup, task-store misses, or polling errors — map those to `-32603` InternalError so clients don't retry/reformat pointlessly. Flag any catch-all that hard-codes ParseError/InvalidParams without discriminating the thrown cause. (#1752, #1769)
64+
65+
### Schema Compliance
66+
67+
- When editing Zod protocol schemas in `schemas.ts`, verify unknown-key handling matches the spec `schema.ts`: if the spec type has no `additionalProperties: false`, the SDK schema must use `.passthrough()` / `.catchall(z.unknown())` rather than implicit strict — over-strict Zod (incl. `z.literal('object')` on `type`) rejects spec-valid payloads from other SDKs. Also confirm `spec.types.test.ts` still passes bidirectionally. (#1768, #1849, #1169)
68+
69+
### Async / Lifecycle
70+
71+
- In `close()` / shutdown paths, wrap user-supplied or chained callbacks (`onclose?.()`, cancel fns) in `try/finally` so a throw can't skip the remaining teardown (`abort()`, `_onclose()`, map clears) — otherwise the transport is left half-open. (#1735, #1763)
72+
- Deferred callbacks (`setTimeout`, `.finally()`, reconnect closures) must check closed/aborted state before mutating `this._*` or starting I/O — a callback scheduled pre-close can fire after close/reconnect and corrupt the new connection's state (e.g., delete the new request's `AbortController`). (#1735, #1763)
73+
74+
### Completeness
75+
76+
- When a PR replaces a pattern (error class, auth-flow step, catch shape), grep the package for surviving instances of the old form — partial migrations leave sibling code paths with the very bug the PR claims to fix. Flag every leftover site. (#1657, #1761, #1595)
77+
78+
### Documentation & Changesets
79+
80+
- Read added `.changeset/*.md` text and new inline comments against the implementation in the same diff — prose that promises behavior the code no longer ships misleads consumers and contradicts stated intent. Flag any claim the diff doesn't back. (#1718, #1838)
81+
82+
### CI & GitHub Actions
83+
84+
- Do **not** assert that a third-party GitHub Action or publish toolchain will fail or needs extra permissions/tokens without verifying its docs or source — `pnpm publish` delegates to the system npm CLI (so npm OIDC works), and `changesets/action` in publish mode has no PR-comment step requiring `pull-requests: write`. For diffs under `.github/workflows/`, confirm claimed behavior in the action's README/source before flagging. (#1838, #1836)

0 commit comments

Comments
 (0)