Skip to content

Commit b6cad63

Browse files
committed
feat[ci](pr-review): severity-based merge gate; exclude rules/filters/definitions from AI review
1 parent 6468c5f commit b6cad63

5 files changed

Lines changed: 176 additions & 87 deletions

File tree

.github/ai-prompts/README.md

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ this exact shape (no markdown, no code fences, no extra text):
2727
"summary": "<one line, max 200 chars>",
2828
"findings": [
2929
{
30-
"severity": "high" | "medium" | "low",
30+
"severity": "critical" | "high" | "medium" | "low",
3131
"file": "<path>",
3232
"line": <int>,
3333
"message": "<description and mitigation>"
@@ -36,22 +36,33 @@ this exact shape (no markdown, no code fences, no extra text):
3636
}
3737
```
3838

39+
### Severity drives the merge gate
40+
41+
The approver blocks the merge based on **severity**, not on how many findings
42+
there are. Pick the lowest severity that honestly fits — don't inflate a nit.
43+
44+
- **`critical` / `high` → BLOCKING.** Something that can break: crashes, nil
45+
dereferences, data loss/corruption, races/deadlocks, broken or unsafe DB
46+
migrations, security holes, breaking API/proto/contract changes. These stop
47+
auto-merge.
48+
- **`medium` / `low` → non-blocking WARNING.** Real but contained: missing
49+
user feedback, inconsistent patterns, naming, typos in docs/strings, style.
50+
Reported as warnings; the PR can still merge.
51+
3952
### Tier semantics
4053

41-
- **Tier 1 — Approve.** The change is simple, doesn't touch critical logic,
42-
no issues detected. The approver aggregates all tiers and, if every
43-
prompt returns Tier 1, approves the PR.
44-
- **Tier 2 — Changes requested.** Minor issues the author must fix before
45-
merging: typos, small bugs, out-of-context code, noticeable style
46-
problems, incomplete mocks or tests.
47-
- **Tier 3 — Engineer review required.** The diff touches critical paths
48-
(crypto, auth, DB migrations, installer, gRPC contracts, CI/CD, secret
49-
handling) or introduces changes the model can't judge with sufficient
50-
confidence. The approver blocks the merge and @mentions the senior
51-
engineering team.
52-
53-
The approver takes the **maximum tier** across all prompts: if security
54-
returns Tier 1 but architecture returns Tier 3, the final verdict is Tier 3.
54+
`tier` is a coarse signal. The gate uses severity for blocking, **plus** Tier 3:
55+
56+
- **Tier 1** — fine to merge; no high/critical issues (minor warnings allowed).
57+
- **Tier 2** — at least one high-severity bug that should be fixed.
58+
- **Tier 3** — engineer review required / could break. Critical paths (crypto,
59+
auth, DB migrations, installer, gRPC contracts, CI/CD, secret handling) or
60+
changes the model can't judge confidently. Always blocks and @mentions the
61+
team.
62+
63+
**The merge is blocked if** any finding is `high`/`critical`, **or** any prompt
64+
returns Tier 3, **or** no review ran. Otherwise the approver approves the PR
65+
(any medium/low findings ride along as warnings).
5566

5667
### When there's nothing to report
5768

@@ -60,10 +71,9 @@ Tier 1, a brief `summary` ("No security concerns detected.") and
6071

6172
### Unparseable responses
6273

63-
If the model returns something that isn't valid JSON matching the schema,
64-
the approver treats it as **Tier 2** with a generic finding asking for
65-
manual review. Fail-safe behaviour — we'd rather block and ask for human
66-
review than let something pass without understanding it.
74+
If the model returns something that isn't valid JSON matching the schema, the
75+
approver treats it as a blocking `high` finding. Fail-safe behaviour — we'd
76+
rather hold for a human than let something pass without understanding it.
6777

6878
## Picking a model
6979

.github/ai-prompts/bugs.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -44,25 +44,31 @@ You are a senior code reviewer. Review the Pull Request diff looking for
4444
rest of the diff. Even in a 100-file PR dominated by backend changes, a
4545
single misspelling in a guide or a personal name in a customer-facing
4646
doc still warrants a finding — do not skip it because "the real work is
47-
elsewhere". When you find any of these, set tier to AT LEAST 2.
47+
elsewhere". Report these as `low`/`medium` (they're warnings, not blockers).
4848

4949
**Ignore** preexisting issues on lines not touched by the diff.
5050

51-
## How to assign tier
51+
## Severity (this is what blocks the merge)
5252

53-
- **Tier 1** — No concrete bugs detected AND no user-facing string
54-
anomalies (typos, internal references, contact info leaks). The change
55-
looks correct.
56-
- **Tier 2** — Concrete but contained bugs the author must fix before
57-
merging (off-by-one, error swallowing, unclosed resources,
58-
out-of-context code). **Always Tier 2 minimum** if you find any
59-
user-facing string anomaly: typos in docs/guides/messages, personal
60-
names or internal handles in customer-facing content, internal URLs
61-
or ticket IDs leaking into public docs.
62-
- **Tier 3** — A bug that may cause data corruption, deadlock, large-scale
63-
leaks, or any issue whose impact the author shouldn't fix without a
64-
second opinion. Also applies if the diff touches DB migrations, error
65-
handling on transactional paths, or complex concurrency.
53+
Pick the lowest severity that honestly fits; don't inflate a nit.
54+
55+
- **`critical` / `high` — blocking.** A bug that will actually break behavior:
56+
nil/null deref, out-of-bounds, race/deadlock, goroutine/resource leak,
57+
unhandled error on an important path, inverted logic, malformed query, a
58+
migration that breaks existing data, out-of-context code that changes
59+
behavior. Use `critical` for data corruption, deadlock, or large-scale leaks.
60+
- **`medium` / `low` — non-blocking warning.** Real but contained: missing
61+
user feedback, inconsistent error-handling style, naming, typos in
62+
docs/guides/messages, personal names or internal handles/URLs/ticket IDs in
63+
customer-facing content.
64+
65+
## Tier
66+
67+
- **Tier 1** — no high/critical bugs (minor warnings are fine).
68+
- **Tier 2** — at least one high-severity bug to fix before merging.
69+
- **Tier 3** — could cause data corruption, deadlock, or large-scale leaks, or
70+
the diff touches DB migrations, transactional error handling, or complex
71+
concurrency and needs a second opinion.
6672

6773
## Output
6874

@@ -73,7 +79,7 @@ Respond with valid JSON ONLY (no markdown, no backticks, no extra text):
7379
"tier": 1 | 2 | 3,
7480
"summary": "<one line, max 200 chars>",
7581
"findings": [
76-
{"severity": "high"|"medium"|"low", "file": "<path>", "line": <n>, "message": "<description and how to reproduce>"}
82+
{"severity": "critical"|"high"|"medium"|"low", "file": "<path>", "line": <n>, "message": "<description and how to reproduce>"}
7783
]
7884
}
7985
```

.github/scripts/ai-review.sh

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,10 @@ write_fallback() {
4747
tier: 2,
4848
summary: "AI review could not parse model response — manual review recommended.",
4949
findings: [{
50-
severity: "medium",
50+
severity: "high",
5151
file: "(n/a)",
5252
line: 0,
53-
message: $reason
53+
message: ($reason + " (fail-safe: a review that cannot run is treated as blocking).")
5454
}]
5555
}' > "$OUTPUT_FILE"
5656
echo "::warning::Wrote fallback result: $reason"
@@ -83,6 +83,19 @@ MODEL="${prompt_model:-$DEFAULT_MODEL}"
8383

8484
echo "::group::AI review — prompt: $prompt_name (model: $MODEL)"
8585

86+
# --- Nothing to review -------------------------------------------------------
87+
# The diff can be empty after upstream filtering (e.g. a PR that only touches
88+
# excluded rules/filters/definitions paths). Pass as Tier 1 instead of calling
89+
# the model with an empty diff.
90+
if [[ ! -s "$DIFF_FILE" ]] || ! grep -q '[^[:space:]]' "$DIFF_FILE"; then
91+
jq -n --arg prompt "$prompt_name" --arg model "$MODEL" \
92+
'{prompt: $prompt, model: $model, tier: 1, summary: "No reviewable changes in this diff (excluded paths only).", findings: []}' \
93+
> "$OUTPUT_FILE"
94+
echo "Empty diff — wrote Tier 1 pass."
95+
echo "::endgroup::"
96+
exit 0
97+
fi
98+
8699
# --- Build request body ------------------------------------------------------
87100

88101
prompt_body=$(tail -n "+${body_start}" "$PROMPT_FILE")

.github/scripts/approver.sh

Lines changed: 75 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -143,48 +143,73 @@ fi
143143

144144
# =============================================================================
145145
# 2. AI verdict — read every ai-review-* artifact
146+
#
147+
# Gate policy (severity-based): only HIGH/CRITICAL findings — or an explicit
148+
# Tier 3 (critical path / needs a human), or a review that couldn't run —
149+
# block the merge. MEDIUM/LOW findings are surfaced as non-blocking warnings
150+
# and do NOT stop auto-merge.
146151
# =============================================================================
147152

148153
declare -a ai_results=()
149154
declare -i max_tier=1
155+
has_block_sev=false # any high/critical finding
156+
has_any_findings=false # any finding at all (for warning vs clean wording)
150157
ai_findings_md=""
151158

152159
shopt -s nullglob
153160
for d in "$ARTIFACTS_DIR"/ai-review-*/; do
154161
f="${d}result.json"
155162
[[ -f "$f" ]] || continue
156163
ai_results+=("$f")
164+
157165
tier=$(jq -r '.tier // 2' "$f")
158-
if (( tier > max_tier )); then
159-
max_tier=$tier
166+
(( tier > max_tier )) && max_tier=$tier
167+
168+
if jq -e '[(.findings // [])[].severity // "" | ascii_downcase] | any(. == "high" or . == "critical")' "$f" >/dev/null 2>&1; then
169+
has_block_sev=true
170+
fi
171+
if jq -e '((.findings // []) | length) > 0' "$f" >/dev/null 2>&1; then
172+
has_any_findings=true
160173
fi
161174
done
162175
shopt -u nullglob
163176

177+
no_ai=false
164178
if [[ ${#ai_results[@]} -eq 0 ]]; then
165-
echo "::warning::No AI review artifacts — treating as tier 2 fail-safe"
166-
max_tier=2
179+
echo "::warning::No AI review artifacts — fail-safe block"
180+
no_ai=true
167181
fi
168182

169-
# Build a markdown section per AI prompt result.
183+
# Final AI gate: block on a high/critical finding, an explicit Tier 3, or a
184+
# review that did not run. Tier 2 on its own (only medium/low) does NOT block.
185+
ai_blocked=false
186+
if $no_ai || $has_block_sev || (( max_tier >= 3 )); then
187+
ai_blocked=true
188+
fi
189+
190+
# Build a markdown section per AI prompt result, labelled by what it found
191+
# (blocking high/critical vs non-blocking warnings vs clean).
170192
for f in "${ai_results[@]}"; do
171193
prompt=$(jq -r '.prompt // "unknown"' "$f")
172194
model=$(jq -r '.model // "?"' "$f")
173-
tier=$(jq -r '.tier // 2' "$f")
174195
summary=$(jq -r '.summary // "(no summary)"' "$f")
196+
p_block=$(jq -r '[(.findings // [])[].severity // "" | ascii_downcase] | any(. == "high" or . == "critical")' "$f" 2>/dev/null || echo false)
197+
p_count=$(jq -r '(.findings // []) | length' "$f" 2>/dev/null || echo 0)
198+
p_tier=$(jq -r '.tier // 2' "$f")
175199
findings=$(jq -r '
176-
.findings // [] |
200+
(.findings // []) |
177201
if length == 0 then " _No findings._"
178202
else
179203
map(" - **\(.severity // "?")** `\(.file // "?"):\(.line // "?")` — \(.message // "")") | join("\n")
180204
end
181205
' "$f")
182-
case "$tier" in
183-
1) icon="" label="Tier 1 — looks clean" ;;
184-
2) icon="⚠️" label="Tier 2 — changes requested" ;;
185-
3) icon="🛑" label="Tier 3 — engineer review required" ;;
186-
*) icon="" label="Tier ?" ;;
187-
esac
206+
if [[ "$p_block" == "true" || "$p_tier" == "3" ]]; then
207+
icon="🛑" label="blocking — must fix before merge"
208+
elif (( p_count > 0 )); then
209+
icon="⚠️" label="non-blocking warnings"
210+
else
211+
icon="" label="clean"
212+
fi
188213
ai_findings_md+=$'\n'"#### $icon \`$prompt\` (\`$model\`) — $label"$'\n\n'
189214
ai_findings_md+="**Summary:** $summary"$'\n\n'
190215
ai_findings_md+="$findings"$'\n'
@@ -219,32 +244,30 @@ else
219244
fi
220245

221246
# --- AI verdict comment (always) ---
222-
case "$max_tier" in
223-
1)
224-
ai_header="### ✅ AI review — Approved"
225-
ai_intro="All prompts returned Tier 1. No blocking issues detected in this diff."
226-
;;
227-
2)
228-
ai_header="### ⚠️ AI review — Changes requested"
229-
ai_intro="One or more prompts found issues the author should fix before merging. Details below."
230-
;;
231-
3)
232-
mention=""
233-
if [[ -n "$TIER3_REVIEWERS" ]]; then
234-
IFS=',' read -ra handles <<< "$TIER3_REVIEWERS"
235-
for h in "${handles[@]}"; do
236-
h="$(echo "$h" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' -e 's/^@//')"
237-
[[ -n "$h" ]] && mention+="@$h "
238-
done
239-
fi
240-
ai_header="### 🛑 AI review — Engineer review required"
241-
ai_intro="This PR touches critical paths or introduces changes the model cannot judge with sufficient confidence. ${mention}please review."
242-
;;
243-
*)
244-
ai_header="### ❓ AI review — Unknown verdict"
245-
ai_intro="The approver could not determine an overall tier."
246-
;;
247-
esac
247+
if $no_ai; then
248+
ai_header="### ❓ AI review — could not run"
249+
ai_intro="No AI results were produced, so the merge is held for a human. Check the workflow logs."
250+
elif (( max_tier >= 3 )); then
251+
mention=""
252+
if [[ -n "$TIER3_REVIEWERS" ]]; then
253+
IFS=',' read -ra handles <<< "$TIER3_REVIEWERS"
254+
for h in "${handles[@]}"; do
255+
h="$(echo "$h" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' -e 's/^@//')"
256+
[[ -n "$h" ]] && mention+="@$h "
257+
done
258+
fi
259+
ai_header="### 🛑 AI review — Engineer review required"
260+
ai_intro="This PR touches critical paths or introduces changes the model cannot judge with sufficient confidence. ${mention}please review."
261+
elif $has_block_sev; then
262+
ai_header="### 🛑 AI review — Blocking issues"
263+
ai_intro="One or more high/critical issues can break things and must be fixed before merging. Details below."
264+
elif $has_any_findings; then
265+
ai_header="### ✅ AI review — Approved with warnings"
266+
ai_intro="Only minor (medium/low) issues were found. They won't block the merge, but consider addressing them."
267+
else
268+
ai_header="### ✅ AI review — Approved"
269+
ai_intro="No issues detected in this diff."
270+
fi
248271

249272
ai_body=$(cat <<EOF
250273
$ai_header
@@ -317,18 +340,18 @@ fi
317340
# =============================================================================
318341

319342
if [[ -n "$APPROVER_TOKEN" || "$DRY_RUN" == "1" ]]; then
320-
if ! $deps_failed && [[ "$max_tier" == "1" ]] && $authorized; then
343+
if ! $deps_failed && ! $ai_blocked && $authorized; then
321344
review_event="APPROVE"
322-
review_body="Approved by AI review (Tier 1, deps OK, authorized author)."
345+
review_body="Approved — no blocking issues, deps OK, authorized author. Any non-blocking warnings are listed above."
323346
elif ! $authorized; then
324347
review_event="REQUEST_CHANGES"
325348
review_body="Author is not in @${ORG}/${ADMIN_TEAM} nor @${ORG}/${CORE_TEAM} — admin review required."
326-
elif [[ "$max_tier" == "3" ]] || $deps_failed; then
349+
elif $deps_failed; then
327350
review_event="REQUEST_CHANGES"
328-
review_body="Changes requested — see approver comments above."
351+
review_body="Changes requested — Go dependencies check failed (see above)."
329352
else
330353
review_event="REQUEST_CHANGES"
331-
review_body="Changes requested by AI review (Tier 2)."
354+
review_body="Changes requested AI review found blocking issues (high/critical, or engineer review required). See above."
332355
fi
333356

334357
if [[ "$DRY_RUN" == "1" ]]; then
@@ -359,7 +382,7 @@ fi
359382
# stays pending.
360383
# =============================================================================
361384

362-
if ! $deps_failed && [[ "$max_tier" == "1" ]] && $authorized; then
385+
if ! $deps_failed && ! $ai_blocked && $authorized; then
363386
if [[ "$BASE_REF" == release/* ]]; then
364387
if [[ "$DRY_RUN" == "1" ]]; then
365388
echo "[DRY_RUN] Would enable auto-merge: gh pr merge $PR_NUMBER --auto --$MERGE_METHOD (base: $BASE_REF)"
@@ -383,10 +406,12 @@ fi
383406

384407
echo ""
385408
echo "Summary:"
386-
echo " deps_failed: $deps_failed"
387-
echo " max_tier: $max_tier"
388-
echo " authorized: $authorized"
389-
echo " base_ref: $BASE_REF"
409+
echo " deps_failed: $deps_failed"
410+
echo " max_tier: $max_tier"
411+
echo " has_block_sev: $has_block_sev"
412+
echo " ai_blocked: $ai_blocked"
413+
echo " authorized: $authorized"
414+
echo " base_ref: $BASE_REF"
390415

391416
if $deps_failed; then
392417
exit 1
@@ -396,7 +421,7 @@ if ! $authorized; then
396421
exit 1
397422
fi
398423

399-
if [[ "$max_tier" -ge 2 ]]; then
424+
if $ai_blocked; then
400425
exit 1
401426
fi
402427

0 commit comments

Comments
 (0)