feat[ci](pr-review): severity-based merge gate; exclude rules/filters/definitions from AI review

Kbayero · Kbayero · commit b6cad63100ce · 2026-06-15T16:26:22.000-04:00
diff --git a/.github/ai-prompts/README.md b/.github/ai-prompts/README.md
@@ -27,7 +27,7 @@ this exact shape (no markdown, no code fences, no extra text):
   "summary": "<one line, max 200 chars>",
   "findings": [
     {
-      "severity": "high" | "medium" | "low",
+      "severity": "critical" | "high" | "medium" | "low",
       "file": "<path>",
       "line": <int>,
       "message": "<description and mitigation>"
@@ -36,22 +36,33 @@ this exact shape (no markdown, no code fences, no extra text):
 }
 ```
 
+### Severity drives the merge gate
+
+The approver blocks the merge based on **severity**, not on how many findings
+there are. Pick the lowest severity that honestly fits — don't inflate a nit.
+
+- **`critical` / `high` → BLOCKING.** Something that can break: crashes, nil
+  dereferences, data loss/corruption, races/deadlocks, broken or unsafe DB
+  migrations, security holes, breaking API/proto/contract changes. These stop
+  auto-merge.
+- **`medium` / `low` → non-blocking WARNING.** Real but contained: missing
+  user feedback, inconsistent patterns, naming, typos in docs/strings, style.
+  Reported as warnings; the PR can still merge.
+
 ### Tier semantics
 
-- **Tier 1 — Approve.** The change is simple, doesn't touch critical logic,
-  no issues detected. The approver aggregates all tiers and, if every
-  prompt returns Tier 1, approves the PR.
-- **Tier 2 — Changes requested.** Minor issues the author must fix before
-  merging: typos, small bugs, out-of-context code, noticeable style
-  problems, incomplete mocks or tests.
-- **Tier 3 — Engineer review required.** The diff touches critical paths
-  (crypto, auth, DB migrations, installer, gRPC contracts, CI/CD, secret
-  handling) or introduces changes the model can't judge with sufficient
-  confidence. The approver blocks the merge and @mentions the senior
-  engineering team.
-
-The approver takes the **maximum tier** across all prompts: if security
-returns Tier 1 but architecture returns Tier 3, the final verdict is Tier 3.
+`tier` is a coarse signal. The gate uses severity for blocking, **plus** Tier 3:
+
+- **Tier 1** — fine to merge; no high/critical issues (minor warnings allowed).
+- **Tier 2** — at least one high-severity bug that should be fixed.
+- **Tier 3** — engineer review required / could break. Critical paths (crypto,
+  auth, DB migrations, installer, gRPC contracts, CI/CD, secret handling) or
+  changes the model can't judge confidently. Always blocks and @mentions the
+  team.
+
+**The merge is blocked if** any finding is `high`/`critical`, **or** any prompt
+returns Tier 3, **or** no review ran. Otherwise the approver approves the PR
+(any medium/low findings ride along as warnings).
 
 ### When there's nothing to report
 
@@ -60,10 +71,9 @@ Tier 1, a brief `summary` ("No security concerns detected.") and
 
 ### Unparseable responses
 
-If the model returns something that isn't valid JSON matching the schema,
-the approver treats it as **Tier 2** with a generic finding asking for
-manual review. Fail-safe behaviour — we'd rather block and ask for human
-review than let something pass without understanding it.
+If the model returns something that isn't valid JSON matching the schema, the
+approver treats it as a blocking `high` finding. Fail-safe behaviour — we'd
+rather hold for a human than let something pass without understanding it.
 
 ## Picking a model
 
diff --git a/.github/ai-prompts/bugs.md b/.github/ai-prompts/bugs.md
@@ -44,25 +44,31 @@ You are a senior code reviewer. Review the Pull Request diff looking for
 rest of the diff. Even in a 100-file PR dominated by backend changes, a
 single misspelling in a guide or a personal name in a customer-facing
 doc still warrants a finding — do not skip it because "the real work is
-elsewhere". When you find any of these, set tier to AT LEAST 2.
+elsewhere". Report these as `low`/`medium` (they're warnings, not blockers).
 
 **Ignore** preexisting issues on lines not touched by the diff.
 
-## How to assign tier
+## Severity (this is what blocks the merge)
 
-- **Tier 1** — No concrete bugs detected AND no user-facing string
-  anomalies (typos, internal references, contact info leaks). The change
-  looks correct.
-- **Tier 2** — Concrete but contained bugs the author must fix before
-  merging (off-by-one, error swallowing, unclosed resources,
-  out-of-context code). **Always Tier 2 minimum** if you find any
-  user-facing string anomaly: typos in docs/guides/messages, personal
-  names or internal handles in customer-facing content, internal URLs
-  or ticket IDs leaking into public docs.
-- **Tier 3** — A bug that may cause data corruption, deadlock, large-scale
-  leaks, or any issue whose impact the author shouldn't fix without a
-  second opinion. Also applies if the diff touches DB migrations, error
-  handling on transactional paths, or complex concurrency.
+Pick the lowest severity that honestly fits; don't inflate a nit.
+
+- **`critical` / `high` — blocking.** A bug that will actually break behavior:
+  nil/null deref, out-of-bounds, race/deadlock, goroutine/resource leak,
+  unhandled error on an important path, inverted logic, malformed query, a
+  migration that breaks existing data, out-of-context code that changes
+  behavior. Use `critical` for data corruption, deadlock, or large-scale leaks.
+- **`medium` / `low` — non-blocking warning.** Real but contained: missing
+  user feedback, inconsistent error-handling style, naming, typos in
+  docs/guides/messages, personal names or internal handles/URLs/ticket IDs in
+  customer-facing content.
+
+## Tier
+
+- **Tier 1** — no high/critical bugs (minor warnings are fine).
+- **Tier 2** — at least one high-severity bug to fix before merging.
+- **Tier 3** — could cause data corruption, deadlock, or large-scale leaks, or
+  the diff touches DB migrations, transactional error handling, or complex
+  concurrency and needs a second opinion.
 
 ## Output
 
@@ -73,7 +79,7 @@ Respond with valid JSON ONLY (no markdown, no backticks, no extra text):
   "tier": 1 | 2 | 3,
   "summary": "<one line, max 200 chars>",
   "findings": [
-    {"severity": "high"|"medium"|"low", "file": "<path>", "line": <n>, "message": "<description and how to reproduce>"}
+    {"severity": "critical"|"high"|"medium"|"low", "file": "<path>", "line": <n>, "message": "<description and how to reproduce>"}
   ]
 }
 ```
diff --git a/.github/scripts/ai-review.sh b/.github/scripts/ai-review.sh
@@ -47,10 +47,10 @@ write_fallback() {
             tier: 2,
             summary: "AI review could not parse model response — manual review recommended.",
             findings: [{
-                severity: "medium",
+                severity: "high",
                 file: "(n/a)",
                 line: 0,
-                message: $reason
+                message: ($reason + " (fail-safe: a review that cannot run is treated as blocking).")
             }]
         }' > "$OUTPUT_FILE"
     echo "::warning::Wrote fallback result: $reason"
@@ -83,6 +83,19 @@ MODEL="${prompt_model:-$DEFAULT_MODEL}"
 
 echo "::group::AI review — prompt: $prompt_name (model: $MODEL)"
 
+# --- Nothing to review -------------------------------------------------------
+# The diff can be empty after upstream filtering (e.g. a PR that only touches
+# excluded rules/filters/definitions paths). Pass as Tier 1 instead of calling
+# the model with an empty diff.
+if [[ ! -s "$DIFF_FILE" ]] || ! grep -q '[^[:space:]]' "$DIFF_FILE"; then
+    jq -n --arg prompt "$prompt_name" --arg model "$MODEL" \
+        '{prompt: $prompt, model: $model, tier: 1, summary: "No reviewable changes in this diff (excluded paths only).", findings: []}' \
+        > "$OUTPUT_FILE"
+    echo "Empty diff — wrote Tier 1 pass."
+    echo "::endgroup::"
+    exit 0
+fi
+
 # --- Build request body ------------------------------------------------------
 
 prompt_body=$(tail -n "+${body_start}" "$PROMPT_FILE")
diff --git a/.github/scripts/approver.sh b/.github/scripts/approver.sh
@@ -143,48 +143,73 @@ fi
 
 # =============================================================================
 # 2. AI verdict — read every ai-review-* artifact
+#
+# Gate policy (severity-based): only HIGH/CRITICAL findings — or an explicit
+# Tier 3 (critical path / needs a human), or a review that couldn't run —
+# block the merge. MEDIUM/LOW findings are surfaced as non-blocking warnings
+# and do NOT stop auto-merge.
 # =============================================================================
 
 declare -a ai_results=()
 declare -i max_tier=1
+has_block_sev=false      # any high/critical finding
+has_any_findings=false   # any finding at all (for warning vs clean wording)
 ai_findings_md=""
 
 shopt -s nullglob
 for d in "$ARTIFACTS_DIR"/ai-review-*/; do
     f="${d}result.json"
     [[ -f "$f" ]] || continue
     ai_results+=("$f")
+
     tier=$(jq -r '.tier // 2' "$f")
-    if (( tier > max_tier )); then
-        max_tier=$tier
+    (( tier > max_tier )) && max_tier=$tier
+
+    if jq -e '[(.findings // [])[].severity // "" | ascii_downcase] | any(. == "high" or . == "critical")' "$f" >/dev/null 2>&1; then
+        has_block_sev=true
+    fi
+    if jq -e '((.findings // []) | length) > 0' "$f" >/dev/null 2>&1; then
+        has_any_findings=true
     fi
 done
 shopt -u nullglob
 
+no_ai=false
 if [[ ${#ai_results[@]} -eq 0 ]]; then
-    echo "::warning::No AI review artifacts — treating as tier 2 fail-safe"
-    max_tier=2
+    echo "::warning::No AI review artifacts — fail-safe block"
+    no_ai=true
 fi
 
-# Build a markdown section per AI prompt result.
+# Final AI gate: block on a high/critical finding, an explicit Tier 3, or a
+# review that did not run. Tier 2 on its own (only medium/low) does NOT block.
+ai_blocked=false
+if $no_ai || $has_block_sev || (( max_tier >= 3 )); then
+    ai_blocked=true
+fi
+
+# Build a markdown section per AI prompt result, labelled by what it found
+# (blocking high/critical vs non-blocking warnings vs clean).
 for f in "${ai_results[@]}"; do
     prompt=$(jq -r '.prompt // "unknown"' "$f")
     model=$(jq -r '.model // "?"' "$f")
-    tier=$(jq -r '.tier // 2' "$f")
     summary=$(jq -r '.summary // "(no summary)"' "$f")
+    p_block=$(jq -r '[(.findings // [])[].severity // "" | ascii_downcase] | any(. == "high" or . == "critical")' "$f" 2>/dev/null || echo false)
+    p_count=$(jq -r '(.findings // []) | length' "$f" 2>/dev/null || echo 0)
+    p_tier=$(jq -r '.tier // 2' "$f")
     findings=$(jq -r '
-        .findings // [] |
+        (.findings // []) |
         if length == 0 then "  _No findings._"
         else
             map("  - **\(.severity // "?")** `\(.file // "?"):\(.line // "?")` — \(.message // "")") | join("\n")
         end
     ' "$f")
-    case "$tier" in
-        1) icon="✅" label="Tier 1 — looks clean" ;;
-        2) icon="⚠️" label="Tier 2 — changes requested" ;;
-        3) icon="🛑" label="Tier 3 — engineer review required" ;;
-        *) icon="❓" label="Tier ?" ;;
-    esac
+    if [[ "$p_block" == "true" || "$p_tier" == "3" ]]; then
+        icon="🛑" label="blocking — must fix before merge"
+    elif (( p_count > 0 )); then
+        icon="⚠️" label="non-blocking warnings"
+    else
+        icon="✅" label="clean"
+    fi
     ai_findings_md+=$'\n'"#### $icon \`$prompt\` (\`$model\`) — $label"$'\n\n'
     ai_findings_md+="**Summary:** $summary"$'\n\n'
     ai_findings_md+="$findings"$'\n'
@@ -219,32 +244,30 @@ else
 fi
 
 # --- AI verdict comment (always) ---
-case "$max_tier" in
-    1)
-        ai_header="### ✅ AI review — Approved"
-        ai_intro="All prompts returned Tier 1. No blocking issues detected in this diff."
-        ;;
-    2)
-        ai_header="### ⚠️ AI review — Changes requested"
-        ai_intro="One or more prompts found issues the author should fix before merging. Details below."
-        ;;
-    3)
-        mention=""
-        if [[ -n "$TIER3_REVIEWERS" ]]; then
-            IFS=',' read -ra handles <<< "$TIER3_REVIEWERS"
-            for h in "${handles[@]}"; do
-                h="$(echo "$h" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' -e 's/^@//')"
-                [[ -n "$h" ]] && mention+="@$h "
-            done
-        fi
-        ai_header="### 🛑 AI review — Engineer review required"
-        ai_intro="This PR touches critical paths or introduces changes the model cannot judge with sufficient confidence. ${mention}please review."
-        ;;
-    *)
-        ai_header="### ❓ AI review — Unknown verdict"
-        ai_intro="The approver could not determine an overall tier."
-        ;;
-esac
+if $no_ai; then
+    ai_header="### ❓ AI review — could not run"
+    ai_intro="No AI results were produced, so the merge is held for a human. Check the workflow logs."
+elif (( max_tier >= 3 )); then
+    mention=""
+    if [[ -n "$TIER3_REVIEWERS" ]]; then
+        IFS=',' read -ra handles <<< "$TIER3_REVIEWERS"
+        for h in "${handles[@]}"; do
+            h="$(echo "$h" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' -e 's/^@//')"
+            [[ -n "$h" ]] && mention+="@$h "
+        done
+    fi
+    ai_header="### 🛑 AI review — Engineer review required"
+    ai_intro="This PR touches critical paths or introduces changes the model cannot judge with sufficient confidence. ${mention}please review."
+elif $has_block_sev; then
+    ai_header="### 🛑 AI review — Blocking issues"
+    ai_intro="One or more high/critical issues can break things and must be fixed before merging. Details below."
+elif $has_any_findings; then
+    ai_header="### ✅ AI review — Approved with warnings"
+    ai_intro="Only minor (medium/low) issues were found. They won't block the merge, but consider addressing them."
+else
+    ai_header="### ✅ AI review — Approved"
+    ai_intro="No issues detected in this diff."
+fi
 
 ai_body=$(cat <<EOF
 $ai_header
@@ -317,18 +340,18 @@ fi
 # =============================================================================
 
 if [[ -n "$APPROVER_TOKEN" || "$DRY_RUN" == "1" ]]; then
-    if ! $deps_failed && [[ "$max_tier" == "1" ]] && $authorized; then
+    if ! $deps_failed && ! $ai_blocked && $authorized; then
         review_event="APPROVE"
-        review_body="Approved by AI review (Tier 1, deps OK, authorized author)."
+        review_body="Approved — no blocking issues, deps OK, authorized author. Any non-blocking warnings are listed above."
     elif ! $authorized; then
         review_event="REQUEST_CHANGES"
         review_body="Author is not in @${ORG}/${ADMIN_TEAM} nor @${ORG}/${CORE_TEAM} — admin review required."
-    elif [[ "$max_tier" == "3" ]] || $deps_failed; then
+    elif $deps_failed; then
         review_event="REQUEST_CHANGES"
-        review_body="Changes requested — see approver comments above."
+        review_body="Changes requested — Go dependencies check failed (see above)."
     else
         review_event="REQUEST_CHANGES"
-        review_body="Changes requested by AI review (Tier 2)."
+        review_body="Changes requested — AI review found blocking issues (high/critical, or engineer review required). See above."
     fi
 
     if [[ "$DRY_RUN" == "1" ]]; then
@@ -359,7 +382,7 @@ fi
 # stays pending.
 # =============================================================================
 
-if ! $deps_failed && [[ "$max_tier" == "1" ]] && $authorized; then
+if ! $deps_failed && ! $ai_blocked && $authorized; then
     if [[ "$BASE_REF" == release/* ]]; then
         if [[ "$DRY_RUN" == "1" ]]; then
             echo "[DRY_RUN] Would enable auto-merge: gh pr merge $PR_NUMBER --auto --$MERGE_METHOD (base: $BASE_REF)"
@@ -383,10 +406,12 @@ fi
 
 echo ""
 echo "Summary:"
-echo "  deps_failed: $deps_failed"
-echo "  max_tier:    $max_tier"
-echo "  authorized:  $authorized"
-echo "  base_ref:    $BASE_REF"
+echo "  deps_failed:    $deps_failed"
+echo "  max_tier:       $max_tier"
+echo "  has_block_sev:  $has_block_sev"
+echo "  ai_blocked:     $ai_blocked"
+echo "  authorized:     $authorized"
+echo "  base_ref:       $BASE_REF"
 
 if $deps_failed; then
     exit 1
@@ -396,7 +421,7 @@ if ! $authorized; then
     exit 1
 fi
 
-if [[ "$max_tier" -ge 2 ]]; then
+if $ai_blocked; then
     exit 1
 fi
 
diff --git a/.github/workflows/_pr-reusable-ai-review.yml b/.github/workflows/_pr-reusable-ai-review.yml