Skip to content

Commit 4cd57f7

Browse files
ci: triage_ci_failure pulls failure context from public GitHub API (#14210)
The sequencer repo is public, so most of the GitHub Actions REST API is reachable unauthenticated. Rework the triage_ci_failure skill to fetch failure context directly (run/job/check-run metadata, annotations, re-run history) instead of telling the user it can't see CI and asking them to paste logs. Key behaviors, validated live against real failures and via simulated claude.ai-web (no-auth) runs: - Check a pasted run isn't stale (Graphite force-pushes) before triaging it. - Detect "already green on re-run" and report flaky-but-not-blocking. - Use filter=all so a failing earlier attempt isn't hidden by the default. - Treat generic "exit code 1" annotations as no-signal; scope a hypothesis from the PR diff rather than going silent. - Only ask the user when the cause is genuinely behind auth-gated logs. Splits the endpoint catalog / pagination / rate-limit / tool-tier notes into references/github_api.md (progressive disclosure); SKILL.md is now the 79-line decision flow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a6c8403 commit 4cd57f7

2 files changed

Lines changed: 135 additions & 55 deletions

File tree

Lines changed: 45 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,79 @@
11
---
22
name: triage_ci_failure
3-
description: Triage CI failures, flaky tests, and broken builds in the sequencer mono-repo. Auto-invoke when a user mentions a failing CI job, flaky test, red check, or pastes a GitHub Actions URL — context (PR link, CI job link, base branch) must be gathered BEFORE any code investigation begins.
3+
description: Triage CI failures, flaky tests, and broken builds in the sequencer mono-repo. Use when a user mentions a failing CI job, flaky test, red check, or shares a GitHub Actions / PR URL — the skill pulls failure context directly from the public GitHub REST API so you can usually diagnose and report a verdict without asking the user any follow-up questions.
44
---
55

66
# Triage CI Failure
77

8-
When invoked (typically because someone tagged Claude in the mono-repo Slack channel about a CI failure or flaky test), follow this workflow to gather context before investigating.
8+
Usually invoked when someone tags Claude in the mono-repo Slack channel about a CI failure. The repo (`starkware-libs/sequencer`) is **public**, so most of the GitHub REST API is reachable without auth. Your goal: diagnose and report a verdict **without asking follow-up questions**. Pull what failed, on which step, with which annotation, on which attempts — then report. Only fall back to "please paste the logs" when the public API genuinely can't get you there; never ask to confirm context you can already fetch.
99

10-
## Step 1: Gather Required Context
10+
**Endpoint catalog, pagination, rate limits, and which tools to use in each environment live in `references/github_api.md`.** Read it when you need an exact URL; this file is the decision flow. Substitute `O=starkware-libs`, `R=sequencer` throughout.
1111

12-
Before starting any investigation, you MUST have the following information. Check if any of these are missing from the message or thread:
12+
## Step 1: Resolve the input, then check it isn't stale
1313

14-
### Required Information
14+
From the message, extract a PR URL, run URL (`/actions/runs/{run_id}`), job URL (`.../job/{job_id}`), check-run id, commit SHA, or branch name. Key fact: **`job.id == check_run.id`** — one numeric id bridges "I have a job link" and "I want its annotations." With only a branch name, list recent failed runs on it before asking the user anything.
1515

16-
| Item | Why Needed | Example |
17-
|------|------------|---------|
18-
| **PR link** or **branch name** | To understand what code is being tested | `https://github.com/starkware-libs/sequencer/pull/12345` or `feature/my-branch` |
19-
| **Failed CI job link** | To get a `details_url` you can open and ask the user to paste relevant log lines from | `https://github.com/starkware-libs/sequencer/actions/runs/123456/job/789` |
20-
| **Base branch** | The branch this PR targets — check `scripts/parent_branch.txt` for the default, don't assume `main` | `main`, `release/v1.2`, `feature/epic-branch` |
21-
| **Is this a new failure or flaky?** | Determines investigation approach | "Started failing today" vs "Fails ~10% of runs" |
16+
**Triage the pasted link — it's usually accurate for the failure they want explained.** Diagnose that run/job even if the user re-ran it afterward (a common flow: paste a link, then re-run assuming it's flaky). As a *complementary* check, note whether the run is stale: this repo uses Graphite stacks, so a run's `head_sha` can lag the PR's current `head.sha` (`GET /pulls/{pr}`). If they differ, also report the current head's status — so a "merge-gatekeeper noise, harmless" verdict on an old SHA doesn't mask a genuinely red live head. Add that as context; don't discard the pasted run in favor of the current head.
2217

23-
### Nice to Have
18+
## Step 2: Did it already go green on a re-run?
2419

25-
- Error message snippet (the available GitHub MCP tools only expose check-run metadata, not raw Actions log output, so a pasted snippet often unblocks the fastest investigation)
26-
- Whether this was working before a recent rebase
27-
- Related PRs or recent merges that might have caused regression
20+
`GET /actions/runs/{run_id}` → check `run_attempt`, `previous_attempt_url`, `conclusion`. If it was re-run (`run_attempt > 1` or `previous_attempt_url` set) **and** the latest `conclusion` is `success`, the workflow already passed on retry — **lead with "the PR isn't blocked."**
2821

29-
---
30-
31-
## Step 2: If Missing Information, Ask First
22+
But don't wave it away: a fail-then-pass with no code change is a **flaky test**, a real signal worth understanding. So still:
23+
1. Find the flaky job/step — `GET /actions/runs/{run_id}/jobs?filter=all` (the `filter=all` matters; the default hides the failed earlier attempt). Pull that job's annotations.
24+
2. Judge whether it's a known flake (see Step 4's flakiness note).
25+
3. Report a recommendation, e.g. *"Passed on re-run (attempt 2), PR not blocked. Attempt-1 failure was `run-integration-tests` — flaky; worth a tracking issue rather than per-PR re-runs."*
3226

33-
If ANY required information is missing, reply in the thread (Slack or PR comment, wherever you were invoked) asking for it. Do NOT start investigating with incomplete context.
27+
Corollary trap: a pasted *job* link can point at a failed earlier attempt while the run is now green. Reconcile the job's `run_attempt` against the run's current one (via `filter=all`) before calling anything broken.
3428

35-
**Template response:**
29+
**Diagnose the sporadic failure either way.** A green-on-latest run doesn't end the triage — if a failure happened (even once, even already re-run away), root-cause it via Steps 3–4. Continue below regardless; the only thing the green status changes is the "is the PR blocked?" answer.
3630

37-
> To investigate this properly, I need a bit more context:
38-
>
39-
> - [ ] **PR/Branch**: Which PR or branch is failing? (link preferred)
40-
> - [ ] **CI Job**: Link to the failed job and, if convenient, paste the relevant error lines
41-
> - [ ] **Base branch**: What branch is this targeting? (don't assume main)
42-
> - [ ] **Failure pattern**: Is this a new failure or has it been flaky?
43-
>
44-
> Once I have these, I'll dig in!
31+
## Step 3: Fast path — PR to root cause in a few calls
4532

46-
Adapt this based on what's already provided — only ask for what's missing.
33+
1. `GET /pulls/{pr}``head.sha`, `base.ref`
34+
2. `GET /commits/{head.sha}/check-runs?filter=all&per_page=100` → every check-run at that SHA
35+
3. Keep `conclusion in ('failure','timed_out')`; **skip `cancelled`/`skipped`/`neutral`** — a `cancelled` job usually means a sibling failed first, so the cause is elsewhere
36+
4. For each failing check, `GET /check-runs/{check_id}/annotations` → the inline error (file + line + text) is usually all you need
4737

48-
---
38+
**merge-gatekeeper / merge-gatekeeper-new** failing alone is a downstream alarm — something else failed first. Look at sibling check-runs at the same SHA or the previous attempt. Second mode: gatekeeper also fails by **timing out** waiting on a required check that never reached `success` (e.g. a `cancelled` sibling) — then there's *no* failed sibling at this SHA; the real red is usually on a newer SHA, i.e. the pasted run is stale (Step 1).
4939

50-
## Step 3: Verify the Context
40+
## Step 4: When annotations aren't enough
5141

52-
Once you have the required information:
42+
Annotations are the primary signal, but for test/build jobs they're often just `"Process completed with exit code 1"`. **Treat a generic exit-code annotation (or an empty one, or null `output.*`) as no signal**the real assertion/panic is only in the raw step log, which needs auth (`/logs` is 403 unauthenticated; reach it via `gh run view --job {id} --log-failed` when authed — see `references/github_api.md`).
5343

54-
1. **Open the PR** — use `mcp__github__pull_request_read` with `method=get` to confirm the base branch, changed files, and any existing review comments
55-
2. **Inspect the failed check** — use `method=get_check_runs` for status/conclusion and the `details_url`; for raw Actions logs you'll need the user to paste them (no MCP tool returns them directly)
56-
3. **Check if known flaky** — search CLAUDE.md "Common Gotchas" and recent Slack history for known flaky tests
57-
4. **Determine scope** — is this related to the PR's changes, or a pre-existing/infrastructure issue?
44+
When logs are unreachable, **don't go silent — narrow it from the diff.** Pull `pulls/{pr}/files`; if the failing job is `run-tests` and the PR edits `crates/foo/.../bar_test.rs` or a fixture, report a *scoped hypothesis* ("likely a `foo` test or stale fixture from this rename") plus the one confirming command. That beats both a bare "can't see logs" and a fabricated test name.
5845

59-
---
46+
**Flakiness check:** to tell flaky from newly-broken, see whether the same job fails in unrelated runs. Note many jobs here (`run-integration-tests`, `run-tests`) run only on `pull_request`, never `push` to `main` — so `branch=main&status=failure` won't show them and you'd wrongly conclude "not a known flake." For those, judge by (a) whether this run went green on re-run (Step 2, strongest signal) and (b) scanning recent failed runs of the same workflow across other PRs. Say which signal you used.
6047

61-
## Step 4: Investigate and Report
48+
## Step 5: Report — ask only when genuinely blocked
6249

63-
Only after completing steps 1-3, begin your investigation:
50+
You usually have enough to classify the failure yourself. Report directly; don't tack on reflexive questions — every needless "is this flaky for you?" trains the user to expect noise. Answer these yourself rather than asking:
51+
- **New or flaky?** → flakiness check above.
52+
- **Caused by this PR?** → diff `pulls/{pr}/files` against the failing crate/test path.
53+
- **Known pattern?** → see Common patterns below.
6454

65-
1. **If it's a code issue in the PR**: identify the root cause, propose a fix
66-
2. **If it's a known flaky test**: link to prior discussions, explain the flakiness pattern
67-
3. **If it's infrastructure/transient**: suggest a re-run and explain why
68-
4. **If unclear**: share what you found and what you'd need to dig deeper
55+
Ask the user *only* when a tool genuinely can't close the gap:
56+
- annotation empty/generic AND `output.text` empty AND no `gh`/MCP raw-log access → ask for a paste;
57+
- the cause hinges on something only they know (e.g. "did your last rebase pick up commit X?") → ask that.
6958

70-
Always report back in the thread with:
71-
- What you found
72-
- Whether action is needed
73-
- Proposed next steps (if any)
59+
Otherwise don't ask — report and move on.
7460

75-
---
61+
## Step 6: Classify and report
7662

77-
## Step 5: Commit and Push
63+
1. **Code issue in the PR** — name the file/line, propose a fix
64+
2. **Known flaky test** — link prior discussion, suggest re-run
65+
3. **Infrastructure / transient** (network, action-download, GCloud) — suggest re-run, explain why
66+
4. **Pre-existing on the base branch** — call it out; the PR didn't cause it
7867

79-
When fixing the issue, create one commit per PR.
68+
State what you found, whether action is needed, and the next step.
8069

81-
---
70+
## Step 7: Fix only if asked
8271

83-
## Common Patterns in This Repo
72+
Apply a fix and commit **only** if the user explicitly asks. A triage request isn't an implicit "go patch it." Commit convention: `scope: subject` (no `feat:`/`fix:` prefix), one commit per PR.
8473

85-
From CLAUDE.md — these failures are often NOT code bugs:
74+
## Common patterns in this repo
8675

8776
- `blockifier_reexecution` — transient GCloud network issues; suggest re-run
88-
- `merge-gatekeeper` / `merge-gatekeeper-new` — downstream failures (other checks failed first)
89-
- Formatting failures — run `scripts/rust_fmt.sh` (uses pinned nightly toolchain), NOT `cargo fmt` directly
77+
- `merge-gatekeeper` / `merge-gatekeeper-new` — downstream/timeout failure; find the upstream cause (Step 3)
78+
- Formatting failures — run `scripts/rust_fmt.sh` (pinned nightly), NOT `cargo fmt` directly
79+
- Action-download failures from `codeload.github.com` (404/503) — GitHub-side flake; re-run
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# GitHub REST API reference for CI triage
2+
3+
All endpoints below are reachable **without authentication** on `starkware-libs/sequencer` (public repo) unless noted. Substitute `O=starkware-libs`, `R=sequencer`. Use `WebFetch` (claude.ai web), `curl`, or `mcp__github__*` tools.
4+
5+
## Contents
6+
- [Around the PR](#around-the-pr)
7+
- [Around the commit](#around-the-commit)
8+
- [Around the workflow run](#around-the-workflow-run)
9+
- [Around the check-run](#around-the-check-run)
10+
- [Pagination](#pagination)
11+
- [filter=latest vs filter=all](#filterlatest-vs-filterall)
12+
- [Rate limit](#rate-limit)
13+
- [Where raw logs live](#where-raw-logs-live)
14+
- [Tool-tier cheat sheet](#tool-tier-cheat-sheet)
15+
- [Patterns to recognize](#patterns-to-recognize)
16+
17+
## Around the PR
18+
19+
| Endpoint | What you learn |
20+
|---|---|
21+
| `GET /repos/{O}/{R}/pulls/{pr}` | `head.sha`, `head.ref`, `base.ref`, mergeable/draft state |
22+
| `GET /repos/{O}/{R}/pulls/{pr}/files?per_page=100` | All changed files — judge whether the failing test/crate is plausibly affected by this PR |
23+
| `GET /repos/{O}/{R}/issues/{pr}/comments?per_page=100` | Earlier triage discussion, prior re-run requests |
24+
| `GET /repos/{O}/{R}/issues/{pr}/events?per_page=100` | Re-runs, force-pushes, base-branch changes |
25+
26+
Also check `scripts/parent_branch.txt` in the checked-out repo: the base branch isn't always `main` (stacked PRs target feature branches), and assuming `main` misleads your "is this on the base branch too?" check.
27+
28+
## Around the commit
29+
30+
| Endpoint | What you learn |
31+
|---|---|
32+
| `GET /repos/{O}/{R}/commits/{sha}` | Author, message, files touched |
33+
| `GET /repos/{O}/{R}/commits/{sha}/check-runs?filter=all&per_page=100` | Every check-run at this SHA, including re-runs |
34+
| `GET /repos/{O}/{R}/commits/{sha}/status` | Legacy combined-status checks (some third-party integrations live here, not under `/check-runs`) |
35+
| `GET /repos/{O}/{R}/actions/runs?head_sha={sha}&per_page=100` | All workflow runs triggered by this commit |
36+
37+
## Around the workflow run
38+
39+
| Endpoint | What you learn |
40+
|---|---|
41+
| `GET /repos/{O}/{R}/actions/runs/{run_id}` | `name`, `conclusion`, `head_sha`, `head_branch`, `run_attempt`, `previous_attempt_url` |
42+
| `GET /repos/{O}/{R}/actions/runs/{run_id}/jobs?filter=all&per_page=100` | Every job + each step's name and conclusion → which *step* failed without log access. **Always pass `filter=all`**: the default returns only the latest attempt, so on a re-run the original failing job is hidden and you'll see all-green. |
43+
| `GET /repos/{O}/{R}/actions/runs/{run_id}/attempts/{n}/jobs?per_page=100` | Jobs from a specific prior attempt — compare across re-runs |
44+
| `GET /repos/{O}/{R}/actions/runs/{run_id}/artifacts?per_page=100` | Artifact names + URLs (artifact *download* needs auth) |
45+
| `GET /repos/{O}/{R}/actions/runs/{run_id}/timing` | Billable time per job — for "CI got slower" triage |
46+
| `GET /repos/{O}/{R}/actions/jobs/{job_id}` | Single job's status + `check_run_url` (when you have a job id but not a check id) |
47+
48+
## Around the check-run
49+
50+
`job.id == check_run.id` in GitHub Actions — the same numeric id works in both the Actions and Checks APIs.
51+
52+
| Endpoint | Notes |
53+
|---|---|
54+
| `GET /repos/{O}/{R}/check-runs/{check_id}/annotations?per_page=100` | **Primary failure signal.** Returns a bare JSON array (no `total_count`); paginate via `?page=N` + the `Link` header. |
55+
| `GET /repos/{O}/{R}/check-runs/{check_id}` | `output.title`/`summary`/`text` — useful when set, but `null` on most Actions check-runs. Treat as fallback. |
56+
57+
## Pagination
58+
59+
Default page size is 30; pass `per_page=100`. Most list endpoints return `{"total_count": N, "<items>": [...]}` — paginate until you've covered `total_count`. The annotations endpoint is the exception: it returns a bare array, so use `?page=N` + the `Link` header.
60+
61+
## filter=latest vs filter=all
62+
63+
On `/commits/{sha}/check-runs` and `/actions/runs/{id}/jobs`, `filter=latest` (the default) collapses re-runs to the latest attempt. Use `filter=all` to see re-run history and to catch a failing earlier attempt that the default hides — essential for flakiness diagnosis and for job links that point at an old attempt.
64+
65+
## Rate limit
66+
67+
Unauthenticated requests share a **60-per-hour-per-IP** quota. A thorough triage (PR + files + comments + check-runs + jobs + annotations + history) can run into it. If authed (`gh` locally or GitHub MCP token), you get 5000/hr — prefer that. If unauth, prioritize the few high-signal calls (re-run check + fast path) and only pull wider context when needed.
68+
69+
## Where raw logs live
70+
71+
`GET /actions/jobs/{job_id}/logs` and `GET /actions/runs/{run_id}/logs` return **403 without auth**. Raw log text is only reachable via:
72+
73+
1. **`gh` CLI, authed**`gh run view --repo {O}/{R} --job {job_id} --log-failed` (add `--attempt {N}` for a specific attempt). Works on this public repo when `gh auth status` is logged in.
74+
2. **`mcp__github__*`** — exposes check-run metadata, not raw Actions logs (verify in-session; the surface evolves).
75+
3. **Ask the user to paste** — last resort.
76+
77+
## Tool-tier cheat sheet
78+
79+
| Environment | Primary fetch path | Raw logs? |
80+
|---|---|---|
81+
| Claude.ai web (no `gh`, no GitHub MCP) | `WebFetch` against the unauth endpoints above | No — ask user to paste if annotations didn't cover it |
82+
| Claude Code locally with `gh` authed | `gh` CLI for run/job/logs; `WebFetch`/`mcp__github__*` for the rest | Yes — `gh run view --log-failed` |
83+
| Claude Code, GitHub MCP only (no `gh`) | `mcp__github__*` for PR/check metadata; `WebFetch` for annotations + run/job lists | Verify if your MCP exposes Actions logs; if not, ask |
84+
85+
The metadata endpoints work in all three — they're the portable baseline.
86+
87+
## Patterns to recognize
88+
89+
- `head_branch` like `gh-readonly-queue/main/pr-NNNN-...` → a merge-queue run, not a regular PR run. The PR number is in the branch name.
90+
- `run_attempt > 1` or a non-null `previous_attempt_url` → someone re-ran it; comparing attempts is a quick flakiness check.

0 commit comments

Comments
 (0)