Skip to content

Commit 1d3c2d5

Browse files
feat(batch-bug-shepherd): recommendation-fold loop, Copilot+CI gates, mergeability table
Refactor the batch-bug-shepherd skill into a single shepherd-driver convergence loop that closes four production gaps surfaced by the in-flight bug-queue sweep: 1. Recommendation-fold loop. Every panel CEO follow-up and Copilot inline review item is run through assets/fold-vs-defer-rubric.md and folded unless it crosses the PR's stated scope. Default is fold; defer is the scope-creep exception with a one-line scope_boundary_crossed note. 2. Copilot PR review address loop. Phase X.0 fetches copilot-pull-request-reviewer[bot] review per assets/copilot-classification-prompt.md, classifies each item LEGIT/NOT-LEGIT, and folds LEGIT into the same iteration. 2-round cap on Copilot fetches. 3. Post-push CI verification loop. gh pr checks --watch after every push, with assets/ci-recovery-checklist.md bucketing failures (lint / test / infra / unknown) under a 3-iteration cap. 4. Orchestrator ownership signal. Assigns the shepherd actor and applies status/shepherding on pickup; the label is cleared on terminal. New asset assets/shepherd-driver-prompt.md replaces the old shepherd-prompt / completion-prompt split. New supporting assets: fold-vs-defer-rubric.md, copilot-classification-prompt.md, ci-recovery-checklist.md, strategic-alignment-prompt.md, conflict-resolution-prompt.md, progress-diagram.md. New references/ directory with mergeability-gate.md and strategic-alignment-gate.md. Genesis design record in design.md. Mergeability status table (new in this commit). Shepherd-driver step X.8 captures a per-PR mergeability snapshot via gh pr view <n> --json mergeable,mergeStateStatus,statusCheckRollup immediately after the last push. The snapshot lands as a one-row table in the PR advisory comment (final-report-template.md PR ADVISORY COMMENT block) and is aggregated by the orchestrator at saga-end into a Mergeability status table in the FINAL REPORT block (PR, head SHA, CEO stance, outer iterations, folds, deferrals, Copilot rounds, CI status, mergeable, mergeStateStatus, notes). verdict-schema.json grows four optional completion-return fields: head_sha, mergeable, merge_state_status, ci_status. Validated on the wave-2 shepherd run that drove PRs #1472, #1512, #1513, #1514, #1515, #1516 to advisory-terminal. PR #1514 hit 4 outer iterations with 11 folds + 1 deferral, exercising the fold-by-default discipline at the cap. CHANGELOG entry under [Unreleased] / Added. Lint notes: this commit touches NO Python (.agents/ skill files are markdown + JSON + CHANGELOG markdown). The only applicable lint gates are the ASCII guard and bash scripts/lint-auth-signals.sh, both silent. ruff / pylint / ruff format are skipped per .apm/instructions/linting.instructions.md scope (src/ tests/ only). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 47291a8 commit 1d3c2d5

18 files changed

Lines changed: 2452 additions & 357 deletions

.agents/skills/batch-bug-shepherd/SKILL.md

Lines changed: 310 additions & 206 deletions
Large diffs are not rendered by default.
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# CI recovery checklist
2+
3+
Consumed by: `assets/shepherd-driver-prompt.md` (after every push),
4+
`assets/fix-prompt.md` (when a greenfield-fix PR turns red on first
5+
CI run).
6+
7+
Every push from a shepherd-driver or fix subagent MUST be followed
8+
by CI observation. A push that is not observed green is not a
9+
landing candidate.
10+
11+
ASCII only.
12+
13+
## Watch contract
14+
15+
```
16+
gh pr checks <PR> --repo microsoft/apm --watch
17+
```
18+
19+
`--watch` blocks until the check set is conclusive. If `--watch` is
20+
unavailable in the runtime's `gh` version, fall back to polling:
21+
22+
```
23+
while true; do
24+
out=$(gh pr checks <PR> --repo microsoft/apm)
25+
echo "$out"
26+
echo "$out" | grep -qE '(pending|queued|in_progress|running)' || break
27+
sleep 30
28+
done
29+
```
30+
31+
Settle on one of: ALL GREEN, ANY FAIL, ANY CANCELLED.
32+
33+
## On ALL GREEN
34+
35+
Proceed to the next step in the shepherd-driver loop (Copilot
36+
re-fetch + panel re-run, or final advisory if convergence reached).
37+
Record the green check summary in `ci_evidence`.
38+
39+
## On ANY FAIL or CANCELLED
40+
41+
For each failing check:
42+
43+
```
44+
gh run view <run-id> --repo microsoft/apm --log-failed
45+
```
46+
47+
Classify the failure into one of four buckets:
48+
49+
### Bucket 1 -- lint failure
50+
51+
Symptom: `ruff check` or `ruff format --check` non-silent; pylint
52+
R0801 fires on a duplication threshold; one of the repo's grep
53+
guards (YAML I/O, file length, `relative_to`, auth-signals) fires.
54+
55+
Recovery:
56+
1. Re-run the CI-mirror chain LOCALLY per `.apm/instructions/
57+
linting.instructions.md`.
58+
2. Auto-fix: `uv run --extra dev ruff check src/ tests/ --fix` and
59+
`uv run --extra dev ruff format src/ tests/`.
60+
3. Re-run the full chain silent.
61+
4. Commit (one commit per logical fix; ASCII commit message; include
62+
`Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>`).
63+
5. Push. Re-enter watch.
64+
65+
### Bucket 2 -- test failure
66+
67+
Symptom: pytest red in the failing job log.
68+
69+
Recovery:
70+
1. Reproduce the failing test locally: `uv run --extra dev pytest -xvs <node-id>`.
71+
2. Read the trace, identify root cause. If the test is asserting on
72+
new behavior this PR introduces, fix the production code; if the
73+
test was a pre-existing flake on the test, fix the test only with
74+
a clear comment.
75+
3. Re-run the test until green.
76+
4. Re-run the broader suite for touched modules.
77+
5. Lint chain silent.
78+
6. Commit + push + re-enter watch.
79+
80+
### Bucket 3 -- CI infra hiccup (transient)
81+
82+
Symptoms: network timeout fetching dependencies, runner pre-empted,
83+
GitHub Actions service disruption, dependency mirror 5xx, action
84+
checkout failure unrelated to the diff. Same job passed minutes ago
85+
on a parent commit.
86+
87+
Recovery:
88+
1. `gh run rerun <run-id> --failed --repo microsoft/apm`.
89+
2. Watch again.
90+
3. Each run-id gets at most ONE re-run. A second failure on the
91+
same job ID is no longer treated as transient -- escalate to
92+
Bucket 4.
93+
94+
### Bucket 4 -- persistent unknown failure
95+
96+
Symptom: failure does not match buckets 1-3; same job fails twice;
97+
diff doesn't obviously explain the failure.
98+
99+
Recovery:
100+
1. Record the failing job name, the run-id URL, and a 30-line
101+
excerpt of the failing log in the shepherd-driver scratch
102+
context.
103+
2. If the PR's iteration counter for CI recovery is below 3, try
104+
ONE more fix attempt (e.g. revert the most recent suspect
105+
commit; re-run). If it succeeds, record both the symptom and the
106+
fix.
107+
3. If the CI recovery iteration counter hits 3, STOP. Return
108+
`status: blocked` with the failing job + log excerpt in the
109+
`blocker` field. Remove `status/shepherding` label. The advisory
110+
comment names the failing job and points the maintainer at the
111+
run URL.
112+
113+
## Iteration cap
114+
115+
**Hard cap: 3 CI fix iterations per shepherd-driver run.** Beyond
116+
that the loop terminates with `status: blocked`. The cap covers all
117+
buckets combined (a sequence of lint-then-test-then-infra counts as
118+
three).
119+
120+
## What flows back
121+
122+
The shepherd-driver records in its return:
123+
124+
```json
125+
{
126+
"ci_iterations": 0..3,
127+
"ci_evidence": "URL of the final green run, or summary of the
128+
failing job for blocked status"
129+
}
130+
```

.agents/skills/batch-bug-shepherd/assets/completion-prompt.md

Lines changed: 0 additions & 91 deletions
This file was deleted.

0 commit comments

Comments
 (0)