Skip to content

Commit 1d203b1

Browse files
feat(agentic-ci): decision-ready triage and daily PR fixes (#600)
* feat(agentic-ci): decision-ready triage and daily PR fixes Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health. * fix(agentic-ci): address review findings before merge - Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts. * fix(agentic-ci): close residuals from review pass 2 - Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate | jq | length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field). * fix(agentic-ci): identity-based partial-post detection in triage fallback Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other. * fix(agentic-ci): close remaining bot-review findings - Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step. * fix(agentic-ci): close Greptile pass-2 findings (timeout, re-verify wording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden. * fix(agentic-ci): close Greptile pass-3 P1s in triage fallback - Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted. * fix(agentic-ci): close johnnygreco review-pass warnings Address the five Warnings from the 2026-05-07 review focused on the trust boundary for autonomous PR generation. Five workflow/policy adjustments shrink the surface where agent compliance is load-bearing: - Workflow-level scope gate. After the fix step, re-derive the diff against `origin/main` and validate against the per-suite path allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and the 3-file cap. On violation, close the PR with `--delete-branch` and flip the `attempted_fixes` entry from `open` to `abandoned` so two-strike logic still sees the failure. The recipe alone could not bind the agent's path choices; the workflow now does. - Dependencies install-dev verification. For the dependencies suite only, re-run `make install-dev` after the scope gate so the agent's pyproject edit is exercised against the lockfile resolver. Closes the PR if `install-dev` fails — catches the failure mode where the per-package test target passed against the old cached lockfile. - Flip matrix-job `cancel-in-progress` from true to false. A cancellation between the agent's git push and `gh pr create` would leave an orphaned branch with no `attempted_fixes` record; reconciliation only covers PRs that were opened. Queueing a duplicate run is the lesser evil. `_fix-policy.md` Atomicity section now documents the trade-off. - Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not modify outside `{{memory_path}}/`" directive. A literal-minded agent could refuse to write the report file, which would break the job summary, artifact upload, and the fix phase's audit context. - Always upload the agent log artifact (was `if: failure()` only) and include `runner-state.json`. For autonomous mode, the most interesting failure is "the workflow succeeded but the PR was wrong"; the stream-json log is the only way to look back days later. Also takes johnnygreco's Suggestion 2: spell out in the policy doc that the `draft_until_proven` flip is the sole human-gated promotion step in the fix policy and must not be automated. Greptile and the github-actions auto-reviewer's findings were already closed in the prior pass-2/pass-3 commits; no action needed on those. * fix(agentic-ci): close Codex review-pass-2 findings on workflow gates Codex flagged five issues in the prior commit's scope/lockfile gates. This commit closes all five: - HIGH: Wrong-PR targeting. Both gates selected the last globally-open attempted_fixes entry, which could match a stale orphan from a prior crashed run rather than the PR opened by *this* run. Adds a pre-fix snapshot step that captures `(id, attempts-length)` pairs before the fix runs, and changes the post-fix selectors to require that the entry's attempts count grew during this run. - HIGH: Docstring-only enforcement gap on the docs-and-references suite. The .py path allowlist was at workflow level but the docstring-only caveat was still policy-only. Adds an AST-based check: for each .py file changed, parse the post-change tree, collect docstring line ranges (module/class/function), then verify every added line in the diff is either inside a docstring, a comment, or whitespace. Verified locally with both pass and fail fixtures. - MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather than `origin/main...origin/$BRANCH`, so a misbehaving agent that left HEAD pointing elsewhere would have validated the wrong tree. Now fetches `origin/$BRANCH` first and prefers that ref. Falls back to HEAD only if fetch fails (with a warning). - MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on empty diff, breaking the downstream integer comparison. Replaces with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct for any input including empty. - LOW: Non-atomic JSON writes. The runner-state mutations could leave the file half-written if the workflow was cancelled mid-write. Switches both gates to the temp-file + os.replace pattern. Also: dependencies-lockfile gate now does an explicit `git checkout --detach origin/$BRANCH` before re-running install-dev, so verification runs against what was actually pushed rather than relying on local working-tree state. * fix(agentic-ci): gate fix + scope_gate steps on snapshot.outcome Greptile review on 872d561 flagged that the fix step's custom `if:` expression bypasses GitHub Actions' implicit success() check. Without explicitly referencing steps.snapshot.outcome, a snapshot failure (corrupt runner-state, disk error) would let the fix step run anyway. The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json` would then exit non-zero on the missing file, leave OPEN empty, and hit the "nothing to validate" early-exit — silently approving whatever the agent pushed. Adds steps.snapshot.outcome == 'success' to both the fix step's condition (the actual fix) and the scope_gate step's condition (belt-and-suspenders against future refactors). * fix(agentic-ci): harden daily fix gates Signed-off-by: Andre Manoel <amanoel@nvidia.com> * fix(agentic-ci): validate all grown fix attempts * fix(agentic-ci): harden post-fix gates --------- Signed-off-by: Andre Manoel <amanoel@nvidia.com>
1 parent 46dc8b2 commit 1d203b1

12 files changed

Lines changed: 1187 additions & 153 deletions

File tree

.agents/recipes/_fix-policy.md

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# Agentic CI Fix Policy
2+
3+
Prepended to every daily-suite recipe alongside `_runner.md`. Defines what
4+
"open a PR" means for these recipes and the rules that apply across all of
5+
them. Each suite recipe declares only its eligible finding categories, its
6+
branch types, and any risk-specific notes — everything else is here.
7+
8+
When in doubt, fall back to report-only.
9+
10+
## Localized fix bar
11+
12+
A finding may be converted to a fix only if all hold:
13+
14+
- **Bounded scope**: ≤3 files, ≤50 LOC net.
15+
- **Reversible**: no public API changes, no `__all__` deletions, no version
16+
bumps (Dependabot owns those), no schema changes, no migrations.
17+
- **Self-evident**: the audit established both the problem *and* the unique
18+
correct fix. Mechanical, not interpretive.
19+
- **Test-safe**: when the recipe declares `test_required`, run the
20+
per-package test target for the affected package and abort on failure.
21+
Mapping (the Makefile does not expose `test-<package>` directly):
22+
23+
| Package directory | Test target |
24+
|-------------------|-------------|
25+
| `packages/data-designer-config` | `make test-config` |
26+
| `packages/data-designer-engine` | `make test-engine` |
27+
| `packages/data-designer` | `make test-interface` |
28+
- **Single concern**: one finding per PR.
29+
- **Allowlisted paths**: matches the suite's path allowlist.
30+
31+
If the top-ranked candidate fails the bar, try the next. If none of the top
32+
5 qualify, skip the fix step and emit report-only.
33+
34+
## Allowlists
35+
36+
### Per-suite path allowlist
37+
38+
| Suite | Paths the recipe MAY modify |
39+
|-------|-----------------------------|
40+
| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) |
41+
| dependencies | `packages/*/pyproject.toml` |
42+
| structure | `packages/*/src/**/*.py` |
43+
| code-quality | `packages/*/src/**/*.py` |
44+
| test-health | (no fix phase) |
45+
46+
### Shared forbidden paths (all suites)
47+
48+
- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`,
49+
`.git/**`, anything in `.gitignore`.
50+
51+
### Shared forbidden commands
52+
53+
- `git push --force` (any variant), `git rebase`, `git reset --hard`,
54+
`git branch -D`/`-d`/`--delete`.
55+
- `gh pr merge`, `gh pr close`, `gh pr review`.
56+
- `pip install`, `uv pip install` (use `make install-dev` only).
57+
58+
## Runner-state schema
59+
60+
Each daily recipe maintains two arrays in
61+
`{{memory_path}}/runner-state.json` beyond the existing `known_issues` /
62+
`baselines`:
63+
64+
```json
65+
{
66+
"fix_backlog": [
67+
{ "id": "<hash>", "category": "...", "first_seen": "YYYY-MM-DD",
68+
"last_seen": "YYYY-MM-DD", "data": { /* category fields */ } }
69+
],
70+
"attempted_fixes": [
71+
{ "id": "<hash>", "attempts": [
72+
{ "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD",
73+
"branch": "agentic-ci/..." }
74+
] }
75+
]
76+
}
77+
```
78+
79+
Also: `draft_until_proven` (boolean, per-suite, default `true` for
80+
code-quality and unset elsewhere) controls draft-PR mode.
81+
82+
### `fix_backlog` rules (audit phase populates this)
83+
84+
- Append every detected finding in an eligible category. If `id` is already
85+
present, **refresh both `last_seen` and `data`** with the current scan's
86+
values. The `data` field is used by the fix phase to apply the change
87+
without re-scanning, so stale `data` would let an old plan drive a new
88+
PR after the underlying file moved or changed.
89+
- Drop entries with `last_seen` older than 30 days.
90+
- Cap at 200 entries (drop oldest by `first_seen`).
91+
- Populated **before** the `known_issues` filter so fixable findings persist
92+
even when their report row is suppressed for being unchanged.
93+
94+
### `attempted_fixes` rules
95+
96+
`outcome``{open, merged, closed, abandoned}`.
97+
98+
- `abandoned` means the recipe could not produce a PR (tests failed,
99+
conflict, lint failed, allowlist rejected, etc.).
100+
- Reconcile at the start of each fix run. First refresh existing latest
101+
`open` attempts that have a `pr_number`: query the PR and flip the
102+
attempt to `merged` or `closed` if it is no longer open. Then recover
103+
from crashes that left state un-updated: list open PRs (`gh pr list`)
104+
whose bodies contain the
105+
`<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out
106+
each `<id>`, and back-fill any missing `attempted_fixes` entries with
107+
`outcome: "open"` and the parsed `pr_number` and `branch`.
108+
- Prune: drop `merged` entries older than 90 days. Do **not** prune
109+
`closed` or `abandoned` entries by age — pruning a single-strike entry
110+
would erase the history needed to ever reach the two-strike threshold.
111+
- The 200-entry cap handles long-tail cleanup. Eviction order:
112+
non-two-strike entries first, oldest-first by `attempts[0].at`.
113+
Two-strike entries (≥2 `closed`/`abandoned`) are exempt from cap
114+
eviction unless every other entry has already been evicted — they
115+
represent maintainer-action signals and must not be silently
116+
forgotten. If two-strike entries alone exceed 200, that's itself a
117+
signal worth surfacing; in that pathological case, evict oldest-first
118+
by `attempts[0].at`.
119+
- Two-strike entries surface in the report under
120+
`Repeatedly-failed fix attempts` and are filtered from selection
121+
permanently.
122+
123+
## Finding hash
124+
125+
`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where
126+
`canonical_key` uses durable identifiers only — never line numbers or free
127+
text:
128+
129+
| Suite (category) | canonical_key |
130+
|------------------|---------------|
131+
| docs (broken-link) | `<source-file>:<target>` |
132+
| docs (docstring-drift) | `<source-file>:<symbol>:<param-or-empty>:<drift-type>` |
133+
| docs (arch-ref-rename) | `<doc-file>:<old-symbol>` |
134+
| dependencies (transitive-gap) | `<package>:<dep>:transitive` |
135+
| dependencies (unused) | `<package>:<dep>:unused` |
136+
| structure (missing-future) | `<source-file>:missing-future` |
137+
| structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` |
138+
| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:<ordinal>:bare-except` |
139+
140+
Symbols use fully-qualified Python names.
141+
`try-body-hash` is `sha1(<try-block body, leading/trailing whitespace
142+
stripped, internal lines preserved>)[:8]`.
143+
`ordinal` is the 1-based position of this bare-except among bare-excepts
144+
in the same enclosing symbol, in source order. Both are needed: the body
145+
hash distinguishes most cases, and the ordinal disambiguates the rare
146+
case of two bare-except blocks with byte-identical try bodies.
147+
148+
## Ranking
149+
150+
Earlier criteria override later ones:
151+
152+
1. **Fix confidence** (per-category):
153+
154+
| Category | Confidence |
155+
|----------|-----------|
156+
| structure / missing-future | 1.0 |
157+
| structure / lazy-import | 0.9 |
158+
| docs / broken-link | 0.9 |
159+
| dependencies / transitive-gap | 0.85 |
160+
| docs / arch-ref-rename | 0.8 |
161+
| dependencies / unused | 0.75 |
162+
| docs / docstring-drift | 0.75 |
163+
| code-quality / bare-except | 0.6 |
164+
165+
2. **Defect severity**:
166+
167+
| Severity | Examples |
168+
|----------|----------|
169+
| high | missing transitive dep, heavy import bypassing lazy system |
170+
| medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API |
171+
| low | broken link in dev-notes, missing `__future__ import annotations`, unused dep |
172+
173+
3. **User-facing impact** — visible to docs-site readers or plugin
174+
consumers vs internal-only.
175+
176+
4. **Recency** — newer findings rank above long-standing ones.
177+
178+
Record the chosen finding's id, scores, and rationale at the top of
179+
`/tmp/audit-{{suite}}.md`.
180+
181+
## Standard fix procedure
182+
183+
The fix phase of every eligible recipe follows these steps. Suite recipes
184+
declare only the parts that vary (eligible categories, branch type,
185+
`test_required`, suite-specific quirks).
186+
187+
1. Reconcile `attempted_fixes`: refresh recorded open PRs to
188+
`merged`/`closed` when appropriate, then scan open PRs (`gh pr list`)
189+
to recover any state lost to a prior crash.
190+
2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or
191+
`merged`; surface two-strike entries in the report's
192+
`Repeatedly-failed fix attempts` section and drop them from selection.
193+
3. Rank the remainder per the Ranking section.
194+
4. For each candidate, top 5 max:
195+
1. Re-verify the finding still applies (re-grep / re-read). If not,
196+
remove from `fix_backlog` and continue.
197+
2. Apply the fix. If the diff exceeds the localized-fix bar or touches
198+
a non-allowlisted path, abandon and continue.
199+
3. If the category sets `test_required: true`, run the per-package
200+
test target (see the mapping table in "Localized fix bar" above)
201+
for the package containing the change. On failure: abandon and
202+
continue.
203+
4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
204+
`<type>(agentic-ci): <one-line>`. Push.
205+
5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
206+
hidden metadata block:
207+
`<!-- agentic-ci finding=<id> suite=<suite> -->`
208+
6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
209+
iff `draft_until_proven` is true for the suite.
210+
7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
211+
8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
212+
5. If all 5 candidates were abandoned, append a one-line note to the
213+
report and exit cleanly. The state already reflects the abandonments.
214+
215+
On any failure mid-flow: record `outcome: "abandoned"` for the chosen
216+
finding (with `pr_number: null`), leave any pushed branch in place
217+
(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
218+
to the next candidate.
219+
220+
## PR conventions
221+
222+
- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is
223+
interactive-only and shells the body inline; CI needs determinism.
224+
- **Title**: conventional, `<type>(agentic-ci): <one-line>`.
225+
- **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
226+
- **Draft PRs**: `code-quality` opens draft until a maintainer flips
227+
`draft_until_proven` to `false` in runner-state, after at least two
228+
non-draft PRs from that suite have landed clean. This flip is
229+
intentionally manual — it is the sole human-gated promotion step in
230+
the fix policy and must not be automated.
231+
232+
## Atomicity
233+
234+
Each fix-phase invocation produces exactly one of:
235+
236+
- **Report-only** — runner-state updated; no branch, commit, or PR.
237+
- **Report + PR** — same, plus a pushed branch, a commit, and a PR. The
238+
`attempted_fixes` entry is recorded *before* the recipe exits.
239+
240+
No half-states. The runner state is the source of truth for what the
241+
recipe has tried; never silently drop a failed attempt.
242+
243+
The matrix-level concurrency for the daily workflow uses
244+
`cancel-in-progress: false` so a fix in flight cannot be cancelled
245+
between push and PR open. The trade-off is a queued duplicate run if a
246+
manual dispatch arrives while cron is still going; that's preferable to
247+
orphaned branches with no `attempted_fixes` record.
248+
249+
## Workflow-level scope gate
250+
251+
The agent's compliance with the path allowlists and the localized-fix
252+
bar is load-bearing for autonomous PR generation, but the recipe alone
253+
cannot enforce them. The daily workflow runs a post-fix scope gate that
254+
re-derives the per-suite allowlist (mirrored from the table above) and
255+
the diff stats from the pushed branch, then closes the PR and deletes
256+
the remote branch on violation. The gate also flips the
257+
`attempted_fixes` entry from `open` to `abandoned` so two-strike logic
258+
sees the failure. Keep the workflow's allowlist regexes in sync with the
259+
table above; the workflow is the enforcement, the table is the
260+
specification.

.agents/recipes/_phase-audit.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Phase directive
2+
3+
This invocation runs the **AUDIT** phase only.
4+
5+
- Execute the audit steps from the recipe and write the report to
6+
`/tmp/audit-{{suite}}.md`.
7+
- Update `{{memory_path}}/runner-state.json` with detected findings,
8+
including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE
9+
applying the `known_issues` filter to the report, so fixable findings
10+
persist across runs even when their report row is suppressed).
11+
- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs.
12+
- Do NOT modify any files outside `{{memory_path}}/` and the report file
13+
`/tmp/audit-{{suite}}.md` itself.
14+
- A separate invocation will run the FIX phase if `fix_backlog` has
15+
eligible candidates and the suite has a fix phase.
16+
- Read the recipe in full for context; the "Fix phase" section informs
17+
which finding categories should populate `fix_backlog`, but you must
18+
not act on them in this invocation.

.agents/recipes/_phase-fix.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
## Phase directive
2+
3+
This invocation runs the **FIX** phase only.
4+
5+
- The audit phase has already completed in a previous invocation. Its
6+
report is at `/tmp/audit-{{suite}}.md` and
7+
`{{memory_path}}/runner-state.json` has the populated `fix_backlog`.
8+
- Execute only the recipe's "Fix phase" section per `_fix-policy.md`.
9+
Do NOT redo audit work — that is, do NOT re-scan whole packages or
10+
rebuild `fix_backlog` from scratch. The "no re-scan" rule does NOT
11+
override the per-candidate re-verification step required by
12+
`_fix-policy.md` §"Standard fix procedure" step 4.1: when you pick a
13+
candidate, you MUST re-grep / re-read the specific file or symbol it
14+
points at to confirm the finding still applies before editing.
15+
Re-verification of a single candidate is required; re-scanning the
16+
codebase to discover new findings is forbidden.
17+
- Pick the highest-ranked eligible candidate from `fix_backlog`, apply
18+
the fix, run the package's tests if applicable, commit, push, and open
19+
the PR using `gh pr create --body-file`.
20+
- Record the attempt in `attempted_fixes` (whether successful, abandoned,
21+
or failed through the top-5 fallback) before exiting.
22+
- If no candidate qualifies after trying up to 5 of them, exit cleanly,
23+
append a short note to `/tmp/audit-{{suite}}.md` describing what was
24+
tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
25+
- Do NOT delete branches, even on failure (per `_runner.md` and
26+
`_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow
27+
to reap over time.
28+
- Read the recipe in full for context, but treat the audit phase as
29+
already done.

.agents/recipes/_runner.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow
7676
will handle posting it. Do not post directly to GitHub - the workflow controls
7777
output routing.
7878

79-
If your recipe produces code changes, commit them on a new branch and use
80-
`/create-pr` to open a pull request. The branch name should follow the
81-
pattern `agentic-ci/chore/{suite}-YYYYMMDD`.
79+
If your recipe produces code changes, commit them on a new branch following
80+
the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}`
81+
matches the change kind (`chore`/`docs`/`fix`/`refactor`).
82+
83+
For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-<suite>.md`
84+
directly rather than the `/create-pr` skill. The skill assumes an interactive
85+
session (it can prompt about uncommitted changes, base branch, etc.) and
86+
shells the body inline, which breaks on backticks and special characters.
87+
Daily-suite recipes that open PRs are governed by `_fix-policy.md` — read it
88+
for the full PR contract (allowlists, draft mode, hidden metadata, branch
89+
naming, atomicity).

0 commit comments

Comments
 (0)