Feature Description
Problem Statement
The evolution pipeline can open pull requests but cannot continue working on them once CI reports a failure. Agent-generated PRs sit open when a single CI shard fails — there is no automated step that reads the failing job log, extracts the concrete error (pytest traceback, lint violation, type error), and turns it into an actionable fix or child issue.
Proposed Solution
Add a CI failure auto-diagnosis stage that:
- Polls the fork's open PRs via the GitHub Actions API (
gh run list, gh run view --log-failed).
- Fetches the failing job log and extracts the concrete error class and message.
- For small safe fixes (e.g., a lint violation, a missing import), pushes a follow-up commit to the PR branch.
- For larger failures, creates a focused child issue linking the PR and the extracted error, so the next implementation cycle can pick it up.
This differs from the broader evaluation harness (#491) because it targets real CI outcomes for real code changes, not synthetic benchmarks. It directly unblocks the merge stage by converting stuck PR states into actionable next steps.
Value Proposition
- Impact: 0.9
- Effort: 0.5
- Priority Score: 1.44
Research Evidence
- OpenHands / SWE-agent "verify-before-submit" workflows (shift verification left)
- GitHub Actions API log-parsing practices (
gh run view --log-failed)
Implementation Plan
- New skill
evolution-ci-diagnosis + cron job entry.
- Step 1: list open PRs and their latest CI run status.
- Step 2: fetch failing log shard, extract error via a bounded regex/parser.
- Step 3: classify failure as "trivial-fixable" vs "needs-child-issue".
- Step 4: push fix commit OR create child issue with extracted context.
- Reuse existing
terminal/web toolsets — no new core model tools.
Success Criteria
Feature Description
Problem Statement
The evolution pipeline can open pull requests but cannot continue working on them once CI reports a failure. Agent-generated PRs sit open when a single CI shard fails — there is no automated step that reads the failing job log, extracts the concrete error (pytest traceback, lint violation, type error), and turns it into an actionable fix or child issue.
Proposed Solution
Add a CI failure auto-diagnosis stage that:
gh run list,gh run view --log-failed).This differs from the broader evaluation harness (#491) because it targets real CI outcomes for real code changes, not synthetic benchmarks. It directly unblocks the merge stage by converting stuck PR states into actionable next steps.
Value Proposition
Research Evidence
gh run view --log-failed)Implementation Plan
evolution-ci-diagnosis+ cron job entry.terminal/webtoolsets — no new core model tools.Success Criteria