Skip to content

[FEATURE] CI failure auto-diagnosis for evolution PRs #577

Description

@acipat

Feature Description

Problem Statement

The evolution pipeline can open pull requests but cannot continue working on them once CI reports a failure. Agent-generated PRs sit open when a single CI shard fails — there is no automated step that reads the failing job log, extracts the concrete error (pytest traceback, lint violation, type error), and turns it into an actionable fix or child issue.

Proposed Solution

Add a CI failure auto-diagnosis stage that:

  1. Polls the fork's open PRs via the GitHub Actions API (gh run list, gh run view --log-failed).
  2. Fetches the failing job log and extracts the concrete error class and message.
  3. For small safe fixes (e.g., a lint violation, a missing import), pushes a follow-up commit to the PR branch.
  4. For larger failures, creates a focused child issue linking the PR and the extracted error, so the next implementation cycle can pick it up.

This differs from the broader evaluation harness (#491) because it targets real CI outcomes for real code changes, not synthetic benchmarks. It directly unblocks the merge stage by converting stuck PR states into actionable next steps.

Value Proposition

  • Impact: 0.9
  • Effort: 0.5
  • Priority Score: 1.44

Research Evidence

  • OpenHands / SWE-agent "verify-before-submit" workflows (shift verification left)
  • GitHub Actions API log-parsing practices (gh run view --log-failed)

Implementation Plan

  1. New skill evolution-ci-diagnosis + cron job entry.
  2. Step 1: list open PRs and their latest CI run status.
  3. Step 2: fetch failing log shard, extract error via a bounded regex/parser.
  4. Step 3: classify failure as "trivial-fixable" vs "needs-child-issue".
  5. Step 4: push fix commit OR create child issue with extracted context.
  6. Reuse existing terminal/web toolsets — no new core model tools.

Success Criteria

  • Open PRs with failed CI are automatically triaged within one cron cycle.
  • Trivial CI failures (lint, import) get auto-fixed commits.
  • Complex failures produce a child issue with the extracted error context.
  • No increase in manual triage burden for maintainers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedAccepted by evolution — sent to a PR / implemented

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions