Skip to content

[github] auto-mode review-remediation: surface the failing-test output to the fixing agent + revisit the 2-cy#4130

Open
mabry1985 wants to merge 1 commit into
mainfrom
feature/github-auto-mode-review-remediation-surface-the-6n8z81m
Open

[github] auto-mode review-remediation: surface the failing-test output to the fixing agent + revisit the 2-cy#4130
mabry1985 wants to merge 1 commit into
mainfrom
feature/github-auto-mode-review-remediation-surface-the-6n8z81m

Conversation

@mabry1985

@mabry1985 mabry1985 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

SPARC PRD: Auto-Mode Review-Remediation — Failing-Test Evidence + Cycle Budget Revisit

Situation

Auto-mode's PR review-remediation loop operates as follows:

  1. A PR is opened by a fixing agent, triggering CI checks and review
  2. ReviewProcessor (lead-engineer-review-merge-processors.ts) polls the PR via gh pr view for check results and review feedback
  3. On CI failure, the processor collects only the check names (e.g., CI / test, CI / lint) and sets `ctx.reviewFeedback = 'CI ...

Closes #4124


Created automatically by Automaker

Summary by CodeRabbit

  • New Features

    • Added structured CI failure evidence collection capturing test details, assertions, and file locations from failed checks
    • Implemented progress-aware remediation budget extension that extends when measurable improvement is detected
  • Improvements

    • Enhanced CI failure context integration in execution and review workflows

@protoquinn

protoquinn Bot commented Jun 7, 2026

Copy link
Copy Markdown

👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow.

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR replaces auto-mode's transient CI failure message queue with structured evidence extraction from GitHub checks and logs. It surfaces concrete test assertions, expected/received values, and file locations to the fixing agent. It also makes the remediation budget dynamic by detecting measurable progress (fewer failures per cycle) and extending rather than escalating when progress is observed.

Changes

Structured CI Failure Evidence and Progress-Aware Remediation

Layer / File(s) Summary
Type definitions and interface updates
libs/types/src/ci-reaction.ts, libs/types/src/feature.ts, libs/types/src/index.ts, apps/server/src/services/lead-engineer-types.ts
CIFailureEvidence and RemediationCycleSnapshot types define structured failure metadata and per-cycle history. Feature gains optional _remediationHistory tracking. StateContext extends with ciFailureEvidence field.
CI failure evidence collection service
apps/server/src/services/ci-failure-evidence-collector.ts
New collector queries GitHub API for failed check runs and fetches their annotations and logs. Parses test file, name, assertion, expected/received values, and file:line location using regex patterns. Returns structured evidence array and exports markdown formatter.
Progress-aware budget enforcement
apps/server/src/services/remediation-budget-enforcer.ts
RemediationBudgetEnforcer adds checkProgressAndExtend() to compare current failure count against prior cycle history. Supports REVIEW_REMEDIATION_MAX_CYCLES environment override and recalculates total cap. Returns structured extension decision with reason.
Remove old CI injection mechanism
apps/server/src/services/auto-mode-service.ts
Deletes per-feature pendingCIInjections queue, associated cleanup in stopAutoLoopForProject, stopFeature, and shutdown, and public methods sendCIFailureToAgent() and consumePendingCIInjection(). Replaces with resumeCheckedProjects set.
Review-remediation cycle integration
apps/server/src/services/lead-engineer-review-merge-processors.ts
ReviewProcessor collects structured evidence during ci_failed state, persists _remediationHistory with timestamp and failure count, then calls checkProgressAndExtend(). If progress detected, transitions back to EXECUTE with remediation context; otherwise escalates when budget exhausted.
Evidence injection into execution context
apps/server/src/services/lead-engineer-execute-processor.ts
ExecuteProcessor formats evidence and injects it into recovery context prompt, prioritized before fallback review feedback and project knowledge.

Sequence Diagram(s)

sequenceDiagram
  participant ReviewProcessor
  participant Collector as collectCIFailureEvidence
  participant GitHub as GitHub API
  participant Parser as parseTestFailureDetails
  participant Enforcer as checkProgressAndExtend
  participant Feature as feature._remediationHistory
  ReviewProcessor->>Collector: collect evidence for failed checks
  Collector->>GitHub: list failed check runs & fetch annotations
  GitHub-->>Collector: check metadata + annotations/logs
  Collector->>Parser: extract from annotation or log
  Parser-->>Collector: test file, assertion, values, location
  Collector-->>ReviewProcessor: CIFailureEvidence[]
  ReviewProcessor->>Feature: persist remediation history
  ReviewProcessor->>Enforcer: check progress vs prior cycle
  Enforcer->>Feature: read prior failure count
  Feature-->>Enforcer: previous ciFailureCount
  Enforcer-->>ReviewProcessor: {extended: boolean, reason: string}
  alt Progress detected (fewer failures)
    ReviewProcessor->>ReviewProcessor: continue EXECUTE with evidence
  else Budget exhausted, no progress
    ReviewProcessor->>ReviewProcessor: escalate to operator
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A bunny hops through failing tests with glee,
Now sees the why—the assertion, clear and free!
No more blind guesses; progress marks the way,
With evidence in hand, the agent saves the day. 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title is partially related to the changeset but incomplete and truncated. It mentions surfacing failing-test output and revisiting the 2-cycle cap (core objectives), but is cut off mid-word ('2-cy' instead of '2-cycle'), making it unclear and misleading. Complete and clarify the title to fully convey the PR's scope, e.g., 'Surface failing-test evidence to agent and make review-remediation budget adaptive based on progress'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR successfully implements both objectives from #4124: (1) surfacing failing-test output via new CI failure evidence collection and formatting, and (2) revisiting the 2-cycle cap through progress-aware budget extension in RemediationBudgetEnforcer.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: removing obsolete pendingCI injection mechanisms, adding CI failure evidence collection/formatting, integrating evidence into remediation feedback, and implementing progress-aware budget extension. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/github-auto-mode-review-remediation-surface-the-6n8z81m

Comment @coderabbitai help to get the list of available commands and usage tips.

@protoquinn protoquinn Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Audit — PR #4130 | [github] auto-mode review-remediation: surface the failing-test output to the fixing agent + revisit the 2-cy

VERDICT: WARN


CI Status

  • harness-eval: queued
  • checks: queued
  • build: queued
  • test: queued
  • Close external code contributions: skipped

⏳ CI non-terminal — 4 checks still running. This is a non-blocking comment; formal verdict will land on next pass once CI is terminal.


Diff Review

  • Removes pendingCIInjections map + sendCIFailureToAgent/consumePendingCIInjection from auto-mode-service.ts — old in-process CI injection mechanism is fully replaced
  • Adds ci-failure-evidence-collector.ts (391 lines) — fetches structured GitHub check-run annotations via gh api, falls back to 500-line log excerpt extraction with failure pattern matching

Observations

  • MEDIUM/Gap: The diff does not show how CIFailureEvidenceCollector is wired into lead-engineer-review-merge-processors.ts. The PR description states ReviewProcessor will use it, but the integration call-site is absent from the diff. Without that wiring, the new collector is dead code. Verify the processor invokes the collector in a follow-up or confirm it lands in the same PR.
  • LOW: execAsync with gh CLI uses projectPath as cwd — acceptable since projectPath is internal to the protoMaker session context, but worth a security eye on path isolation if untrusted repos can be loaded as features.
  • LOW: MAX_LOG_LINES = 500 cap is reasonable; no risk of context blowout.
  • INFO: Graceful degradation (warn + empty array on failure) is correct for a background collector.

— Quinn, QA Engineer

@protoquinn

protoquinn Bot commented Jun 7, 2026

Copy link
Copy Markdown

Submitted COMMENT review on #4130.

@protoquinn protoquinn Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Audit — PR #4130 | [github] auto-mode review-remediation: surface the failing-test output to the fixing agent + revisit the 2-cy

VERDICT: WARN


CI Status

  • ci-complete: failure
  • harness-eval: success
  • checks: failure
  • build: failure
  • test: failure
  • Close external code contributions: skipped

Diff Review
This PR replaces a simple in-memory CI injection pipeline (pendingCIInjections Map in auto-mode-service.ts) with a structured ci-failure-evidence-collector.ts that fetches failure annotations from GitHub check runs and parses structured evidence from raw logs. The removals in auto-mode-service.ts are clean (no other callers of sendCIFailureToAgent visible in diff). The new collector is well-scoped: all execAsync calls are constrained to projectPath with timeouts (10s/15s), errors are handled gracefully with logger.warn, and log excerpts are hard-capped at 500 lines.

Observations

  • GAP: build: failure may stem from unverified type imports — CIFailureEvidence from @protolabsai/types and StateContext from ./lead-engineer-types.js are not visible in the diff and cannot be confirmed as existing in this tree. If these types are missing or mismatched, tsc would fail at build. Author should confirm or add the types.
  • GAP: The diff is truncated at 200 lines (897 total); the remaining ~700 lines of ci-failure-evidence-collector.ts were not reviewed. Structural review (clawpatch) also skipped — checkout cache could not resolve main or HEAD~1 as git refs in this environment.
  • CONTEXT: checks and test failures are expected — they are the CI failures this feature is designed to surface to the fixing agent (the PR's stated purpose). They do not indicate a bug introduced by this PR.
  • POSITIVE: No unresolved CodeRabbit threads. No security-sensitive paths touched. Old injection code removed cleanly with all call sites removed in the same diff.

— Quinn, QA Engineer

@protoquinn

protoquinn Bot commented Jun 7, 2026

Copy link
Copy Markdown

Submitted COMMENT review on protoLabsAI/protoMaker#4130.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/server/src/services/lead-engineer-review-merge-processors.ts (2)

423-425: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear ctx.ciFailureEvidence when CI evidence is unavailable or not relevant.

ctx.ciFailureEvidence is only overwritten on successful collection. If CI evidence was set in a prior cycle, it can leak into later non-CI remediation (or CI cycles with collection failure), and ExecuteProcessor will prioritize stale evidence in recovery context.

Suggested fix
 if (reviewState === 'changes_requested') {
+  ctx.ciFailureEvidence = undefined;
   ...
 }

 if (reviewState === 'ci_failed') {
+  ctx.ciFailureEvidence = undefined; // reset before recollecting
   ...
   if (evidences.length > 0) {
     ctx.ciFailureEvidence = evidences;
     ...
   } else if (ciFailureNames.length > 0) {
+    ctx.ciFailureEvidence = undefined;
     ctx.reviewFeedback = `CI checks failed: ${ciFailureNames.join(', ')}`;
   }
 }

Also applies to: 468-486

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/server/src/services/lead-engineer-review-merge-processors.ts` around
lines 423 - 425, The code currently only sets ctx.ciFailureEvidence when CI
evidence is successfully collected, allowing stale evidence to persist across
cycles; update the branches in lead-engineer-review-merge-processors.ts where
feedbackParts is handled and in the other CI-related block (the region around
where ctx.reviewFeedback is set and the block covering lines 468-486) to
explicitly clear ctx.ciFailureEvidence (set it to undefined/null) whenever CI
evidence is unavailable, irrelevant, or collection fails so ExecuteProcessor
will not see stale evidence; ensure every path that represents "no CI evidence"
(including collection failure and non-CI remediation paths) assigns
ctx.ciFailureEvidence = undefined and keep these changes adjacent to the
existing ctx.reviewFeedback/feedbackParts logic and the CI collection failure
handling.

488-523: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Progress extension is computed against the wrong snapshot and inconsistent counters.

The code appends the current _remediationHistory entry before calling checkProgressAndExtend(), then compares with evidences.length while persisted history uses ciFailureNames.length. This can suppress valid extensions or trigger inconsistent extension behavior.

Suggested fix direction
- // Record remediation history for progress-aware budget extension
- history.push({ timestamp: ..., ciFailureCount: ciFailureNames.length, ... });
- await featureLoader.update(..._remediationHistory: history);
-
  const ciBudgetResult = ciEnforcer.checkAndIncrement(...);

  if (!ciBudgetResult.allowed) {
+   // Compare against previous persisted snapshot (before writing current cycle)
    const progressCheck = ciEnforcer.checkProgressAndExtend(ctx.feature, evidences);
    ...
  }

+ // Persist current snapshot after evaluation, using the same count basis used for progress checks.
+ const effectiveFailureCount = evidences.length > 0 ? evidences.length : ciFailureNames.length;
+ history.push({ timestamp: ..., ciFailureCount: effectiveFailureCount, failingCheckNames: ciFailureNames });
+ await featureLoader.update(..._remediationHistory: history);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/server/src/services/lead-engineer-review-merge-processors.ts` around
lines 488 - 523, The progress-extension logic is using an updated remediation
snapshot and inconsistent counters which can hide or mis-trigger extensions;
before mutating ctx.feature._remediationHistory, capture the pre-update snapshot
and the corresponding failure count (use ciFailureNames.length as the canonical
CI-failure count), pass that snapshot and count into
RemediationBudgetEnforcer.checkProgressAndExtend(ctx.feature, evidences) (or a
new overload) so the extension comparison is performed against the same,
pre-persisted state, then only append the new history entry and persist after
the progress check; ensure checkProgressAndExtend and any comparisons use
ciFailureNames.length (not evidences.length) for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/server/src/services/ci-failure-evidence-collector.ts`:
- Around line 34-35: The two regexes meant to match diff-added/removed lines
currently use /\b-d\b/ and /\b\+d\b/ (which match a literal "d"); replace them
with regexes that look for leading "+" or "-" characters such as /^-/ and /^\+/
(or /^-\s/ and /^\+\s/ if you want to allow a single space after the sign) so
the diff context lines are correctly detected by the code that processes diffs
(replace the existing /\b-d\b/ and /\b\+d\b/ occurrences with these corrected
patterns).
- Around line 254-319: The loop over failedChecks is re-calling
getFailedStepLog() for each check which causes unnecessary GH API calls and
latency; instead, call getFailedStepLog() at most once and reuse its result.
Refactor the code around the failedChecks loop (the block that calls
getCheckRunOutput, parses annotations via parseTestFailureDetails, and currently
calls getFailedStepLog) to: 1) determine if any check might need the workflow
log (i.e., no annotations and no summary) and only then call getFailedStepLog
once into a local variable (e.g., failedStepLog); 2) inside the loop use that
cached failedStepLog to populate evidence.logExcerpt and to call
parseTestFailureDetails instead of invoking getFailedStepLog per-check; and 3)
keep existing behavior for using annotations, output.summary,
parseTestFailureDetails, evidence.testName, evidence.location, etc., but replace
the per-check getFailedStepLog call with the single cached value.

---

Outside diff comments:
In `@apps/server/src/services/lead-engineer-review-merge-processors.ts`:
- Around line 423-425: The code currently only sets ctx.ciFailureEvidence when
CI evidence is successfully collected, allowing stale evidence to persist across
cycles; update the branches in lead-engineer-review-merge-processors.ts where
feedbackParts is handled and in the other CI-related block (the region around
where ctx.reviewFeedback is set and the block covering lines 468-486) to
explicitly clear ctx.ciFailureEvidence (set it to undefined/null) whenever CI
evidence is unavailable, irrelevant, or collection fails so ExecuteProcessor
will not see stale evidence; ensure every path that represents "no CI evidence"
(including collection failure and non-CI remediation paths) assigns
ctx.ciFailureEvidence = undefined and keep these changes adjacent to the
existing ctx.reviewFeedback/feedbackParts logic and the CI collection failure
handling.
- Around line 488-523: The progress-extension logic is using an updated
remediation snapshot and inconsistent counters which can hide or mis-trigger
extensions; before mutating ctx.feature._remediationHistory, capture the
pre-update snapshot and the corresponding failure count (use
ciFailureNames.length as the canonical CI-failure count), pass that snapshot and
count into RemediationBudgetEnforcer.checkProgressAndExtend(ctx.feature,
evidences) (or a new overload) so the extension comparison is performed against
the same, pre-persisted state, then only append the new history entry and
persist after the progress check; ensure checkProgressAndExtend and any
comparisons use ciFailureNames.length (not evidences.length) for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 06365562-c4bf-442b-beac-3e21f9fcded0

📥 Commits

Reviewing files that changed from the base of the PR and between 9612732 and 1829d28.

📒 Files selected for processing (9)
  • apps/server/src/services/auto-mode-service.ts
  • apps/server/src/services/ci-failure-evidence-collector.ts
  • apps/server/src/services/lead-engineer-execute-processor.ts
  • apps/server/src/services/lead-engineer-review-merge-processors.ts
  • apps/server/src/services/lead-engineer-types.ts
  • apps/server/src/services/remediation-budget-enforcer.ts
  • libs/types/src/ci-reaction.ts
  • libs/types/src/feature.ts
  • libs/types/src/index.ts
💤 Files with no reviewable changes (1)
  • apps/server/src/services/auto-mode-service.ts

Comment on lines +34 to +35
/\b-d\b/, // diff removed lines
/\b\+d\b/, // diff added lines

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix diff-line regexes; current patterns don't match real +/- diff lines.

The patterns at Line 34 and Line 35 match literal d tokens, so diff context lines are effectively never captured.

Suggested fix
-  /\b-d\b/, // diff removed lines
-  /\b\+d\b/, // diff added lines
+  /^\s*-\s/, // diff removed lines
+  /^\s*\+\s/, // diff added lines
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/\b-d\b/, // diff removed lines
/\b\+d\b/, // diff added lines
/^\s*-\s/, // diff removed lines
/^\s*\+\s/, // diff added lines
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/server/src/services/ci-failure-evidence-collector.ts` around lines 34 -
35, The two regexes meant to match diff-added/removed lines currently use
/\b-d\b/ and /\b\+d\b/ (which match a literal "d"); replace them with regexes
that look for leading "+" or "-" characters such as /^-/ and /^\+/ (or /^-\s/
and /^\+\s/ if you want to allow a single space after the sign) so the diff
context lines are correctly detected by the code that processes diffs (replace
the existing /\b-d\b/ and /\b\+d\b/ occurrences with these corrected patterns).

Comment on lines +254 to +319
for (const check of failedChecks) {
const evidence: CIFailureEvidence = {
checkName: check.name,
checkUrl: check.html_url,
};

// Fetch check run output
const output = await getCheckRunOutput(slug, check.id, projectPath);

if (output) {
// Prefer annotations
if (output.annotations && output.annotations.length > 0) {
evidence.annotations = output.annotations;

// Extract details from annotations
const failureAnnotations = output.annotations.filter(
(a) => a.annotation_level === 'failure'
);

if (failureAnnotations.length > 0) {
const first = failureAnnotations[0];
if (first.path) {
evidence.testFile = first.path;
}
if (first.start_line) {
evidence.location = `${first.path || 'unknown'}:${first.start_line}`;
}
if (first.message) {
const parsed = parseTestFailureDetails(first.message);
Object.assign(evidence, parsed);
if (!evidence.testName && first.title) {
evidence.testName = first.title;
}
}
if (first.raw_details) {
const parsed = parseTestFailureDetails(first.raw_details);
Object.assign(evidence, parsed);
}
}

// If annotations exist but no failure-level ones, use summary
if (!evidence.testName && output.summary) {
const parsed = parseTestFailureDetails(output.summary);
Object.assign(evidence, parsed);
}
} else if (output.summary && output.summary.trim()) {
// No annotations, parse summary
const parsed = parseTestFailureDetails(output.summary);
Object.assign(evidence, parsed);
// Use summary as log excerpt if it's substantial
if (output.summary.trim().length > 50) {
evidence.logExcerpt = output.summary.trim().slice(0, 3000);
}
}
}

// If we still have no log excerpt, try fetching failed step logs
if (!evidence.logExcerpt && !evidence.annotations?.length) {
const log = await getFailedStepLog(projectPath, prNumber);
if (log && log.trim().length > 10) {
evidence.logExcerpt = log.trim().slice(0, 5000);
// Try to parse details from the log
const parsed = parseTestFailureDetails(log);
Object.assign(evidence, parsed);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid re-fetching failed workflow logs inside the per-check loop.

When annotations are missing, the code calls getFailedStepLog() per failed check. That multiplies gh calls and can add large latency in multi-failure PRs, increasing timeout/escalation risk.

Suggested refactor
 export async function collectCIFailureEvidence(ctx: StateContext): Promise<CIFailureEvidence[]> {
@@
-  for (const check of failedChecks) {
+  let sharedFailedLog: string | null = null;
+  let sharedFailedLogLoaded = false;
+
+  for (const check of failedChecks) {
@@
-    if (!evidence.logExcerpt && !evidence.annotations?.length) {
-      const log = await getFailedStepLog(projectPath, prNumber);
+    if (!evidence.logExcerpt && !evidence.annotations?.length) {
+      if (!sharedFailedLogLoaded) {
+        sharedFailedLog = await getFailedStepLog(projectPath, prNumber);
+        sharedFailedLogLoaded = true;
+      }
+      const log = sharedFailedLog;
       if (log && log.trim().length > 10) {
         evidence.logExcerpt = log.trim().slice(0, 5000);
         const parsed = parseTestFailureDetails(log);
         Object.assign(evidence, parsed);
       }
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/server/src/services/ci-failure-evidence-collector.ts` around lines 254 -
319, The loop over failedChecks is re-calling getFailedStepLog() for each check
which causes unnecessary GH API calls and latency; instead, call
getFailedStepLog() at most once and reuse its result. Refactor the code around
the failedChecks loop (the block that calls getCheckRunOutput, parses
annotations via parseTestFailureDetails, and currently calls getFailedStepLog)
to: 1) determine if any check might need the workflow log (i.e., no annotations
and no summary) and only then call getFailedStepLog once into a local variable
(e.g., failedStepLog); 2) inside the loop use that cached failedStepLog to
populate evidence.logExcerpt and to call parseTestFailureDetails instead of
invoking getFailedStepLog per-check; and 3) keep existing behavior for using
annotations, output.summary, parseTestFailureDetails, evidence.testName,
evidence.location, etc., but replace the per-check getFailedStepLog call with
the single cached value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto-mode review-remediation: surface the failing-test output to the fixing agent + revisit the 2-cycle cap

1 participant