Skip to content

feat(autocurrency): add agent-driven currency fix loop#6054

Open
Eren-Jeager123 wants to merge 8 commits intomainfrom
feat/currency-fix-agent
Open

feat(autocurrency): add agent-driven currency fix loop#6054
Eren-Jeager123 wants to merge 8 commits intomainfrom
feat/currency-fix-agent

Conversation

@Eren-Jeager123
Copy link
Copy Markdown
Contributor

@Eren-Jeager123 Eren-Jeager123 commented May 7, 2026

Summary

Add an automated agent that diagnoses and fixes CI failures on auto-update PRs (vLLM/SGLang currency bumps).

What This PR Adds

File Purpose
.github/workflows/agent-currency-fix.yml Workflow triggered by workflow_run when PR CI fails on auto-update/* branches
scripts/autocurrency/agent-fix.py Python script that reads CI logs, calls Bedrock Claude Opus 4.6, and applies file edits

How It Works

PR CI fails → workflow_run fires → validate branch name (regex)
→ checkout main (trusted) → copy agent-fix.py to /tmp
→ git switch to PR branch → count [agent-fix] commits
→ if < 3: call Bedrock → parse search/replace blocks → apply with fuzzy match
  → if apply fails: retry LLM with error context (up to 3 retries)
  → if apply succeeds: commit → push → PR CI re-runs
→ if >= 3: comment on PR, escalate to human

Edit Format: Search/Replace Blocks (Industry Standard)

Uses the same format as Aider, Cline, and Claude's native editing:

docker/vllm/Dockerfile
<<<<<<< SEARCH
  "pillow>=11.0.0" \
  "xgrammar>=0.1.30" \
=======
  "pillow>=12.1.1" \
  "xgrammar>=0.1.32" \
>>>>>>> REPLACE

Why this format: Claude is trained on it, avoids line numbers (LLMs can't count), clear delimiters, industry convergence.

Matching (3 layers): exact → whitespace-normalized → fuzzy (difflib >= 0.8)

Retry loop: If edits fail to apply or response is unparseable, retries up to 3 times with error feedback showing what the file actually contains (same approach as Aider).

Security Architecture

CodeQL compliant:

  • actions/checkout only checks out main (trusted, default branch)
  • PR branch switched via git fetch + git checkout (not actions/checkout)
  • agent-fix.py always executed from main's copy (/tmp/agent-fix.py)
  • Branch name validated against strict regex ^auto-update/[a-z]+-[0-9]+\.[0-9]+\.[0-9]+$
  • All event data stored in env vars (no direct ${{ }} interpolation in run: blocks)
  • No actions/cache usage (avoids cache poisoning)
  • Push explicitly targets HEAD:$HEAD_BRANCH

Loop Prevention

Two levels:

  1. Outer loop (across CI cycles): Git commit counting — max 3 [agent-fix] commits per PR branch
  2. Inner loop (within one workflow run): LLM retry — max 3 Bedrock calls if response is unparseable or edits fail to apply

Key Design Decisions

  1. Trigger: workflow_run + startsWith(branch, 'auto-update/') — branch name over PR label (payload doesn't include labels)
  2. Edit format: Search/replace blocks — industry standard, highest LLM accuracy, fuzzy matching fallback
  3. Scope: Ubuntu variants only (PR - vLLM EC2/SageMaker, PR - SGLang EC2/SageMaker). Expand after testing.
  4. Concurrency: One agent per branch (cancel-in-progress: true). Multiple failing workflows → last one wins.
  5. Safety: Independent workflow, not a required check, cannot merge, human approval required.

Dependencies

  • IAM change: Adds bedrock:InvokeModel permission to CodeBuild runner role (separate infra PR)

Testing

  • workflow_run can only trigger from the default branch — cannot test from feature branch
  • Risk is zero: independent workflow, cannot impact PR merge status
  • Will validate on the next real auto-update PR failure

Add a workflow and script that automatically diagnoses and fixes CI
failures on auto-update PRs for vLLM and SGLang.

Architecture:
- workflow_run triggers on PR CI failure for auto-update/* branches
- Circuit breaker via GitHub Actions cache prevents infinite loops
- Git commit counting limits to 3 fix attempts before escalating
- Calls Bedrock Claude Opus 4.6 to reason about failures
- Applies returned file edits, commits, and pushes to PR branch
- PR CI re-runs naturally on new commit

Trigger decision: uses startsWith(branch, 'auto-update/') filter
rather than PR label check because workflow_run payload doesn't
include labels and the branch naming is deterministic.

Scope: Ubuntu variants only (PR - vLLM EC2, PR - vLLM SageMaker,
PR - SGLang EC2, PR - SGLang SageMaker). Will expand to amzn2023
variants after testing.
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Comment thread .github/workflows/agent-currency-fix.yml Fixed
@Eren-Jeager123 Eren-Jeager123 force-pushed the feat/currency-fix-agent branch from 3b02df0 to 1f1ec80 Compare May 7, 2026 00:57
- Validate branch name against strict regex before use
- Move event data to env vars to prevent shell injection via
  crafted branch names (CodeQL workflow_run injection rule)
- Add security comment explaining privileged checkout is safe
  because branch name is validated to bot-created pattern only
- Remove concurrency group using interpolated branch name
Comment thread .github/workflows/agent-currency-fix.yml Fixed
Address remaining CodeQL findings:
- Checkout main (trusted) separately for agent-fix.py execution
- Checkout PR branch (workdir) separately for data modification
- Use GITHUB_ENV for framework name instead of step output interpolation
- Scripts always execute from main's copy, never from PR branch

This prevents cache poisoning and untrusted code execution because
the Python script is always sourced from the default branch.
Comment thread .github/workflows/agent-currency-fix.yml Fixed
CodeQL flags privileged checkout of untrusted code. Fix by using
the default GITHUB_TOKEN (read-only) for the workdir checkout, then
configuring the push remote URL with the app token separately.
This way the checkout itself is unprivileged.
Use actions/checkout only for main (trusted). Switch to PR branch
via git fetch + git checkout after copying agent-fix.py to /tmp.
This avoids a second actions/checkout with a non-default ref, which
CodeQL flags as untrusted code execution in workflow_run context.

Push explicitly targets HEAD:$HEAD_BRANCH to be clear about destination.
CodeQL flags actions/cache/save after switching to untrusted branch
as potential cache poisoning. Remove cache entirely — rely solely on
git commit counting for loop prevention. The tradeoff is ~30s of
runner time on post-exhaustion triggers (acceptable for our volume).
- Add concurrency group per PR branch so only one agent runs at a time
  when multiple PR workflows (ec2 + sagemaker) fail simultaneously.
  cancel-in-progress: true means the later trigger wins.
- Add warning log when string replace matches multiple occurrences
  (still replaces all, but logs for visibility).
Major improvements aligned with industry best practices (Aider, Cline,
Codex CLI):

1. Search/Replace block format — same format used by Claude, Aider, and
   Cline. LLM returns filepath + SEARCH/REPLACE delimited blocks instead
   of JSON. More natural for the model, better accuracy.

2. Fuzzy matching — three-layer strategy:
   - Exact string match
   - Whitespace-normalized match
   - Fuzzy line-by-line match (difflib, >= 0.8 threshold)

3. Retry loop with error feedback — if edits fail to apply or response
   is unparseable, retries up to 3 times with detailed error context
   (what was searched, what the file actually contains). This mirrors
   Aider's approach of showing the LLM its mistake.

4. System/user prompt separation — system prompt defines format rules
   and constraints, user prompt provides the specific context. Cleaner
   for the model.

5. Partial success handling — if some edits apply and others fail,
   keeps the successful ones and logs warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants