feat(autocurrency): add agent-driven currency fix loop by Eren-Jeager123 · Pull Request #6054 · aws/deep-learning-containers

Eren-Jeager123 · 2026-05-07T00:56:07Z

Summary

Add an automated agent that diagnoses and fixes CI failures on auto-update PRs (vLLM/SGLang currency bumps).

What This PR Adds

File	Purpose
`.github/workflows/agent-currency-fix.yml`	Workflow triggered by `workflow_run` when PR CI fails on `auto-update/*` branches
`scripts/autocurrency/agent-fix.py`	Python script that reads CI logs, calls Bedrock Claude Opus 4.6, and applies file edits

How It Works

PR CI fails → workflow_run fires → validate branch name (regex)
→ checkout main (trusted) → copy agent-fix.py to /tmp
→ git switch to PR branch → count [agent-fix] commits
→ if < 3: call Bedrock → parse search/replace blocks → apply with fuzzy match
  → if apply fails: retry LLM with error context (up to 3 retries)
  → if apply succeeds: commit → push → PR CI re-runs
→ if >= 3: comment on PR, escalate to human

Edit Format: Search/Replace Blocks (Industry Standard)

Uses the same format as Aider, Cline, and Claude's native editing:

docker/vllm/Dockerfile
<<<<<<< SEARCH
  "pillow>=11.0.0" \
  "xgrammar>=0.1.30" \
=======
  "pillow>=12.1.1" \
  "xgrammar>=0.1.32" \
>>>>>>> REPLACE

Why this format: Claude is trained on it, avoids line numbers (LLMs can't count), clear delimiters, industry convergence.

Matching (3 layers): exact → whitespace-normalized → fuzzy (difflib >= 0.8)

Retry loop: If edits fail to apply or response is unparseable, retries up to 3 times with error feedback showing what the file actually contains (same approach as Aider).

Security Architecture

CodeQL compliant:

actions/checkout only checks out main (trusted, default branch)
PR branch switched via git fetch + git checkout (not actions/checkout)
agent-fix.py always executed from main's copy (/tmp/agent-fix.py)
Branch name validated against strict regex ^auto-update/[a-z]+-[0-9]+\.[0-9]+\.[0-9]+$
All event data stored in env vars (no direct ${{ }} interpolation in run: blocks)
No actions/cache usage (avoids cache poisoning)
Push explicitly targets HEAD:$HEAD_BRANCH

Loop Prevention

Two levels:

Outer loop (across CI cycles): Git commit counting — max 3 [agent-fix] commits per PR branch
Inner loop (within one workflow run): LLM retry — max 3 Bedrock calls if response is unparseable or edits fail to apply

Key Design Decisions

Trigger: workflow_run + startsWith(branch, 'auto-update/') — branch name over PR label (payload doesn't include labels)
Edit format: Search/replace blocks — industry standard, highest LLM accuracy, fuzzy matching fallback
Scope: Ubuntu variants only (PR - vLLM EC2/SageMaker, PR - SGLang EC2/SageMaker). Expand after testing.
Concurrency: One agent per branch (cancel-in-progress: true). Multiple failing workflows → last one wins.
Safety: Independent workflow, not a required check, cannot merge, human approval required.

Dependencies

IAM change: Adds bedrock:InvokeModel permission to CodeBuild runner role (separate infra PR)

Testing

workflow_run can only trigger from the default branch — cannot test from feature branch
Risk is zero: independent workflow, cannot impact PR merge status
Will validate on the next real auto-update PR failure

Add a workflow and script that automatically diagnoses and fixes CI failures on auto-update PRs for vLLM and SGLang. Architecture: - workflow_run triggers on PR CI failure for auto-update/* branches - Circuit breaker via GitHub Actions cache prevents infinite loops - Git commit counting limits to 3 fix attempts before escalating - Calls Bedrock Claude Opus 4.6 to reason about failures - Applies returned file edits, commits, and pushes to PR branch - PR CI re-runs naturally on new commit Trigger decision: uses startsWith(branch, 'auto-update/') filter rather than PR label check because workflow_run payload doesn't include labels and the branch naming is deterministic. Scope: Ubuntu variants only (PR - vLLM EC2, PR - vLLM SageMaker, PR - SGLang EC2, PR - SGLang SageMaker). Will expand to amzn2023 variants after testing.

- Validate branch name against strict regex before use - Move event data to env vars to prevent shell injection via crafted branch names (CodeQL workflow_run injection rule) - Add security comment explaining privileged checkout is safe because branch name is validated to bot-created pattern only - Remove concurrency group using interpolated branch name

Address remaining CodeQL findings: - Checkout main (trusted) separately for agent-fix.py execution - Checkout PR branch (workdir) separately for data modification - Use GITHUB_ENV for framework name instead of step output interpolation - Scripts always execute from main's copy, never from PR branch This prevents cache poisoning and untrusted code execution because the Python script is always sourced from the default branch.

CodeQL flags privileged checkout of untrusted code. Fix by using the default GITHUB_TOKEN (read-only) for the workdir checkout, then configuring the push remote URL with the app token separately. This way the checkout itself is unprivileged.

Use actions/checkout only for main (trusted). Switch to PR branch via git fetch + git checkout after copying agent-fix.py to /tmp. This avoids a second actions/checkout with a non-default ref, which CodeQL flags as untrusted code execution in workflow_run context. Push explicitly targets HEAD:$HEAD_BRANCH to be clear about destination.

CodeQL flags actions/cache/save after switching to untrusted branch as potential cache poisoning. Remove cache entirely — rely solely on git commit counting for loop prevention. The tradeoff is ~30s of runner time on post-exhaustion triggers (acceptable for our volume).

- Add concurrency group per PR branch so only one agent runs at a time when multiple PR workflows (ec2 + sagemaker) fail simultaneously. cancel-in-progress: true means the later trigger wins. - Add warning log when string replace matches multiple occurrences (still replaces all, but logs for visibility).

Major improvements aligned with industry best practices (Aider, Cline, Codex CLI): 1. Search/Replace block format — same format used by Claude, Aider, and Cline. LLM returns filepath + SEARCH/REPLACE delimited blocks instead of JSON. More natural for the model, better accuracy. 2. Fuzzy matching — three-layer strategy: - Exact string match - Whitespace-normalized match - Fuzzy line-by-line match (difflib, >= 0.8 threshold) 3. Retry loop with error feedback — if edits fail to apply or response is unparseable, retries up to 3 times with detailed error context (what was searched, what the file actually contains). This mirrors Aider's approach of showing the LLM its mistake. 4. System/user prompt separation — system prompt defines format rules and constraints, user prompt provides the specific context. Cleaner for the model. 5. Partial success handling — if some edits apply and others fail, keeps the successful ones and logs warnings.

aws-deep-learning-containers-ci Bot added the authorized label May 7, 2026

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Eren-Jeager123 force-pushed the feat/currency-fix-agent branch from 3b02df0 to 1f1ec80 Compare May 7, 2026 00:57

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread .github/workflows/agent-currency-fix.yml Fixed

github-advanced-security AI found potential problems May 7, 2026

View reviewed changes

Comment thread .github/workflows/agent-currency-fix.yml Fixed

Eren-Jeager123 added 5 commits May 7, 2026 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autocurrency): add agent-driven currency fix loop#6054

feat(autocurrency): add agent-driven currency fix loop#6054
Eren-Jeager123 wants to merge 8 commits intomainfrom
feat/currency-fix-agent

Eren-Jeager123 commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eren-Jeager123 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What This PR Adds

How It Works

Edit Format: Search/Replace Blocks (Industry Standard)

Security Architecture

Loop Prevention

Key Design Decisions

Dependencies

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Eren-Jeager123 commented May 7, 2026 •

edited

Loading