feat(autocurrency): add agent-driven currency fix loop#6054
Open
Eren-Jeager123 wants to merge 8 commits intomainfrom
Open
feat(autocurrency): add agent-driven currency fix loop#6054Eren-Jeager123 wants to merge 8 commits intomainfrom
Eren-Jeager123 wants to merge 8 commits intomainfrom
Conversation
Add a workflow and script that automatically diagnoses and fixes CI failures on auto-update PRs for vLLM and SGLang. Architecture: - workflow_run triggers on PR CI failure for auto-update/* branches - Circuit breaker via GitHub Actions cache prevents infinite loops - Git commit counting limits to 3 fix attempts before escalating - Calls Bedrock Claude Opus 4.6 to reason about failures - Applies returned file edits, commits, and pushes to PR branch - PR CI re-runs naturally on new commit Trigger decision: uses startsWith(branch, 'auto-update/') filter rather than PR label check because workflow_run payload doesn't include labels and the branch naming is deterministic. Scope: Ubuntu variants only (PR - vLLM EC2, PR - vLLM SageMaker, PR - SGLang EC2, PR - SGLang SageMaker). Will expand to amzn2023 variants after testing.
3b02df0 to
1f1ec80
Compare
- Validate branch name against strict regex before use - Move event data to env vars to prevent shell injection via crafted branch names (CodeQL workflow_run injection rule) - Add security comment explaining privileged checkout is safe because branch name is validated to bot-created pattern only - Remove concurrency group using interpolated branch name
Address remaining CodeQL findings: - Checkout main (trusted) separately for agent-fix.py execution - Checkout PR branch (workdir) separately for data modification - Use GITHUB_ENV for framework name instead of step output interpolation - Scripts always execute from main's copy, never from PR branch This prevents cache poisoning and untrusted code execution because the Python script is always sourced from the default branch.
CodeQL flags privileged checkout of untrusted code. Fix by using the default GITHUB_TOKEN (read-only) for the workdir checkout, then configuring the push remote URL with the app token separately. This way the checkout itself is unprivileged.
Use actions/checkout only for main (trusted). Switch to PR branch via git fetch + git checkout after copying agent-fix.py to /tmp. This avoids a second actions/checkout with a non-default ref, which CodeQL flags as untrusted code execution in workflow_run context. Push explicitly targets HEAD:$HEAD_BRANCH to be clear about destination.
CodeQL flags actions/cache/save after switching to untrusted branch as potential cache poisoning. Remove cache entirely — rely solely on git commit counting for loop prevention. The tradeoff is ~30s of runner time on post-exhaustion triggers (acceptable for our volume).
- Add concurrency group per PR branch so only one agent runs at a time when multiple PR workflows (ec2 + sagemaker) fail simultaneously. cancel-in-progress: true means the later trigger wins. - Add warning log when string replace matches multiple occurrences (still replaces all, but logs for visibility).
Major improvements aligned with industry best practices (Aider, Cline, Codex CLI): 1. Search/Replace block format — same format used by Claude, Aider, and Cline. LLM returns filepath + SEARCH/REPLACE delimited blocks instead of JSON. More natural for the model, better accuracy. 2. Fuzzy matching — three-layer strategy: - Exact string match - Whitespace-normalized match - Fuzzy line-by-line match (difflib, >= 0.8 threshold) 3. Retry loop with error feedback — if edits fail to apply or response is unparseable, retries up to 3 times with detailed error context (what was searched, what the file actually contains). This mirrors Aider's approach of showing the LLM its mistake. 4. System/user prompt separation — system prompt defines format rules and constraints, user prompt provides the specific context. Cleaner for the model. 5. Partial success handling — if some edits apply and others fail, keeps the successful ones and logs warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an automated agent that diagnoses and fixes CI failures on auto-update PRs (vLLM/SGLang currency bumps).
What This PR Adds
.github/workflows/agent-currency-fix.ymlworkflow_runwhen PR CI fails onauto-update/*branchesscripts/autocurrency/agent-fix.pyHow It Works
Edit Format: Search/Replace Blocks (Industry Standard)
Uses the same format as Aider, Cline, and Claude's native editing:
Why this format: Claude is trained on it, avoids line numbers (LLMs can't count), clear delimiters, industry convergence.
Matching (3 layers): exact → whitespace-normalized → fuzzy (difflib >= 0.8)
Retry loop: If edits fail to apply or response is unparseable, retries up to 3 times with error feedback showing what the file actually contains (same approach as Aider).
Security Architecture
CodeQL compliant:
actions/checkoutonly checks outmain(trusted, default branch)git fetch+git checkout(notactions/checkout)agent-fix.pyalways executed from main's copy (/tmp/agent-fix.py)^auto-update/[a-z]+-[0-9]+\.[0-9]+\.[0-9]+$${{ }}interpolation inrun:blocks)actions/cacheusage (avoids cache poisoning)HEAD:$HEAD_BRANCHLoop Prevention
Two levels:
[agent-fix]commits per PR branchKey Design Decisions
workflow_run+startsWith(branch, 'auto-update/')— branch name over PR label (payload doesn't include labels)cancel-in-progress: true). Multiple failing workflows → last one wins.Dependencies
bedrock:InvokeModelpermission to CodeBuild runner role (separate infra PR)Testing
workflow_runcan only trigger from the default branch — cannot test from feature branch