[UPDATE PRIMITIVE] Fix transient HTTP 503 failures in install-packs.sh via exponential backoff retry#121
Merged
data-douser merged 3 commits intodd/no-grep-or-bustfrom Mar 10, 2026
Conversation
The GitHub Actions integration test was failing on windows-latest with HTTP 503 "Egress is over the account limit" when downloading CodeQL packs from GHCR.io. Add a run_with_retry() helper function that retries a command up to 3 times with exponential backoff (10s, 20s, 40s). Both codeql pack install calls in install_packs() now use run_with_retry to handle transient network errors gracefully. Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix failing GitHub Actions workflow for integration tests
[UPDATE PRIMITIVE] Fix transient HTTP 503 failures in install-packs.sh via exponential backoff retry
Mar 10, 2026
data-douser
added a commit
that referenced
this pull request
Mar 11, 2026
… to avoid LLM use of `grep` (#119) * Resolve database lock contention w/ vscode-codeql Resolves #117 Fixes a known compatibility issue for databases added, and therefore locked, via the GitHub.vscode-codeql extension. The vscode-codeql query server creates .lock files in the cache directory of every registered CodeQL database, preventing the ql-mcp server from running CLI commands (codeql_query_run, codeql_database_analyze) against those same databases. Add a DatabaseCopier that syncs databases from vscode-codeql storage into a managed directory under the `vscode-codeql-development-mcp-server` extension's globalStorage, stripping .lock files from the copy. The EnvironmentBuilder now sets CODEQL_DATABASES_BASE_DIRS to this managed directory by default (configurable via codeql-mcp.copyDatabases). - New DatabaseCopier class with incremental sync (skips unchanged databases) - StoragePaths.getManagedDatabaseStoragePath() for the managed databases/ dir - EnvironmentBuilder accepts injectable DatabaseCopierFactory for testability - codeql-mcp.copyDatabases setting (default: true) - 11 unit tests for DatabaseCopier (real filesystem operations) - 15 unit tests for EnvironmentBuilder (updated for copy mode + fallback) - 3 bridge integration tests (managed dir structure, no .lock files) - 4 E2E integration tests: inject .lock → copy → codeql_query_run + codeql_database_analyze succeed against the lock-free copy * Address PR review comments * Address more PR review comments * Add search_ql_code and codeql_resolve_files tools Add search_ql_code and codeql_resolve_files tools in order to eliminate grep/CLI dependencies. - New tools: search_ql_code (QL text/regex search) and codeql_resolve_files (file discovery by extension/glob) so LLMs never need shell access - Rewrite profile_codeql_query_from_logs with two-tier design: compact inline JSON + line-indexed detail file for targeted read_file access; parser now captures RA operations and pipeline-stage tuple progressions - Fix codeql_resolve_database to probe child directories for databases - Remove all grep/CLI references from prompts and resources - Cross-platform: normalize \r\n line endings in parser and search tool * Add "after" files for query evaluation integration tests * address Code Scanning TOCTOU race and PR review feedback - Eliminate filesystem race condition in search-ql-code.ts (read-then-check instead of stat-then-read) - Add symlink cycle detection using lstatSync and visited-path tracking - Fix tool description field names in profile-codeql-query-from-logs.ts ({startLine,endLine} → detailLines: {start,end}) - Fix monitoring-state.json fixtures to use standard sessions format - Rename find_qll_files → find_ql_files to match actual .ql extension * Stream large files instead of loading into memory - addresses latest review feedback for PR #119 - search-ql-code: check file size via lstatSync before reading; stream large files (>5 MB) line-by-line instead of skipping them - evaluator-log-parser: replace readFileSync with streaming async generator (createReadStream + readline) for brace-depth JSON parsing; parseEvaluatorLog now reads the file once instead of twice - profile-codeql-query: convert local parser to streaming with Map-based lookups instead of O(n) events.find() - database-copier: use lstat in removeLockFiles to skip symlinks; throw on fatal mkdir failures for proper fallback in EnvironmentBuilder - Validate contextLines/maxResults with schema bounds and clamping - Add environment-builder test for syncAll-throws fallback * Fix tool issues found during explain-codeql-query workflow testing - search_ql_code: add missing await in tool handler; skip .codeql, node_modules, and .git directories to avoid duplicate results from compiled pack caches - cli-tool-registry: extract resolveDatabasePath helper for multi-language DB root auto-resolution; apply to codeql_query_run, codeql_database_analyze, and codeql_resolve_database - environment-builder: route CODEQL_MCP_TMP_DIR to workspace-local .codeql/ql-mcp scratch directory (configurable via scratchDir setting); add CODEQL_MCP_WORKSPACE_FOLDERS env var - query-file-finder: add contextual hints array for missing tests, documentation, and expected results * [UPDATE PRIMITIVE] Fix transient HTTP 503 failures in install-packs.sh via exponential backoff retry (#121) * Initial plan * fix: add retry logic with exponential backoff to install-packs.sh The GitHub Actions integration test was failing on windows-latest with HTTP 503 "Egress is over the account limit" when downloading CodeQL packs from GHCR.io. Add a run_with_retry() helper function that retries a command up to 3 times with exponential backoff (10s, 20s, 40s). Both codeql pack install calls in install_packs() now use run_with_retry to handle transient network errors gracefully. Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: data-douser <70299490+data-douser@users.noreply.github.com> * deterministic profiler output and search efficiency - addresses latest feedback for PR #119 ; - profile-codeql-query-from-logs: remove non-deterministic `Generated:` timestamp from detail file header to ensure reproducible output for integration test fixtures ; - search-ql-code: early-exit file processing once maxResults matches are collected; subsequent files are scanned cheaply for totalMatches count only, avoiding large array allocations and context extraction ; * Fix TOCTOU bug for search_ql_code tool * Stream-count large files & detect ambiguous DB paths - search-ql-code: use streaming (readline) for totalMatches counting on large files in the early-exit path; eliminates TOCTOU race from prior lstatSync check - cli-tool-registry: resolveDatabasePath now collects all candidate children and throws on ambiguity instead of silently picking the first - Add tests for cross-file totalMatches accuracy under truncation, single- child DB auto-resolve, and multi-child DB ambiguity error * Address latest PR review comments * Use fstatSync(fd) to avoid OOM w/ searchFile --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
windows-latestintegration test job was failing non-deterministically whencodeql pack installhit HTTP 503 "Egress is over the account limit" from GHCR.io — a transient rate-limit error with no recovery path.📝 Update Information
Primitive Details
server/scripts/install-packs.sh✅ ALLOWED FILES:
server/scripts/install-packs.sh— retry logic added to pack installation🚫 FORBIDDEN FILES: None included.
🛑 MANDATORY PR VALIDATION CHECKLIST
Update Metadata
🎯 Changes Description
Current Behavior
codeql pack installis called directly. Any non-zero exit — including transient GHCR.io 503s — causes the step to fail immediately with no retry.Updated Behavior
codeql pack installcalls are wrapped inrun_with_retry 3 10, which retries up to 3 times with exponential backoff (10s → 20s → 40s), logging a warning on each failure and a hard error only after all attempts are exhausted.Motivation
GHCR.io returns HTTP 503 "Egress is over the account limit" under transient load. This is a recoverable error; retrying with backoff is sufficient to resolve it without any code changes.
🔄 Before vs. After Comparison
Functionality Changes
API Changes
No API changes — script interface is identical.
Output Format Changes
No output format changes. Additional
WARNING:lines are emitted on retried attempts;ERROR:is emitted only on total failure.🧪 Testing & Validation
Test Coverage Updates
Validation Scenarios
Test Results
codeql_pack_install/install_packrequires live GHCR.io (blocked in sandbox — pre-existing)📋 Implementation Details
Files Modified
server/scripts/install-packs.shCode Changes Summary
run_with_retryhelper with exponential backoffDependencies
sleep, arithmetic expansion)🔍 Quality Improvements
Bug Fixes
install-packs.shfails non-deterministically on GHCR.io 503 egress limit errorscodeql pack installexits non-zero on HTTP 503; script had no recovery mechanismrun_with_retry(3 attempts, 10s initial delay, 2× backoff)Performance Improvements
Code Quality Enhancements
run_with_retryis a named, documented helper — intent is clearrun_with_retrycan wrap any future CLI call in the script🔗 References
Related Issues/PRs
66475926194— Integration Tests (windows-latest, http) run22909057614External References
Validation Materials
HTTP/1.1 503 Egress is over the account limitoncodeql/ssablob fetch🚀 Compatibility & Migration
Backward Compatibility
API Evolution
install-packs.shCLI interface unchanged👥 Review Guidelines
For Reviewers
Please verify:
install-packs.shmodifiedsrcandtestpack installsWARNING+ERRORmessaging on failure pathsTesting Instructions
Validation Checklist
ERROR:after 3 exhausted attempts📊 Impact Assessment
Performance Impact
Server Impact
sleepduring backoff)AI Assistant Impact
🔄 Deployment Strategy
Rollout Considerations
run_with_retryhelper and restore directcodeql pack installcallsPost-Deployment Validation
windows-latestintegration test jobs for sustained 503 failures (would indicate quota issue needing a different solution)Update Methodology: This update follows best practices:
Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.