Skip to content

fix(containers): apt install fallback to archive.ubuntu.com#5266

Merged
lpcox merged 8 commits into
mainfrom
fix/apt-mirror-install-fallback
Jun 19, 2026
Merged

fix(containers): apt install fallback to archive.ubuntu.com#5266
lpcox merged 8 commits into
mainfrom
fix/apt-mirror-install-fallback

Conversation

@lpcox

@lpcox lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Problem

The agent and squid Dockerfiles rewrite the apt mirror to azure.archive.ubuntu.com (normally much faster on Azure-hosted GitHub runners). They already retry on transient failures, but the fallback to archive.ubuntu.com only triggered when apt-get update reported Failed to fetch — i.e. metadata failures.

When the Azure mirror's metadata was reachable but the package downloads timed out during apt-get install:

E: Failed to fetch http://azure.archive.ubuntu.com/.../curl_..._amd64.deb  Connection timed out

the retry path re-ran apt_update_retry (which succeeded, leaving sources still pointed at azure) and retried the install against the same failing mirror → slow backoff retries and ultimately a hard build failure.

This is what intermittently killed the smoke-claude agent job (the build runs lazily inside the Execute … CLI step under --build-local), and caused the slow azure pulls observed in CI.

Fix

  • Extract the mirror rewrite into a shared force_archive_mirror helper.
  • Have apt_update_retry reuse it.
  • Add apt_install_retry / apt_upgrade_retry that force the archive.ubuntu.com mirror before retrying, covering the install and upgrade phases — not just update.

A flaky Azure mirror now transparently falls back to archive.ubuntu.com (already in the firewall allowlist) instead of failing the build. Applied consistently across all apt blocks in both Dockerfiles (the iptables-init container is built from the agent image, so it's covered; api-proxy is alpine/apk and unaffected).

Validation

  • All RUN blocks pass bash -n syntax checks (after mimicking Docker's comment stripping).
  • Functional simulation confirms the flow: install fails on azure → force_archive_mirror rewrites sources → install retried against archive → succeeds (exit 0).
  • src/services/{agent,squid}-service.test.ts still pass (40 tests); lint + build pass via pre-commit hook.
  • Docker daemon wasn't available locally to run a full image build; validation was via syntax + logic simulation + existing unit tests.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The agent and squid Dockerfiles rewrite the apt mirror to
azure.archive.ubuntu.com (faster on Azure-hosted GitHub runners). The
existing fallback to archive.ubuntu.com only triggered when 'apt-get
update' reported 'Failed to fetch' (metadata failures). When the Azure
mirror's metadata was reachable but the package (.deb) downloads timed
out during 'apt-get install', the retry path re-ran apt_update_retry
(which succeeded, leaving sources pointed at azure) and retried the
install against the same failing mirror, causing slow retries and
ultimately a hard build failure.

Extract the mirror rewrite into a shared force_archive_mirror helper,
have apt_update_retry reuse it, and add apt_install_retry /
apt_upgrade_retry that force the archive.ubuntu.com mirror before
retrying. This covers the install and upgrade phases, not just update,
so a flaky Azure mirror no longer fails the build (archive.ubuntu.com is
already in the firewall allowlist).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 22:37
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 97.54% 97.58% 📈 +0.04%
Statements 97.47% 97.50% 📈 +0.03%
Functions 98.85% 98.85% ➡️ +0.00%
Branches 92.87% 92.91% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves build reliability for the agent and squid container images by adding retry helpers that fall back from the Azure Ubuntu apt mirror to archive.ubuntu.com not only during apt-get update, but also when apt-get install/upgrade fails mid-download.

Changes:

  • Extracted apt mirror fallback logic into force_archive_mirror() and reused it from update retries.
  • Added apt_install_retry() (and apt_upgrade_retry() in the agent image) to force the archive mirror before retrying installs/upgrades.
  • Updated Dockerfile RUN blocks to use the new retry helpers consistently.
Show a summary per file
File Description
containers/squid/Dockerfile Adds force_archive_mirror + apt_install_retry and wires them into the Squid image’s apt flow.
containers/agent/Dockerfile Adds force_archive_mirror + apt_install_retry + apt_upgrade_retry across multiple apt blocks to reduce flaky CI builds.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 9

Comment on lines 31 to 34
if [ -d /etc/apt/sources.list.d ]; then \
find /etc/apt/sources.list.d -name '*.sources' -exec \
sed -i 's|http://azure.archive.ubuntu.com|http://archive.ubuntu.com|g' {} + 2>/dev/null || true; \
fi; \
Comment on lines +38 to +45
local i; for i in 1 2 3; do \
rm -rf /var/lib/apt/lists/* && apt-get update 2>&1 | tee /tmp/apt-update.log && \
if ! grep -q "Failed to fetch" /tmp/apt-update.log; then return 0; fi; \
echo "apt-get update attempt $i/3 had fetch failures, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
done; \
echo "All apt-get update retries failed, falling back to archive.ubuntu.com..." >&2; \
force_archive_mirror; \
}; \
Comment on lines 41 to 44
if [ -d /etc/apt/sources.list.d ]; then \
find /etc/apt/sources.list.d -name '*.sources' -exec \
sed -i 's|http://azure.archive.ubuntu.com|http://archive.ubuntu.com|g' {} + 2>/dev/null || true; \
fi; \
Comment on lines +47 to +55
apt_update_retry() { \
local i; for i in 1 2 3; do \
rm -rf /var/lib/apt/lists/* && apt-get update 2>&1 | tee /tmp/apt-update.log && \
if ! grep -q "Failed to fetch" /tmp/apt-update.log; then return 0; fi; \
echo "apt-get update attempt $i/3 had fetch failures, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
done; \
echo "All apt-get update retries failed, falling back to archive.ubuntu.com..." >&2; \
force_archive_mirror; \
}; \
Comment on lines 99 to 102
if [ -d /etc/apt/sources.list.d ]; then \
find /etc/apt/sources.list.d -name '*.sources' -exec \
sed -i 's|http://azure.archive.ubuntu.com|http://archive.ubuntu.com|g' {} + 2>/dev/null || true; \
fi; \
Comment on lines +105 to +113
apt_update_retry() { \
local i; for i in 1 2 3; do \
rm -rf /var/lib/apt/lists/* && apt-get update 2>&1 | tee /tmp/apt-update.log && \
if ! grep -q "Failed to fetch" /tmp/apt-update.log; then return 0; fi; \
echo "apt-get update attempt $i/3 had fetch failures, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
done; \
echo "All apt-get update retries failed, falling back to archive.ubuntu.com..." >&2; \
force_archive_mirror; \
}; \
Comment on lines 134 to 137
if [ -d /etc/apt/sources.list.d ]; then \
find /etc/apt/sources.list.d -name '*.sources' -exec \
sed -i 's|http://azure.archive.ubuntu.com|http://archive.ubuntu.com|g' {} + 2>/dev/null || true; \
fi; \
Comment on lines +140 to +148
apt_update_retry() { \
local i; for i in 1 2 3; do \
rm -rf /var/lib/apt/lists/* && apt-get update 2>&1 | tee /tmp/apt-update.log && \
if ! grep -q "Failed to fetch" /tmp/apt-update.log; then return 0; fi; \
echo "apt-get update attempt $i/3 had fetch failures, retrying in $((i*10))s..." >&2; sleep $((i*10)); \
done; \
echo "All apt-get update retries failed, falling back to archive.ubuntu.com..." >&2; \
force_archive_mirror; \
}; \
Comment on lines +203 to +207
if [ -d /etc/apt/sources.list.d ]; then \
find /etc/apt/sources.list.d -name '*.sources' -exec \
sed -i 's|http://azure.archive.ubuntu.com|http://archive.ubuntu.com|g' {} + 2>/dev/null || true; \
fi; \
rm -rf /var/lib/apt/lists/* && apt-get update; \
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@lpcox Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

Address review feedback on the apt mirror fallback helpers:

- force_archive_mirror: also rewrite security.ubuntu.com (not just
  azure.archive.ubuntu.com) inside deb822 /etc/apt/sources.list.d/*.sources
  entries, matching the /etc/apt/sources.list branch. Otherwise, if the
  initial Azure-mirror rewrite never ran (e.g. DNS failure) and the base
  image uses .sources files, the fallback would leave security.ubuntu.com
  in place and apt-get update could keep failing.

- apt_update_retry: stop masking apt-get update's exit code. The previous
  'apt-get update 2>&1 | tee' pipeline returned tee's status (0) under
  /bin/sh (no pipefail), so a non-"Failed to fetch" failure (e.g. a dpkg
  lock error) was treated as success and never retried. Redirect to a log
  file, capture the real exit code, and retry/fall back when either the
  command fails or "Failed to fetch" appears.

Applied consistently across all apt blocks in both Dockerfiles.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lpcox

lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the review feedback in 4608238:

1. security.ubuntu.com not rewritten in deb822 .sources files (5 occurrences)
force_archive_mirror now rewrites both azure.archive.ubuntu.com and security.ubuntu.comarchive.ubuntu.com inside /etc/apt/sources.list.d/*.sources, matching the /etc/apt/sources.list branch. So even if the initial Azure rewrite never ran (DNS failure) on a deb822 base image, the fallback no longer leaves security.ubuntu.com pointing at a failing host.

2. tee masking apt-get update exit code (4 occurrences)
Replaced apt-get update 2>&1 | tee … with a redirect to a log file plus an explicit exit-code check. Under /bin/sh (no pipefail) the old pipeline returned tee's status (0), so a non-Failed to fetch failure (e.g. a dpkg lock error) was treated as success and never retried. Now the helper retries/falls back when either the command exits non-zero or Failed to fetch appears.

Applied consistently across all apt blocks in both Dockerfiles.

Validation: all 12 RUN blocks pass bash -n; a behavioral simulation confirms the three cases — clean success returns immediately (no retry/fallback), a non-fetch hard failure now retries 3× and falls back (previously masked as success), and a fetch-warning failure retries and falls back. Docker wasn't available locally for a full image build.

The runGhCommand tests spawn the real `gh` binary, so they are subject
to runner contention. Jest's 5s default could fire before the helper's
own 30s COMMAND_TIMEOUT_MS, producing a spurious "Exceeded timeout of
5000 ms" failure on a slow runner (observed on Node 22). Give these
tests a 30s timeout to match the internal command timeout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Smoke test completed with FAIL status. Comment added to PR #5266.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

@github-actions

Copy link
Copy Markdown
Contributor

🤖 Smoke Test Results — Copilot Engine Validation

PR: fix(containers): apt install fallback to archive.ubuntu.com
Author: @lpcox

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity ❌ pre-step data not resolved
File write/read ❌ pre-step data not resolved

Overall: FAIL — Pre-step data collection did not resolve template variables (${{ steps.smoke-data.outputs.* }}). MCP connectivity confirmed working; HTTP and file tests could not be verified.

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Status Details
1. Module Loading otel.js loads successfully; exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + internal helpers
2. Test Suite 59/59 tests pass across otel.test.js + otel-fanout.test.js (2 suites, 0 failures)
3. Env Var Forwarding api-proxy-service-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
4. Token Tracker Integration onUsage callback present in token-tracker-http.js (line 283/324) as OTEL hook point
5. OTEL Diagnostics i️ No spans exported — no live api-proxy container in this smoke run (expected)

All scenarios pass. OTEL integration is functional.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

Merged PRs:
✅ ci(smoke): add token-usage sanity checks to smoke workflows
✅ fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
Queried PRs:
✅ fix(containers): apt install fallback to archive.ubuntu.com
✅ docs: sync schemas and specs with source changes
Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.16.0 v22.22.3 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP connectivity: ✅
  • GitHub.com connectivity: ✅
  • File write/read test: ✅
  • Direct BYOK inference path: ✅

Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra

Overall: PASS

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

🔐 Smoke Test: Copilot PAT Auth — PASS

Test Result
GitHub MCP connectivity
GitHub.com HTTP (200)
File write/read

Overall: PASS · Auth mode: PAT (COPILOT_GITHUB_TOKEN)

cc @lpcox

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode ✅ PASS

  • ✅ MCP connectivity (github-list_pull_requests)
  • ✅ GitHub.com connectivity (HTTP 200)
  • ✅ File write/read capability (/etc/hosts)
  • ✅ BYOK inference path (agent → api-proxy → api.githubcopilot.com)

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY via api-proxy sidecar).

cc @lpcox

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox - PRs: fix(containers): apt install fallback to archive.ubuntu.com; docs: sync schemas and specs with source changes
✅ GitHub MCP tool | ✅ HTTP connectivity | ✅ File I/O | ✅ BYOK inference
Running direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5266 · 49 AIC · ⊞ 7.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results

Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results — Services Connectivity

Check Result
Redis PING ❌ no response
PostgreSQL pg_isready no response
PostgreSQL SELECT 1 ❌ no response

host.docker.internal resolves to 172.17.0.1 but ports 6379 and 5432 are not reachable.

Overall: FAIL

🔌 Service connectivity validated by Smoke Services

@lpcox lpcox merged commit 1dc8335 into main Jun 19, 2026
85 of 89 checks passed
@lpcox lpcox deleted the fix/apt-mirror-install-fallback branch June 19, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants