Skip to content

feat(orchestrator): introduce apt cache proxy for sandbox provisioning#2623

Open
arkamar wants to merge 7 commits into
mainfrom
feat/apt-cache
Open

feat(orchestrator): introduce apt cache proxy for sandbox provisioning#2623
arkamar wants to merge 7 commits into
mainfrom
feat/apt-cache

Conversation

@arkamar
Copy link
Copy Markdown
Contributor

@arkamar arkamar commented May 11, 2026

Introduces an apt cache proxy for sandbox provisioning, controlled by the apt-cache-enabled LaunchDarkly feature flag and APT_PROXY_URL env var. When enabled, apt-get operations during template builds go through the configured proxy, speeding up repeated builds and insulating provisioning from upstream apt repository outages or rate limits. The proxy config is cleaned up in the finalize phase so it doesn't leak into end-user sandboxes.

arkamar added 5 commits May 7, 2026 09:38
Introduce an apt caching layer to speed up apt-get operations during
template builds and sandbox runtime. An apt-cacher-ng container is added
to the local-dev Docker Compose stack, and the orchestrator conditionally
injects an apt proxy config into the sandbox rootfs.

Controlled by the 'apt-cache-enabled' LaunchDarkly feature flag and the
APT_PROXY_URL environment variable on the orchestrator.
Writing the apt proxy config in provision.sh instead of injecting it as
an OCI layer makes it easier to add conditional logic for other package
managers (dnf, apk) if non-Debian distros are supported in the future.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 11, 2026

PR Summary

Medium Risk
Touches template provisioning scripts and build cache keying, so misconfiguration could cause failed builds or unexpected cache reuse. Changes are gated by a feature flag but affect a core build path when enabled.

Overview
Adds an apt-cache-enabled feature flag and APT_PROXY_URL config to optionally inject an APT proxy into the template provisioning script, then removes the proxy config during finalize. Also introduces a local-dev apt-cacher-ng service and includes the effective proxy URL in base layer cache hashing/telemetry, but if the flag is enabled with an empty APT_PROXY_URL the build will silently proceed without proxy and cache invalidation won’t reflect the toggle.

Reviewed by Cursor Bugbot for commit 6402343. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

❌ 8 Tests Failed:

Tests completed Failed Passed Skipped
2618 8 2610 7
View the full list of 11 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestTeamMetrics

Flake rate in main: 70.29% (Passed 153 times, Failed 362 times)

Stack Traces | 2.06s run time
=== RUN   TestTeamMetrics
=== PAUSE TestTeamMetrics
=== CONT  TestTeamMetrics
    team_metrics_test.go:61: 
        	Error Trace:	.../api/metrics/team_metrics_test.go:61
        	Error:      	Should be true
        	Test:       	TestTeamMetrics
        	Messages:   	MaxConcurrentSandboxes should be >= 0
--- FAIL: TestTeamMetrics (2.06s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.46% (Passed 161 times, Failed 523 times)

Stack Traces | 43.8s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (43.76s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 76.93% (Passed 155 times, Failed 517 times)

Stack Traces | 2.3s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox io0jv035fczu16bdifaev
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1360}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
Executing command curl in sandbox io0jv035fczu16bdifaev
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1361}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1362}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Wed, 13 May 2026 16:42:40 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox io0jv035fczu16bdifaev
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (2.30s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 56.94% (Passed 267 times, Failed 353 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_0_0_0_0

Flake rate in main: 62.81% (Passed 151 times, Failed 255 times)

Stack Traces | 6.81s run time
=== RUN   TestBindLocalhost/bind_0_0_0_0
=== PAUSE TestBindLocalhost/bind_0_0_0_0
=== CONT  TestBindLocalhost/bind_0_0_0_0
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1265}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_0_0_0_0
        	Messages:   	Unexpected status code 502 for bind address 0.0.0.0
--- FAIL: TestBindLocalhost/bind_0_0_0_0 (6.81s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_127_0_0_1

Flake rate in main: 57.97% (Passed 153 times, Failed 211 times)

Stack Traces | 7.82s run time
=== RUN   TestBindLocalhost/bind_127_0_0_1
=== PAUSE TestBindLocalhost/bind_127_0_0_1
=== CONT  TestBindLocalhost/bind_127_0_0_1
Executing command python in sandbox i2z07h6a8cbftaylej06h
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_127_0_0_1
        	Messages:   	Unexpected status code 502 for bind address 127.0.0.1
--- FAIL: TestBindLocalhost/bind_127_0_0_1 (7.82s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::

Flake rate in main: 56.03% (Passed 153 times, Failed 195 times)

Stack Traces | 7.68s run time
=== RUN   TestBindLocalhost/bind_::
=== PAUSE TestBindLocalhost/bind_::
=== CONT  TestBindLocalhost/bind_::
Executing command python in sandbox iw453zxlhtz0k84ecaviy
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::
        	Messages:   	Unexpected status code 502 for bind address ::
--- FAIL: TestBindLocalhost/bind_:: (7.68s)
Executing command python in sandbox isk21p39v9bjsnzyxoska
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.55% (Passed 151 times, Failed 275 times)

Stack Traces | 8.87s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (8.87s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_localhost

Flake rate in main: 64.39% (Passed 151 times, Failed 273 times)

Stack Traces | 7.69s run time
=== RUN   TestBindLocalhost/bind_localhost
=== PAUSE TestBindLocalhost/bind_localhost
=== CONT  TestBindLocalhost/bind_localhost
Executing command python in sandbox iviehn9tf6bl7kyyudk2f
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1265}}
Executing command python in sandbox is5t4ggq2d33q49dqn8o2
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_localhost
        	Messages:   	Unexpected status code 502 for bind address localhost
--- FAIL: TestBindLocalhost/bind_localhost (7.69s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.18% (Passed 161 times, Failed 315 times)

Stack Traces | 77.1s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (77.12s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 67.17% (Passed 151 times, Failed 309 times)

Stack Traces | 48.4s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1264}}
Executing command bash in sandbox ist0dy782socwga4ombw3 (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory before tmpfs mount: 188 MB\nFree memory before tmpfs mount: 797 MB\nMemory to use in integrity test (80% of free, min 64MB): 637 MB\n"}}
Executing command bash in sandbox ist0dy782socwga4ombw3 (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"637+0 records in\n637+0 records out\n667942912 bytes (668 MB, 637 MiB) copied, 9.89743 s, 67.5 MB/s\n\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=637\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 9.65\n\tPercent of CPU this job got: 97%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:09.94\n\tAverage shared text size ("}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2628\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 344\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 40\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 833 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox ij2ztkr101c4p50kux69x
Executing command bash in sandbox ij2ztkr101c4p50kux69x (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1280}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"3d3478fc29884cd20d5a12a4cd0a7fa4ff7ef459d6cba1bc50cf16b7b2228c7f\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox ij2ztkr101c4p50kux69x
Executing command bash in sandbox ij2ztkr101c4p50kux69x (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1283}}
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{data:{stdout:"3d3478fc29884cd20d5a12a4cd0a7fa4ff7ef459d6cba1bc50cf16b7b2228c7f\n"}}
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:99: Command [bash] completed successfully in sandbox ij2ztkr101c4p50kux69x
Executing command bash in sandbox ij2ztkr101c4p50kux69x (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1286}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox ij2ztkr101c4p50kux69x: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (48.44s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The provisioning script uses double quotes for the APT_PROXY_URL assignment, which allows for potential command injection. The APT configuration also lacks an HTTPS proxy definition, causing requests to HTTPS repositories to bypass the cache.

Comment thread packages/orchestrator/pkg/template/build/phases/base/provision.sh
Comment thread packages/orchestrator/pkg/template/build/phases/base/provision.sh
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Comment thread packages/orchestrator/pkg/template/build/phases/base/provision.sh Outdated
Proxy URLs may contain credentials. Since provisioning logs are visible
to template builders, omit the URL from the log message entirely.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1b72b2728

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

BusyBox: rootfs.SandboxBusyBoxPath,
ResultPath: provisionScriptResultPath,
Provider: buildContext.BuilderConfig.Provider,
AptProxyURL: aptProxyURL,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include apt proxy setting in the base layer hash

When apt-cache-enabled or APT_PROXY_URL changes after a base layer has already been cached, this rendered AptProxyURL does not affect the base hash: Hash still keys on the unrendered provisionScriptFile (or the manual provision version), so the phase cache can skip Build and reuse a rootfs with a stale or missing /etc/apt/apt.conf.d/00-e2b-build-proxy. In that scenario downstream build steps run with the previous proxy state until someone forces a rebuild or bumps the provision version, defeating flag rollouts and potentially leaving builds pointed at an old proxy URL.

Useful? React with 👍 / 👎.

…changes

The base layer hash did not account for the apt proxy configuration.
When the apt-cache feature flag was toggled or the proxy URL changed,
builds would reuse a stale cached rootfs with the old proxy state. The
proxy URL is now included as a hash key only when non-empty, so existing
caches are not invalidated unnecessarily.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 640234302e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

attribute.String("provision_version", provisionVersion),
attribute.String("base_source", baseSource),
attribute.Int64("disk_size_mb", bb.Config.DiskSizeMB),
attribute.String("apt_proxy_url", aptProxyURL),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid exporting proxy credentials in telemetry

If APT_PROXY_URL is configured with embedded credentials (for example http://user:pass@proxy:3142, which apt proxy URLs allow), this attribute sends the full secret-bearing URL to the tracing backend for every base hash calculation. The value only needs to affect cache invalidation, so telemetry should record a redacted URL, host, or enabled flag instead of the raw config value.

Useful? React with 👍 / 👎.

@ValentaTomas
Copy link
Copy Markdown
Member

@tvi Is this hitting the same problems we discussed for the proxy before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants