Skip to content

(fix)compression: increase storage gRPC connections to 8#2644

Merged
levb merged 1 commit into
mainfrom
lev-fix-8-grpc-connections
May 13, 2026
Merged

(fix)compression: increase storage gRPC connections to 8#2644
levb merged 1 commit into
mainfrom
lev-fix-8-grpc-connections

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented May 13, 2026

Fetching in compressed mode, we make more GCS requests (2MbU at a time, not 4). Increasing the number of gRPC connection to 8 appears to have improved the performance - better fetch P9x and fewer health check failures.

@levb levb requested a review from djeebus May 13, 2026 18:01
@cla-bot cla-bot Bot added the cla-signed label May 13, 2026
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

❌ 11 Tests Failed:

Tests completed Failed Passed Skipped
2618 11 2607 7
View the top 2 failed test(s) by shortest run time
github.com/e2b-dev/infra/packages/orchestrator/pkg/sandbox/uffd/userfaultfd::TestRemoveThenWriteGated
Stack Traces | 0s run time
=== RUN   TestRemoveThenWriteGated
=== PAUSE TestRemoveThenWriteGated
=== CONT  TestRemoveThenWriteGated
--- FAIL: TestRemoveThenWriteGated (0.00s)
github.com/e2b-dev/infra/packages/orchestrator/pkg/sandbox/uffd/userfaultfd::TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
Stack Traces | 0.31s run time
=== RUN   TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
=== PAUSE TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
=== CONT  TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
    remove_test.go:289: 
        	Error Trace:	.../uffd/userfaultfd/remove_test.go:289
        	Error:      	elements differ
        	            	
        	            	extra elements in list A:
        	            	([]interface {}) (len=1) {
        	            	 (uint) 0
        	            	}
        	            	
        	            	
        	            	listA:
        	            	([]uint) (len=1) {
        	            	 (uint) 0
        	            	}
        	            	
        	            	
        	            	listB:
        	            	([]uint) <nil>
        	Test:       	TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
    remove_test.go:290: 
        	Error Trace:	.../uffd/userfaultfd/remove_test.go:290
        	Error:      	Should be empty, but was [0]
        	Test:       	TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write
--- FAIL: TestRemoveThenWriteGated/4k_gated_remove_with_concurrent_write (0.31s)
View the full list of 10 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/metrics::TestTeamMetrics

Flake rate in main: 70.29% (Passed 153 times, Failed 362 times)

Stack Traces | 1.33s run time
=== RUN   TestTeamMetrics
=== PAUSE TestTeamMetrics
=== CONT  TestTeamMetrics
    team_metrics_test.go:61: 
        	Error Trace:	.../api/metrics/team_metrics_test.go:61
        	Error:      	Should be true
        	Test:       	TestTeamMetrics
        	Messages:   	MaxConcurrentSandboxes should be >= 0
--- FAIL: TestTeamMetrics (1.33s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.46% (Passed 161 times, Failed 523 times)

Stack Traces | 224s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (224.34s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 76.93% (Passed 155 times, Failed 517 times)

Stack Traces | 5.59s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox ihvdd0272awzih90hep5h
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
Executing command curl in sandbox ihvdd0272awzih90hep5h
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1366}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35  exited:true  status:"exit status 35"  error:"exit status 35"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1367}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Wed, 13 May 2026 18:11:38 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox ihvdd0272awzih90hep5h
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (5.59s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 56.94% (Passed 267 times, Failed 353 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_0_0_0_0

Flake rate in main: 62.81% (Passed 151 times, Failed 255 times)

Stack Traces | 11.6s run time
=== RUN   TestBindLocalhost/bind_0_0_0_0
=== PAUSE TestBindLocalhost/bind_0_0_0_0
=== CONT  TestBindLocalhost/bind_0_0_0_0
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
Executing command python in sandbox ipojsw89g6et0vvqidt84
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_0_0_0_0
        	Messages:   	Unexpected status code 502 for bind address 0.0.0.0
--- FAIL: TestBindLocalhost/bind_0_0_0_0 (11.60s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_127_0_0_1

Flake rate in main: 57.97% (Passed 153 times, Failed 211 times)

Stack Traces | 7.78s run time
=== RUN   TestBindLocalhost/bind_127_0_0_1
=== PAUSE TestBindLocalhost/bind_127_0_0_1
=== CONT  TestBindLocalhost/bind_127_0_0_1
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1250}}
Executing command python in sandbox ibm6xbb0606xdt92m3s08
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_127_0_0_1
        	Messages:   	Unexpected status code 502 for bind address 127.0.0.1
--- FAIL: TestBindLocalhost/bind_127_0_0_1 (7.78s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.55% (Passed 151 times, Failed 275 times)

Stack Traces | 10.5s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
Executing command python in sandbox iv73nqd36pgprlj95mfop
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (10.52s)
Executing command python in sandbox ir799t84p4dorlp4zm371
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_localhost

Flake rate in main: 64.39% (Passed 151 times, Failed 273 times)

Stack Traces | 9.07s run time
=== RUN   TestBindLocalhost/bind_localhost
=== PAUSE TestBindLocalhost/bind_localhost
=== CONT  TestBindLocalhost/bind_localhost
Executing command python in sandbox icaumylthveg63mxs84h9
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1260}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_localhost
        	Messages:   	Unexpected status code 502 for bind address localhost
--- FAIL: TestBindLocalhost/bind_localhost (9.07s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.18% (Passed 161 times, Failed 315 times)

Stack Traces | 79.5s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (79.46s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 67.17% (Passed 151 times, Failed 309 times)

Stack Traces | 27.4s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1262}}
Executing command bash in sandbox i7yesstm2acku2xbh88yl (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 185 MB\nFree memory before tmpfs mount: 799 MB\nMemory to use in integrity test (80% of free, min 64MB): 639 MB\n"}}
Executing command bash in sandbox i7yesstm2acku2xbh88yl (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"639+0 records in\n639+0 records out\n670040064 bytes (670 MB, 639 MiB) copied, 3.73192 s, 180 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=639\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.71\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.73\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2716\n\tAverage resident set size (kbyte"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s): 0\n\tMajor (requiring I/O) page fault"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s: 3\n\tMinor ("}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"reclaimin"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"g a frame"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:") page faults: 344\n\tVolunt"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"ary contex"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"t switche"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"s: "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"4\n\tInvoluntary context switches: 18\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 830 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox iqsampem9y01w0x8vdbnp
Executing command bash in sandbox iqsampem9y01w0x8vdbnp (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1279}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"a42d7e9263cefc836df23d0295d911f94817a4d98edf5181a4f0580c6f3b0461\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox iqsampem9y01w0x8vdbnp
Executing command bash in sandbox iqsampem9y01w0x8vdbnp (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1282}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox iqsampem9y01w0x8vdbnp: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (27.41s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request increases the default gRPC connection pool size for Google Cloud Storage from 4 to 8. I have no feedback to provide.

@levb levb enabled auto-merge (squash) May 13, 2026 18:14
@levb levb merged commit 7de6855 into main May 13, 2026
52 of 53 checks passed
@levb levb deleted the lev-fix-8-grpc-connections branch May 13, 2026 18:21
ValentaTomas pushed a commit that referenced this pull request May 13, 2026
Fetching in compressed mode, we make more GCS requests (2MbU at a time,
not 4). Increasing the number of gRPC connection to 8 appears to have
improved the performance - better fetch P9x and fewer health check
failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants