Skip to content

perf(compression) - hardcode Zstd decode concurrency to 1#2648

Open
levb wants to merge 2 commits into
mainfrom
lev-zstd-decoder-concurrency-1
Open

perf(compression) - hardcode Zstd decode concurrency to 1#2648
levb wants to merge 2 commits into
mainfrom
lev-zstd-decoder-concurrency-1

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented May 13, 2026

With sufficiently high concurrency of callers, using the built-in decoder concurrency on our 2Mb frames was detrimental to the throughput. Setting the decoder concurrency to 1 also saves on allocations.

Findings:

  • Sequential: default beats concurrency=1 by ~25% (1535 vs 1235 MB/s)
  • Parallel (16): concurrency=1 beats default by ~30% (10328 vs 7920 MB/s aggregate)
  • Allocations: concurrency=1 has 1 alloc/op vs 18 (-94%)

Compares zstd.NewReader(r) vs zstd.NewReader(r, WithDecoderConcurrency(1))
under both sequential and parallel patterns. Decoders are pulled from a
sync.Pool with Reset reuse, mirroring production's getZstdDecoder. Source
data is real binaries (3.5× compression ratio), 40 chunks of 2 MiB each
to match DefaultCompressFrameSize. Skips on systems without the candidate
data files.

Benchmark output captured on AMD Ryzen 7 8845HS (16 cores, GOMAXPROCS=16),
80 MiB raw → 22.78 MiB compressed (avg frame 583 KiB):

=== source data ===
source: /home/lev/dev/infra/packages/orchestrator/orchestrator
chunks: 40 (raw=80 MiB, comp=22.78 MiB)
ratio: 3.511x (raw/comp)
comp size: min=139715 B (136 KiB), max=1497120 B (1462 KiB), avg=597239 B (583 KiB)

goos: linux
goarch: amd64
pkg: zstdbench
cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
BenchmarkDecodeDefault-16 2562 1367280 ns/op 1533.81 MB/s 9887 B/op 18 allocs/op
BenchmarkDecodeDefault-16 2553 1364731 ns/op 1536.68 MB/s 9900 B/op 18 allocs/op
BenchmarkDecodeDefault-16 2497 1365620 ns/op 1535.68 MB/s 10091 B/op 18 allocs/op
BenchmarkDecodeConcurrency1-16 2079 1703633 ns/op 1230.99 MB/s 5172 B/op 1 allocs/op
BenchmarkDecodeConcurrency1-16 2070 1696848 ns/op 1235.91 MB/s 5185 B/op 1 allocs/op
BenchmarkDecodeConcurrency1-16 2020 1697646 ns/op 1235.33 MB/s 5322 B/op 1 allocs/op
BenchmarkDecodeDefault_Parallel-16 13562 264797 ns/op 7919.85 MB/s 27322 B/op 18 allocs/op
BenchmarkDecodeDefault_Parallel-16 13591 264851 ns/op 7918.24 MB/s 27964 B/op 18 allocs/op
BenchmarkDecodeDefault_Parallel-16 13576 264678 ns/op 7923.40 MB/s 28114 B/op 18 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17623 203340 ns/op 10313.51 MB/s 9827 B/op 1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17707 203043 ns/op 10328.59 MB/s 9795 B/op 1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17697 202858 ns/op 10338.02 MB/s 9816 B/op 1 allocs/op
PASS
ok zstdbench 57.959s

levb added 2 commits May 13, 2026 12:17
Compares zstd.NewReader(r) vs zstd.NewReader(r, WithDecoderConcurrency(1))
under both sequential and parallel patterns. Decoders are pulled from a
sync.Pool with Reset reuse, mirroring production's getZstdDecoder. Source
data is real binaries (3.5× compression ratio), 40 chunks of 2 MiB each
to match DefaultCompressFrameSize. Skips on systems without the candidate
data files.

Benchmark output captured on AMD Ryzen 7 8845HS (16 cores, GOMAXPROCS=16),
80 MiB raw → 22.78 MiB compressed (avg frame 583 KiB):

=== source data ===
  source: /home/lev/dev/infra/packages/orchestrator/orchestrator
  chunks: 40 (raw=80 MiB, comp=22.78 MiB)
  ratio:  3.511x (raw/comp)
  comp size: min=139715 B (136 KiB), max=1497120 B (1462 KiB), avg=597239 B (583 KiB)

goos: linux
goarch: amd64
pkg: zstdbench
cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
BenchmarkDecodeDefault-16                  2562    1367280 ns/op  1533.81 MB/s   9887 B/op   18 allocs/op
BenchmarkDecodeDefault-16                  2553    1364731 ns/op  1536.68 MB/s   9900 B/op   18 allocs/op
BenchmarkDecodeDefault-16                  2497    1365620 ns/op  1535.68 MB/s  10091 B/op   18 allocs/op
BenchmarkDecodeConcurrency1-16             2079    1703633 ns/op  1230.99 MB/s   5172 B/op    1 allocs/op
BenchmarkDecodeConcurrency1-16             2070    1696848 ns/op  1235.91 MB/s   5185 B/op    1 allocs/op
BenchmarkDecodeConcurrency1-16             2020    1697646 ns/op  1235.33 MB/s   5322 B/op    1 allocs/op
BenchmarkDecodeDefault_Parallel-16        13562     264797 ns/op  7919.85 MB/s  27322 B/op   18 allocs/op
BenchmarkDecodeDefault_Parallel-16        13591     264851 ns/op  7918.24 MB/s  27964 B/op   18 allocs/op
BenchmarkDecodeDefault_Parallel-16        13576     264678 ns/op  7923.40 MB/s  28114 B/op   18 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16   17623     203340 ns/op 10313.51 MB/s   9827 B/op    1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16   17707     203043 ns/op 10328.59 MB/s   9795 B/op    1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16   17697     202858 ns/op 10338.02 MB/s   9816 B/op    1 allocs/op
PASS
ok      zstdbench       57.959s

Findings:
- Sequential: default beats concurrency=1 by ~25% (1535 vs 1235 MB/s)
- Parallel:   concurrency=1 beats default by ~30% (10328 vs 7920 MB/s aggregate)
- Allocations: concurrency=1 has 1 alloc/op vs 18 (-94%)
The comment at compress_decode.go:30-31 said

    // zstd concurrency is hardcoded to 1: benchmarks show higher values hurt
    // throughput for single 2MiB frame decodes.

but the code called zstd.NewReader(r) with no options, defaulting to
GOMAXPROCS internal worker goroutines per decoder. The previous commit's
benchmark confirms the comment's intent: under the production workload
(many concurrent decoders sharing a sync.Pool), concurrency=1 wins
~30% on aggregate throughput and reduces allocations 18→1 per decode.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
2618 6 2612 7
View the top 1 failed test(s) by shortest run time
github.com/e2b-dev/infra/packages/shared/pkg/storage::TestRetryableClient_ActualRetryBehavior
Stack Traces | 0.36s run time
=== RUN   TestRetryableClient_ActualRetryBehavior
=== PAUSE TestRetryableClient_ActualRetryBehavior
=== CONT  TestRetryableClient_ActualRetryBehavior
    gcp_multipart_test.go:1014: 
        	Error Trace:	.../pkg/storage/gcp_multipart_test.go:1014
        	Error:      	"232.173874ms" is not less than "200ms"
        	Test:       	TestRetryableClient_ActualRetryBehavior
--- FAIL: TestRetryableClient_ActualRetryBehavior (0.36s)
View the full list of 7 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.44% (Passed 164 times, Failed 532 times)

Stack Traces | 183s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (183.40s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 76.90% (Passed 158 times, Failed 526 times)

Stack Traces | 5.54s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox im1ajj9kekb5uxn9gkqzg
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1364}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox im1ajj9kekb5uxn9gkqzg
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox im1ajj9kekb5uxn9gkqzg
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1366}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Wed, 13 May 2026 20:11:56 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox im1ajj9kekb5uxn9gkqzg
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (5.54s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost

Flake rate in main: 56.80% (Passed 273 times, Failed 359 times)

Stack Traces | 0s run time
=== RUN   TestBindLocalhost
=== PAUSE TestBindLocalhost
=== CONT  TestBindLocalhost
--- FAIL: TestBindLocalhost (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestBindLocalhost/bind_::1

Flake rate in main: 64.43% (Passed 154 times, Failed 279 times)

Stack Traces | 6.93s run time
=== RUN   TestBindLocalhost/bind_::1
=== PAUSE TestBindLocalhost/bind_::1
=== CONT  TestBindLocalhost/bind_::1
Executing command python in sandbox i4xhkn9195hyud0jpv9ce
    localhost_bind_test.go:69: Command [python] output: event:{start:{pid:1254}}
    localhost_bind_test.go:90: 
        	Error Trace:	.../tests/envd/localhost_bind_test.go:90
        	Error:      	Not equal: 
        	            	expected: 200
        	            	actual  : 502
        	Test:       	TestBindLocalhost/bind_::1
        	Messages:   	Unexpected status code 502 for bind address ::1
--- FAIL: TestBindLocalhost/bind_::1 (6.93s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.05% (Passed 164 times, Failed 319 times)

Stack Traces | 82.4s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (82.43s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 67.02% (Passed 154 times, Failed 313 times)

Stack Traces | 26.9s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1259}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 184 MB\nFree memory before tmpfs mount: 800 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Memory to use in integrity test (80% of free, min 64MB): 640 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"640+0 records in\n640+0 records out\n671088640 bytes (671 MB, 640 MiB) copied, 3.32744 s, 202 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=640\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.30\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.33\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2652\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 340\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 72\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 833 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox in8xrn57nijjpqgpwl82e
Executing command bash in sandbox in8xrn57nijjpqgpwl82e (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1275}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"29cff6122989036db2ed3260e252d10f3e4b24b020a87047627617ae3757436c\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox in8xrn57nijjpqgpwl82e
Executing command bash in sandbox in8xrn57nijjpqgpwl82e (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1278}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox in8xrn57nijjpqgpwl82e: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (26.90s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxWithTrafficAccessTokenAutoResumeViaProxy

Flake rate in main: 54.73% (Passed 158 times, Failed 191 times)

Stack Traces | 21.5s run time
=== RUN   TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
=== PAUSE TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
=== CONT  TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
Executing command apt-get in sandbox iou2pojwog47c41xpenck (user: root)
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i8bin7h3h3uluikez5tof","message":"The sandbox is running but port is not open","port":8080,"code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i8bin7h3h3uluikez5tof","message":"The sandbox is running but port is not open","port":8080,"code":502}
    traffic_access_token_test.go:263: [Status code: 502] Response body: {"sandboxId":"i8bin7h3h3uluikez5tof","message":"The sandbox is running but port is not open","port":8080,"code":502}
Executing command ls in sandbox ivh9cxua4lsx3usrnsqi8
    traffic_access_token_test.go:292: 
        	Error Trace:	.../tests/proxies/traffic_access_token_test.go:292
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestSandboxWithTrafficAccessTokenAutoResumeViaProxy
--- FAIL: TestSandboxWithTrafficAccessTokenAutoResumeViaProxy (21.50s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants