Commit fb759bc
authored
perf(compression) - hardcode Zstd decode concurrency to 1 (#2648)
With sufficiently high concurrency of callers, using the built-in
decoder concurrency on our 2Mb frames was detrimental to the throughput.
Setting the decoder concurrency to 1 also saves on allocations.
Findings:
- Sequential: default beats concurrency=1 by ~25% (1535 vs 1235 MB/s)
- Parallel (16): concurrency=1 beats default by ~30% (10328 vs 7920 MB/s
aggregate)
- Allocations: concurrency=1 has 1 alloc/op vs 18 (-94%)
Compares zstd.NewReader(r) vs zstd.NewReader(r,
WithDecoderConcurrency(1))
under both sequential and parallel patterns. Decoders are pulled from a
sync.Pool with Reset reuse, mirroring production's getZstdDecoder.
Source
data is real binaries (3.5× compression ratio), 40 chunks of 2 MiB each
to match DefaultCompressFrameSize. Skips on systems without the
candidate
data files.
Benchmark output captured on AMD Ryzen 7 8845HS (16 cores,
GOMAXPROCS=16),
80 MiB raw → 22.78 MiB compressed (avg frame 583 KiB):
=== source data ===
source: /home/lev/dev/infra/packages/orchestrator/orchestrator
chunks: 40 (raw=80 MiB, comp=22.78 MiB)
ratio: 3.511x (raw/comp)
comp size: min=139715 B (136 KiB), max=1497120 B (1462 KiB), avg=597239
B (583 KiB)
goos: linux
goarch: amd64
pkg: zstdbench
cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
BenchmarkDecodeDefault-16 2562 1367280 ns/op 1533.81 MB/s 9887 B/op 18
allocs/op
BenchmarkDecodeDefault-16 2553 1364731 ns/op 1536.68 MB/s 9900 B/op 18
allocs/op
BenchmarkDecodeDefault-16 2497 1365620 ns/op 1535.68 MB/s 10091 B/op 18
allocs/op
BenchmarkDecodeConcurrency1-16 2079 1703633 ns/op 1230.99 MB/s 5172 B/op
1 allocs/op
BenchmarkDecodeConcurrency1-16 2070 1696848 ns/op 1235.91 MB/s 5185 B/op
1 allocs/op
BenchmarkDecodeConcurrency1-16 2020 1697646 ns/op 1235.33 MB/s 5322 B/op
1 allocs/op
BenchmarkDecodeDefault_Parallel-16 13562 264797 ns/op 7919.85 MB/s 27322
B/op 18 allocs/op
BenchmarkDecodeDefault_Parallel-16 13591 264851 ns/op 7918.24 MB/s 27964
B/op 18 allocs/op
BenchmarkDecodeDefault_Parallel-16 13576 264678 ns/op 7923.40 MB/s 28114
B/op 18 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17623 203340 ns/op 10313.51 MB/s
9827 B/op 1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17707 203043 ns/op 10328.59 MB/s
9795 B/op 1 allocs/op
BenchmarkDecodeConcurrency1_Parallel-16 17697 202858 ns/op 10338.02 MB/s
9816 B/op 1 allocs/op
PASS
ok zstdbench 57.959s1 parent 8cf1795 commit fb759bc
1 file changed
Lines changed: 3 additions & 1 deletion
File tree
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| |||
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
46 | | - | |
| 48 | + | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
| |||
0 commit comments