Skip to content

Commit fb759bc

Browse files
authored
perf(compression) - hardcode Zstd decode concurrency to 1 (#2648)
With sufficiently high concurrency of callers, using the built-in decoder concurrency on our 2Mb frames was detrimental to the throughput. Setting the decoder concurrency to 1 also saves on allocations. Findings: - Sequential: default beats concurrency=1 by ~25% (1535 vs 1235 MB/s) - Parallel (16): concurrency=1 beats default by ~30% (10328 vs 7920 MB/s aggregate) - Allocations: concurrency=1 has 1 alloc/op vs 18 (-94%) Compares zstd.NewReader(r) vs zstd.NewReader(r, WithDecoderConcurrency(1)) under both sequential and parallel patterns. Decoders are pulled from a sync.Pool with Reset reuse, mirroring production's getZstdDecoder. Source data is real binaries (3.5× compression ratio), 40 chunks of 2 MiB each to match DefaultCompressFrameSize. Skips on systems without the candidate data files. Benchmark output captured on AMD Ryzen 7 8845HS (16 cores, GOMAXPROCS=16), 80 MiB raw → 22.78 MiB compressed (avg frame 583 KiB): === source data === source: /home/lev/dev/infra/packages/orchestrator/orchestrator chunks: 40 (raw=80 MiB, comp=22.78 MiB) ratio: 3.511x (raw/comp) comp size: min=139715 B (136 KiB), max=1497120 B (1462 KiB), avg=597239 B (583 KiB) goos: linux goarch: amd64 pkg: zstdbench cpu: AMD Ryzen 7 8845HS w/ Radeon 780M Graphics BenchmarkDecodeDefault-16 2562 1367280 ns/op 1533.81 MB/s 9887 B/op 18 allocs/op BenchmarkDecodeDefault-16 2553 1364731 ns/op 1536.68 MB/s 9900 B/op 18 allocs/op BenchmarkDecodeDefault-16 2497 1365620 ns/op 1535.68 MB/s 10091 B/op 18 allocs/op BenchmarkDecodeConcurrency1-16 2079 1703633 ns/op 1230.99 MB/s 5172 B/op 1 allocs/op BenchmarkDecodeConcurrency1-16 2070 1696848 ns/op 1235.91 MB/s 5185 B/op 1 allocs/op BenchmarkDecodeConcurrency1-16 2020 1697646 ns/op 1235.33 MB/s 5322 B/op 1 allocs/op BenchmarkDecodeDefault_Parallel-16 13562 264797 ns/op 7919.85 MB/s 27322 B/op 18 allocs/op BenchmarkDecodeDefault_Parallel-16 13591 264851 ns/op 7918.24 MB/s 27964 B/op 18 allocs/op BenchmarkDecodeDefault_Parallel-16 13576 264678 ns/op 7923.40 MB/s 28114 B/op 18 allocs/op BenchmarkDecodeConcurrency1_Parallel-16 17623 203340 ns/op 10313.51 MB/s 9827 B/op 1 allocs/op BenchmarkDecodeConcurrency1_Parallel-16 17707 203043 ns/op 10328.59 MB/s 9795 B/op 1 allocs/op BenchmarkDecodeConcurrency1_Parallel-16 17697 202858 ns/op 10338.02 MB/s 9816 B/op 1 allocs/op PASS ok zstdbench 57.959s
1 parent 8cf1795 commit fb759bc

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

packages/shared/pkg/storage/compress_decode.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ func putLZ4Decoder(dec *lz4.Reader) {
2929

3030
// zstd concurrency is hardcoded to 1: benchmarks show higher values hurt
3131
// throughput for single 2MiB frame decodes.
32+
const zstdDecoderConcurrency = 1
33+
3234
var zstdDecoderPool sync.Pool
3335

3436
func getZstdDecoder(r io.Reader) (*zstd.Decoder, error) {
@@ -43,7 +45,7 @@ func getZstdDecoder(r io.Reader) (*zstd.Decoder, error) {
4345
return dec, nil
4446
}
4547

46-
return zstd.NewReader(r)
48+
return zstd.NewReader(r, zstd.WithDecoderConcurrency(zstdDecoderConcurrency))
4749
}
4850

4951
func putZstdDecoder(dec *zstd.Decoder) {

0 commit comments

Comments
 (0)