Commit f9d81a4
committed
Update CompressionBenchmark page sizes to realistic values (64K-1MB)
Use page sizes that reflect actual Parquet page sizes observed in practice:
64KB, 128KB, 256KB, and 1MB (the default). The 20K row-count limit
(PARQUET-1414) means most numeric columns produce pages of 78-234KB,
making the previous 8KB test point unrealistic.
Also fix JMH annotation processor path for Java 17+ compatibility
and reduce warmup/measurement iterations for faster iteration.
Performance results (master vs perf-compression-bypass branch):
Compression (ops/s, higher is better):
Codec | Page | Master | Branch | Speedup
SNAPPY | 64KB | 53,979 | 60,799 | +12.6%
SNAPPY | 128KB | 27,764 | 30,524 | +9.9%
SNAPPY | 256KB | 13,549 | 14,648 | +8.1%
SNAPPY | 1MB | 2,445 | 2,675 | +9.4%
ZSTD | 64KB | 8,813 | 8,719 | -1.1%
ZSTD | 128KB | 4,361 | 4,501 | +3.2%
ZSTD | 256KB | 2,112 | 2,008 | -4.9%
ZSTD | 1MB | 423 | 422 | -0.3%
LZ4_RAW | 64KB | 37,777 | 36,107 | -4.4%
LZ4_RAW | 128KB | 16,777 | 16,330 | -2.7%
LZ4_RAW | 256KB | 9,060 | 8,956 | -1.1%
LZ4_RAW | 1MB | 1,961 | 2,191 | +11.7%
GZIP | 64KB | 1,422 | 1,423 | +0.1%
GZIP | 128KB | 641 | 646 | +0.8%
GZIP | 256KB | 315 | 317 | +0.7%
GZIP | 1MB | 75 | 77 | +2.3%
Decompression (ops/s, higher is better):
Codec | Page | Master | Branch | Speedup
SNAPPY | 64KB | 60,928 | 67,224 | +10.3%
SNAPPY | 128KB | 29,919 | 33,457 | +11.8%
SNAPPY | 256KB | 14,431 | 15,912 | +10.3%
SNAPPY | 1MB | 3,140 | 3,540 | +12.7%
ZSTD | 64KB | 32,042 | 35,750 | +11.6%
ZSTD | 128KB | 19,447 | 21,800 | +12.1%
ZSTD | 256KB | 9,495 | 10,759 | +13.3%
ZSTD | 1MB | 2,155 | 2,409 | +11.8%
LZ4_RAW | 64KB | 80,415 |118,358 | +47.2%
LZ4_RAW | 128KB | 40,615 | 59,620 | +46.8%
LZ4_RAW | 256KB | 19,888 | 29,914 | +50.4%
LZ4_RAW | 1MB | 4,628 | 7,517 | +62.4%
GZIP | 64KB | 9,393 | 9,608 | +2.3%
GZIP | 128KB | 4,101 | 4,536 | +10.6%
GZIP | 256KB | 1,736 | 1,891 | +8.9%
GZIP | 1MB | 406 | 442 | +9.1%
Key findings:
- SNAPPY: consistent 8-13% improvement across all page sizes
- LZ4_RAW decompression: strongest gain at 47-62% (eliminates 2x heap<->direct copies)
- ZSTD decompression: 11-13% from NoFinalizer + config caching
- GZIP decompression: 9-11% faster at 128KB+ page sizes
- ZSTD/GZIP compression: within noise (CPU-bound in native codec)
- LZ4_RAW compression: within noise at small pages, +12% at 1MB1 parent 23881dd commit f9d81a4
2 files changed
Lines changed: 12 additions & 3 deletions
File tree
- parquet-benchmarks
- src/main/java/org/apache/parquet/benchmarks
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
92 | 101 | | |
93 | 102 | | |
94 | 103 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | | - | |
| 57 | + | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
0 commit comments