Commit ab767a4
committed
Update CompressionBenchmark page sizes to realistic values (64K-1MB)
Use page sizes that reflect actual Parquet page sizes observed in practice:
64KB, 128KB, 256KB, and 1MB (the default). The 20K row-count limit
(PARQUET-1414) means most numeric columns produce pages of 78-234KB,
making the previous 8KB test point unrealistic.
Also fix JMH annotation processor path for Java 17+ compatibility
and reduce warmup/measurement iterations for faster iteration.
Performance results (master vs perf-compression-bypass branch):
Compression (ops/s, higher is better):
Codec | Page | Master | Branch | Speedup
SNAPPY | 64KB | 55,208 | 62,776 | +13.7%
SNAPPY | 128KB | 28,188 | 31,692 | +12.4%
SNAPPY | 256KB | 13,787 | 15,392 | +11.6%
SNAPPY | 1MB | 2,493 | 2,775 | +11.3%
ZSTD | 64KB | 9,127 | 9,364 | +2.6%
ZSTD | 128KB | 4,517 | 4,567 | +1.1%
ZSTD | 256KB | 2,072 | 2,161 | +4.3%
ZSTD | 1MB | 446 | 439 | -1.6%
LZ4_RAW | 64KB | 38,055 | 37,088 | -2.5%
LZ4_RAW | 128KB | 17,488 | 17,192 | -1.7%
LZ4_RAW | 256KB | 9,307 | 9,229 | -0.8%
LZ4_RAW | 1MB | 2,060 | 2,266 | +10.0%
GZIP | 64KB | 1,406 | 1,457 | +3.6%
GZIP | 128KB | 643 | 652 | +1.4%
GZIP | 256KB | 316 | 321 | +1.6%
GZIP | 1MB | 77 | 78 | +1.3%
Decompression (ops/s, higher is better):
Codec | Page | Master | Branch | Speedup
SNAPPY | 64KB | 62,553 | 68,684 | +9.8%
SNAPPY | 128KB | 31,207 | 34,199 | +9.6%
SNAPPY | 256KB | 14,737 | 16,157 | +9.6%
SNAPPY | 1MB | 3,219 | 3,581 | +11.2%
ZSTD | 64KB | 35,480 | 36,241 | +2.1%
ZSTD | 128KB | 22,068 | 22,328 | +1.2%
ZSTD | 256KB | 10,910 | 10,895 | -0.1%
ZSTD | 1MB | 2,433 | 2,482 | +2.0%
LZ4_RAW | 64KB |105,142 |120,335 | +14.4%
LZ4_RAW | 128KB | 52,938 | 60,533 | +14.3%
LZ4_RAW | 256KB | 26,360 | 30,257 | +14.8%
LZ4_RAW | 1MB | 6,155 | 7,556 | +22.8%
GZIP | 64KB | 9,429 | 9,772 | +3.6%
GZIP | 128KB | 4,167 | 4,541 | +9.0%
GZIP | 256KB | 1,733 | 1,915 | +10.5%
GZIP | 1MB | 405 | 452 | +11.6%
Key findings:
- SNAPPY: consistent 10-14% improvement across all page sizes
- LZ4_RAW decompression: strongest gain at 14-23% faster
- GZIP decompression: 9-12% faster at 128KB+ page sizes
- ZSTD: modest 1-4% gains (JNI library already efficient)
- Gains consistent across realistic page sizes (64K-1MB)1 parent c79ed9c commit ab767a4
2 files changed
Lines changed: 12 additions & 3 deletions
File tree
- parquet-benchmarks
- src/main/java/org/apache/parquet/benchmarks
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
92 | 101 | | |
93 | 102 | | |
94 | 103 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | | - | |
| 57 | + | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
0 commit comments