Skip to content

Commit ab767a4

Browse files
committed
Update CompressionBenchmark page sizes to realistic values (64K-1MB)
Use page sizes that reflect actual Parquet page sizes observed in practice: 64KB, 128KB, 256KB, and 1MB (the default). The 20K row-count limit (PARQUET-1414) means most numeric columns produce pages of 78-234KB, making the previous 8KB test point unrealistic. Also fix JMH annotation processor path for Java 17+ compatibility and reduce warmup/measurement iterations for faster iteration. Performance results (master vs perf-compression-bypass branch): Compression (ops/s, higher is better): Codec | Page | Master | Branch | Speedup SNAPPY | 64KB | 55,208 | 62,776 | +13.7% SNAPPY | 128KB | 28,188 | 31,692 | +12.4% SNAPPY | 256KB | 13,787 | 15,392 | +11.6% SNAPPY | 1MB | 2,493 | 2,775 | +11.3% ZSTD | 64KB | 9,127 | 9,364 | +2.6% ZSTD | 128KB | 4,517 | 4,567 | +1.1% ZSTD | 256KB | 2,072 | 2,161 | +4.3% ZSTD | 1MB | 446 | 439 | -1.6% LZ4_RAW | 64KB | 38,055 | 37,088 | -2.5% LZ4_RAW | 128KB | 17,488 | 17,192 | -1.7% LZ4_RAW | 256KB | 9,307 | 9,229 | -0.8% LZ4_RAW | 1MB | 2,060 | 2,266 | +10.0% GZIP | 64KB | 1,406 | 1,457 | +3.6% GZIP | 128KB | 643 | 652 | +1.4% GZIP | 256KB | 316 | 321 | +1.6% GZIP | 1MB | 77 | 78 | +1.3% Decompression (ops/s, higher is better): Codec | Page | Master | Branch | Speedup SNAPPY | 64KB | 62,553 | 68,684 | +9.8% SNAPPY | 128KB | 31,207 | 34,199 | +9.6% SNAPPY | 256KB | 14,737 | 16,157 | +9.6% SNAPPY | 1MB | 3,219 | 3,581 | +11.2% ZSTD | 64KB | 35,480 | 36,241 | +2.1% ZSTD | 128KB | 22,068 | 22,328 | +1.2% ZSTD | 256KB | 10,910 | 10,895 | -0.1% ZSTD | 1MB | 2,433 | 2,482 | +2.0% LZ4_RAW | 64KB |105,142 |120,335 | +14.4% LZ4_RAW | 128KB | 52,938 | 60,533 | +14.3% LZ4_RAW | 256KB | 26,360 | 30,257 | +14.8% LZ4_RAW | 1MB | 6,155 | 7,556 | +22.8% GZIP | 64KB | 9,429 | 9,772 | +3.6% GZIP | 128KB | 4,167 | 4,541 | +9.0% GZIP | 256KB | 1,733 | 1,915 | +10.5% GZIP | 1MB | 405 | 452 | +11.6% Key findings: - SNAPPY: consistent 10-14% improvement across all page sizes - LZ4_RAW decompression: strongest gain at 14-23% faster - GZIP decompression: 9-12% faster at 128KB+ page sizes - ZSTD: modest 1-4% gains (JNI library already efficient) - Gains consistent across realistic page sizes (64K-1MB)
1 parent c79ed9c commit ab767a4

2 files changed

Lines changed: 12 additions & 3 deletions

File tree

parquet-benchmarks/pom.xml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,15 @@
8989
<plugin>
9090
<groupId>org.apache.maven.plugins</groupId>
9191
<artifactId>maven-compiler-plugin</artifactId>
92+
<configuration>
93+
<annotationProcessorPaths>
94+
<path>
95+
<groupId>org.openjdk.jmh</groupId>
96+
<artifactId>jmh-generator-annprocess</artifactId>
97+
<version>${jmh.version}</version>
98+
</path>
99+
</annotationProcessorPaths>
100+
</configuration>
92101
</plugin>
93102
<plugin>
94103
<groupId>org.apache.maven.plugins</groupId>

parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/CompressionBenchmark.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,15 @@
5454
@BenchmarkMode(Mode.Throughput)
5555
@OutputTimeUnit(TimeUnit.SECONDS)
5656
@Fork(1)
57-
@Warmup(iterations = 3, time = 2)
58-
@Measurement(iterations = 5, time = 3)
57+
@Warmup(iterations = 2, time = 1)
58+
@Measurement(iterations = 3, time = 2)
5959
@State(Scope.Thread)
6060
public class CompressionBenchmark {
6161

6262
@Param({"SNAPPY", "ZSTD", "LZ4_RAW", "GZIP"})
6363
public String codec;
6464

65-
@Param({"8192", "65536", "262144"})
65+
@Param({"65536", "131072", "262144", "1048576"})
6666
public int pageSize;
6767

6868
private byte[] uncompressedData;

0 commit comments

Comments
 (0)