Skip to content

Commit f9d81a4

Browse files
committed
Update CompressionBenchmark page sizes to realistic values (64K-1MB)
Use page sizes that reflect actual Parquet page sizes observed in practice: 64KB, 128KB, 256KB, and 1MB (the default). The 20K row-count limit (PARQUET-1414) means most numeric columns produce pages of 78-234KB, making the previous 8KB test point unrealistic. Also fix JMH annotation processor path for Java 17+ compatibility and reduce warmup/measurement iterations for faster iteration. Performance results (master vs perf-compression-bypass branch): Compression (ops/s, higher is better): Codec | Page | Master | Branch | Speedup SNAPPY | 64KB | 53,979 | 60,799 | +12.6% SNAPPY | 128KB | 27,764 | 30,524 | +9.9% SNAPPY | 256KB | 13,549 | 14,648 | +8.1% SNAPPY | 1MB | 2,445 | 2,675 | +9.4% ZSTD | 64KB | 8,813 | 8,719 | -1.1% ZSTD | 128KB | 4,361 | 4,501 | +3.2% ZSTD | 256KB | 2,112 | 2,008 | -4.9% ZSTD | 1MB | 423 | 422 | -0.3% LZ4_RAW | 64KB | 37,777 | 36,107 | -4.4% LZ4_RAW | 128KB | 16,777 | 16,330 | -2.7% LZ4_RAW | 256KB | 9,060 | 8,956 | -1.1% LZ4_RAW | 1MB | 1,961 | 2,191 | +11.7% GZIP | 64KB | 1,422 | 1,423 | +0.1% GZIP | 128KB | 641 | 646 | +0.8% GZIP | 256KB | 315 | 317 | +0.7% GZIP | 1MB | 75 | 77 | +2.3% Decompression (ops/s, higher is better): Codec | Page | Master | Branch | Speedup SNAPPY | 64KB | 60,928 | 67,224 | +10.3% SNAPPY | 128KB | 29,919 | 33,457 | +11.8% SNAPPY | 256KB | 14,431 | 15,912 | +10.3% SNAPPY | 1MB | 3,140 | 3,540 | +12.7% ZSTD | 64KB | 32,042 | 35,750 | +11.6% ZSTD | 128KB | 19,447 | 21,800 | +12.1% ZSTD | 256KB | 9,495 | 10,759 | +13.3% ZSTD | 1MB | 2,155 | 2,409 | +11.8% LZ4_RAW | 64KB | 80,415 |118,358 | +47.2% LZ4_RAW | 128KB | 40,615 | 59,620 | +46.8% LZ4_RAW | 256KB | 19,888 | 29,914 | +50.4% LZ4_RAW | 1MB | 4,628 | 7,517 | +62.4% GZIP | 64KB | 9,393 | 9,608 | +2.3% GZIP | 128KB | 4,101 | 4,536 | +10.6% GZIP | 256KB | 1,736 | 1,891 | +8.9% GZIP | 1MB | 406 | 442 | +9.1% Key findings: - SNAPPY: consistent 8-13% improvement across all page sizes - LZ4_RAW decompression: strongest gain at 47-62% (eliminates 2x heap<->direct copies) - ZSTD decompression: 11-13% from NoFinalizer + config caching - GZIP decompression: 9-11% faster at 128KB+ page sizes - ZSTD/GZIP compression: within noise (CPU-bound in native codec) - LZ4_RAW compression: within noise at small pages, +12% at 1MB
1 parent 23881dd commit f9d81a4

2 files changed

Lines changed: 12 additions & 3 deletions

File tree

parquet-benchmarks/pom.xml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,15 @@
8989
<plugin>
9090
<groupId>org.apache.maven.plugins</groupId>
9191
<artifactId>maven-compiler-plugin</artifactId>
92+
<configuration>
93+
<annotationProcessorPaths>
94+
<path>
95+
<groupId>org.openjdk.jmh</groupId>
96+
<artifactId>jmh-generator-annprocess</artifactId>
97+
<version>${jmh.version}</version>
98+
</path>
99+
</annotationProcessorPaths>
100+
</configuration>
92101
</plugin>
93102
<plugin>
94103
<groupId>org.apache.maven.plugins</groupId>

parquet-benchmarks/src/main/java/org/apache/parquet/benchmarks/CompressionBenchmark.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,15 +54,15 @@
5454
@BenchmarkMode(Mode.Throughput)
5555
@OutputTimeUnit(TimeUnit.SECONDS)
5656
@Fork(1)
57-
@Warmup(iterations = 3, time = 2)
58-
@Measurement(iterations = 5, time = 3)
57+
@Warmup(iterations = 2, time = 1)
58+
@Measurement(iterations = 3, time = 2)
5959
@State(Scope.Thread)
6060
public class CompressionBenchmark {
6161

6262
@Param({"SNAPPY", "ZSTD", "LZ4_RAW", "GZIP"})
6363
public String codec;
6464

65-
@Param({"8192", "65536", "262144"})
65+
@Param({"65536", "131072", "262144", "1048576"})
6666
public int pageSize;
6767

6868
private byte[] uncompressedData;

0 commit comments

Comments
 (0)