Commit ba5d027
committed
GH-3522: Reuse intermediate buffers in RunLengthBitPackingHybridDecoder PACKED path
Allocate the int[] values buffer and byte[] read-staging buffer once per
decoder and grow them lazily, instead of allocating fresh arrays on every
PACKED run. Resolves the existing "TODO: reuse a buffer" comment.
A new currentBufferLength field tracks the logical length of the active
region in packedValuesBuffer (which may now exceed the current run's
size after a prior larger run grew it).
Benchmark (RleDictionaryIndexDecodingBenchmark, 100k INT32, BIT_WIDTH=10,
JMH -wi 5 -i 10 -f 2):
Pattern | master ops/s | optimized ops/s | Improvement
SEQUENTIAL | 93,061,521 | 113,856,860 | +22.3%
RANDOM | 92,929,824 | 114,238,638 | +22.9%
LOW_CARDINALITY | 92,813,229 | 115,271,347 | +24.2%
End-to-end FileReadBenchmark sees ~2% improvement (RLE decoding is a
small fraction of full file reads).
Validation: 573 parquet-column tests pass. Built with
-Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true.1 parent d96c669 commit ba5d027
1 file changed
Lines changed: 17 additions & 5 deletions
File tree
- parquet-column/src/main/java/org/apache/parquet/column/values/rle
Lines changed: 17 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
72 | | - | |
| 77 | + | |
73 | 78 | | |
74 | 79 | | |
75 | 80 | | |
| |||
90 | 95 | | |
91 | 96 | | |
92 | 97 | | |
| 98 | + | |
93 | 99 | | |
94 | | - | |
95 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
96 | 108 | | |
97 | 109 | | |
98 | 110 | | |
99 | | - | |
| 111 | + | |
100 | 112 | | |
101 | 113 | | |
102 | 114 | | |
103 | | - | |
| 115 | + | |
104 | 116 | | |
105 | 117 | | |
106 | 118 | | |
| |||
0 commit comments