Skip to content

Commit af02aef

Browse files
committed
GH-3522: Use unpack32Values fast path in RLE hybrid decoder PACKED branch
Symmetric to the encoder's pack32Values fast path, the decoder's PACKED branch now batches 4 groups (32 values) into a single unpack32Values call instead of looping unpack8Values four times. Falls back to unpack8Values for residual <4-group tails. This benefits long PACKED runs (>=32 values) by reducing loop overhead and enabling the packer's optimized 32-value code path. Combined benchmark for the full par9 branch (all 4 commits: buffer reuse + ByteBuffer conversion + pack32Values encoder + unpack32Values decoder): RleDictionaryIndexDecodingBenchmark (100k dictionary IDs, JMH -wi 3 -i 5 -f 1): Pattern Before (ops/s) After (ops/s) Improvement SEQUENTIAL 603,445,362 698,066,810 +16% (1.16x) RANDOM 613,691,096 681,685,407 +11% (1.11x) LOW_CARDINALITY 611,963,736 686,200,341 +12% (1.12x) IntEncodingBenchmark.decodeDictionary (100k INT32 values, full dictionary decode path including RLE index decode): Pattern Before (ops/s) After (ops/s) Improvement SEQUENTIAL 418,357,276 539,458,940 +29% (1.29x) RANDOM 417,041,197 527,231,831 +26% (1.26x) LOW_CARDINALITY 605,354,083 628,283,691 +4% HIGH_CARDINALITY 416,731,808 535,763,242 +29% (1.29x) All 573 parquet-column tests pass.
1 parent dc23dbc commit af02aef

1 file changed

Lines changed: 12 additions & 4 deletions

File tree

parquet-column/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridDecoder.java

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -114,10 +114,18 @@ private void readNext() {
114114
int bytesToRead = (int) Math.ceil(currentCount * bitWidth / 8.0);
115115
bytesToRead = Math.min(bytesToRead, buffer.remaining());
116116
buffer.get(packedBytesBuffer, 0, bytesToRead);
117-
for (int valueIndex = 0, byteIndex = 0;
118-
valueIndex < currentCount;
119-
valueIndex += 8, byteIndex += bitWidth) {
120-
packer.unpack8Values(packedBytesBuffer, byteIndex, currentBuffer, valueIndex);
117+
// Unpack 32 values (4 groups) at a time when possible — symmetric to the encoder's
118+
// pack32Values fast path. Falls back to unpack8Values for any residual groups.
119+
int groupIdx = 0;
120+
int byteIndex = 0;
121+
final int step32 = bitWidth * 4;
122+
while (groupIdx + 4 <= numGroups) {
123+
packer.unpack32Values(packedBytesBuffer, byteIndex, currentBuffer, groupIdx * 8);
124+
groupIdx += 4;
125+
byteIndex += step32;
126+
}
127+
for (; groupIdx < numGroups; groupIdx++, byteIndex += bitWidth) {
128+
packer.unpack8Values(packedBytesBuffer, byteIndex, currentBuffer, groupIdx * 8);
121129
}
122130
break;
123131
default:

0 commit comments

Comments
 (0)