Commit 88a3b0e
committed
GH-3505: Optimize ByteStreamSplitValuesReader page transposition
ByteStreamSplitValuesReader.decodeData eagerly transposes an entire page
from stream-split layout (elementSizeInBytes streams of valuesCount bytes
each) back to interleaved layout (valuesCount elements of elementSizeInBytes
bytes each). The current loop performs one ByteBuffer.get(int) per byte,
which incurs per-call bounds checks and virtual dispatch through
HeapByteBuffer/DirectByteBuffer for every single byte of the page. For a
100k-value FLOAT page that is 400k get(int) calls; for DOUBLE/LONG it is
800k.
This change rewrites decodeData in three steps:
1. Drop down to a byte[] view of the encoded buffer. When encoded.hasArray()
is true (the typical case) use the backing array directly with the
correct base offset; otherwise copy once with a single get(byte[]) call.
This eliminates the per-byte ByteBuffer.get(int) bounds check and
virtual dispatch.
2. Specialize loops for the common element sizes (4 and 8). Hoist all
stream * valuesCount offsets out of the inner loop into local ints
(s0..s3 for floats/ints, s0..s7 for doubles/longs), and write each
output slot exactly once in a single sequential pass. Reads come from
elementSizeInBytes concurrent sequential streams which modern hardware
prefetchers handle well.
3. Generic fallback for arbitrary element sizes (FIXED_LEN_BYTE_ARRAY of
any width) keeps the existing behaviour.
Benchmark (new ByteStreamSplitDecodingBenchmark, 100k values per
invocation, JDK 18, JMH -wi 5 -i 10 -f 3, 30 samples per row):
Type Before (ops/s) After (ops/s) Improvement
Float 47,798,981 162,294,904 +240% (3.40x)
Double 26,320,043 66,002,524 +151% (2.51x)
Int 47,072,832 162,177,747 +245% (3.45x)
Long 26,795,544 65,999,343 +146% (2.46x)
Decoded output is byte-identical to before; per-op heap allocation is
unchanged (the only allocation is the per-page decode buffer plus the
boxing of returned primitives by the benchmark).
All 573 parquet-column tests pass; 51 BSS-specific tests pass.1 parent d96c669 commit 88a3b0e
1 file changed
Lines changed: 78 additions & 8 deletions
File tree
- parquet-column/src/main/java/org/apache/parquet/column/values/bytestreamsplit
Lines changed: 78 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
53 | 68 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
60 | 131 | | |
61 | 132 | | |
62 | | - | |
63 | 133 | | |
64 | 134 | | |
65 | 135 | | |
| |||
0 commit comments