Commit 7049836
committed
GH-3516: Optimize DeltaByteArrayWriter / DeltaLengthByteArrayValuesWriter
Two related changes in the DELTA_BYTE_ARRAY write path:
1. DeltaLengthByteArrayValuesWriter: drop the unused LittleEndianDataOutputStream
wrapper. Binary.writeTo(arrayOut) works directly with the underlying
CapacityByteArrayOutputStream; the LE wrapper added an extra layer of
dispatch on every value but never used any LE functionality
(writeInt/writeLong/etc.). Add a new writeBytes(byte[], int, int) overload
so callers that already have the raw bytes can avoid allocating a Binary
wrapper.
2. DeltaByteArrayWriter: tighten suffixWriter field type to
DeltaLengthByteArrayValuesWriter (it's always constructed as one) so the
new writeBytes(byte[], int, int) overload is callable. Replace the suffix
call with the raw-bytes overload, eliminating the per-value Binary.slice()
allocation.
Benchmark results (BinaryEncodingBenchmark.encodeDeltaByteArray and
encodeDeltaLengthByteArray, added in #3512):
- encodeDeltaByteArray (LOW cardinality, len=10): +33% to +55%
- encodeDeltaLengthByteArray (LOW card, len=10): +18% to +21%
- long-string cases: flat (per-value alloc amortized away)
No public API change. No file format change.
Validation: parquet-column 573 tests pass. Built with
-Dspotless.check.skip=true -Drat.skip=true -Djapicmp.skip=true.1 parent 53d7842 commit 7049836
2 files changed
Lines changed: 15 additions & 11 deletions
File tree
- parquet-column/src/main/java/org/apache/parquet/column/values
- deltalengthbytearray
- deltastrings
Lines changed: 11 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| |||
46 | 45 | | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | 48 | | |
51 | 49 | | |
52 | 50 | | |
53 | | - | |
54 | 51 | | |
55 | 52 | | |
56 | 53 | | |
| |||
63 | 60 | | |
64 | 61 | | |
65 | 62 | | |
66 | | - | |
| 63 | + | |
67 | 64 | | |
68 | 65 | | |
69 | 66 | | |
70 | 67 | | |
71 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
72 | 79 | | |
73 | 80 | | |
74 | 81 | | |
75 | 82 | | |
76 | 83 | | |
77 | 84 | | |
78 | 85 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
99 | 101 | | |
100 | 102 | | |
101 | 103 | | |
0 commit comments