Commit 9c411b0
committed
apacheGH-3516: Optimize DeltaByteArrayWriter / DeltaLengthByteArrayValuesWriter
Two related changes in the DELTA_BYTE_ARRAY write path:
1. DeltaLengthByteArrayValuesWriter: drop the unused LittleEndianDataOutputStream
wrapper. Binary.writeTo(arrayOut) works directly with the underlying
CapacityByteArrayOutputStream; the LE wrapper added an extra layer of
dispatch on every value but never used any LE functionality
(writeInt/writeLong/etc.). Add a new writeBytes(byte[], int, int) overload
so callers that already have the raw bytes can avoid allocating a Binary
wrapper.
2. DeltaByteArrayWriter: tighten suffixWriter field type to
DeltaLengthByteArrayValuesWriter (it's always constructed as one) so the
new writeBytes(byte[], int, int) overload is callable. Replace the suffix
call with the raw-bytes overload, eliminating the per-value Binary.slice()
allocation.
Benchmark (BinaryEncodingBenchmark, 100k BINARY values per invocation,
JMH -wi 3 -i 5 -f 1):
Benchmark Param Before (ops/s) After (ops/s) Improvement
encodeDeltaByteArray LOW/10 61,475,818 81,416,754 +32% (1.32x)
encodeDeltaByteArray LOW/100 34,759,755 45,186,617 +30% (1.30x)
encodeDeltaByteArray LOW/1000 5,386,922 6,532,850 +21% (1.21x)
encodeDeltaByteArray HIGH/10 56,799,595 78,966,929 +39% (1.39x)
encodeDeltaLengthByteArray LOW/10 129,447,876 136,657,079 +6%
encodeDeltaLengthByteArray HIGH/10 123,673,058 116,778,775 flat (noise)
Negative controls (encodePlain, encodeDictionary): unchanged within noise.
The DeltaByteArray path benefits most because it eliminates both the
Binary.slice() allocation per suffix and the OutputStream dispatch layer.
DeltaLengthByteArray gains are smaller since only the OutputStream wrapper
removal applies there.
No public API change. No file format change.
All 573 parquet-column tests pass.1 parent 492b686 commit 9c411b0
2 files changed
Lines changed: 15 additions & 11 deletions
File tree
- parquet-column/src/main/java/org/apache/parquet/column/values
- deltalengthbytearray
- deltastrings
Lines changed: 11 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| |||
46 | 45 | | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | 48 | | |
51 | 49 | | |
52 | 50 | | |
53 | | - | |
54 | 51 | | |
55 | 52 | | |
56 | 53 | | |
| |||
63 | 60 | | |
64 | 61 | | |
65 | 62 | | |
66 | | - | |
| 63 | + | |
67 | 64 | | |
68 | 65 | | |
69 | 66 | | |
70 | 67 | | |
71 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
72 | 79 | | |
73 | 80 | | |
74 | 81 | | |
75 | 82 | | |
76 | 83 | | |
77 | 84 | | |
78 | 85 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
0 commit comments