Commit bd9a009
perf(read): read compact Arrow decimals directly (#573)
## Summary
Decimal reads currently fall through Spark's Arrow object path, which
materializes a `BigDecimal` before building the Catalyst `Decimal`. That
is unnecessary for compact Spark decimals: when `precision <= 18`, the
unscaled value is guaranteed to fit in a `long`.
This adds a `DecimalVector` accessor in `LanceArrowColumnVector` that
keeps Spark's existing wide-decimal behavior, but reads compact decimals
directly from the Arrow data buffer.
## How it works
Arrow decimal128 stores each value as a 16-byte two's-complement
unscaled integer. For `precision <= 18`, the high eight bytes are only
sign extension, so the low eight bytes contain the complete Spark
compact-decimal value.
The new accessor:
- routes `DecimalVector` through Lance's own accessor instead of the
generic Spark `ArrowColumnVector` wrapper;
- returns null before touching the value buffer;
- reads the low 64 bits directly for compact decimals and constructs
`Decimal` with `Decimal.createUnsafe`;
- keeps the existing `Decimal.apply(vector.getObject(...))` path for
`precision > 18`;
- routes `hasNull`, `numNulls`, `isNullAt`, and `close` through the same
decimal accessor, so an owned decimal vector is released on close
exactly like every sibling accessor.
This follows Spark's compact-decimal representation used in
`WritableColumnVector` / `UnsafeRow`, while preserving Spark
`ArrowColumnVector`'s object-path semantics for wider decimals.
## Tests
Added direct `LanceArrowColumnVectorTest` coverage for:
- `precision == 18` fast path — positive values, negative values, nulls,
and null counts;
- `precision == 1` boundary, positive and negative;
- `precision > 18` fallback through the Arrow object path;
- owning-close lifecycle — `closeVectorOnClose=true`, then assert the
allocator is fully drained after `close()` (guards against the decimal
vector leaking).
## Test plan
- [x] CI passes across supported Spark / Scala modules
- [x] Unit coverage added for compact, boundary, wide, and owning-close
paths
- [x] `make lint` clean
- [x] Decimal suite compiles and passes against Spark 4.1
---------
Co-authored-by: Daniel Rammer <hamersaw@protonmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent c9df56e commit bd9a009
2 files changed
Lines changed: 161 additions & 0 deletions
File tree
- lance-spark-base_2.12/src
- main/java/org/lance/spark/vectorized
- test/java/org/lance/spark/vectorized
Lines changed: 63 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
| 20 | + | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| |||
64 | 66 | | |
65 | 67 | | |
66 | 68 | | |
| 69 | + | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
| |||
130 | 133 | | |
131 | 134 | | |
132 | 135 | | |
| 136 | + | |
| 137 | + | |
133 | 138 | | |
134 | 139 | | |
135 | 140 | | |
| |||
192 | 197 | | |
193 | 198 | | |
194 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
195 | 203 | | |
196 | 204 | | |
197 | 205 | | |
| |||
247 | 255 | | |
248 | 256 | | |
249 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
250 | 261 | | |
251 | 262 | | |
252 | 263 | | |
| |||
303 | 314 | | |
304 | 315 | | |
305 | 316 | | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
306 | 320 | | |
307 | 321 | | |
308 | 322 | | |
| |||
359 | 373 | | |
360 | 374 | | |
361 | 375 | | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
362 | 379 | | |
363 | 380 | | |
364 | 381 | | |
| |||
475 | 492 | | |
476 | 493 | | |
477 | 494 | | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
478 | 498 | | |
479 | 499 | | |
480 | 500 | | |
| |||
532 | 552 | | |
533 | 553 | | |
534 | 554 | | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
535 | 598 | | |
Lines changed: 98 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
27 | 32 | | |
28 | 33 | | |
29 | 34 | | |
| |||
50 | 55 | | |
51 | 56 | | |
52 | 57 | | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
53 | 151 | | |
0 commit comments