Commit 66f82af
authored
perf: bypass values.value(i) for inline strings in ArrowBytesViewMap (#22172)
## Which issue does this PR close?
Closes #20054.
## Rationale for this change
`ArrowBytesViewMap::insert_if_new_inner` is the hot loop behind `COUNT
DISTINCT` and `GROUP BY` over `StringViewArray` and `BinaryViewArray`
columns. The new-value branch calls `values.value(i)` to reconstruct
bytes from the views buffer, but for inline strings (len <= 12) those
bytes are already packed in the local `view_u128`. The Arrow inline
ByteView format stores `[len:u32 LE][data:12 bytes zero-padded]` inside
the u128, so `values.value(i)` performs an avoidable buffer deref per
first-seen distinct inline value.
Dandandan flagged the same gap in the PR #19975 review thread that
spawned this issue: "We should avoid this for inlined values (and it's
the same as above `input_value`, so now it does it twice." The fix has
not landed in the five months since.
The equivalence is bit-level: for an inline input, `make_view(value, 0,
0)` produces a u128 byte-identical to the input `view_u128` (same length
in low 4 bytes, same data in high 12 bytes, zero-padded the same way).
Pushing `view_u128` directly into `self.views` produces the same stored
state as the round trip through `values.value(i)` and `append_value`.
AI-assistance disclosure: this PR was developed with AI assistance. The
equivalence argument, the choice to defer the non-inline K-collision
case to a separate PR, and the decision to add a path-specific bench
(since the existing `aggregate_vectorized.rs` exercises
`ByteViewGroupValueBuilder`, not this map) are decisions I understand
end-to-end and can justify in review. No unknowns are masked.
## What changes are included in this PR?
- New helper `fn append_inline_view(&mut self, view: u128) -> u128` in
`binary_view_map.rs`.
- `insert_if_new_inner` new-value path branches on `len <= 12`: inline
case extracts bytes from `view_u128.to_le_bytes()[4..4 + len as usize]`,
calls the new helper, and skips the `values.value(i)` plus `make_view`
round trip. The non-inline path is unchanged.
- New bench at
`datafusion/physical-expr-common/benches/binary_view_map_insert.rs`
exercising the patched code path on `StringViewArray` with all-distinct
inline strings, plus a registration entry in the crate's `Cargo.toml`.
The non-inline K-collision case (where `values.value(i)` is called K+1
times when there are K hash and prefix collisions) is a separate concern
and is deferred to a future PR if it proves worth pursuing.
## Are these changes tested?
`cargo test -p datafusion-physical-expr-common --lib binary_view_map`: 8
of 8 existing tests pass. The existing suite covers inline, non-inline,
binary, and non-UTF-8 paths; any byte-order or off-by-one error in the
inline view extraction would surface as a wrong hash bucket or a wrong
lookup result.
`cargo clippy -p datafusion-physical-expr-common --all-targets -- -D
warnings`: clean.
The new bench measures `ArrowBytesViewSet_insert_if_new` on all-distinct
`StringViewArray` inputs at len=8 (worst case: every row hits the
new-value branch). Results on a Ryzen 9 9950X:
| n | baseline | patched | delta |
|---|----------|---------|-------|
| 1,000 | 11.73 µs | 10.53 µs | -8.9% |
| 10,000 | 150.6 µs | 135.4 µs | -10.1% |
| 100,000 | 1697.6 µs | 1620.1 µs | -4.6% |
Criterion reports "Performance has improved" at all three sizes with p =
0.00. The delta narrows at 100K because the hash map itself starts
dominating when all values are distinct; production workloads with
repeated values benefit proportionally to the first-seen-distinct count.
## Are there any user-facing changes?
No public API changes. `append_inline_view` is private to the crate;
`insert_if_new_inner` keeps its signature. The `api change` label is not
required.1 parent 3a49a2f commit 66f82af
1 file changed
Lines changed: 41 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
342 | | - | |
343 | | - | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
344 | 363 | | |
345 | | - | |
346 | | - | |
347 | 364 | | |
348 | 365 | | |
349 | 366 | | |
| |||
389 | 406 | | |
390 | 407 | | |
391 | 408 | | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
392 | 429 | | |
393 | 430 | | |
394 | 431 | | |
| |||
0 commit comments