Skip to content

Commit 66f82af

Browse files
perf: bypass values.value(i) for inline strings in ArrowBytesViewMap (#22172)
## Which issue does this PR close? Closes #20054. ## Rationale for this change `ArrowBytesViewMap::insert_if_new_inner` is the hot loop behind `COUNT DISTINCT` and `GROUP BY` over `StringViewArray` and `BinaryViewArray` columns. The new-value branch calls `values.value(i)` to reconstruct bytes from the views buffer, but for inline strings (len <= 12) those bytes are already packed in the local `view_u128`. The Arrow inline ByteView format stores `[len:u32 LE][data:12 bytes zero-padded]` inside the u128, so `values.value(i)` performs an avoidable buffer deref per first-seen distinct inline value. Dandandan flagged the same gap in the PR #19975 review thread that spawned this issue: "We should avoid this for inlined values (and it's the same as above `input_value`, so now it does it twice." The fix has not landed in the five months since. The equivalence is bit-level: for an inline input, `make_view(value, 0, 0)` produces a u128 byte-identical to the input `view_u128` (same length in low 4 bytes, same data in high 12 bytes, zero-padded the same way). Pushing `view_u128` directly into `self.views` produces the same stored state as the round trip through `values.value(i)` and `append_value`. AI-assistance disclosure: this PR was developed with AI assistance. The equivalence argument, the choice to defer the non-inline K-collision case to a separate PR, and the decision to add a path-specific bench (since the existing `aggregate_vectorized.rs` exercises `ByteViewGroupValueBuilder`, not this map) are decisions I understand end-to-end and can justify in review. No unknowns are masked. ## What changes are included in this PR? - New helper `fn append_inline_view(&mut self, view: u128) -> u128` in `binary_view_map.rs`. - `insert_if_new_inner` new-value path branches on `len <= 12`: inline case extracts bytes from `view_u128.to_le_bytes()[4..4 + len as usize]`, calls the new helper, and skips the `values.value(i)` plus `make_view` round trip. The non-inline path is unchanged. - New bench at `datafusion/physical-expr-common/benches/binary_view_map_insert.rs` exercising the patched code path on `StringViewArray` with all-distinct inline strings, plus a registration entry in the crate's `Cargo.toml`. The non-inline K-collision case (where `values.value(i)` is called K+1 times when there are K hash and prefix collisions) is a separate concern and is deferred to a future PR if it proves worth pursuing. ## Are these changes tested? `cargo test -p datafusion-physical-expr-common --lib binary_view_map`: 8 of 8 existing tests pass. The existing suite covers inline, non-inline, binary, and non-UTF-8 paths; any byte-order or off-by-one error in the inline view extraction would surface as a wrong hash bucket or a wrong lookup result. `cargo clippy -p datafusion-physical-expr-common --all-targets -- -D warnings`: clean. The new bench measures `ArrowBytesViewSet_insert_if_new` on all-distinct `StringViewArray` inputs at len=8 (worst case: every row hits the new-value branch). Results on a Ryzen 9 9950X: | n | baseline | patched | delta | |---|----------|---------|-------| | 1,000 | 11.73 µs | 10.53 µs | -8.9% | | 10,000 | 150.6 µs | 135.4 µs | -10.1% | | 100,000 | 1697.6 µs | 1620.1 µs | -4.6% | Criterion reports "Performance has improved" at all three sizes with p = 0.00. The delta narrows at 100K because the hash map itself starts dominating when all values are distinct; production workloads with repeated values benefit proportionally to the first-seen-distinct count. ## Are there any user-facing changes? No public API changes. `append_inline_view` is private to the crate; `insert_if_new_inner` keeps its signature. The `api change` label is not required.
1 parent 3a49a2f commit 66f82af

1 file changed

Lines changed: 41 additions & 4 deletions

File tree

datafusion/physical-expr-common/src/binary_view_map.rs

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -339,11 +339,28 @@ where
339339
payload
340340
} else {
341341
// no existing value, make a new one
342-
let value: &[u8] = values.value(i).as_ref();
343-
let payload = make_payload_fn(Some(value));
342+
let (new_view, payload) = if len <= 12 {
343+
// Inline path: bytes are already packed in view_u128.
344+
// The inline ByteView format is [len:u32 LE][data:12 bytes zero-padded],
345+
// so extracting bytes from the u128 avoids a round-trip through
346+
// values.value(i) (which reads the views buffer and returns the same slice).
347+
let view_bytes = view_u128.to_le_bytes();
348+
let value = &view_bytes[4..4 + len as usize];
349+
let payload = make_payload_fn(Some(value));
350+
// For inline strings, the stored view is identical to the input view:
351+
// make_view(value, 0, 0) produces the same u128 as view_u128.
352+
//
353+
// SAFETY: view_u128 was a valid view, and the enclosing `len <= 12`
354+
// ensures it is inline
355+
let new_view = unsafe { self.append_inline_view(view_u128) };
356+
(new_view, payload)
357+
} else {
358+
let value: &[u8] = values.value(i).as_ref();
359+
let payload = make_payload_fn(Some(value));
360+
let new_view = self.append_value(value);
361+
(new_view, payload)
362+
};
344363

345-
// Create view pointing to our buffers
346-
let new_view = self.append_value(value);
347364
let new_header = Entry {
348365
view: new_view,
349366
hash,
@@ -389,6 +406,26 @@ where
389406
}
390407
}
391408

409+
/// Append an already-computed inline view (len <= 12) directly, bypassing
410+
/// buffer allocation.
411+
///
412+
/// Returns the view that was stored (identical to the argument).
413+
///
414+
/// # Safety
415+
///
416+
/// `view` must be a valid inline `ByteView`: the length field in the low
417+
/// 32 bits must be <= 12, and the remaining 12 bytes must hold the
418+
/// value's bytes (zero-padded if shorter). Calling with a non-inline view
419+
/// would store a value that downstream `views` consumers interpret as
420+
/// `[buffer_index, offset]` into the `completed`/`in_progress` buffers,
421+
/// which is unsound for any view that didn't originate from a real
422+
/// allocation in those buffers.
423+
unsafe fn append_inline_view(&mut self, view: u128) -> u128 {
424+
self.views.push(view);
425+
self.nulls.append_non_null();
426+
view
427+
}
428+
392429
/// Append a value to our buffers and return the view pointing to it
393430
fn append_value(&mut self, value: &[u8]) -> u128 {
394431
let len = value.len();

0 commit comments

Comments
 (0)