perf: Optimize lower, upper for ASCII inputs#21980
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work
|
Thanks for the review, @comphead ! Please let me know if you have more feedback. FYI I think there's an opportunity to refactor this into an extension to the |
Do you mean for special case ASCII strings? |
alamb
left a comment
There was a problem hiding this comment.
The code makes sense to me -- thank you @neilconway
I agree with @comphead that making some better API / helpers to encapsulate this type of StringBuilder
| } | ||
| let mut bytes = view.to_le_bytes(); | ||
| if len <= 12 { | ||
| // Inline: value is in bytes[4..4+len], no buffer reference. Convert |
There was a problem hiding this comment.
This patterns would be really nice to abstract -- namely a method that makes a new StringArray where it is guaranteed that each output value will be exactly as long ad the input value.
Upper/lower are good examples. Maybe also translate with single cahracnters
| let mut completed: Vec<Buffer> = Vec::new(); | ||
| let mut block_size: u32 = STRING_VIEW_INIT_BLOCK_SIZE; | ||
|
|
||
| for i in 0..item_len { |
There was a problem hiding this comment.
I think you could avoid a lot of this copy / paste by using a StringViewBuilder with the append_block and append_view_unchecked methods
https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html#method.append_block
That being said, I do think it would be slightly slower than this implementation because it would have to re-check the length
It almost seems like what we want is some sort of API on StringViewArray itself, similar to https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary
So this code could be written something like
let new_array = orig_array.map_values(convert)That would also let us do potentially crazy things like reuse the buffer allocations and modify the values in place if they weren't shared 🤔
If that makes sense to you I can file a ticket in arrow-rs perhaps.
|
run benchmarks upper, lower |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
I meant something similar to what you suggested in the PR -- for example, a function like: impl StringViewArrayBuilder {
/// Reserve `len` bytes of output storage and let the caller fill them.
pub fn append_with<F: FnOnce(&mut [u8])>(&mut self, len: usize, fill: F);
}We could use that in |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks @neilconway and @Omega359 |
Which issue does this PR close?
lower,uppercould be further optimized for ASCII-only inputs #21813.Rationale for this change
This PR implements two optimizations for
lowerandupperon ASCII strings:Utf8/LargeUtf8code path, we previously did the case conversion viastr::to_uppercaseorstr::to_lowercase. For ASCII inputs, it is faster to usemap(u8::to_ascii_lowercase).collect()over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.Utf8Viewcode path previously wasn't optimized for ASCII strings; add a new code path that is. As with theUtf8code path, we can do case-conversion on bytes directly, which vectorizes well and avoids repeated ASCII checks. In addition, we can build the outputStringViewArraydirectly, which avoids the intermediate strings and unnecessary allocations used in the previous approach.Benchmarks (ARM64):
upper
lower — all-ASCII (the optimized paths)
lower — some non-ASCII string_views (mostly noise)
lower — first/middle non-ASCII (flat)
What changes are included in this PR?
Are these changes tested?
Covered by existing tests.
Are there any user-facing changes?
No.