Skip to content

Commit 9f893a4

Browse files
authored
perf: Optimize split_part, support Utf8View (#21119)
## Which issue does this PR close? - Closes #21117. - Closes #21118 . ## Rationale for this change `split_part` currently accepts `Utf8View` but always returns `Utf8`. When given `Utf8View` input, it should instead return `Utf8View` output. While we're at it, optimize `split_part` for single-character delimiters (the common case): `str::split(&str)` is significantly slower than `str::split(char)` for single-character ASCII delimiters, because the former uses a general string matching algorithm but the latter uses `memchr::memchr`. Benchmark results (M4 Max): - `utf8_single_char/pos_first`: 142 µs → 104 µs (-26%) - `utf8_single_char/pos_middle`: 389 µs → 365 µs (-6%) - `utf8_single_char/pos_negative`: 154 µs → 109 µs (-29%) - `utf8_multi_char/pos_middle`: 356 µs → 361 µs (~0%, noise) - `utf8view_single_char/pos_first`: 143 µs → 111 µs (-22%) - `utf8_long_strings/pos_middle`: 2568 µs → 1984 µs (-23%) - `utf8view_long_parts/pos_middle`: 998 µs → 470 µs (-53%) ## What changes are included in this PR? * Revise `split_part` benchmarks to reduce redundancy and improve `Utf8View` coverage * Support `Utf8View` -> `Utf8View` in `split_part` * Refactor `split_part` to cleanup some redundant code * Optimize `split_part` for single-character delimiters * Add SLT test coverage for `split_part` with `Utf8View` input ## Are these changes tested? Yes. New tests and benchmarks added. ## Are there any user-facing changes? No.
1 parent e913557 commit 9f893a4

File tree

3 files changed

+297
-384
lines changed

3 files changed

+297
-384
lines changed

0 commit comments

Comments
 (0)