perf: Optimize `lower`, `upper` for ASCII inputs by neilconway · Pull Request #21980 · apache/datafusion

neilconway · 2026-05-01T22:27:02Z

Which issue does this PR close?

Closes lower, upper could be further optimized for ASCII-only inputs #21813.

Rationale for this change

This PR implements two optimizations for lower and upper on ASCII strings:

For the Utf8/LargeUtf8 code path, we previously did the case conversion via str::to_uppercase or str::to_lowercase. For ASCII inputs, it is faster to use map(u8::to_ascii_lowercase).collect() over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.
The Utf8View code path previously wasn't optimized for ASCII strings; add a new code path that is. As with the Utf8 code path, we can do case-conversion on bytes directly, which vectorizes well and avoids repeated ASCII checks. In addition, we can build the output StringViewArray directly, which avoids the intermediate strings and unnecessary allocations used in the previous approach.

Benchmarks (ARM64):

upper

upper_all_values_are_ascii: 5.4 → 4.1 µs (−24.1%)

lower — all-ASCII (the optimized paths)

lower_all_values_are_ascii: 1024: 5.4 → 4.0 µs (−25.9%)
lower_all_values_are_ascii: 4096: 22.6 → 15.6 µs (−31.0%)
lower_all_values_are_ascii: 8192: 41.9 → 30.8 µs (−26.5%)
string_views size:4096 str_len:10 null:0 mixed:false: 151.0 → 75.3 µs (−50.1%)
string_views size:4096 str_len:10 null:0 mixed:true: 175.9 → 134.6 µs (−23.5%)
string_views size:4096 str_len:10 null:0.1 mixed:false: 143.5 → 76.8 µs (−46.5%)
string_views size:4096 str_len:10 null:0.1 mixed:true: 166.6 → 125.0 µs (−25.0%)
string_views size:4096 str_len:64 null:0 mixed:false: 150.1 → 92.7 µs (−38.2%)
string_views size:4096 str_len:64 null:0 mixed:true: 185.2 → 140.1 µs (−24.4%)
string_views size:4096 str_len:64 null:0.1 mixed:false: 136.7 → 97.0 µs (−29.0%)
string_views size:4096 str_len:64 null:0.1 mixed:true: 173.7 → 131.2 µs (−24.5%)
string_views size:4096 str_len:128 null:0 mixed:false: 190.3 → 141.7 µs (−25.5%)
string_views size:4096 str_len:128 null:0 mixed:true: 197.0 → 153.7 µs (−22.0%)
string_views size:4096 str_len:128 null:0.1 mixed:false: 173.3 → 141.7 µs (−18.2%)
string_views size:4096 str_len:128 null:0.1 mixed:true: 184.0 → 142.8 µs (−22.4%)
string_views size:8192 str_len:10 null:0 mixed:false: 302.9 → 150.2 µs (−50.4%)
string_views size:8192 str_len:10 null:0 mixed:true: 352.9 → 279.0 µs (−20.9%)
string_views size:8192 str_len:10 null:0.1 mixed:false: 285.0 → 154.3 µs (−45.9%)
string_views size:8192 str_len:10 null:0.1 mixed:true: 334.2 → 266.4 µs (−20.3%)
string_views size:8192 str_len:64 null:0 mixed:false: 295.6 → 184.4 µs (−37.6%)
string_views size:8192 str_len:64 null:0 mixed:true: 371.4 → 290.7 µs (−21.7%)
string_views size:8192 str_len:64 null:0.1 mixed:false: 273.7 → 195.1 µs (−28.7%)
string_views size:8192 str_len:64 null:0.1 mixed:true: 347.0 → 279.6 µs (−19.4%)
string_views size:8192 str_len:128 null:0 mixed:false: 379.6 → 285.6 µs (−24.8%)
string_views size:8192 str_len:128 null:0 mixed:true: 397.1 → 317.4 µs (−20.1%)
string_views size:8192 str_len:128 null:0.1 mixed:false: 364.1 → 285.1 µs (−21.7%)
string_views size:8192 str_len:128 null:0.1 mixed:true: 379.3 → 302.3 µs (−20.3%)
lower_sliced_ascii parent=65536 slice=128 str_len=32: 980.2 → 797.9 ns (−18.6%)

lower — some non-ASCII string_views (mostly noise)

size:4096 str_len:10 null:0 mixed:false: 374.5 → 362.2 µs (−3.3%)
size:4096 str_len:10 null:0 mixed:true: 374.6 → 380.5 µs (+1.6%)
size:4096 str_len:10 null:0.1 mixed:false: 340.8 → 356.5 µs (+4.6%)
size:4096 str_len:10 null:0.1 mixed:true: 344.0 → 352.5 µs (+2.5%)
size:4096 str_len:64 null:0 mixed:false: 377.5 → 373.5 µs (−1.1%)
size:4096 str_len:64 null:0 mixed:true: 380.6 → 375.0 µs (−1.5%)
size:4096 str_len:64 null:0.1 mixed:false: 330.7 → 341.8 µs (+3.4%)
size:4096 str_len:64 null:0.1 mixed:true: 341.8 → 354.2 µs (+3.6%)
size:4096 str_len:128 null:0 mixed:false: 371.8 → 356.2 µs (−4.2%)
size:4096 str_len:128 null:0 mixed:true: 378.9 → 386.0 µs (+1.9%)
size:4096 str_len:128 null:0.1 mixed:false: 350.5 → 350.3 µs (−0.1%)
size:4096 str_len:128 null:0.1 mixed:true: 351.0 → 337.9 µs (−3.7%)
size:8192 str_len:10 null:0 mixed:false: 740.0 → 757.2 µs (+2.3%)
size:8192 str_len:10 null:0 mixed:true: 781.3 → 750.2 µs (−4.0%)
size:8192 str_len:10 null:0.1 mixed:false: 693.7 → 693.7 µs (0.0%)
size:8192 str_len:10 null:0.1 mixed:true: 681.5 → 705.2 µs (+3.5%)
size:8192 str_len:64 null:0 mixed:false: 755.5 → 768.6 µs (+1.7%)
size:8192 str_len:64 null:0 mixed:true: 759.6 → 754.3 µs (−0.7%)
size:8192 str_len:64 null:0.1 mixed:false: 711.5 → 667.8 µs (−6.1%)
size:8192 str_len:64 null:0.1 mixed:true: 682.1 → 688.2 µs (+0.9%)
size:8192 str_len:128 null:0 mixed:false: 771.5 → 765.9 µs (−0.7%)
size:8192 str_len:128 null:0 mixed:true: 747.7 → 792.6 µs (+6.0%)
size:8192 str_len:128 null:0.1 mixed:false: 687.1 → 701.3 µs (+2.1%)
size:8192 str_len:128 null:0.1 mixed:true: 679.2 → 696.8 µs (+2.6%)

lower — first/middle non-ASCII (flat)

lower_the_first_value_is_nonascii: 1024: 42.1 → 42.4 µs (+0.7%)
lower_the_first_value_is_nonascii: 4096: 173.9 → 173.3 µs (−0.3%)
lower_the_first_value_is_nonascii: 8192: 350.8 → 349.3 µs (−0.4%)
lower_the_middle_value_is_nonascii: 1024: 42.9 → 42.8 µs (−0.2%)
lower_the_middle_value_is_nonascii: 4096: 175.1 → 176.3 µs (+0.7%)
lower_the_middle_value_is_nonascii: 8192: 353.6 → 354.6 µs (+0.3%)

What changes are included in this PR?

Implement optimizations
Share StringViewArray buffer size constants with the bulk-NULL builders

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

comphead

Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work

neilconway · 2026-05-02T14:02:22Z

Thanks for the review, @comphead ! Please let me know if you have more feedback.

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

alamb · 2026-05-03T12:04:55Z

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

Do you mean for special case ASCII strings?

alamb

The code makes sense to me -- thank you @neilconway

I agree with @comphead that making some better API / helpers to encapsulate this type of StringBuilder

alamb · 2026-05-03T12:07:18Z

+        }
+        let mut bytes = view.to_le_bytes();
+        if len <= 12 {
+            // Inline: value is in bytes[4..4+len], no buffer reference. Convert


This patterns would be really nice to abstract -- namely a method that makes a new StringArray where it is guaranteed that each output value will be exactly as long ad the input value.

Upper/lower are good examples. Maybe also translate with single cahracnters

Maybe also initcap

alamb · 2026-05-03T12:13:22Z

+    let mut completed: Vec<Buffer> = Vec::new();
+    let mut block_size: u32 = STRING_VIEW_INIT_BLOCK_SIZE;
+
+    for i in 0..item_len {


I think you could avoid a lot of this copy / paste by using a StringViewBuilder with the append_block and append_view_unchecked methods

https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html#method.append_block

That being said, I do think it would be slightly slower than this implementation because it would have to re-check the length

It almost seems like what we want is some sort of API on StringViewArray itself, similar to https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary

So this code could be written something like

let new_array = orig_array.map_values(convert)

That would also let us do potentially crazy things like reuse the buffer allocations and modify the values in place if they weren't shared 🤔

If that makes sense to you I can file a ticket in arrow-rs perhaps.

alamb · 2026-05-03T12:19:56Z

run benchmarks upper, lower

adriangbot · 2026-05-03T12:23:28Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4366153592-1993-6mld9 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff
BENCH_NAME=upper
BENCH_COMMAND=cargo bench --features=parquet --bench upper
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-03T12:23:42Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4366153592-1994-4wvqg 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff
BENCH_NAME=lower
BENCH_COMMAND=cargo bench --features=parquet --bench lower
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-03T12:27:34Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                         main                                   neilc_perf-case-conv
-----                         ----                                   --------------------
upper_all_values_are_ascii    1.87     27.4±2.63µs        ? ?/sec    1.00     14.7±0.04µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	40.0s
Peak memory	4.2 GiB
Avg memory	4.2 GiB
CPU user	43.2s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	30.0s
Peak memory	4.2 GiB
Avg memory	4.2 GiB
CPU user	35.9s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

neilconway · 2026-05-03T12:31:04Z

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

Do you mean for special case ASCII strings?

I meant something similar to what you suggested in the PR -- for example, a function like:

  impl StringViewArrayBuilder {
      /// Reserve `len` bytes of output storage and let the caller fill them.
      pub fn append_with<F: FnOnce(&mut [u8])>(&mut self, len: usize, fill: F);
  }

We could use that in reverse, translate, initcap, and lower/upper. I want to check whether it regresses performance against the hand-optimized version though.

alamb · 2026-05-03T12:38:06Z

Filed

Introduce StringViewArrayBuilder::map to avoid duplication #21997

adriangbot · 2026-05-03T12:53:19Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                                                                             main                                   neilc_perf-case-conv
-----                                                                                                                             ----                                   --------------------
lower_all_values_are_ascii: 1024                                                                                                  1.36      2.7±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
lower_all_values_are_ascii: 4096                                                                                                  1.79     13.4±1.97µs        ? ?/sec    1.00      7.5±0.00µs        ? ?/sec
lower_all_values_are_ascii: 8192                                                                                                  1.00     21.1±1.98µs        ? ?/sec    1.03     21.8±1.07µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0, mixed: false                                   1.61     91.5±0.62µs        ? ?/sec    1.00     56.7±0.01µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0, mixed: true                                    1.49    107.3±1.14µs        ? ?/sec    1.00     72.2±0.09µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0.1, mixed: false                                 1.53     85.3±0.41µs        ? ?/sec    1.00     55.7±0.05µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0.1, mixed: true                                  1.44     96.5±0.53µs        ? ?/sec    1.00     67.1±0.77µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0, mixed: false                                  1.56    100.3±0.16µs        ? ?/sec    1.00     64.2±0.15µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0, mixed: true                                   1.48    117.0±0.30µs        ? ?/sec    1.00     78.8±0.24µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0.1, mixed: false                                1.44     93.2±0.15µs        ? ?/sec    1.00     64.8±0.09µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0.1, mixed: true                                 1.46    105.0±0.36µs        ? ?/sec    1.00     71.7±0.19µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0, mixed: false                                   1.95     84.2±0.45µs        ? ?/sec    1.00     43.2±0.08µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0, mixed: true                                    1.53    111.4±0.37µs        ? ?/sec    1.00     73.0±0.18µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0.1, mixed: false                                 1.78     79.4±0.43µs        ? ?/sec    1.00     44.7±0.06µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0.1, mixed: true                                  1.51     99.3±0.39µs        ? ?/sec    1.00     65.9±0.17µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0, mixed: false                                   1.55    183.4±1.50µs        ? ?/sec    1.00    118.2±1.08µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0, mixed: true                                    1.31    222.8±0.51µs        ? ?/sec    1.00    169.8±0.40µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0.1, mixed: false                                 1.53    171.3±0.91µs        ? ?/sec    1.00    111.6±0.05µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0.1, mixed: true                                  1.27    201.4±0.41µs        ? ?/sec    1.00    158.0±0.51µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0, mixed: false                                  1.50    199.8±0.48µs        ? ?/sec    1.00    133.5±0.23µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0, mixed: true                                   1.23    246.4±0.72µs        ? ?/sec    1.00    199.9±0.52µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0.1, mixed: false                                1.42    186.9±0.34µs        ? ?/sec    1.00    132.0±0.33µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0.1, mixed: true                                 1.19    222.1±0.61µs        ? ?/sec    1.00    185.9±1.11µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0, mixed: false                                   1.93    168.4±1.08µs        ? ?/sec    1.00     87.4±0.14µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0, mixed: true                                    1.26    233.0±0.56µs        ? ?/sec    1.00    185.1±0.48µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0.1, mixed: false                                 1.73    158.2±0.81µs        ? ?/sec    1.00     91.5±0.12µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0.1, mixed: true                                  1.24    213.2±0.92µs        ? ?/sec    1.00    171.9±0.45µs        ? ?/sec
lower_sliced_ascii: parent=65536, slice=128, str_len=32                                                                           1.17    601.0±2.15ns        ? ?/sec    1.00    512.7±0.82ns        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: false       1.04    226.6±0.69µs        ? ?/sec    1.00    218.6±0.76µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    217.1±0.57µs        ? ?/sec    1.02    221.4±0.46µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    198.2±0.61µs        ? ?/sec    1.05    208.2±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    201.5±0.52µs        ? ?/sec    1.05    212.4±0.49µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: false      1.01    223.3±0.76µs        ? ?/sec    1.00    221.1±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: true       1.08    236.7±0.91µs        ? ?/sec    1.00    219.3±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: false    1.00    193.6±0.36µs        ? ?/sec    1.09    210.2±0.57µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: true     1.00    197.8±0.38µs        ? ?/sec    1.02    202.3±0.52µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: false       1.00    221.0±0.54µs        ? ?/sec    1.02    225.5±0.73µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    221.5±0.85µs        ? ?/sec    1.00    221.3±0.48µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    203.6±0.42µs        ? ?/sec    1.01    206.4±0.78µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.03    192.8±0.47µs        ? ?/sec    1.00    187.6±0.56µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: false       1.00    448.2±1.20µs        ? ?/sec    1.00    446.1±1.03µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: true        1.01    445.1±1.47µs        ? ?/sec    1.00    440.1±1.22µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    395.7±0.88µs        ? ?/sec    1.04    411.3±1.26µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    387.9±1.18µs        ? ?/sec    1.06    410.7±0.99µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: false      1.02    458.1±1.72µs        ? ?/sec    1.00    450.6±1.26µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: true       1.00    448.6±1.40µs        ? ?/sec    1.01    451.2±1.55µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: false    1.00    408.4±1.22µs        ? ?/sec    1.01    412.2±1.06µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: true     1.00    399.3±1.53µs        ? ?/sec    1.02    406.7±1.05µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: false       1.03    456.0±1.88µs        ? ?/sec    1.00    442.1±0.89µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    445.6±1.37µs        ? ?/sec    1.02    453.2±1.35µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    404.2±1.02µs        ? ?/sec    1.02    413.6±1.32µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    409.2±1.03µs        ? ?/sec    1.01    412.9±1.00µs        ? ?/sec
lower_the_first_value_is_nonascii: 1024                                                                                           1.00     26.4±0.06µs        ? ?/sec    1.00     26.4±0.12µs        ? ?/sec
lower_the_first_value_is_nonascii: 4096                                                                                           1.00    107.4±0.24µs        ? ?/sec    1.00    107.6±0.40µs        ? ?/sec
lower_the_first_value_is_nonascii: 8192                                                                                           1.00    215.2±0.54µs        ? ?/sec    1.00    215.8±0.85µs        ? ?/sec
lower_the_middle_value_is_nonascii: 1024                                                                                          1.00     26.9±0.04µs        ? ?/sec    1.00     26.8±0.10µs        ? ?/sec
lower_the_middle_value_is_nonascii: 4096                                                                                          1.00    108.6±0.22µs        ? ?/sec    1.00    109.0±0.37µs        ? ?/sec
lower_the_middle_value_is_nonascii: 8192                                                                                          1.00    218.1±0.43µs        ? ?/sec    1.00    217.5±0.85µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	795.2s
Peak memory	4.2 GiB
Avg memory	4.2 GiB
CPU user	1006.4s
CPU sys	1.6s
Peak spill	0 B

branch

Metric	Value
Wall time	805.2s
Peak memory	4.2 GiB
Avg memory	4.2 GiB
CPU user	1016.9s
CPU sys	1.1s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-05-04T20:28:28Z

Thanks @neilconway and @Omega359

.

e6c07fc

github-actions Bot added the functions Changes to functions implementation label May 1, 2026

comphead reviewed May 2, 2026

View reviewed changes

Comment thread datafusion/functions/src/string/common.rs

comphead reviewed May 2, 2026

View reviewed changes

Add comments, per review

4ec33f6

alamb added the performance Make DataFusion faster label May 3, 2026

alamb approved these changes May 3, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

alamb mentioned this pull request May 3, 2026

Introduce StringViewArrayBuilder::map to avoid duplication #21997

Open

Omega359 reviewed May 3, 2026

View reviewed changes

Comment thread datafusion/functions/src/string/lower.rs

Add tests for upper, per review

e88e443

alamb added this pull request to the merge queue May 4, 2026

Merged via the queue into apache:main with commit ae796ab May 4, 2026
35 checks passed

Conversation

neilconway commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

neilconway commented May 2, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb May 3, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 3, 2026

Choose a reason for hiding this comment

Uh oh!

alamb May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented May 3, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

adriangbot commented May 3, 2026

Uh oh!

adriangbot commented May 3, 2026

Uh oh!

adriangbot commented May 3, 2026

Uh oh!

neilconway commented May 3, 2026

Uh oh!

alamb commented May 3, 2026

Uh oh!

adriangbot commented May 3, 2026

Uh oh!

Uh oh!

alamb commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

neilconway commented May 1, 2026 •

edited

Loading