Skip to content

perf: Optimize lower, upper for ASCII inputs#21980

Merged
alamb merged 3 commits intoapache:mainfrom
neilconway:neilc/perf-case-conv
May 4, 2026
Merged

perf: Optimize lower, upper for ASCII inputs#21980
alamb merged 3 commits intoapache:mainfrom
neilconway:neilc/perf-case-conv

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 1, 2026

Which issue does this PR close?

Rationale for this change

This PR implements two optimizations for lower and upper on ASCII strings:

  1. For the Utf8/LargeUtf8 code path, we previously did the case conversion via str::to_uppercase or str::to_lowercase. For ASCII inputs, it is faster to use map(u8::to_ascii_lowercase).collect() over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check.
  2. The Utf8View code path previously wasn't optimized for ASCII strings; add a new code path that is. As with the Utf8 code path, we can do case-conversion on bytes directly, which vectorizes well and avoids repeated ASCII checks. In addition, we can build the output StringViewArray directly, which avoids the intermediate strings and unnecessary allocations used in the previous approach.

Benchmarks (ARM64):

upper

  • upper_all_values_are_ascii: 5.4 → 4.1 µs (−24.1%)

lower — all-ASCII (the optimized paths)

  • lower_all_values_are_ascii: 1024: 5.4 → 4.0 µs (−25.9%)
  • lower_all_values_are_ascii: 4096: 22.6 → 15.6 µs (−31.0%)
  • lower_all_values_are_ascii: 8192: 41.9 → 30.8 µs (−26.5%)
  • string_views size:4096 str_len:10 null:0 mixed:false: 151.0 → 75.3 µs (−50.1%)
  • string_views size:4096 str_len:10 null:0 mixed:true: 175.9 → 134.6 µs (−23.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:false: 143.5 → 76.8 µs (−46.5%)
  • string_views size:4096 str_len:10 null:0.1 mixed:true: 166.6 → 125.0 µs (−25.0%)
  • string_views size:4096 str_len:64 null:0 mixed:false: 150.1 → 92.7 µs (−38.2%)
  • string_views size:4096 str_len:64 null:0 mixed:true: 185.2 → 140.1 µs (−24.4%)
  • string_views size:4096 str_len:64 null:0.1 mixed:false: 136.7 → 97.0 µs (−29.0%)
  • string_views size:4096 str_len:64 null:0.1 mixed:true: 173.7 → 131.2 µs (−24.5%)
  • string_views size:4096 str_len:128 null:0 mixed:false: 190.3 → 141.7 µs (−25.5%)
  • string_views size:4096 str_len:128 null:0 mixed:true: 197.0 → 153.7 µs (−22.0%)
  • string_views size:4096 str_len:128 null:0.1 mixed:false: 173.3 → 141.7 µs (−18.2%)
  • string_views size:4096 str_len:128 null:0.1 mixed:true: 184.0 → 142.8 µs (−22.4%)
  • string_views size:8192 str_len:10 null:0 mixed:false: 302.9 → 150.2 µs (−50.4%)
  • string_views size:8192 str_len:10 null:0 mixed:true: 352.9 → 279.0 µs (−20.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:false: 285.0 → 154.3 µs (−45.9%)
  • string_views size:8192 str_len:10 null:0.1 mixed:true: 334.2 → 266.4 µs (−20.3%)
  • string_views size:8192 str_len:64 null:0 mixed:false: 295.6 → 184.4 µs (−37.6%)
  • string_views size:8192 str_len:64 null:0 mixed:true: 371.4 → 290.7 µs (−21.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:false: 273.7 → 195.1 µs (−28.7%)
  • string_views size:8192 str_len:64 null:0.1 mixed:true: 347.0 → 279.6 µs (−19.4%)
  • string_views size:8192 str_len:128 null:0 mixed:false: 379.6 → 285.6 µs (−24.8%)
  • string_views size:8192 str_len:128 null:0 mixed:true: 397.1 → 317.4 µs (−20.1%)
  • string_views size:8192 str_len:128 null:0.1 mixed:false: 364.1 → 285.1 µs (−21.7%)
  • string_views size:8192 str_len:128 null:0.1 mixed:true: 379.3 → 302.3 µs (−20.3%)
  • lower_sliced_ascii parent=65536 slice=128 str_len=32: 980.2 → 797.9 ns (−18.6%)

lower — some non-ASCII string_views (mostly noise)

  • size:4096 str_len:10 null:0 mixed:false: 374.5 → 362.2 µs (−3.3%)
  • size:4096 str_len:10 null:0 mixed:true: 374.6 → 380.5 µs (+1.6%)
  • size:4096 str_len:10 null:0.1 mixed:false: 340.8 → 356.5 µs (+4.6%)
  • size:4096 str_len:10 null:0.1 mixed:true: 344.0 → 352.5 µs (+2.5%)
  • size:4096 str_len:64 null:0 mixed:false: 377.5 → 373.5 µs (−1.1%)
  • size:4096 str_len:64 null:0 mixed:true: 380.6 → 375.0 µs (−1.5%)
  • size:4096 str_len:64 null:0.1 mixed:false: 330.7 → 341.8 µs (+3.4%)
  • size:4096 str_len:64 null:0.1 mixed:true: 341.8 → 354.2 µs (+3.6%)
  • size:4096 str_len:128 null:0 mixed:false: 371.8 → 356.2 µs (−4.2%)
  • size:4096 str_len:128 null:0 mixed:true: 378.9 → 386.0 µs (+1.9%)
  • size:4096 str_len:128 null:0.1 mixed:false: 350.5 → 350.3 µs (−0.1%)
  • size:4096 str_len:128 null:0.1 mixed:true: 351.0 → 337.9 µs (−3.7%)
  • size:8192 str_len:10 null:0 mixed:false: 740.0 → 757.2 µs (+2.3%)
  • size:8192 str_len:10 null:0 mixed:true: 781.3 → 750.2 µs (−4.0%)
  • size:8192 str_len:10 null:0.1 mixed:false: 693.7 → 693.7 µs (0.0%)
  • size:8192 str_len:10 null:0.1 mixed:true: 681.5 → 705.2 µs (+3.5%)
  • size:8192 str_len:64 null:0 mixed:false: 755.5 → 768.6 µs (+1.7%)
  • size:8192 str_len:64 null:0 mixed:true: 759.6 → 754.3 µs (−0.7%)
  • size:8192 str_len:64 null:0.1 mixed:false: 711.5 → 667.8 µs (−6.1%)
  • size:8192 str_len:64 null:0.1 mixed:true: 682.1 → 688.2 µs (+0.9%)
  • size:8192 str_len:128 null:0 mixed:false: 771.5 → 765.9 µs (−0.7%)
  • size:8192 str_len:128 null:0 mixed:true: 747.7 → 792.6 µs (+6.0%)
  • size:8192 str_len:128 null:0.1 mixed:false: 687.1 → 701.3 µs (+2.1%)
  • size:8192 str_len:128 null:0.1 mixed:true: 679.2 → 696.8 µs (+2.6%)

lower — first/middle non-ASCII (flat)

  • lower_the_first_value_is_nonascii: 1024: 42.1 → 42.4 µs (+0.7%)
  • lower_the_first_value_is_nonascii: 4096: 173.9 → 173.3 µs (−0.3%)
  • lower_the_first_value_is_nonascii: 8192: 350.8 → 349.3 µs (−0.4%)
  • lower_the_middle_value_is_nonascii: 1024: 42.9 → 42.8 µs (−0.2%)
  • lower_the_middle_value_is_nonascii: 4096: 175.1 → 176.3 µs (+0.7%)
  • lower_the_middle_value_is_nonascii: 8192: 353.6 → 354.6 µs (+0.3%)

What changes are included in this PR?

  • Implement optimizations
  • Share StringViewArray buffer size constants with the bulk-NULL builders

Are these changes tested?

Covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label May 1, 2026
Comment thread datafusion/functions/src/string/common.rs
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway
I believe the optimization tightly connected to german strings specifics and it would be nice to comment the byte level work

@neilconway
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @comphead ! Please let me know if you have more feedback.

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 3, 2026

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

Do you mean for special case ASCII strings?

@alamb alamb added the performance Make DataFusion faster label May 3, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code makes sense to me -- thank you @neilconway

I agree with @comphead that making some better API / helpers to encapsulate this type of StringBuilder

}
let mut bytes = view.to_le_bytes();
if len <= 12 {
// Inline: value is in bytes[4..4+len], no buffer reference. Convert
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patterns would be really nice to abstract -- namely a method that makes a new StringArray where it is guaranteed that each output value will be exactly as long ad the input value.

Upper/lower are good examples. Maybe also translate with single cahracnters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also initcap

let mut completed: Vec<Buffer> = Vec::new();
let mut block_size: u32 = STRING_VIEW_INIT_BLOCK_SIZE;

for i in 0..item_len {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could avoid a lot of this copy / paste by using a StringViewBuilder with the append_block and append_view_unchecked methods

https://docs.rs/arrow/latest/arrow/array/type.StringViewBuilder.html#method.append_block

That being said, I do think it would be slightly slower than this implementation because it would have to re-check the length

It almost seems like what we want is some sort of API on StringViewArray itself, similar to https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary

So this code could be written something like

let new_array = orig_array.map_values(convert)

That would also let us do potentially crazy things like reuse the buffer allocations and modify the values in place if they weren't shared 🤔

If that makes sense to you I can file a ticket in arrow-rs perhaps.

Comment thread datafusion/functions/src/strings.rs
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 3, 2026

run benchmarks upper, lower

@adriangbot

This comment has been minimized.

@alamb

This comment has been minimized.

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4366153592-1993-6mld9 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff
BENCH_NAME=upper
BENCH_COMMAND=cargo bench --features=parquet --bench upper
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4366153592-1994-4wvqg 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing neilc/perf-case-conv (4ec33f6) to ba038e9 (merge-base) diff
BENCH_NAME=lower
BENCH_COMMAND=cargo bench --features=parquet --bench lower
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                         main                                   neilc_perf-case-conv
-----                         ----                                   --------------------
upper_all_values_are_ascii    1.87     27.4±2.63µs        ? ?/sec    1.00     14.7±0.04µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 40.0s
Peak memory 4.2 GiB
Avg memory 4.2 GiB
CPU user 43.2s
CPU sys 0.7s
Peak spill 0 B

branch

Metric Value
Wall time 30.0s
Peak memory 4.2 GiB
Avg memory 4.2 GiB
CPU user 35.9s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

@neilconway
Copy link
Copy Markdown
Contributor Author

FYI I think there's an opportunity to refactor this into an extension to the StringViewArrayBuilder API we just added (and perhaps use it in some other string UDFs), but I'd like to land this change first, so we can be sure the refactoring doesn't regress codegen.

Do you mean for special case ASCII strings?

I meant something similar to what you suggested in the PR -- for example, a function like:

  impl StringViewArrayBuilder {
      /// Reserve `len` bytes of output storage and let the caller fill them.
      pub fn append_with<F: FnOnce(&mut [u8])>(&mut self, len: usize, fill: F);
  }

We could use that in reverse, translate, initcap, and lower/upper. I want to check whether it regresses performance against the hand-optimized version though.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 3, 2026

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                                                                                             main                                   neilc_perf-case-conv
-----                                                                                                                             ----                                   --------------------
lower_all_values_are_ascii: 1024                                                                                                  1.36      2.7±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
lower_all_values_are_ascii: 4096                                                                                                  1.79     13.4±1.97µs        ? ?/sec    1.00      7.5±0.00µs        ? ?/sec
lower_all_values_are_ascii: 8192                                                                                                  1.00     21.1±1.98µs        ? ?/sec    1.03     21.8±1.07µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0, mixed: false                                   1.61     91.5±0.62µs        ? ?/sec    1.00     56.7±0.01µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0, mixed: true                                    1.49    107.3±1.14µs        ? ?/sec    1.00     72.2±0.09µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0.1, mixed: false                                 1.53     85.3±0.41µs        ? ?/sec    1.00     55.7±0.05µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 10, null_density: 0.1, mixed: true                                  1.44     96.5±0.53µs        ? ?/sec    1.00     67.1±0.77µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0, mixed: false                                  1.56    100.3±0.16µs        ? ?/sec    1.00     64.2±0.15µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0, mixed: true                                   1.48    117.0±0.30µs        ? ?/sec    1.00     78.8±0.24µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0.1, mixed: false                                1.44     93.2±0.15µs        ? ?/sec    1.00     64.8±0.09µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 128, null_density: 0.1, mixed: true                                 1.46    105.0±0.36µs        ? ?/sec    1.00     71.7±0.19µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0, mixed: false                                   1.95     84.2±0.45µs        ? ?/sec    1.00     43.2±0.08µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0, mixed: true                                    1.53    111.4±0.37µs        ? ?/sec    1.00     73.0±0.18µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0.1, mixed: false                                 1.78     79.4±0.43µs        ? ?/sec    1.00     44.7±0.06µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 4096, str_len: 64, null_density: 0.1, mixed: true                                  1.51     99.3±0.39µs        ? ?/sec    1.00     65.9±0.17µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0, mixed: false                                   1.55    183.4±1.50µs        ? ?/sec    1.00    118.2±1.08µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0, mixed: true                                    1.31    222.8±0.51µs        ? ?/sec    1.00    169.8±0.40µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0.1, mixed: false                                 1.53    171.3±0.91µs        ? ?/sec    1.00    111.6±0.05µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 10, null_density: 0.1, mixed: true                                  1.27    201.4±0.41µs        ? ?/sec    1.00    158.0±0.51µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0, mixed: false                                  1.50    199.8±0.48µs        ? ?/sec    1.00    133.5±0.23µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0, mixed: true                                   1.23    246.4±0.72µs        ? ?/sec    1.00    199.9±0.52µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0.1, mixed: false                                1.42    186.9±0.34µs        ? ?/sec    1.00    132.0±0.33µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 128, null_density: 0.1, mixed: true                                 1.19    222.1±0.61µs        ? ?/sec    1.00    185.9±1.11µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0, mixed: false                                   1.93    168.4±1.08µs        ? ?/sec    1.00     87.4±0.14µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0, mixed: true                                    1.26    233.0±0.56µs        ? ?/sec    1.00    185.1±0.48µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0.1, mixed: false                                 1.73    158.2±0.81µs        ? ?/sec    1.00     91.5±0.12µs        ? ?/sec
lower_all_values_are_ascii_string_views: size: 8192, str_len: 64, null_density: 0.1, mixed: true                                  1.24    213.2±0.92µs        ? ?/sec    1.00    171.9±0.45µs        ? ?/sec
lower_sliced_ascii: parent=65536, slice=128, str_len=32                                                                           1.17    601.0±2.15ns        ? ?/sec    1.00    512.7±0.82ns        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: false       1.04    226.6±0.69µs        ? ?/sec    1.00    218.6±0.76µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    217.1±0.57µs        ? ?/sec    1.02    221.4±0.46µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    198.2±0.61µs        ? ?/sec    1.05    208.2±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    201.5±0.52µs        ? ?/sec    1.05    212.4±0.49µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: false      1.01    223.3±0.76µs        ? ?/sec    1.00    221.1±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: true       1.08    236.7±0.91µs        ? ?/sec    1.00    219.3±0.53µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: false    1.00    193.6±0.36µs        ? ?/sec    1.09    210.2±0.57µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: true     1.00    197.8±0.38µs        ? ?/sec    1.02    202.3±0.52µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: false       1.00    221.0±0.54µs        ? ?/sec    1.02    225.5±0.73µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    221.5±0.85µs        ? ?/sec    1.00    221.3±0.48µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    203.6±0.42µs        ? ?/sec    1.01    206.4±0.78µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 4096, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.03    192.8±0.47µs        ? ?/sec    1.00    187.6±0.56µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: false       1.00    448.2±1.20µs        ? ?/sec    1.00    446.1±1.03µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0, mixed: true        1.01    445.1±1.47µs        ? ?/sec    1.00    440.1±1.22µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    395.7±0.88µs        ? ?/sec    1.04    411.3±1.26µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 10, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    387.9±1.18µs        ? ?/sec    1.06    410.7±0.99µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: false      1.02    458.1±1.72µs        ? ?/sec    1.00    450.6±1.26µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0, mixed: true       1.00    448.6±1.40µs        ? ?/sec    1.01    451.2±1.55µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: false    1.00    408.4±1.22µs        ? ?/sec    1.01    412.2±1.06µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 128, non_ascii_density: 0.1, null_density: 0.1, mixed: true     1.00    399.3±1.53µs        ? ?/sec    1.02    406.7±1.05µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: false       1.03    456.0±1.88µs        ? ?/sec    1.00    442.1±0.89µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0, mixed: true        1.00    445.6±1.37µs        ? ?/sec    1.02    453.2±1.35µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: false     1.00    404.2±1.02µs        ? ?/sec    1.02    413.6±1.32µs        ? ?/sec
lower_some_values_are_nonascii_string_views: size: 8192, str_len: 64, non_ascii_density: 0.1, null_density: 0.1, mixed: true      1.00    409.2±1.03µs        ? ?/sec    1.01    412.9±1.00µs        ? ?/sec
lower_the_first_value_is_nonascii: 1024                                                                                           1.00     26.4±0.06µs        ? ?/sec    1.00     26.4±0.12µs        ? ?/sec
lower_the_first_value_is_nonascii: 4096                                                                                           1.00    107.4±0.24µs        ? ?/sec    1.00    107.6±0.40µs        ? ?/sec
lower_the_first_value_is_nonascii: 8192                                                                                           1.00    215.2±0.54µs        ? ?/sec    1.00    215.8±0.85µs        ? ?/sec
lower_the_middle_value_is_nonascii: 1024                                                                                          1.00     26.9±0.04µs        ? ?/sec    1.00     26.8±0.10µs        ? ?/sec
lower_the_middle_value_is_nonascii: 4096                                                                                          1.00    108.6±0.22µs        ? ?/sec    1.00    109.0±0.37µs        ? ?/sec
lower_the_middle_value_is_nonascii: 8192                                                                                          1.00    218.1±0.43µs        ? ?/sec    1.00    217.5±0.85µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 795.2s
Peak memory 4.2 GiB
Avg memory 4.2 GiB
CPU user 1006.4s
CPU sys 1.6s
Peak spill 0 B

branch

Metric Value
Wall time 805.2s
Peak memory 4.2 GiB
Avg memory 4.2 GiB
CPU user 1016.9s
CPU sys 1.1s
Peak spill 0 B

File an issue against this benchmark runner

Comment thread datafusion/functions/src/string/lower.rs
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 4, 2026

Thanks @neilconway and @Omega359

@alamb alamb added this pull request to the merge queue May 4, 2026
Merged via the queue into apache:main with commit ae796ab May 4, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lower, upper could be further optimized for ASCII-only inputs

5 participants