Optimize `regexp_replace` by stripping trailing .* from anchored patterns. 2.4x improvement (ClickBench Q28) by Dandandan · Pull Request #21379 · apache/datafusion

Dandandan · 2026-04-05T06:14:31Z

Which issue does this PR close?

Closes: Optimize regexp_replace by stripping trailing .* from anchored patterns #21382

Rationale for this change

regexp_replace with anchored patterns like ^https?://(?:www\.)?([^/]+)/.*$ spends time scanning the trailing .*$ and using captures() + expand() with String allocation on every row.

It just happens this SELECT regexp_replace(url, '^https?://(?:www\.)?([^/]+)/.*$', '\1') query benefits from this optimization (2.4x faster)

What changes are included in this PR?

Strip trailing .*$ from the pattern string for anchored patterns where the replacement is \1
Use captures_read with pre-allocated CaptureLocations for direct byte-slice extraction

Are these changes tested?

Yes, covered by existing regexp_replace unit tests, ClickBench sqllogictests, and the new URL domain extraction sqllogictest.

Are there any user-facing changes?

No.

alamb · 2026-04-06T16:01:37Z

│ QQuery 28 │ 7790.05 / 7848.59 ±33.05 / 7892.04 ms │ 3206.58 / 3225.44 ±15.28 / 3245.49 ms │ +2.43x faster │

datafusion/benchmarks/queries/clickbench/queries/q28.sql

Line 4 in c17c87c

    
           SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Seems to make a non trivial difference

alamb · 2026-04-06T16:16:26Z

I don't think this rewrite is correct in all cases

Here are some .slt test that pass on main but not this PR:

# If the overall pattern matches but capture group 1 does not participate,
# regexp_replace(..., '\1') should substitute the empty string, not keep
# the original input.
query B
SELECT regexp_replace('bzzz', '^(a)?b.*$', '\1') = '';
----
true

# Stripping trailing .*$ must not change match semantics for inputs with
# newlines when the original pattern does not use the 's' flag.
query B
SELECT regexp_replace(concat('http://x/', chr(10), 'rest'), '^https?://([^/]+)/.*$', '\1')
       = concat('http://x/', chr(10), 'rest');
----
true

(they return false on this branch)

adriangb · 2026-04-06T16:34:31Z

SELECT regexp_replace(concat('http://x/', chr(10), 'rest'), '^https?://([^/]+)/.*$', '\1')
       = concat('http://x/', chr(10), 'rest');

FWIW I checked and duckdb also returns true. Maybe we should commit these tests? Or have some sort of regex fuzz tests?

Dandandan · 2026-04-06T17:09:44Z

Hmm I need to spend some more time to make these cases work (but I think the optimization is still right for this case, we should mostly retract other cases).

alamb · 2026-04-08T20:13:05Z

(I haven't added coverage for the Utf8View branch as it will be a larger change -- however, i think we can do it as a follow on PR -- I'll make a proposal)

adriangbot · 2026-04-08T20:55:41Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                           main                                   optimize-regexp-replace-v2
-----                           ----                                   --------------------------
regexp_count_1000 string        1.00      2.7±0.04ms        ? ?/sec    1.00      2.7±0.01ms        ? ?/sec
regexp_count_1000 utf8view      1.00      2.8±0.01ms        ? ?/sec    1.01      2.8±0.01ms        ? ?/sec
regexp_instr_1000 string        1.00      3.3±0.01ms        ? ?/sec    1.01      3.3±0.01ms        ? ?/sec
regexp_instr_1000 utf8view      1.00      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
regexp_like scalar utf8         1.01     15.5±0.04µs        ? ?/sec    1.00     15.4±0.01µs        ? ?/sec
regexp_like_1000                1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
regexp_like_1000 utf8view       1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.01ms        ? ?/sec
regexp_match_1000               1.00      3.4±0.01ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
regexp_match_1000 utf8view      1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
regexp_replace_1000             1.00      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
regexp_replace_1000 utf8view    1.00      2.6±0.01ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	112.0s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	135.5s
CPU sys	0.9s
Peak spill	0 B

branch

Metric	Value
Wall time	110.8s
Peak memory	3.5 GiB
Avg memory	3.5 GiB
CPU user	135.3s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-08T20:56:04Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4209590481-972-5z5zs 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing optimize-regexp-replace-v2 (114eec6) to 603bfb4 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

alamb · 2026-04-08T21:04:18Z

Added test coverage in

Add more regexp_replace test coverage #21485

adriangbot · 2026-04-08T21:07:39Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and optimize-regexp-replace-v2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            optimize-regexp-replace-v2 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.38 ±6.27 / 16.91 ms │          1.20 / 4.38 ±6.29 / 16.96 ms │     no change │
│ QQuery 1  │        14.30 / 14.51 ±0.16 / 14.75 ms │        14.17 / 14.59 ±0.22 / 14.79 ms │     no change │
│ QQuery 2  │        43.27 / 43.62 ±0.30 / 44.12 ms │        43.71 / 43.99 ±0.25 / 44.39 ms │     no change │
│ QQuery 3  │        40.94 / 44.31 ±2.51 / 47.32 ms │        43.12 / 43.94 ±0.90 / 45.35 ms │     no change │
│ QQuery 4  │     271.70 / 282.50 ±5.48 / 286.38 ms │     283.20 / 292.29 ±7.09 / 304.73 ms │     no change │
│ QQuery 5  │     335.72 / 340.03 ±4.12 / 347.73 ms │     341.81 / 342.57 ±0.69 / 343.54 ms │     no change │
│ QQuery 6  │           5.30 / 6.55 ±0.89 / 7.94 ms │           6.02 / 7.01 ±0.92 / 8.31 ms │  1.07x slower │
│ QQuery 7  │        16.41 / 17.03 ±1.12 / 19.26 ms │        16.51 / 17.21 ±0.84 / 18.63 ms │     no change │
│ QQuery 8  │     402.01 / 411.16 ±5.28 / 417.77 ms │     405.69 / 413.66 ±6.14 / 423.67 ms │     no change │
│ QQuery 9  │     630.78 / 636.43 ±5.52 / 645.07 ms │     619.01 / 629.59 ±5.60 / 634.80 ms │     no change │
│ QQuery 10 │        89.16 / 91.74 ±2.05 / 94.80 ms │        89.34 / 92.21 ±2.80 / 97.29 ms │     no change │
│ QQuery 11 │     102.80 / 103.36 ±0.58 / 104.43 ms │     101.95 / 104.43 ±1.43 / 105.87 ms │     no change │
│ QQuery 12 │     327.96 / 333.37 ±5.50 / 343.97 ms │    335.73 / 343.62 ±10.53 / 364.04 ms │     no change │
│ QQuery 13 │     452.15 / 457.11 ±3.28 / 461.93 ms │    438.48 / 473.84 ±20.28 / 492.81 ms │     no change │
│ QQuery 14 │     338.28 / 346.14 ±6.51 / 356.60 ms │     337.45 / 341.81 ±3.33 / 346.91 ms │     no change │
│ QQuery 15 │    343.26 / 366.04 ±18.72 / 389.48 ms │    345.56 / 358.72 ±12.00 / 380.38 ms │     no change │
│ QQuery 16 │     713.36 / 724.01 ±7.79 / 736.65 ms │    704.38 / 758.37 ±40.57 / 829.75 ms │     no change │
│ QQuery 17 │     694.46 / 698.81 ±4.00 / 706.03 ms │    703.65 / 717.02 ±14.63 / 743.46 ms │     no change │
│ QQuery 18 │ 1373.14 / 1434.86 ±43.84 / 1507.14 ms │ 1373.61 / 1442.32 ±45.52 / 1494.11 ms │     no change │
│ QQuery 19 │        35.39 / 36.55 ±1.15 / 38.75 ms │       35.10 / 46.02 ±20.11 / 86.21 ms │  1.26x slower │
│ QQuery 20 │    717.18 / 741.48 ±26.08 / 786.77 ms │    715.57 / 726.72 ±16.58 / 759.07 ms │     no change │
│ QQuery 21 │     758.99 / 766.60 ±7.79 / 781.09 ms │     761.98 / 765.58 ±3.57 / 772.22 ms │     no change │
│ QQuery 22 │  1123.97 / 1128.41 ±4.64 / 1137.36 ms │  1121.14 / 1124.65 ±3.66 / 1131.57 ms │     no change │
│ QQuery 23 │ 3034.05 / 3053.51 ±19.65 / 3088.19 ms │  3038.58 / 3049.51 ±6.15 / 3056.19 ms │     no change │
│ QQuery 24 │      97.77 / 102.07 ±2.17 / 103.58 ms │     100.09 / 102.44 ±1.89 / 105.60 ms │     no change │
│ QQuery 25 │     137.16 / 139.00 ±2.32 / 143.48 ms │     136.49 / 139.85 ±2.63 / 142.66 ms │     no change │
│ QQuery 26 │      96.41 / 101.40 ±2.66 / 103.74 ms │      99.03 / 101.26 ±1.93 / 104.27 ms │     no change │
│ QQuery 27 │     847.75 / 852.17 ±5.33 / 862.63 ms │     843.38 / 851.47 ±6.80 / 863.62 ms │     no change │
│ QQuery 28 │ 7696.18 / 7723.18 ±18.37 / 7742.53 ms │ 3220.02 / 3250.82 ±17.33 / 3270.96 ms │ +2.38x faster │
│ QQuery 29 │        48.96 / 54.24 ±6.15 / 64.34 ms │      50.13 / 72.07 ±34.06 / 139.71 ms │  1.33x slower │
│ QQuery 30 │     357.93 / 363.05 ±6.06 / 370.51 ms │     356.22 / 361.23 ±6.42 / 373.18 ms │     no change │
│ QQuery 31 │    352.75 / 361.42 ±12.63 / 386.31 ms │     367.62 / 381.91 ±9.42 / 394.24 ms │  1.06x slower │
│ QQuery 32 │ 1246.81 / 1274.63 ±21.22 / 1308.20 ms │ 1009.74 / 1051.97 ±39.57 / 1108.01 ms │ +1.21x faster │
│ QQuery 33 │ 1476.77 / 1524.69 ±35.22 / 1581.57 ms │ 1421.25 / 1434.90 ±13.54 / 1460.60 ms │ +1.06x faster │
│ QQuery 34 │ 1474.63 / 1494.76 ±20.71 / 1529.34 ms │  1435.03 / 1445.43 ±8.12 / 1454.97 ms │     no change │
│ QQuery 35 │     376.49 / 382.33 ±3.32 / 386.21 ms │     374.13 / 380.78 ±6.44 / 391.72 ms │     no change │
│ QQuery 36 │     111.06 / 121.36 ±5.44 / 126.99 ms │     117.47 / 121.07 ±2.15 / 123.03 ms │     no change │
│ QQuery 37 │        46.92 / 49.98 ±2.64 / 53.22 ms │        46.28 / 48.75 ±2.06 / 51.91 ms │     no change │
│ QQuery 38 │        74.30 / 76.93 ±1.76 / 79.45 ms │        74.92 / 76.81 ±1.38 / 79.17 ms │     no change │
│ QQuery 39 │     201.11 / 214.09 ±7.09 / 221.56 ms │     203.10 / 211.92 ±5.23 / 219.11 ms │     no change │
│ QQuery 40 │        23.48 / 24.84 ±0.81 / 25.68 ms │        22.90 / 24.76 ±2.01 / 28.53 ms │     no change │
│ QQuery 41 │        19.17 / 20.93 ±1.07 / 22.21 ms │        19.48 / 20.77 ±1.00 / 22.33 ms │     no change │
│ QQuery 42 │        18.88 / 19.87 ±0.96 / 21.65 ms │        18.36 / 19.41 ±0.54 / 19.82 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 26983.47ms │
│ Total Time (optimize-regexp-replace-v2)   │ 22251.66ms │
│ Average Time (HEAD)                       │   627.52ms │
│ Average Time (optimize-regexp-replace-v2) │   517.48ms │
│ Queries Faster                            │          3 │
│ Queries Slower                            │          4 │
│ Queries with No Change                    │         36 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	136.0s
Peak memory	38.3 GiB
Avg memory	29.0 GiB
CPU user	1277.9s
CPU sys	99.2s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	112.3s
Peak memory	41.2 GiB
Avg memory	29.6 GiB
CPU user	1057.6s
CPU sys	86.3s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-04-08T21:10:25Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and optimize-regexp-replace-v2
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃            optimize-regexp-replace-v2 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           1.25 / 4.59 ±6.49 / 17.57 ms │          1.22 / 4.55 ±6.50 / 17.54 ms │     no change │
│ QQuery 1  │         14.22 / 14.88 ±0.36 / 15.19 ms │        14.12 / 14.72 ±0.33 / 15.14 ms │     no change │
│ QQuery 2  │         44.14 / 44.37 ±0.16 / 44.59 ms │        44.46 / 44.62 ±0.15 / 44.90 ms │     no change │
│ QQuery 3  │         40.98 / 43.63 ±2.36 / 46.75 ms │        41.02 / 44.60 ±2.63 / 48.12 ms │     no change │
│ QQuery 4  │      303.61 / 309.74 ±4.26 / 315.67 ms │     289.52 / 305.59 ±8.95 / 313.21 ms │     no change │
│ QQuery 5  │      362.47 / 365.42 ±1.63 / 367.27 ms │     347.91 / 355.54 ±6.23 / 364.10 ms │     no change │
│ QQuery 6  │            5.69 / 6.24 ±0.87 / 7.98 ms │           5.68 / 6.35 ±0.90 / 8.14 ms │     no change │
│ QQuery 7  │         17.09 / 17.39 ±0.30 / 17.78 ms │        17.00 / 17.33 ±0.28 / 17.77 ms │     no change │
│ QQuery 8  │      435.96 / 445.88 ±9.22 / 457.28 ms │     434.88 / 449.13 ±8.91 / 458.12 ms │     no change │
│ QQuery 9  │      692.16 / 704.50 ±7.92 / 713.31 ms │    684.43 / 706.23 ±18.50 / 730.68 ms │     no change │
│ QQuery 10 │        93.78 / 96.04 ±2.95 / 101.79 ms │        92.20 / 93.80 ±1.96 / 97.29 ms │     no change │
│ QQuery 11 │      106.37 / 107.68 ±1.20 / 109.91 ms │     107.69 / 108.96 ±1.61 / 112.11 ms │     no change │
│ QQuery 12 │      355.89 / 360.36 ±2.59 / 363.51 ms │     357.08 / 364.46 ±5.43 / 370.91 ms │     no change │
│ QQuery 13 │      464.51 / 472.85 ±8.54 / 486.39 ms │     484.33 / 492.99 ±7.51 / 504.49 ms │     no change │
│ QQuery 14 │      353.52 / 364.18 ±6.16 / 371.55 ms │     356.11 / 363.35 ±3.93 / 367.55 ms │     no change │
│ QQuery 15 │      371.62 / 380.46 ±9.88 / 395.77 ms │    388.83 / 415.85 ±31.47 / 471.54 ms │  1.09x slower │
│ QQuery 16 │     742.63 / 758.94 ±15.61 / 786.37 ms │    772.87 / 793.30 ±24.08 / 839.04 ms │     no change │
│ QQuery 17 │      738.12 / 744.34 ±5.13 / 753.73 ms │     745.38 / 757.25 ±6.81 / 766.50 ms │     no change │
│ QQuery 18 │  1519.49 / 1532.32 ±11.49 / 1551.36 ms │ 1517.92 / 1545.60 ±36.38 / 1616.48 ms │     no change │
│ QQuery 19 │        36.02 / 49.35 ±20.17 / 89.11 ms │        37.43 / 39.57 ±4.06 / 47.69 ms │ +1.25x faster │
│ QQuery 20 │     729.35 / 747.53 ±14.54 / 763.83 ms │    727.30 / 746.59 ±20.13 / 781.61 ms │     no change │
│ QQuery 21 │      777.81 / 786.78 ±6.52 / 796.69 ms │    776.09 / 799.71 ±22.07 / 826.51 ms │     no change │
│ QQuery 22 │  1151.29 / 1160.48 ±10.73 / 1177.40 ms │  1155.49 / 1160.70 ±4.36 / 1166.98 ms │     no change │
│ QQuery 23 │  3132.20 / 3166.55 ±36.63 / 3234.54 ms │ 3146.81 / 3176.09 ±33.61 / 3235.89 ms │     no change │
│ QQuery 24 │      100.68 / 103.34 ±1.56 / 105.42 ms │     100.21 / 104.88 ±2.41 / 107.01 ms │     no change │
│ QQuery 25 │      141.18 / 142.27 ±0.77 / 143.19 ms │     144.44 / 146.26 ±2.05 / 149.84 ms │     no change │
│ QQuery 26 │      102.51 / 104.36 ±1.56 / 106.64 ms │     102.48 / 104.21 ±1.61 / 107.27 ms │     no change │
│ QQuery 27 │      862.57 / 866.46 ±5.23 / 876.74 ms │     861.05 / 868.22 ±7.86 / 883.52 ms │     no change │
│ QQuery 28 │  7765.88 / 7826.81 ±37.25 / 7871.01 ms │ 3297.37 / 3339.85 ±29.83 / 3384.18 ms │ +2.34x faster │
│ QQuery 29 │         51.07 / 56.64 ±5.89 / 66.60 ms │        51.30 / 57.41 ±5.13 / 65.21 ms │     no change │
│ QQuery 30 │      365.03 / 372.49 ±5.56 / 382.03 ms │     375.89 / 382.89 ±4.82 / 388.68 ms │     no change │
│ QQuery 31 │     371.24 / 389.80 ±10.79 / 402.44 ms │     384.77 / 395.39 ±8.52 / 407.71 ms │     no change │
│ QQuery 32 │ 1069.69 / 1191.81 ±112.37 / 1361.94 ms │ 1061.33 / 1092.44 ±33.19 / 1155.04 ms │ +1.09x faster │
│ QQuery 33 │  1501.34 / 1523.66 ±22.56 / 1564.26 ms │  1524.33 / 1533.43 ±9.79 / 1550.37 ms │     no change │
│ QQuery 34 │   1541.59 / 1550.87 ±7.48 / 1562.45 ms │ 1519.21 / 1546.38 ±21.85 / 1573.53 ms │     no change │
│ QQuery 35 │      402.30 / 417.08 ±9.68 / 429.31 ms │     414.44 / 423.15 ±8.07 / 437.87 ms │     no change │
│ QQuery 36 │      120.84 / 124.93 ±2.20 / 127.50 ms │     124.52 / 127.81 ±3.23 / 132.58 ms │     no change │
│ QQuery 37 │         49.39 / 50.26 ±0.58 / 50.96 ms │        51.46 / 52.95 ±1.45 / 55.11 ms │  1.05x slower │
│ QQuery 38 │         75.36 / 77.35 ±1.62 / 80.07 ms │        75.28 / 77.21 ±1.52 / 79.58 ms │     no change │
│ QQuery 39 │      210.35 / 223.47 ±9.54 / 233.54 ms │    217.54 / 233.19 ±10.62 / 244.68 ms │     no change │
│ QQuery 40 │         25.18 / 26.85 ±1.61 / 29.88 ms │        23.46 / 27.48 ±2.40 / 30.77 ms │     no change │
│ QQuery 41 │         21.63 / 23.51 ±1.58 / 25.84 ms │        20.90 / 22.02 ±0.88 / 23.20 ms │ +1.07x faster │
│ QQuery 42 │         19.65 / 20.49 ±0.50 / 21.11 ms │        20.76 / 21.12 ±0.23 / 21.38 ms │     no change │
└───────────┴────────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 27776.95ms │
│ Total Time (optimize-regexp-replace-v2)   │ 23363.14ms │
│ Average Time (HEAD)                       │   645.98ms │
│ Average Time (optimize-regexp-replace-v2) │   543.33ms │
│ Queries Faster                            │          4 │
│ Queries Slower                            │          2 │
│ Queries with No Change                    │         37 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	140.0s
Peak memory	37.5 GiB
Avg memory	27.8 GiB
CPU user	1322.8s
CPU sys	94.5s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	117.9s
Peak memory	38.0 GiB
Avg memory	27.3 GiB
CPU user	1104.9s
CPU sys	95.1s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-04-08T21:11:13Z

I also made a PR to avoid the duplication of loops:

Consolidate special case regexp_match logic #21486

alamb · 2026-04-08T21:13:13Z

run benchmark clickbench_1

adriangbot · 2026-04-08T21:15:56Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4209692663-975-fzxzn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing optimize-regexp-replace-v2 (114eec6) to 603bfb4 (merge-base) diff using: clickbench_1
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-08T21:34:16Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and optimize-regexp-replace-v2
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            optimize-regexp-replace-v2 ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           0.74 / 1.08 ±0.62 / 2.32 ms │           0.75 / 1.10 ±0.65 / 2.40 ms │     no change │
│ QQuery 1  │        13.67 / 14.14 ±0.44 / 14.89 ms │        13.57 / 13.89 ±0.31 / 14.42 ms │     no change │
│ QQuery 2  │        40.94 / 41.15 ±0.25 / 41.61 ms │        41.10 / 41.32 ±0.23 / 41.66 ms │     no change │
│ QQuery 3  │        38.24 / 38.78 ±0.58 / 39.86 ms │        38.28 / 38.61 ±0.39 / 39.37 ms │     no change │
│ QQuery 4  │     266.99 / 267.81 ±0.68 / 268.67 ms │     265.18 / 274.41 ±9.65 / 289.09 ms │     no change │
│ QQuery 5  │     439.77 / 447.71 ±6.73 / 457.30 ms │     446.42 / 450.16 ±3.00 / 453.69 ms │     no change │
│ QQuery 6  │           6.38 / 6.75 ±0.22 / 7.04 ms │           6.26 / 6.70 ±0.39 / 7.20 ms │     no change │
│ QQuery 7  │        15.82 / 16.18 ±0.42 / 16.99 ms │        15.88 / 16.46 ±0.70 / 17.63 ms │     no change │
│ QQuery 8  │     379.23 / 382.71 ±2.27 / 385.59 ms │     392.54 / 397.03 ±4.31 / 404.09 ms │     no change │
│ QQuery 9  │     610.15 / 616.60 ±5.58 / 626.78 ms │     618.54 / 627.81 ±7.31 / 637.98 ms │     no change │
│ QQuery 10 │        90.03 / 91.89 ±1.57 / 94.02 ms │        90.23 / 93.02 ±2.70 / 97.93 ms │     no change │
│ QQuery 11 │     103.12 / 105.13 ±1.50 / 107.01 ms │     103.07 / 104.18 ±0.96 / 105.61 ms │     no change │
│ QQuery 12 │     443.79 / 451.94 ±5.91 / 460.74 ms │     447.84 / 456.48 ±6.98 / 465.38 ms │     no change │
│ QQuery 13 │    512.30 / 531.34 ±15.42 / 550.41 ms │     516.84 / 523.74 ±9.02 / 541.31 ms │     no change │
│ QQuery 14 │     455.85 / 463.63 ±6.32 / 474.36 ms │     458.59 / 463.61 ±6.28 / 475.67 ms │     no change │
│ QQuery 15 │    344.07 / 364.27 ±18.93 / 397.56 ms │     332.26 / 346.32 ±9.66 / 357.77 ms │     no change │
│ QQuery 16 │    733.90 / 757.07 ±18.32 / 783.81 ms │     694.80 / 702.62 ±6.81 / 715.32 ms │ +1.08x faster │
│ QQuery 17 │    687.07 / 725.80 ±23.08 / 754.73 ms │     689.44 / 693.20 ±2.79 / 696.36 ms │     no change │
│ QQuery 18 │  1336.36 / 1347.82 ±8.59 / 1358.07 ms │ 1365.88 / 1429.05 ±36.52 / 1464.82 ms │  1.06x slower │
│ QQuery 19 │        35.48 / 36.81 ±1.10 / 38.74 ms │        37.44 / 39.19 ±1.92 / 42.34 ms │  1.06x slower │
│ QQuery 20 │    618.17 / 628.39 ±13.37 / 654.37 ms │    621.80 / 638.64 ±15.82 / 665.88 ms │     no change │
│ QQuery 21 │     704.98 / 710.23 ±3.52 / 715.33 ms │     707.52 / 711.89 ±7.23 / 726.31 ms │     no change │
│ QQuery 22 │  1354.22 / 1360.16 ±6.15 / 1370.21 ms │  1347.42 / 1351.45 ±3.26 / 1355.96 ms │     no change │
│ QQuery 23 │ 3702.32 / 3738.46 ±23.19 / 3768.29 ms │ 3688.90 / 3716.39 ±20.55 / 3746.43 ms │     no change │
│ QQuery 24 │     212.13 / 218.98 ±4.32 / 225.21 ms │     211.64 / 216.08 ±2.74 / 219.84 ms │     no change │
│ QQuery 25 │     183.21 / 184.80 ±1.33 / 186.62 ms │     183.29 / 184.43 ±1.20 / 186.04 ms │     no change │
│ QQuery 26 │     212.50 / 216.44 ±2.31 / 219.75 ms │     213.68 / 215.95 ±2.26 / 219.70 ms │     no change │
│ QQuery 27 │     745.59 / 749.93 ±3.16 / 755.00 ms │     744.80 / 749.43 ±4.13 / 756.66 ms │     no change │
│ QQuery 28 │ 8434.90 / 8470.17 ±49.07 / 8567.48 ms │ 3408.27 / 3427.43 ±13.05 / 3446.00 ms │ +2.47x faster │
│ QQuery 29 │        49.26 / 52.66 ±3.09 / 56.54 ms │        46.53 / 50.63 ±3.43 / 56.60 ms │     no change │
│ QQuery 30 │    429.16 / 443.55 ±11.03 / 462.60 ms │     434.80 / 441.85 ±7.29 / 455.52 ms │     no change │
│ QQuery 31 │     418.43 / 426.56 ±8.06 / 440.37 ms │     417.19 / 429.06 ±7.28 / 438.37 ms │     no change │
│ QQuery 32 │ 1026.79 / 1038.74 ±16.94 / 1072.11 ms │ 1185.13 / 1265.02 ±54.15 / 1333.93 ms │  1.22x slower │
│ QQuery 33 │  1474.82 / 1482.10 ±4.55 / 1488.54 ms │ 1503.27 / 1543.66 ±28.26 / 1585.97 ms │     no change │
│ QQuery 34 │ 1500.53 / 1524.54 ±13.73 / 1542.51 ms │ 1587.63 / 1655.90 ±61.99 / 1735.02 ms │  1.09x slower │
│ QQuery 35 │     362.21 / 369.55 ±6.42 / 377.71 ms │    370.38 / 410.53 ±26.90 / 446.02 ms │  1.11x slower │
│ QQuery 36 │     118.34 / 123.67 ±5.98 / 135.27 ms │     130.65 / 134.33 ±3.23 / 139.10 ms │  1.09x slower │
│ QQuery 37 │        54.18 / 58.00 ±3.07 / 62.47 ms │        56.02 / 57.89 ±2.41 / 62.62 ms │     no change │
│ QQuery 38 │        83.85 / 87.49 ±3.11 / 93.09 ms │        89.62 / 92.30 ±1.40 / 93.50 ms │  1.05x slower │
│ QQuery 39 │     218.27 / 227.09 ±6.06 / 236.18 ms │     224.58 / 238.14 ±7.79 / 248.78 ms │     no change │
│ QQuery 40 │        22.70 / 25.00 ±1.99 / 28.62 ms │        23.57 / 25.19 ±1.21 / 26.47 ms │     no change │
│ QQuery 41 │        20.52 / 21.00 ±0.39 / 21.59 ms │        20.42 / 21.80 ±1.14 / 23.58 ms │     no change │
│ QQuery 42 │        20.15 / 20.27 ±0.14 / 20.44 ms │        20.20 / 20.38 ±0.15 / 20.60 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 28886.41ms │
│ Total Time (optimize-regexp-replace-v2)   │ 24317.24ms │
│ Average Time (HEAD)                       │   671.78ms │
│ Average Time (optimize-regexp-replace-v2) │   565.52ms │
│ Queries Faster                            │          2 │
│ Queries Slower                            │          7 │
│ Queries with No Change                    │         34 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_1 — base (merge-base)

Metric	Value
Wall time	145.4s
Peak memory	37.5 GiB
Avg memory	31.7 GiB
CPU user	1348.6s
CPU sys	85.4s
Peak spill	0 B

clickbench_1 — branch

Metric	Value
Wall time	122.0s
Peak memory	34.5 GiB
Avg memory	26.6 GiB
CPU user	1126.3s
CPU sys	102.1s
Peak spill	0 B

File an issue against this benchmark runner

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Dandandan · 2026-04-09T08:26:54Z

│ QQuery 28 │ 8434.90 / 8470.17 ±49.07 / 8567.48 ms │ 3408.27 / 3427.43 ±13.05 / 3446.00 ms │ +2.47x faster

woohoo 🚀

## Which issue does this PR close? - related to #21379 ## Rationale for this change While reviewing #21379 I noticed there was minimal Utf8View coverage of the related code. ## What changes are included in this PR? Update the regexp_replace tests to cover utf8, largeutf8, utf8view and dictionary ## Are these changes tested? Yes only tests I verified these tests also pass when run on - #21379 ## Are there any user-facing changes? No

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 5, 2026

This comment has been minimized.

Sign in to view

Dandandan marked this pull request as ready for review April 5, 2026 09:07

This comment has been minimized.

Sign in to view

Dandandan added the performance Make DataFusion faster label Apr 5, 2026

Dandandan changed the title ~~Optimize regexp_replace by stripping trailing .* from anchored patterns~~ Optimize regexp_replace by stripping trailing .* from anchored patterns. 2.4x improvement Apr 5, 2026

Dandandan mentioned this pull request Apr 6, 2026

[EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard #18489

Open

15 tasks

Dandandan closed this Apr 6, 2026

Dandandan force-pushed the optimize-regexp-replace-v2 branch from 349496e to 5a86142 Compare April 6, 2026 17:17

This comment has been minimized.

Sign in to view

alamb mentioned this pull request Apr 8, 2026

Add more regexp_replace test coverage #21485

Merged

alamb mentioned this pull request Apr 8, 2026

Consolidate special case regexp_match logic #21486

Open

Dandandan and others added 2 commits April 9, 2026 09:32

Update datafusion/functions/src/regex/regexpreplace.rs

abd37d1

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Merge branch 'main' into optimize-regexp-replace-v2

2b0cedd

Dandandan added this pull request to the merge queue Apr 9, 2026

Merged via the queue into apache:main with commit cc9f869 Apr 9, 2026
31 checks passed

Dandandan deleted the optimize-regexp-replace-v2 branch April 9, 2026 09:08

Conversation

Dandandan commented Apr 5, 2026 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

alamb commented Apr 6, 2026

Uh oh!

alamb commented Apr 6, 2026

Uh oh!

adriangb commented Apr 6, 2026

Uh oh!

Dandandan commented Apr 6, 2026

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

adriangbot commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

alamb commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

adriangbot commented Apr 8, 2026

Uh oh!

Dandandan commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dandandan commented Apr 5, 2026 •

edited by alamb

Loading