Combine overlapping runs in REE (take kernel) by Rich-T-kid · Pull Request #9865 · apache/arrow-rs

Rich-T-kid · 2026-04-30T18:48:05Z

take and interleave on RunEndEncoded arrays now merge adjacent runs with equal values

Which issue does this PR close?

Rationale for this change

for a logical representation of an ree

let logical_repr = [1, 1, 0, 0, 1, 1];
let ree_take_results = take(logical_repr,[0,1,4,5]);
this produces 
run_ends = [2,4]
values = [1,1]

when the result should be
runs: [4], values: [1]
both answers are correct but ree's should be as compact as possible.

What changes are included in this PR?

I updated the loop in take_run() to check the prev_value before created a new run. if the values are the same continue building the run.
added a compaction function in interleave.rs to join runs that share the same value.

Are these changes tested?

Yes, ive updated past test to be aligned with this behavior as well as added three of my own test.

Are there any user-facing changes?

no

…ir of merging runs

Rich-T-kid · 2026-05-05T00:34:41Z

@alamb could you take a look at this, thx

Jefffrey · 2026-05-05T04:09:55Z

run benchmark take_kernels

adriangbot · 2026-05-05T04:13:29Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4376471020-2022-cdbfs 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-take-REE (780bbe0) to c194e54 (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

Jefffrey

take changes make sense to me. For interleave, we have a PR which aims to add a specialized path for run arrays which I hope to review soon:

#9856

So it might turn out this compact_runs fix might not be needed if we want a specialized path 🤔

adriangbot · 2026-05-05T04:23:54Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                     main                                   rich-T-kid_optimize-take-REE
-----                                                                     ----                                   ----------------------------
take bool 1024                                                            1.01   1037.3±1.96ns        ? ?/sec    1.00   1031.7±0.66ns        ? ?/sec
take bool 512                                                             1.00    573.5±0.40ns        ? ?/sec    1.00    573.5±0.70ns        ? ?/sec
take bool null indices 1024                                               1.07   903.2±41.36ns        ? ?/sec    1.00    847.0±6.30ns        ? ?/sec
take bool null values 1024                                                1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.00  1616.0±15.56ns        ? ?/sec    1.00  1614.2±13.28ns        ? ?/sec
take check bounds i32 1024                                                1.01    666.0±0.99ns        ? ?/sec    1.00    657.8±0.92ns        ? ?/sec
take check bounds i32 512                                                 1.00    388.9±0.54ns        ? ?/sec    1.17    455.3±1.18ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.01      2.6±0.14µs        ? ?/sec    1.00      2.6±0.14µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.7±0.14µs        ? ?/sec    1.00      3.7±0.14µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.06    727.4±2.62ns        ? ?/sec    1.00    688.8±3.09ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.02   1784.6±1.23ns        ? ?/sec    1.00   1748.4±1.82ns        ? ?/sec
take i32 1024                                                             1.03    525.6±0.90ns        ? ?/sec    1.00    511.6±0.71ns        ? ?/sec
take i32 512                                                              1.00    351.6±1.14ns        ? ?/sec    1.00    351.5±0.99ns        ? ?/sec
take i32 null indices 1024                                                1.01    861.4±1.41ns        ? ?/sec    1.00    854.7±1.47ns        ? ?/sec
take i32 null values 1024                                                 1.01   1545.6±2.22ns        ? ?/sec    1.00   1532.1±1.55ns        ? ?/sec
take i32 null values null indices 1024                                    1.01   1728.7±1.29ns        ? ?/sec    1.00   1708.2±1.64ns        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.08µs        ? ?/sec    6.29    107.0±1.19µs        ? ?/sec
take str 1024                                                             1.00      8.6±0.03µs        ? ?/sec    1.01      8.7±0.03µs        ? ?/sec
take str 512                                                              1.00      4.0±0.02µs        ? ?/sec    1.03      4.1±0.02µs        ? ?/sec
take str null indices 1024                                                1.00      6.0±0.07µs        ? ?/sec    1.00      6.0±0.01µs        ? ?/sec
take str null indices 512                                                 1.00      2.8±0.01µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
take str null values 1024                                                 1.00      6.5±0.02µs        ? ?/sec    1.01      6.5±0.01µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.3±0.01µs        ? ?/sec    1.01      5.3±0.01µs        ? ?/sec
take stringview 1024                                                      1.09    803.0±1.07ns        ? ?/sec    1.00    736.7±5.52ns        ? ?/sec
take stringview 512                                                       1.16    541.7±0.71ns        ? ?/sec    1.00    467.8±1.05ns        ? ?/sec
take stringview null indices 1024                                         1.00    912.1±0.86ns        ? ?/sec    1.09    994.2±1.30ns        ? ?/sec
take stringview null indices 512                                          1.01    563.3±1.19ns        ? ?/sec    1.00    560.0±0.96ns        ? ?/sec
take stringview null values 1024                                          1.00   1720.4±1.39ns        ? ?/sec    1.00   1721.3±2.16ns        ? ?/sec
take stringview null values null indices 1024                             1.00   1731.5±4.14ns        ? ?/sec    1.02   1760.3±2.28ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	275.1s
Peak memory	3.0 GiB
Avg memory	3.0 GiB
CPU user	273.7s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	275.1s
Peak memory	3.0 GiB
Avg memory	3.0 GiB
CPU user	274.9s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

Rich-T-kid · 2026-05-05T14:09:08Z

@Jefffrey Should I split this PR into two? I can push up just the take optimization. ill try and take a look at #9856 as well

Rich-T-kid · 2026-05-05T17:07:29Z

I removed the interleave portion of the PR, should be good now 👍🏿 thx

Jefffrey · 2026-05-06T02:11:47Z

run benchmark take_kernels

Jefffrey · 2026-05-06T02:12:38Z

I removed the interleave portion of the PR, should be good now 👍🏿 thx

Thanks for this.

My only concern now is that it seems to have a noticeable impact on the microbenchmark we have 🤔

Not sure if theres a way around the slicing approach used

adriangbot · 2026-05-06T02:15:54Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4384592991-2035-txmqt 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing rich-T-kid/optimize-take-REE (58a62c0) to c194e54 (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-06T02:26:11Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                     main                                   rich-T-kid_optimize-take-REE
-----                                                                     ----                                   ----------------------------
take bool 1024                                                            1.00   1037.5±1.37ns        ? ?/sec    1.00   1037.1±1.61ns        ? ?/sec
take bool 512                                                             1.01    579.1±0.49ns        ? ?/sec    1.00    572.6±2.00ns        ? ?/sec
take bool null indices 1024                                               1.00   854.0±35.25ns        ? ?/sec    1.01   866.3±14.54ns        ? ?/sec
take bool null values 1024                                                1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.02   1637.2±0.99ns        ? ?/sec    1.00  1607.0±16.50ns        ? ?/sec
take check bounds i32 1024                                                1.02    666.2±1.43ns        ? ?/sec    1.00    654.6±1.60ns        ? ?/sec
take check bounds i32 512                                                 1.00    389.4±0.65ns        ? ?/sec    1.17    455.5±1.19ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.00      2.7±0.16µs        ? ?/sec    1.00      2.7±0.16µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.7±0.16µs        ? ?/sec    1.00      3.7±0.16µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.03    728.6±1.50ns        ? ?/sec    1.00   704.7±12.51ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.02   1785.0±2.13ns        ? ?/sec    1.00   1745.0±9.50ns        ? ?/sec
take i32 1024                                                             1.02    525.7±1.14ns        ? ?/sec    1.00    514.4±1.40ns        ? ?/sec
take i32 512                                                              1.00    353.0±1.93ns        ? ?/sec    1.00    352.0±0.91ns        ? ?/sec
take i32 null indices 1024                                                1.00    855.9±0.93ns        ? ?/sec    1.01    866.3±5.11ns        ? ?/sec
take i32 null values 1024                                                 1.01   1547.4±3.23ns        ? ?/sec    1.00   1535.4±3.55ns        ? ?/sec
take i32 null values null indices 1024                                    1.02   1745.7±2.56ns        ? ?/sec    1.00   1704.4±2.30ns        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.08µs        ? ?/sec    6.31    107.6±1.05µs        ? ?/sec
take str 1024                                                             1.00      8.6±0.04µs        ? ?/sec    1.00      8.6±0.04µs        ? ?/sec
take str 512                                                              1.01      4.1±0.02µs        ? ?/sec    1.00      4.0±0.02µs        ? ?/sec
take str null indices 1024                                                1.00      6.0±0.02µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
take str null indices 512                                                 1.00      2.8±0.01µs        ? ?/sec    1.06      3.0±0.01µs        ? ?/sec
take str null values 1024                                                 1.00      6.5±0.01µs        ? ?/sec    1.00      6.5±0.02µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.3±0.01µs        ? ?/sec    1.01      5.3±0.01µs        ? ?/sec
take stringview 1024                                                      1.01    742.9±1.05ns        ? ?/sec    1.00    735.2±0.91ns        ? ?/sec
take stringview 512                                                       1.14    538.9±0.86ns        ? ?/sec    1.00    471.8±1.27ns        ? ?/sec
take stringview null indices 1024                                         1.07    980.9±1.86ns        ? ?/sec    1.00    914.4±0.91ns        ? ?/sec
take stringview null indices 512                                          1.00    564.1±1.11ns        ? ?/sec    1.00    564.3±1.67ns        ? ?/sec
take stringview null values 1024                                          1.00   1723.1±1.52ns        ? ?/sec    1.04   1793.1±2.11ns        ? ?/sec
take stringview null values null indices 1024                             1.01   1763.1±2.39ns        ? ?/sec    1.00   1749.4±3.82ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	275.1s
Peak memory	3.0 GiB
Avg memory	3.0 GiB
CPU user	273.9s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	275.1s
Peak memory	3.0 GiB
Avg memory	3.0 GiB
CPU user	272.3s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

Rich-T-kid · 2026-05-06T13:53:23Z

@Jefffrey I think since we are adding in an extra comparison in a hot loop it make sense that performance would go down. One idea I do have is to pre-compute these booleans all at once and then refer to them in the hot loop instead of doing a comparison. is there any other way to compare dyn Arrays?

Rich-T-kid · 2026-05-06T15:52:08Z

I tried to move the comparisons of physical_indices to their own tight loop but the .slice(_,1) is just to slow to use in take_run() I think this require making a function that handles different data types

Rich-T-kid · 2026-05-06T16:55:43Z

This would really just be replicating behavior already provided by make_comparator(). I went over a couple other approaches but I think the best direction is to factor our make_comparator()

Rich-T-kid · 2026-05-06T17:02:56Z

original - this causes the issues that #7710 aims to resolve.

First approach - using values.to_data().slice(i,1) in the hot loop

Super slow
Second approach - factor out the values to .slice(i,1) to a separate function where its computes the comparissions ahead of time in a boolean array.
this was slightly better than the first approach but still much slower than the initial

Final approach - moved around some code in ord.rs so that arrow-select & arrow-ord dont cause a depency chain. This uses make_comparator(values,values,default) to quickly compare positions in the array.

This approach is still slower than the initial approach but this is expected. comparing values takes time.
cc @Jefffrey
cc @asubiotto I think you also refered to a similar approach for interleave. assuming this PR gets merged it would make make_compactor visible to interleave as well making the dedupe in interleave a lot simpler.

…ke_run()

Rich-T-kid · 2026-05-08T03:21:03Z

@Jefffrey could you take another look? thank you!

asubiotto

Nice, yes this make_comparator approach is what I had in mind. I think this regression is reasonable

asubiotto · 2026-05-08T15:15:23Z

+
+        assert_eq!(result.len(), 11);
+        assert_eq!(result.run_ends().values(), &[5, 8, 11]);
+        assert_eq!(result.values().as_string::<i32>().value(0), "bob");


I think you might be able to keep the same coverage but have fewer assertions if you just compare the values slice by content. On a related note, could you use rstest to write these three new tests as three test cases of the same test? Not sure if arrow uses that crate

yea that make sense, Ill update the PR

Jefffrey · 2026-05-11T06:22:45Z

I'll try take a look at this soon

Jefffrey · 2026-05-14T12:49:49Z

While the benchmark results look good (compared to the slicing of arraydata approach), I'm not certain on this movement of comparator code into arrow-array 🤔

I wonder if we have more justification for this, like if it might unblock other issues we are facing in how comparison logic is in arrow-ord, rather than for REE alone here. @alamb thoughts on expanding arrow-array like this?

alamb · 2026-05-14T18:52:51Z

Yeah, in general I don't think we should be moving large amounts of code (and the comparator code gets used a lot) into the arrow-array module as that will have the effect of increasing binary sizes / compile times for downstream users who are are not compiling the entire arrow crate and not using REE

How about this:

Pull out the kernels needed for REE into their own crates (arrow-ord-basic or arrow-ord-ree)
Add a optional depenedency on those crates to anything that only needs if for ree (behind an ree feature flag perhaps)
?

Rich-T-kid · 2026-05-14T20:14:10Z

I think thats reasonable

…crate gated by run_end_encoded feature Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Rich-T-kid · 2026-05-15T04:56:08Z

im honestly not 100% sure if this was done correctly. It seems to be correct. could you give it another look? @Jefffrey

asubiotto · 2026-05-15T06:11:54Z

As an extra data point, I've had a need for comparator code to live somewhere else so that it can be used e.g. for dictionary interning. This is related to what I posted in the discord last week: https://discord.com/channels/885562378132000778/1314936346653102080/1501935260987031572

Rich-T-kid · 2026-05-18T02:03:00Z

ill wait to resolve the git conflicts until we agree that this is a good approach to take

Rich-T-kid · 2026-05-29T14:10:25Z

@Jefffrey just wanted to bump this PR

Jefffrey · 2026-06-14T00:13:08Z

(I do have this PR on my todolist to review 😅)

Jefffrey

i think moving the cmp code to a separate crate like this is a good idea, especially as @asubiotto highlighted they have a use case for it and i think it could even help out for

#8716
cc @Weijun-H

if this approach looks good to @alamb as well we'll need to update the release instructions (can do as a followup so long as its before next release). i dont think we need to wait for a major release but i could be wrong 🤔

Jefffrey · 2026-07-01T07:18:59Z

+//!
+//! The only public surface is [`make_comparator`] (with [`DynComparator`] as the
+//! returned function type). `arrow-ord` re-exports both from here, so its
+//! public API is unchanged.


Suggested change

//!

//! The only public surface is [`make_comparator`] (with [`DynComparator`] as the

//! returned function type). `arrow-ord` re-exports both from here, so its

//! public API is unchanged.

Probably can omit this part

Jefffrey · 2026-07-01T07:20:56Z

+#[cfg(test)]
+use arrow_schema::{ArrowError, SortOptions};
+#[cfg(test)]
+use std::cmp::Ordering;


we should probably move all the test code as well, leaving this file to just re-export make_comparator to maintain the API

Jefffrey · 2026-07-01T07:21:22Z

 arrow-array = { workspace = true }
 arrow-buffer = { workspace = true }
 arrow-data = { workspace = true }
+arrow-ord-basic = { workspace = true }


i wonder if we should name the new crate something like arrow-cmp?

Jefffrey · 2026-07-01T07:22:28Z

+# Enables the run-end-encoded `take` fast path that merges adjacent
+# physical indices whose underlying values compare equal. Pulls in
+# `arrow-ord-basic` for the slot-wise comparator.
+run_end_encoded = ["dep:arrow-ord-basic"]


i know @alamb suggested this feature, but i wonder if its entirely necessary? the new arrow-ord-basic crate is minimal enough we could probably just get away with always including it?

As long as it isn't a lot of code, I think including it always is fine

In this case I think it will be unlikely that anyone will use these features / disable them, so just keeping the code simpler sounds like a good idea

reproduced take issue & updated take_run() to provide expected behavo…

662d0ca

…ir of merging runs

github-actions Bot added the arrow Changes to the arrow crate label Apr 30, 2026

compact REE interleave calls

780bbe0

Rich-T-kid marked this pull request as ready for review May 1, 2026 18:06

Rich-T-kid changed the title ~~reproduced take issue & updated take_run()~~ Combine overlapping runs in REE May 1, 2026

Jefffrey reviewed May 5, 2026

View reviewed changes

Jefffrey mentioned this pull request May 5, 2026

perf[arrow-select]: add specialized REE interleave #9856

Merged

Rich-T-kid mentioned this pull request May 5, 2026

Revised pr 9856 #9919

Closed

removed interleave optimization from PR since apache#9919 resolves it

58a62c0

Jefffrey changed the title ~~Combine overlapping runs in REE~~ Combine overlapping runs in REE (take kernel) May 6, 2026

refactored code base a bit and updated approach to comparissins to ta…

6729927

…ke_run()

Rich-T-kid requested a review from Jefffrey May 7, 2026 03:44

asubiotto reviewed May 8, 2026

View reviewed changes

simplify test

c8b1027

Rich-T-kid mentioned this pull request May 9, 2026

New crates for take and partition kernels (for use by REE) #9737

Open

alamb mentioned this pull request May 14, 2026

Potential Optimization for interleave/take on RunEndEncoded arrays #7710

Open

move REE take comparator out of arrow-array into new arrow-ord-basic …

c99e333

…crate gated by run_end_encoded feature Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jefffrey mentioned this pull request Jun 14, 2026

refactor: remove arrow-ord dependency in arrow-cast #8716

Open

Jefffrey reviewed Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

Rich-T-kid commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

take and interleave on RunEndEncoded arrays now merge adjacent runs with equal values

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Rich-T-kid commented May 5, 2026

Uh oh!

Jefffrey commented May 5, 2026

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

adriangbot commented May 5, 2026

Uh oh!

Rich-T-kid commented May 5, 2026

Uh oh!

Rich-T-kid commented May 5, 2026

Uh oh!

Jefffrey commented May 6, 2026

Uh oh!

Jefffrey commented May 6, 2026

Uh oh!

adriangbot commented May 6, 2026

Uh oh!

adriangbot commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 8, 2026

Uh oh!

asubiotto left a comment

Choose a reason for hiding this comment

Uh oh!

asubiotto May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey commented May 11, 2026

Uh oh!

Jefffrey commented May 14, 2026

Uh oh!

alamb commented May 14, 2026

Uh oh!

Rich-T-kid commented May 14, 2026

Uh oh!

Rich-T-kid commented May 15, 2026

Uh oh!

asubiotto commented May 15, 2026

Uh oh!

Rich-T-kid commented May 18, 2026

Uh oh!

Rich-T-kid commented May 29, 2026

Uh oh!

Jefffrey commented Jun 14, 2026

Uh oh!

Jefffrey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jul 1, 2026

Choose a reason for hiding this comment

Rich-T-kid commented Apr 30, 2026 •

edited

Loading

Jefffrey left a comment •

edited

Loading