Commit b6b542e
perf: Optimize
## Which issue does this PR close?
- Closes #20769.
## Rationale for this change
`array_positions` previously compared the needle against each row's
sub-array individually. When the needle is a scalar (the common case),
we can do a single bulk `arrow_ord::cmp::not_distinct` comparison
against the entire flat values buffer and then walk the result bitmap,
which is significantly faster: the speedup on the `array_positions()`
microbenchmarks ranges from 5x to 40x, depending on the size of the
array.
The same pattern has already been applied to `array_position` (#20532),
and previously to other array UDFs.
## What changes are included in this PR?
- Add benchmarks for `array_positions`.
- Implement bulk-comparison optimization
- Refactor `array_position`'s existing fast path slightly for
consistency
- Code cleanup to use "haystack" and "needle" consistently, not vague
terms like "list_array" and "element"
- Add unit tests for `array_positions` with sliced ListArrays, for peace
of mind
- Add unit tests for sliced lists and sliced lists with nulls for the
new `array_positions` fast path.
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
## AI usage
Multiple AI tools were used to iterate on this PR. I have reviewed and
understand the resulting code.
---------
Co-authored-by: Oleks V <comphead@users.noreply.github.com>array_positions() for scalar needle (#20770)1 parent a6a4df9 commit b6b542e
File tree
4 files changed
+460
-130
lines changed- datafusion
- functions-nested
- benches
- src
- sqllogictest/test_files
- docs/source/user-guide/sql
4 files changed
+460
-130
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
149 | 256 | | |
150 | 257 | | |
151 | 258 | | |
| |||
0 commit comments