Commit fc514c2
authored
perf: Optimize set operations to avoid RowConverter deserialization overhead (#20623)
## Which issue does this PR close?
- Closes #20622.
## Rationale for this change
Several array set operations (e.g., `array_distinct`, `array_union`,
`array_intersect`, `array_except`) share a similar structure:
* Convert the input(s) using `RowConverter`, ideally in bulk
* Apply the set operation as appropriate, which involves adding or
removing elements from the candidate set of result `Rows`
* Convert the final set of `Rows` back into `ArrayRef`
We can do better for the final step: instead of converting from `Rows`
back into `ArrayRef`, we can just track which indices in the input(s)
correspond to the values we want to return. We can then grab those
values with a single `take`, which avoids the `Row` -> `ArrayRef`
deserialization overhead. This is a 5-20% performance win, depending on
the set operation and the characteristics of the input.
The only wrinkle is that for `intersect` and `union`, because there are
multiple inputs we need to concatenate the inputs together so that we
have a single index space. It turns out that this optimization is a win,
even incurring the `concat` overhead.
## What changes are included in this PR?
* Add a benchmark for `array_except`
* Implement this optimization for `array_distinct`, `array_union`,
`array_intersect`, `array_except`
## Are these changes tested?
Yes, and benchmarked.
## Are there any user-facing changes?
No.1 parent daa8f52 commit fc514c2
3 files changed
Lines changed: 106 additions & 37 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
38 | 39 | | |
39 | 40 | | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
98 | 100 | | |
99 | 101 | | |
100 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
101 | 122 | | |
102 | 123 | | |
103 | 124 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
23 | 26 | | |
| 27 | + | |
24 | 28 | | |
25 | 29 | | |
26 | 30 | | |
| |||
179 | 183 | | |
180 | 184 | | |
181 | 185 | | |
182 | | - | |
| 186 | + | |
183 | 187 | | |
184 | 188 | | |
185 | 189 | | |
| |||
193 | 197 | | |
194 | 198 | | |
195 | 199 | | |
196 | | - | |
| 200 | + | |
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
| |||
204 | 208 | | |
205 | 209 | | |
206 | 210 | | |
207 | | - | |
| 211 | + | |
208 | 212 | | |
209 | 213 | | |
210 | 214 | | |
211 | | - | |
| 215 | + | |
212 | 216 | | |
213 | 217 | | |
214 | 218 | | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
222 | 227 | | |
223 | | - | |
224 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
225 | 239 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
373 | 375 | | |
374 | 376 | | |
375 | 377 | | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
376 | 382 | | |
377 | 383 | | |
378 | | - | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
379 | 391 | | |
380 | 392 | | |
381 | | - | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
382 | 400 | | |
383 | 401 | | |
384 | 402 | | |
| |||
391 | 409 | | |
392 | 410 | | |
393 | 411 | | |
394 | | - | |
| 412 | + | |
| 413 | + | |
395 | 414 | | |
396 | 415 | | |
397 | 416 | | |
| |||
406 | 425 | | |
407 | 426 | | |
408 | 427 | | |
409 | | - | |
| 428 | + | |
410 | 429 | | |
411 | 430 | | |
412 | 431 | | |
| |||
430 | 449 | | |
431 | 450 | | |
432 | 451 | | |
433 | | - | |
| 452 | + | |
434 | 453 | | |
435 | 454 | | |
436 | 455 | | |
437 | 456 | | |
438 | 457 | | |
439 | | - | |
| 458 | + | |
440 | 459 | | |
441 | 460 | | |
442 | 461 | | |
443 | 462 | | |
444 | 463 | | |
445 | 464 | | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | | - | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
452 | 473 | | |
453 | 474 | | |
454 | 475 | | |
| |||
461 | 482 | | |
462 | 483 | | |
463 | 484 | | |
464 | | - | |
| 485 | + | |
465 | 486 | | |
466 | 487 | | |
467 | 488 | | |
468 | 489 | | |
469 | 490 | | |
470 | 491 | | |
471 | | - | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
472 | 495 | | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
473 | 500 | | |
474 | | - | |
475 | | - | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
476 | 504 | | |
477 | 505 | | |
478 | 506 | | |
| |||
539 | 567 | | |
540 | 568 | | |
541 | 569 | | |
542 | | - | |
| 570 | + | |
543 | 571 | | |
544 | 572 | | |
545 | 573 | | |
| |||
559 | 587 | | |
560 | 588 | | |
561 | 589 | | |
562 | | - | |
| 590 | + | |
563 | 591 | | |
564 | 592 | | |
565 | 593 | | |
566 | 594 | | |
567 | 595 | | |
568 | | - | |
569 | | - | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
570 | 599 | | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
571 | 604 | | |
572 | | - | |
573 | | - | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
574 | 608 | | |
575 | 609 | | |
576 | 610 | | |
| |||
0 commit comments