Commit 961c5fc
perf: Optimize NULL handling in
## Which issue does this PR close?
- Closes #21537.
## Rationale for this change
`StringViewArrayBuilder` is implemented on top of Arrow's
`StringViewBuilder`; the latter tracks NULLs incrementally. However, the
`StringViewArrayBuilder` requires callers to pass a NULL buffer to
`finish()` anyway, so the NULL bitmap that has been computed by
`StringViewBuilder` is discarded. It would be more efficient to stop
using `StringViewBuilder` so that we don't do this redundant work; in
theory there might be room for inconsistency between the two NULL
bitmaps as well.
Right now, `StringViewArrayBuilder` is only used by the `concat` and
`concat_ws` UDFs, but I'd like to generalize the API and use it more
broadly in place of `StringViewBuilder` (#21539). For the time being,
here are the results of this PR on the `concat` benchmarks (Arm64):
```
- 1024 rows: 29.6 µs → 28.0 µs, -5.3%
- 4096 rows: 134.3 µs → 125.6 µs, -6.5%
- 8192 rows: 289.7 µs → 273.5 µs, -5.6%
```
## What changes are included in this PR?
* Stop using `StringViewBuilder` and build the views ourselves
* Improve some comments
* Return an error instead of panicking on large input strings
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
---------
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>StringViewArrayBuilder (#21538)1 parent 776b723 commit 961c5fc
3 files changed
Lines changed: 72 additions & 36 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
| 250 | + | |
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| |||
271 | 271 | | |
272 | 272 | | |
273 | 273 | | |
274 | | - | |
| 274 | + | |
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
| 338 | + | |
339 | 339 | | |
340 | 340 | | |
341 | 341 | | |
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | | - | |
| 351 | + | |
352 | 352 | | |
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | 358 | | |
359 | | - | |
| 359 | + | |
360 | 360 | | |
361 | 361 | | |
362 | 362 | | |
| |||
369 | 369 | | |
370 | 370 | | |
371 | 371 | | |
372 | | - | |
| 372 | + | |
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
109 | | - | |
| 109 | + | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | | - | |
| 114 | + | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| |||
150 | 151 | | |
151 | 152 | | |
152 | 153 | | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
153 | 160 | | |
154 | | - | |
| 161 | + | |
| 162 | + | |
155 | 163 | | |
156 | 164 | | |
157 | 165 | | |
158 | 166 | | |
159 | 167 | | |
160 | 168 | | |
161 | | - | |
162 | | - | |
| 169 | + | |
163 | 170 | | |
164 | | - | |
| 171 | + | |
| 172 | + | |
165 | 173 | | |
166 | 174 | | |
167 | 175 | | |
| |||
214 | 222 | | |
215 | 223 | | |
216 | 224 | | |
| 225 | + | |
| 226 | + | |
217 | 227 | | |
218 | | - | |
| 228 | + | |
219 | 229 | | |
220 | | - | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
221 | 242 | | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
226 | 246 | | |
| 247 | + | |
227 | 248 | | |
228 | 249 | | |
229 | 250 | | |
| |||
233 | 254 | | |
234 | 255 | | |
235 | 256 | | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
250 | 267 | | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
251 | 286 | | |
252 | 287 | | |
253 | 288 | | |
| |||
328 | 363 | | |
329 | 364 | | |
330 | 365 | | |
331 | | - | |
| 366 | + | |
332 | 367 | | |
333 | 368 | | |
334 | 369 | | |
335 | 370 | | |
336 | | - | |
| 371 | + | |
337 | 372 | | |
| 373 | + | |
338 | 374 | | |
339 | 375 | | |
340 | 376 | | |
| |||
0 commit comments