Commit cf57f3e
authored
## Which issue does this PR close?
- Closes #21997 (potentially).
## Rationale for this change
This PR adds two new APIs to `GenericStringArrayBuilder` and
`StringViewArrayBuilder`:
1. `append_with` appends a row whose bytes are produced by invoking a
closure that is passed a `StringWriter`
2. `append_byte_map` appends a row whose bytes are produced by mapping
each byte of the input with a byte-to-byte map closure.
For `StringViewArrayBuilder`, `StringWriter` is an append-only string
writer that switches between writing to a new inline view (for short
strings) or to the in-progress data block automatically. For
`GenericStringArrayBuilder`, `StringWriter` just appends to the value
buffer directly.
(We need two new APIs because `append_byte_map` vectorizes a lot better
than `append_with`, so callers that fit the byte-to-byte map pattern
should prefer it.)
Both of these new APIs allow string UDFs to avoid creating an
intermediate data copy in many cases. To illustrate this, this PR adopts
the new APIs in `replace`.
Benchmarks (Arm64):
Group 1: ASCII single-byte fast path (StringArray)
- size=1024 str_len=32 nulls=0.0 : 16.27 µs -> 12.83 µs (−21.1%)
- size=1024 str_len=32 nulls=0.2 : 14.23 µs -> 12.10 µs (−15.0%)
- size=1024 str_len=128 nulls=0.0 : 11.28 µs -> 8.21 µs (−27.3%)
- size=1024 str_len=128 nulls=0.2 : 10.37 µs -> 7.79 µs (−24.9%)
- size=4096 str_len=32 nulls=0.0 : 62.48 µs -> 49.50 µs (−20.8%)
- size=4096 str_len=32 nulls=0.2 : 55.74 µs -> 46.66 µs (−16.3%)
- size=4096 str_len=128 nulls=0.0 : 42.26 µs -> 29.06 µs (−31.2%)
- size=4096 str_len=128 nulls=0.2 : 39.17 µs -> 28.52 µs (−27.2%)
Group 2: Multi-byte StringArray — general writer path
- size=1024 str_len=32 nulls=0.0 : 23.58 µs -> 21.75 µs (−7.8%)
- size=1024 str_len=32 nulls=0.2 : 18.92 µs -> 17.41 µs (−8.0%)
- size=1024 str_len=128 nulls=0.0 : 37.56 µs -> 35.33 µs (−5.9%)
- size=1024 str_len=128 nulls=0.2 : 29.62 µs -> 28.71 µs (−3.1%)
- size=4096 str_len=32 nulls=0.0 : 97.15 µs -> 88.92 µs (−8.5%)
- size=4096 str_len=32 nulls=0.2 : 77.03 µs -> 71.43 µs (−7.3%)
- size=4096 str_len=128 nulls=0.0 : 173.66 µs -> 163.68 µs (−5.7%)
- size=4096 str_len=128 nulls=0.2 : 134.98 µs -> 128.56 µs (−4.8%)
Group 3: Multi-byte StringViewArray — general writer path
- size=1024 str_len=32 nulls=0.0 : 24.46 µs -> 22.18 µs (−9.3%)
- size=1024 str_len=32 nulls=0.2 : 20.04 µs -> 17.71 µs (−11.7%)
- size=1024 str_len=128 nulls=0.0 : 36.43 µs -> 35.79 µs (−1.8%)
- size=1024 str_len=128 nulls=0.2 : 29.73 µs -> 28.70 µs (−3.5%)
- size=4096 str_len=32 nulls=0.0 : 99.07 µs -> 89.68 µs (−9.5%)
- size=4096 str_len=32 nulls=0.2 : 84.38 µs -> 72.46 µs (−14.1%)
- size=4096 str_len=128 nulls=0.0 : 169.27 µs -> 164.80 µs (−2.6%, n.s.)
- size=4096 str_len=128 nulls=0.2 : 133.79 µs -> 130.20 µs (−2.7%, n.s.)
Group 4: Empty-from StringArray
- size=1024 str_len=32 : 87.75 µs -> 50.64 µs (−42.3%)
- size=1024 str_len=128 : 313.00 µs -> 187.77 µs (−40.0%)
Group 5: Empty-from StringViewArray
- size=1024 str_len=32 : 87.01 µs -> 50.10 µs (−42.4%)
- size=1024 str_len=128 : 313.99 µs -> 190.17 µs (−39.4%)
## What changes are included in this PR?
* Add `append_byte_map` and `append_with` to both of the bulk-NULL
string builders
* Add unit tests
* Adopt the new APIs in `replace`
## Are these changes tested?
Yes; new tests added.
## Are there any user-facing changes?
No.
1 parent 1af9bd7 commit cf57f3e
3 files changed
Lines changed: 791 additions & 242 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
165 | 195 | | |
166 | 196 | | |
167 | 197 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
164 | 166 | | |
165 | 167 | | |
166 | 168 | | |
167 | | - | |
168 | 169 | | |
169 | 170 | | |
170 | 171 | | |
| |||
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
187 | | - | |
188 | | - | |
189 | | - | |
| 188 | + | |
190 | 189 | | |
191 | 190 | | |
192 | 191 | | |
193 | 192 | | |
194 | 193 | | |
195 | 194 | | |
196 | 195 | | |
197 | | - | |
198 | | - | |
199 | | - | |
| 196 | + | |
200 | 197 | | |
201 | 198 | | |
202 | 199 | | |
| |||
212 | 209 | | |
213 | 210 | | |
214 | 211 | | |
215 | | - | |
216 | 212 | | |
217 | 213 | | |
218 | 214 | | |
| |||
232 | 228 | | |
233 | 229 | | |
234 | 230 | | |
235 | | - | |
236 | | - | |
237 | | - | |
| 231 | + | |
238 | 232 | | |
239 | 233 | | |
240 | 234 | | |
241 | 235 | | |
242 | 236 | | |
243 | 237 | | |
244 | 238 | | |
245 | | - | |
246 | | - | |
247 | | - | |
| 239 | + | |
248 | 240 | | |
249 | 241 | | |
250 | 242 | | |
251 | 243 | | |
252 | 244 | | |
253 | 245 | | |
254 | | - | |
255 | 246 | | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
271 | 258 | | |
272 | 259 | | |
273 | 260 | | |
274 | | - | |
| 261 | + | |
275 | 262 | | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
282 | 266 | | |
283 | 267 | | |
284 | 268 | | |
285 | 269 | | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
286 | 287 | | |
287 | 288 | | |
288 | | - | |
289 | | - | |
| 289 | + | |
| 290 | + | |
290 | 291 | | |
291 | 292 | | |
292 | | - | |
| 293 | + | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
296 | 297 | | |
297 | 298 | | |
298 | 299 | | |
299 | | - | |
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
| |||
0 commit comments