Skip to content

Commit b357381

Browse files
authored
chore: Replace stray old-style string builder in substr (#22183)
## Which issue does this PR close? - N/A ## Rationale for this change In #21519, we optimized `substr` to use the new bulk-NULL string builders. That PR neglected to convert over one corner-case code path (for handling `Utf8` / `LargeUtf8` values larger than 2GB). For consistency (and a small perf win), this PR converts that case as well. ## What changes are included in this PR? * Swap Arrow's `StringViewBuilder` for the new `StringViewArrayBuilder` in `generic_string_substr_copy` ## Are these changes tested? Yes, covered by existing tests. ## Are there any user-facing changes? No.
1 parent 7738f74 commit b357381

1 file changed

Lines changed: 7 additions & 6 deletions

File tree

datafusion/functions/src/unicode/substr.rs

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@
1717

1818
use std::sync::Arc;
1919

20-
use crate::strings::append_view;
20+
use crate::strings::{StringViewArrayBuilder, append_view};
2121
use crate::utils::make_scalar_function;
2222
use arrow::array::{
2323
Array, ArrayRef, AsArray, GenericStringArray, Int64Array, OffsetSizeTrait,
24-
StringArrayType, StringViewArray, StringViewBuilder, make_view,
24+
StringArrayType, StringViewArray, make_view,
2525
};
2626
use arrow::buffer::{NullBuffer, ScalarBuffer};
2727
use arrow::datatypes::DataType;
@@ -426,11 +426,12 @@ fn generic_string_substr_copy<T: OffsetSizeTrait>(
426426
count_array_opt.and_then(|a| a.nulls()),
427427
]);
428428

429-
let mut result_builder = StringViewBuilder::new();
429+
let len = string_array.len();
430+
let mut result_builder = StringViewArrayBuilder::with_capacity(len);
430431

431-
for i in 0..string_array.len() {
432+
for i in 0..len {
432433
if nulls.as_ref().is_some_and(|n| n.is_null(i)) {
433-
result_builder.append_null();
434+
result_builder.append_placeholder();
434435
continue;
435436
}
436437

@@ -442,7 +443,7 @@ fn generic_string_substr_copy<T: OffsetSizeTrait>(
442443
result_builder.append_value(&string[byte_start..byte_end]);
443444
}
444445

445-
Ok(Arc::new(result_builder.finish()) as ArrayRef)
446+
Ok(Arc::new(result_builder.finish(nulls)?) as ArrayRef)
446447
}
447448

448449
#[cfg(test)]

0 commit comments

Comments
 (0)