feat: add VariantArrayBuilder::with_shredding#9963
Conversation
afac072 to
f52d0c9
Compare
|
Hi @Shekharrajak, is this purely an ergonomic change? Would be cool to actually build a shredded array in one pass without building it first. |
scovich
left a comment
There was a problem hiding this comment.
One thought: Instead of adding a new with_shredding method, what if we added a fallible shred (or build_shredded) method alongside the current infallible build, that takes the shredding schema as an arg and returns a VariantArray?
| VariantArray::try_new(&inner).expect("valid VariantArray by construction") | ||
| let unshredded = VariantArray::try_new(&inner).expect("valid VariantArray by construction"); | ||
| match shredding_schema { | ||
| Some(as_type) => shred_variant(&unshredded, &as_type).expect("shred_variant failed"), |
There was a problem hiding this comment.
Please no panics in prod code like this. Do we know what could go wrong? Does build need to return Result so we can surface the error properly?
(I expect that the schema could be invalid, for example, which is technically a user error but an easy one to make if you don't fully understand the intricacies of variant shredding)
Interesting! In theory it should be possible with the shredded variant builders that |
1079ca6 to
e571952
Compare
8377e40 to
52655f3
Compare
|
Updated as per review comments. |
scovich
left a comment
There was a problem hiding this comment.
The updated code looks nice and clean. My only question is: If we want to eventually directly build shredded variant, this API won't work (because the builder doesn't know the shredding schema until the build_shredded call, after all the values were already appended). The original approach was more forward looking in that way, but hid a panic behind an infallible signature.
Is it better for build to become fallible so we can do with_shredding_schema and have the option to someday shred directly as we append values? Or better to leave as-is, so build can remain infallible? @alamb @sdf-jkl @klion26 any thoughts?
Ref #8480
Rationale for this change
Adds a builder method that configures VariantArrayBuilder to produce a shredded VariantArray at build() time.
What changes are included in this PR?
Adds build_shredded(self, as_type: &DataType) -> Result<VariantArray, ArrowError> as a fallible alternative to build() that shreds the built array in
Are these changes tested?
unit tests
Are there any user-facing changes?
Yes, usage: