Skip to content

feat: add VariantArrayBuilder::with_shredding#9963

Open
Shekharrajak wants to merge 1 commit into
apache:mainfrom
Shekharrajak:feat/variant-array-builder-shredding
Open

feat: add VariantArrayBuilder::with_shredding#9963
Shekharrajak wants to merge 1 commit into
apache:mainfrom
Shekharrajak:feat/variant-array-builder-shredding

Conversation

@Shekharrajak
Copy link
Copy Markdown

@Shekharrajak Shekharrajak commented May 12, 2026

Ref #8480

Rationale for this change

Adds a builder method that configures VariantArrayBuilder to produce a shredded VariantArray at build() time.

What changes are included in this PR?

Adds build_shredded(self, as_type: &DataType) -> Result<VariantArray, ArrowError> as a fallible alternative to build() that shreds the built array in

Are these changes tested?

unit tests

Are there any user-facing changes?

Yes, usage:

let arr = VariantArrayBuilder::new(3)
    .with_shredding(DataType::Int64)
    .tap(|b| { b.append_variant(Variant::Int64(42)); })
    .build(); // returns shredded VariantArray

@github-actions github-actions Bot added the parquet-variant parquet-variant* crates label May 12, 2026
@Shekharrajak Shekharrajak force-pushed the feat/variant-array-builder-shredding branch from afac072 to f52d0c9 Compare May 12, 2026 05:01
@Shekharrajak
Copy link
Copy Markdown
Author

@alamb @scovich , please have a look.

@sdf-jkl
Copy link
Copy Markdown
Contributor

sdf-jkl commented May 14, 2026

Hi @Shekharrajak, is this purely an ergonomic change? Would be cool to actually build a shredded array in one pass without building it first.

Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought: Instead of adding a new with_shredding method, what if we added a fallible shred (or build_shredded) method alongside the current infallible build, that takes the shredding schema as an arg and returns a VariantArray?

VariantArray::try_new(&inner).expect("valid VariantArray by construction")
let unshredded = VariantArray::try_new(&inner).expect("valid VariantArray by construction");
match shredding_schema {
Some(as_type) => shred_variant(&unshredded, &as_type).expect("shred_variant failed"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no panics in prod code like this. Do we know what could go wrong? Does build need to return Result so we can surface the error properly?

(I expect that the schema could be invalid, for example, which is technically a user error but an easy one to make if you don't fully understand the intricacies of variant shredding)

@scovich
Copy link
Copy Markdown
Contributor

scovich commented May 15, 2026

Would be cool to actually build a shredded array in one pass without building it first

Interesting! In theory it should be possible with the shredded variant builders that shred_variant uses, but it would certainly take more than the current 10 LoC.

@Shekharrajak Shekharrajak force-pushed the feat/variant-array-builder-shredding branch 3 times, most recently from 1079ca6 to e571952 Compare May 15, 2026 17:05
@Shekharrajak Shekharrajak force-pushed the feat/variant-array-builder-shredding branch from 8377e40 to 52655f3 Compare May 15, 2026 17:18
@Shekharrajak
Copy link
Copy Markdown
Author

Updated as per review comments.

Copy link
Copy Markdown
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated code looks nice and clean. My only question is: If we want to eventually directly build shredded variant, this API won't work (because the builder doesn't know the shredding schema until the build_shredded call, after all the values were already appended). The original approach was more forward looking in that way, but hid a panic behind an infallible signature.

Is it better for build to become fallible so we can do with_shredding_schema and have the option to someday shred directly as we append values? Or better to leave as-is, so build can remain infallible? @alamb @sdf-jkl @klion26 any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants