Skip to content

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3860#3872

Closed
mbutrovich wants to merge 3 commits intoapache:mainfrom
mbutrovich:field_id_test
Closed

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3860#3872
mbutrovich wants to merge 3 commits intoapache:mainfrom
mbutrovich:field_id_test

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3860.

Rationale for this change

Test apache/iceberg-rust#2307 with the repro test in #3860.

What changes are included in this PR?

How are these changes tested?

  • New test.

@mbutrovich mbutrovich closed this Apr 3, 2026
blackmwk added a commit to apache/iceberg-rust that referenced this pull request Apr 16, 2026
…r schemas with nested types (#2307)

## Which issue does this PR close?

- Closes #2306.
- Downstream issue:
apache/datafusion-comet#3860

## What changes are included in this PR?                  

`build_fallback_field_id_map` iterated over Parquet leaf columns instead
of top-level fields when building the field ID to column index mapping
for migrated files (no embedded field IDs). When nested types (struct,
list, map) precede a primitive column, they expand into multiple leaves,
causing the mapping to diverge from
`add_fallback_field_ids_to_arrow_schema` which correctly assigns ordinal
IDs to top-level Arrow fields. This made predicates on columns after
nested types resolve to a leaf inside the group, crashing with "Leaf
column `id` in predicates isn't a root column in Parquet schema".

The fix iterates `root_schema().get_fields()` directly, assigning
ordinal IDs only to top-level fields. For non-primitive fields
(struct/list/map), it uses `get_column_root_idx` to advance past their
leaf columns. This mirrors iceberg-java's
`ParquetSchemaUtil.addFallbackIds()`, which iterates
`fileSchema.getFields()` assigning ordinal IDs to top-level fields.

Also renames "Leave column" to "Leaf column" in error messages.

## Are these changes tested?

- An integration test
(`test_predicate_on_migrated_file_with_nested_types`) writes a Parquet
file without field IDs containing struct, list, and map columns before
an `id` column, then reads with a predicate on `id`. This reproduces the
exact crash before the fix. Test data is constructed with `serde_arrow`
for readability.

- [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)
used the repro test in

[apache/datafusion-comet#3860](apache/datafusion-comet#3860)
and it passes with this change:
apache/datafusion-comet#3872

---------

Co-authored-by: blackmwk <liurenjie1024@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: native Iceberg reader errors on residual filter on column after nested type for migrated Parquet files

1 participant