Commit 345cbb4
fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types (#2307)
## Which issue does this PR close?
- Closes #2306.
- Downstream issue:
apache/datafusion-comet#3860
## What changes are included in this PR?
`build_fallback_field_id_map` iterated over Parquet leaf columns instead
of top-level fields when building the field ID to column index mapping
for migrated files (no embedded field IDs). When nested types (struct,
list, map) precede a primitive column, they expand into multiple leaves,
causing the mapping to diverge from
`add_fallback_field_ids_to_arrow_schema` which correctly assigns ordinal
IDs to top-level Arrow fields. This made predicates on columns after
nested types resolve to a leaf inside the group, crashing with "Leaf
column `id` in predicates isn't a root column in Parquet schema".
The fix iterates `root_schema().get_fields()` directly, assigning
ordinal IDs only to top-level fields. For non-primitive fields
(struct/list/map), it uses `get_column_root_idx` to advance past their
leaf columns. This mirrors iceberg-java's
`ParquetSchemaUtil.addFallbackIds()`, which iterates
`fileSchema.getFields()` assigning ordinal IDs to top-level fields.
Also renames "Leave column" to "Leaf column" in error messages.
## Are these changes tested?
- An integration test
(`test_predicate_on_migrated_file_with_nested_types`) writes a Parquet
file without field IDs containing struct, list, and map columns before
an `id` column, then reads with a predicate on `id`. This reproduces the
exact crash before the fix. Test data is constructed with `serde_arrow`
for readability.
- [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)
used the repro test in
[apache/datafusion-comet#3860](apache/datafusion-comet#3860)
and it passes with this change:
apache/datafusion-comet#3872
---------
Co-authored-by: blackmwk <liurenjie1024@outlook.com>
(cherry picked from commit 5ea6f4c)1 parent 97db3b4 commit 345cbb4
3 files changed
Lines changed: 277 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1100 | 1100 | | |
1101 | 1101 | | |
1102 | 1102 | | |
1103 | | - | |
| 1103 | + | |
1104 | 1104 | | |
1105 | 1105 | | |
1106 | 1106 | | |
| |||
1111 | 1111 | | |
1112 | 1112 | | |
1113 | 1113 | | |
1114 | | - | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
1115 | 1134 | | |
1116 | 1135 | | |
| 1136 | + | |
1117 | 1137 | | |
1118 | | - | |
1119 | | - | |
1120 | | - | |
1121 | | - | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
1122 | 1144 | | |
1123 | 1145 | | |
1124 | 1146 | | |
| |||
1409 | 1431 | | |
1410 | 1432 | | |
1411 | 1433 | | |
1412 | | - | |
| 1434 | + | |
1413 | 1435 | | |
1414 | 1436 | | |
1415 | 1437 | | |
| |||
1423 | 1445 | | |
1424 | 1446 | | |
1425 | 1447 | | |
1426 | | - | |
| 1448 | + | |
1427 | 1449 | | |
1428 | 1450 | | |
1429 | 1451 | | |
| |||
4667 | 4689 | | |
4668 | 4690 | | |
4669 | 4691 | | |
| 4692 | + | |
| 4693 | + | |
| 4694 | + | |
| 4695 | + | |
| 4696 | + | |
| 4697 | + | |
| 4698 | + | |
| 4699 | + | |
| 4700 | + | |
| 4701 | + | |
| 4702 | + | |
| 4703 | + | |
| 4704 | + | |
| 4705 | + | |
| 4706 | + | |
| 4707 | + | |
| 4708 | + | |
| 4709 | + | |
| 4710 | + | |
| 4711 | + | |
| 4712 | + | |
| 4713 | + | |
| 4714 | + | |
| 4715 | + | |
| 4716 | + | |
| 4717 | + | |
| 4718 | + | |
| 4719 | + | |
| 4720 | + | |
| 4721 | + | |
| 4722 | + | |
| 4723 | + | |
| 4724 | + | |
| 4725 | + | |
| 4726 | + | |
| 4727 | + | |
| 4728 | + | |
| 4729 | + | |
| 4730 | + | |
| 4731 | + | |
| 4732 | + | |
| 4733 | + | |
| 4734 | + | |
| 4735 | + | |
| 4736 | + | |
| 4737 | + | |
| 4738 | + | |
| 4739 | + | |
| 4740 | + | |
| 4741 | + | |
| 4742 | + | |
| 4743 | + | |
| 4744 | + | |
| 4745 | + | |
| 4746 | + | |
| 4747 | + | |
| 4748 | + | |
| 4749 | + | |
| 4750 | + | |
| 4751 | + | |
| 4752 | + | |
| 4753 | + | |
| 4754 | + | |
| 4755 | + | |
| 4756 | + | |
| 4757 | + | |
| 4758 | + | |
| 4759 | + | |
| 4760 | + | |
| 4761 | + | |
| 4762 | + | |
| 4763 | + | |
| 4764 | + | |
| 4765 | + | |
| 4766 | + | |
| 4767 | + | |
| 4768 | + | |
| 4769 | + | |
| 4770 | + | |
| 4771 | + | |
| 4772 | + | |
| 4773 | + | |
| 4774 | + | |
| 4775 | + | |
| 4776 | + | |
| 4777 | + | |
| 4778 | + | |
| 4779 | + | |
| 4780 | + | |
| 4781 | + | |
| 4782 | + | |
| 4783 | + | |
| 4784 | + | |
| 4785 | + | |
| 4786 | + | |
| 4787 | + | |
| 4788 | + | |
| 4789 | + | |
| 4790 | + | |
| 4791 | + | |
| 4792 | + | |
| 4793 | + | |
| 4794 | + | |
| 4795 | + | |
| 4796 | + | |
| 4797 | + | |
| 4798 | + | |
| 4799 | + | |
| 4800 | + | |
| 4801 | + | |
| 4802 | + | |
| 4803 | + | |
| 4804 | + | |
| 4805 | + | |
| 4806 | + | |
| 4807 | + | |
| 4808 | + | |
| 4809 | + | |
| 4810 | + | |
| 4811 | + | |
| 4812 | + | |
| 4813 | + | |
| 4814 | + | |
| 4815 | + | |
| 4816 | + | |
| 4817 | + | |
| 4818 | + | |
| 4819 | + | |
| 4820 | + | |
| 4821 | + | |
| 4822 | + | |
| 4823 | + | |
| 4824 | + | |
| 4825 | + | |
| 4826 | + | |
| 4827 | + | |
| 4828 | + | |
| 4829 | + | |
| 4830 | + | |
| 4831 | + | |
| 4832 | + | |
| 4833 | + | |
| 4834 | + | |
| 4835 | + | |
| 4836 | + | |
| 4837 | + | |
| 4838 | + | |
| 4839 | + | |
| 4840 | + | |
| 4841 | + | |
| 4842 | + | |
| 4843 | + | |
| 4844 | + | |
| 4845 | + | |
| 4846 | + | |
| 4847 | + | |
| 4848 | + | |
| 4849 | + | |
| 4850 | + | |
| 4851 | + | |
| 4852 | + | |
| 4853 | + | |
| 4854 | + | |
| 4855 | + | |
| 4856 | + | |
| 4857 | + | |
| 4858 | + | |
| 4859 | + | |
| 4860 | + | |
| 4861 | + | |
| 4862 | + | |
| 4863 | + | |
| 4864 | + | |
| 4865 | + | |
| 4866 | + | |
| 4867 | + | |
| 4868 | + | |
| 4869 | + | |
| 4870 | + | |
| 4871 | + | |
| 4872 | + | |
| 4873 | + | |
| 4874 | + | |
| 4875 | + | |
| 4876 | + | |
| 4877 | + | |
| 4878 | + | |
| 4879 | + | |
| 4880 | + | |
| 4881 | + | |
| 4882 | + | |
| 4883 | + | |
| 4884 | + | |
| 4885 | + | |
| 4886 | + | |
| 4887 | + | |
| 4888 | + | |
| 4889 | + | |
| 4890 | + | |
| 4891 | + | |
4670 | 4892 | | |
0 commit comments