Background
Part of the Hudi 1.x upgrade (#762).
In Hudi 1.2.0 the metadata-table (MDT) partition-stats index is coupled to the column-stats index. For partitioned tables, the partition-stats generation path rebuilds a file-system view over the committed external parquet files and groups them by fileId. XTable registers externally-written files whose names are not Hudi-native; once the _hudiext marker is stripped, the remaining file name cannot be parsed into a valid fileId, which causes column/partition-stats generation to fail.
Current workaround
In HudiConversionTarget#getWriteConfig, column stats are enabled only for un-partitioned tables:
.withMetadataIndexColumnStats(!metaClient.getTableConfig().isTablePartitioned())
So partitioned tables synced through XTable currently get no MDT column-stats index.
Ask
Enable the MDT column-stats (and partition-stats) index for partitioned tables, e.g. by making the partition-stats file grouping tolerant of externally-registered (non-Hudi) file names, so col-stats can be turned on regardless of partitioning.
Notes
Are you willing to submit PR?
Code of Conduct
Background
Part of the Hudi 1.x upgrade (#762).
In Hudi 1.2.0 the metadata-table (MDT) partition-stats index is coupled to the column-stats index. For partitioned tables, the partition-stats generation path rebuilds a file-system view over the committed external parquet files and groups them by
fileId. XTable registers externally-written files whose names are not Hudi-native; once the_hudiextmarker is stripped, the remaining file name cannot be parsed into a validfileId, which causes column/partition-stats generation to fail.Current workaround
In
HudiConversionTarget#getWriteConfig, column stats are enabled only for un-partitioned tables:So partitioned tables synced through XTable currently get no MDT column-stats index.
Ask
Enable the MDT column-stats (and partition-stats) index for partitioned tables, e.g. by making the partition-stats file grouping tolerant of externally-registered (non-Hudi) file names, so col-stats can be turned on regardless of partitioning.
Notes
Are you willing to submit PR?
Code of Conduct