feat: Add iceberg.schema to footer for compatibility#2249
feat: Add iceberg.schema to footer for compatibility#2249vovacf201 wants to merge 4 commits intoapache:mainfrom
iceberg.schema to footer for compatibility#2249Conversation
* mock change * remove fmt * add unit tests * fix tests * format * commit
| /// schema ID) into the Parquet file footer. | ||
| fn add_iceberg_schema_metadata(props: WriterProperties, schema: &Schema) -> WriterProperties { | ||
| let schema_json = serde_json::to_string(schema) | ||
| .expect("Iceberg schema serialization to JSON should not fail"); |
There was a problem hiding this comment.
Can we propagate this error up rather than having it panic?
|
|
||
| // Preserve any existing key-value metadata from the caller | ||
| let mut kv_metadata = props.key_value_metadata().cloned().unwrap_or_default(); | ||
| kv_metadata.push(iceberg_kv); |
There was a problem hiding this comment.
If a caller has already set ICEBERG_SCHEMA_KEY then we'll add it twice here. Java uses a map so I think it's not an issue there but I feel like we're better off reporting an error in this case?
| let iceberg_kv = KeyValue::new(ICEBERG_SCHEMA_KEY.to_string(), schema_json); | ||
|
|
||
| // Preserve any existing key-value metadata from the caller | ||
| let mut kv_metadata = props.key_value_metadata().cloned().unwrap_or_default(); |
There was a problem hiding this comment.
Is it possible we're better propagating this error upstream also rather than silently ignoring it?
There was a problem hiding this comment.
Thanks for this, looks like a good change to me. Left a few comments.
Also confirmed this matches Java here https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java#L363
|
I realized we also don't write metadata for delete files either so I put this up #2316 |
This PR adds the
iceberg.schemakey-value metadata to the Parquet file footer.It embeds the full Iceberg schema JSON (including field IDs, types, and schema ID) into the Parquet file footer metadata, which is required for compatibility with downstream readers.
Changes:
ICEBERG_SCHEMA_KEYconstant for theiceberg.schemametadata keyadd_iceberg_schema_metadatamethod toParquetWriterBuilderthat injects the schema JSON into writer propertiesCherry-picked from risingwavelabs/iceberg-rust commit 18a0e83