Skip to content

GH-563: Make ColumnMetaData.path_in_schema optional#564

Open
etseidl wants to merge 3 commits intoapache:masterfrom
etseidl:deprecate_path_in_schema
Open

GH-563: Make ColumnMetaData.path_in_schema optional#564
etseidl wants to merge 3 commits intoapache:masterfrom
etseidl:deprecate_path_in_schema

Conversation

@etseidl
Copy link
Copy Markdown
Contributor

@etseidl etseidl commented Apr 8, 2026

Rationale for this change

What changes are included in this PR?

Change path_in_schema to optional.

Do these changes have PoC implementations?

Yes.

Closes #563

@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 8, 2026

I hope to have a Java PoC available soon.

@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 8, 2026

Java PoC apache/parquet-java#3470

I've so far confirmed that parquet-cli cat from the Java PoC can read a file lacking path_in_schema generated by arrow-rs.

@etseidl etseidl marked this pull request as ready for review April 9, 2026 17:53
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 10, 2026

I think it is a great idea -- though before merging this I think we should do a formal approval on the mailing list

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for driving this along @etseidl

Comment thread src/main/thrift/parquet.thrift Outdated
@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 10, 2026

I think it is a great idea -- though before merging this I think we should do a formal approval on the mailing list

For sure! 👍 I just wanted to put up a concrete proposal to drive the discussion.

Also, FWIW, I've started on an arrow-cpp PoC. We'll see how far I get 😅

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@etseidl
Copy link
Copy Markdown
Contributor Author

etseidl commented Apr 10, 2026

C++ PoC apache/arrow#49707

Comment thread src/main/thrift/parquet.thrift Outdated
* the schema, and redundantly storing it here can lead to unnecessary
* bloat in the footer. Writers are encouraged to make the writing of
* this field optional, but for maximal compatibility should default to
* writing the field until at least Month 202X.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on "Forward incompatible features/changes should not be turned on by default until 2 years after the parquet-java implementation containing the feature is released." Lets maybe fill in the date as September 2028, assuming we get things merged by a september java release?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and put Sept 2028 in the text for now. We can update as needed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make path_in_schema optional

3 participants