Skip to content

Fix: order parquet schema + create metadata collector improving greatly speed of dataset query#297

Open
lbesnard wants to merge 5 commits into
mainfrom
ParquetBreakingChange
Open

Fix: order parquet schema + create metadata collector improving greatly speed of dataset query#297
lbesnard wants to merge 5 commits into
mainfrom
ParquetBreakingChange

Conversation

@lbesnard

@lbesnard lbesnard commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@lbesnard lbesnard requested a review from thommodin June 4, 2026 10:11
@lbesnard

lbesnard commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

@thommodin would love your opinion on that. Basically there was no _metadata. 2 years ago, doc online was pretty shit re metadata_collector, and the purpose of it, and being able to write in parallel with many workers.
I ended up avoiding it.

But I'm pretty sure this will make queries way faster. I made some test on samples. WIll try tomorrow on Argo.

but basically, would mean to reprocess all existing parquet dataset because their schema would not be always ordered for all "chunks"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant