Revert "Implement write.parquet.row-group-size-bytes in the pyarrow w… by stephrb · Pull Request #13 · imc-trading/iceberg-python

stephrb · 2026-06-03T17:21:39Z

intended to merge to develop instead of main

The pyiceberg writer has historically ignored write.parquet.row-group-size-bytes (logging 'not implemented') and used only write.parquet.row-group-limit (rows). For wide tables that means a single row group ends up at gigabytes — e.g. 337 cols × 1,048,576 default rows ≈ 1.7 GiB uncompressed per row group — which drives the polars / pyarrow reader's decode peak into the tens of GiB on production reads. Now write_file resolves row_group_size as min(row_group_limit, row_group_size_bytes / bytes_per_row), where bytes_per_row is approximated from the in-memory arrow_table's nbytes. This matches Spark / parquet-mr 'whichever limit fires first' semantics and lets the existing PARQUET_ROW_GROUP_SIZE_BYTES_DEFAULT (128 MiB) actually take effect.

…e-bytes Implement write.parquet.row-group-size-bytes in the pyarrow writer

…riter"

Stephen Buck and others added 3 commits June 3, 2026 10:00

Merge pull request #12 from imc-trading/sbuck/implement-row-group-siz…

852f22b

…e-bytes Implement write.parquet.row-group-size-bytes in the pyarrow writer

Revert "Implement write.parquet.row-group-size-bytes in the pyarrow w…

3a1e3d4

…riter"

stephrb force-pushed the main branch from 852f22b to c84017d Compare June 3, 2026 17:37

stephrb closed this Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Implement write.parquet.row-group-size-bytes in the pyarrow w…#13

Revert "Implement write.parquet.row-group-size-bytes in the pyarrow w…#13
stephrb wants to merge 3 commits into
mainfrom
revert-12-sbuck/implement-row-group-size-bytes

stephrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stephrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant