You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+24-2Lines changed: 24 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -365,6 +365,17 @@ for buf in tbl.scan().to_arrow_batch_reader():
365
365
print(f"Buffer contains {len(buf)} rows")
366
366
```
367
367
368
+
### Streaming writes from a `RecordBatchReader`
369
+
370
+
`tbl.append()`and `tbl.overwrite()` also accept a `pyarrow.RecordBatchReader` directly, which lets you write datasets that don't fit in memory without materialising them as a `pa.Table` first. PyIceberg consumes the reader once and microbatches it into Parquet files of approximately `write.target-file-size-bytes` (default 512 MiB), keeping memory usage bounded by the target size. All files are committed in a single snapshot.
Streaming writes are currently only supported on **unpartitioned** tables. For a partitioned table, materialise the reader as a `pa.Table` first, or follow [#2152](https://github.com/apache/iceberg-python/issues/2152) for the partitioned support tracked as a follow-up.
378
+
368
379
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
369
380
370
381
```python
@@ -425,7 +436,7 @@ You can overwrite the record of `Paris` with a record of `New York`:
425
436
from pyiceberg.expressions import EqualTo
426
437
df = pa.Table.from_pylist(
427
438
[
428
-
{"city": "New York", "lat": 40.7128, "long": 74.0060},
439
+
{"city": "New York", "lat": 40.7128, "long": -74.0060},
If the PyIceberg table is partitioned, you can use `tbl.dynamic_partition_overwrite(df)` to replace the existing partitions with new ones provided in the dataframe. The partitions to be replaced are detected automatically from the provided arrow table.
0 commit comments