Skip to content

Commit 09aad7a

Browse files
sumedhsakdeoclaude
andcommitted
docs: add configuration guidance table to streaming API docs
Add a "which config should I use?" tip box with recommended starting points for common use cases, and clarify that batch_size is an advanced tuning knob. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2efdcba commit 09aad7a

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

mkdocs/docs/api.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -389,6 +389,17 @@ for buf in tbl.scan().to_arrow_batch_reader(order=ScanOrder.ARRIVAL, concurrent_
389389

390390
Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
391391

392+
!!! tip "Which configuration should I use?"
393+
394+
| Use case | Recommended config |
395+
|---|---|
396+
| Small tables, simple queries | Default — no extra args needed |
397+
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
398+
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
399+
| Fine-grained batch control | Add `batch_size=N` to any of the above |
400+
401+
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
402+
392403
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
393404

394405
```python

0 commit comments

Comments
 (0)