Skip to content

Commit 039e91b

Browse files
sumedhsakdeoclaude
andcommitted
fix: validate ArrivalOrder params and clarify ordering docs
- Add __post_init__ to ArrivalOrder raising ValueError if concurrent_streams < 1 or max_buffered_batches < 1. Previously max_buffered_batches=0 would silently create an unbounded queue. - Split the ArrivalOrder row in the ordering semantics table to clarify that interleaving only occurs with concurrent_streams > 1; concurrent_streams=1 reads files sequentially. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent a882dd2 commit 039e91b

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

mkdocs/docs/api.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -378,7 +378,8 @@ for buf in tbl.scan().to_arrow_batch_reader(order=ArrivalOrder(concurrent_stream
378378
| Configuration | File ordering | Within-file ordering |
379379
|---|---|---|
380380
| `TaskOrder()` (default) | Batches grouped by file, in task submission order | Row order |
381-
| `ArrivalOrder()` | Interleaved across files (no grouping guarantee) | Row order within each file |
381+
| `ArrivalOrder(concurrent_streams=1)` | Sequential, one file at a time | Row order |
382+
| `ArrivalOrder(concurrent_streams>1)` | Interleaved across files (no grouping guarantee) | Row order within each file |
382383

383384
The `limit` parameter is enforced correctly regardless of configuration.
384385

pyiceberg/table/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,13 @@ class ArrivalOrder(ScanOrder):
193193
batch_size: int | None = None
194194
max_buffered_batches: int = 16
195195

196+
def __post_init__(self) -> None:
197+
"""Validate ArrivalOrder parameters."""
198+
if self.concurrent_streams < 1:
199+
raise ValueError(f"concurrent_streams must be >= 1, got {self.concurrent_streams}")
200+
if self.max_buffered_batches < 1:
201+
raise ValueError(f"max_buffered_batches must be >= 1, got {self.max_buffered_batches}")
202+
196203

197204
@dataclass()
198205
class UpsertResult:

0 commit comments

Comments
 (0)