You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove @pytest.mark.benchmark so the read throughput tests are included
in the default `make test` filter as parametrize-marked tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -389,16 +389,16 @@ for buf in tbl.scan().to_arrow_batch_reader(order=ScanOrder.ARRIVAL, concurrent_
389
389
390
390
Within each file, batch ordering always follows row order. The `limit` parameter is enforced correctly regardless of configuration.
391
391
392
-
!!! tip "Which configuration should I use?"
392
+
**Which configuration should I use?**
393
393
394
-
| Use case | Recommended config |
395
-
|---|---|
396
-
| Small tables, simple queries | Default — no extra args needed |
397
-
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
398
-
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
399
-
| Fine-grained batch control | Add `batch_size=N` to any of the above |
394
+
| Use case | Recommended config |
395
+
|---|---|
396
+
| Small tables, simple queries | Default — no extra args needed |
397
+
| Large tables, memory-constrained | `streaming=True` — one file at a time, minimal memory |
398
+
| Maximum throughput with bounded memory | `streaming=True, concurrent_files=N` — tune N to balance throughput vs memory |
399
+
| Fine-grained batch control | Add `batch_size=N` to any of the above |
400
400
401
-
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
401
+
**Note:** `streaming=True` yields batches in arrival order (interleaved across files when `concurrent_files > 1`). For deterministic file ordering, use the default non-streaming mode. `batch_size` is usually an advanced tuning knob — the PyArrow default of 131,072 rows works well for most workloads.
402
402
403
403
To avoid any type inconsistencies during writing, you can convert the Iceberg table schema to Arrow:
0 commit comments