Commit 40d09bf
feat: add batch_size_bytes to encoding decode stream (#6388)
## Summary
This adds some initial framework for #6387 . It adds the
`batch_size_bytes` option which will be wired up in a future PR. It
provides a basic implementation that uses a guess-and-check strategy to
try and figure out how many rows to use. This is inexact and some
batches it emits will be too large. Future PRs will add accurate sizing
from the decode layers to hopefully avoid the need for guessing
entirely. Still, this is enough to get the feature wired up and then we
can improve it later.
- Add `batch_size_bytes: Option<u64>` to `SchedulerDecoderConfig` and
thread it through the structural v2.1 decode path
- When set, compute rows-per-batch from byte estimates instead of the
fixed `batch_size` row count
- After each batch decodes, measure actual bytes-per-row and feed it
back so subsequent batches converge toward the target byte size
- Feedback degrades gradually (midpoint) when actual size is smaller
than the estimate; adopts immediately when larger to avoid OOM
- Only the v2.1+ `StructuralBatchDecodeStream` is modified; v2.0
`BatchDecodeStream` is unchanged (logs a warning if the option is set)
- Wiring this through to the file reader and Python/Java bindings will
be done in a follow-up PR
## Test plan
- [x] `test_estimate_bytes_per_row` — unit test for the schema-based
byte estimator
- [x] `test_byte_sized_batches_fixed_width` — 1000 rows × 4 Int32
columns, `batch_size_bytes=1600` → 10 batches of exactly 100 rows,
roundtrip verified
- [x] `test_byte_sized_batches_none_unchanged` — `batch_size_bytes=None`
still uses `rows_per_batch` (no behavioral change)
- [x] `test_byte_sized_batches_feedback_convergence` — 100-byte strings
with 64-byte schema estimate; verifies second/third batches converge to
~50 rows after feedback
- [x] `cargo clippy -p lance-encoding --tests -p lance-file -- -D
warnings` clean
- [x] `cargo fmt --all -- --check` clean
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 5310f36 commit 40d09bf
4 files changed
Lines changed: 339 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
541 | 541 | | |
542 | 542 | | |
543 | 543 | | |
| 544 | + | |
544 | 545 | | |
545 | 546 | | |
546 | 547 | | |
| |||
0 commit comments