feat: add planned blob reads with source-level coalescing#6352
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
0f8e779 to
ebf8e8d
Compare
ebf8e8d to
c016d45
Compare
westonpace
left a comment
There was a problem hiding this comment.
I think some parts of this PR are reinventing capabilities that are already in the file / scan scheduler. The scheduler is already capable of both merging nearby reads into larger reads and splitting extremely large reads into multiple concurrent reads.
You also have a max concurrency control. However, this is also handled already. If you are worried about using too much RAM then you can use the scan scheduler's I/O buffer size parameter which controls how much RAM we allow in I/O buffers before we pause the scan. If you are worried about rate limits or overloading the object store then the object store's themselves already have controls for this (e.g. AIMD throttles)
I think all we need to do is wire up the scan scheduler's config to your user facing config variables (removing max_concurrency in favor of I/O buffer size which is easier to configure) and making one call to submit_request per file (which will allow the scheduler to do the coalescing and splitting of reads)
…escing # Conflicts: # rust/lance/src/dataset/blob.rs
westonpace
left a comment
There was a problem hiding this comment.
Approving on behalf of Weston (via Claude Code).
The refactor in 153ef4b40 is a big improvement — the planner now correctly delegates merging, splitting, and backpressure to the existing FileScheduler instead of reimplementing it. The BlobSource sharing and per-source grouping are clean, and the selection_index approach for order preservation is simpler than the previous buffered/buffer_unordered split.
Two items worth addressing (non-blocking):
1. V1 blob collection still uses .unwrap() on fragment lookup
rust/lance/src/dataset/blob.rs — collect_blob_entries_v1:
let frag = dataset.get_fragment(frag_id as usize).unwrap();
let data_file = frag.data_file_for_field(blob_field_id).unwrap();The V2 path handles errors properly. These should use ok_or_else(|| Error::...) to avoid panicking on invalid row addresses.
2. drain_pending_reads leader can get permanently stuck
If fulfill_pending_blob_reads panics, is_draining stays true forever and no new leader will spawn — all subsequent reads on that BlobSource will silently fail. A drop guard that resets is_draining = false would make this robust.
This PR improves blob I/O in two complementary ways:
BlobFileinstances that resolve to the same physical object now share a lazyBlobSourceand can opportunistically coalesce concurrent reads before handing them to Lance's existing scheduler, and datasets now expose a plannedread_blobsAPI for materializing blob payloads directly. It also adds explicit cursor-preserving range reads forBlobFileacross Rust, Python, and Java, with end-to-end Python coverage for the new API and the edge cases it uncovered.This keeps the optimization aligned with Lance's existing scheduler model while giving callers a higher-level path for sequential and batched blob access.
Python example