perf: submit I/O requests eagerly in FullZipScheduler (#6513)

hushengquan · web-flow · commit 9df651bac713 · 2026-04-15T00:58:11.000+08:00
## Summary Refactor `FullZipScheduler::create_page_load_task` to accept a pre-submitted I/O future instead of deferring I/O submission until the async task executes. This allows the I/O requests to be submitted immediately during scheduling, enabling the object store layer to batch and parallelize them. close #6504 ## I/O Model Change ### Before: Lazy I/O submission (serialized) Previously, `create_page_load_task` received a `FullZipReadSource::Remote(io)` along with byte ranges and priority. The actual `io.submit_request()` call happened **inside** the async block, meaning the I/O request was not submitted until the future was first polled. When decoding multiple pages (e.g. across many fragments), this created a sequential I/O pattern: ``` Page 1: [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode] Page 2: [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode] Page 3: [schedule] -> [poll] -> ... ``` Each page's I/O request could only be submitted after the previous task started executing. The I/O scheduler had no visibility into upcoming requests, preventing it from batching or parallelizing them effectively. ### After: Eager I/O submission (pipelined) Now, `io.submit_request()` is called **before** constructing the `PageLoadTask`, and the resulting future is passed into `create_page_load_task`. All I/O requests for all pages are submitted upfront during the scheduling phase: ``` [schedule all pages] --> submit I/O page 1 -+ --> submit I/O page 2 -+ --> submit I/O page 3 -+ (all in-flight concurrently) --> submit I/O page N -+ | [poll] -> [await page 1 response] -> [decode] [poll] -> [await page 2 response] -> [decode] [poll] -> [await page 3 response] -> [decode] ``` The object store layer can now see all pending requests at once and optimize I/O through batching, connection multiplexing, and parallel fetches. The async tasks only await the already-in-flight I/O futures. ## Changes - `rust/lance-encoding/src/encodings/logical/primitive.rs`: - Changed `create_page_load_task` signature to accept `BoxFuture<'static, Result<Vec<Bytes>>>` instead of `FullZipReadSource` + byte ranges + priority - Moved `io.submit_request()` calls to happen eagerly at both call sites (`schedule_ranges_with_rep_index` and the non-rep-index path), before constructing the page load task ## Performance Tested with a multi-fragment dataset containing fixed-width columns (768-dim float32 vectors, 40 fragments, 50 rows/fragment): | Benchmark | Before (p50) | After (p50) | Speedup | |---|---|---|---| | Fixed-width column scan | 3453 ms | 523 ms | **6.6x** | The improvement comes entirely from I/O pipelining — the decoding logic itself is unchanged. The effect is most pronounced with many fragments or pages, where the serialized I/O submission was the dominant bottleneck.
diff --git a/rust/lance-encoding/src/encodings/logical/primitive.rs b/rust/lance-encoding/src/encodings/logical/primitive.rs
@@ -2165,15 +2165,17 @@ impl FullZipScheduler {
     }
 
     fn create_page_load_task(
-        read_source: FullZipReadSource,
-        byte_ranges: Vec<Range<u64>>,
-        priority: u64,
+        io_future: BoxFuture<'static, Result<Vec<Bytes>>>,
         num_rows: u64,
         details: Arc<FullZipDecodeDetails>,
         bits_per_offset: u8,
     ) -> PageLoadTask {
         let load_task = async move {
-            let data = read_source.fetch(&byte_ranges, priority).await?;
+            let buffers = io_future.await?;
+            let data = buffers
+                .into_iter()
+                .map(|bytes| LanceBuffer::from_bytes(bytes, 1))
+                .collect::<VecDeque<_>>();
             Self::create_decoder(details, data, num_rows, bits_per_offset)
         }
         .boxed();
@@ -2333,14 +2335,9 @@ impl FullZipScheduler {
                 rep_index.bytes_per_value,
                 data_buf_position,
             );
-            let page_load_task = Self::create_page_load_task(
-                FullZipReadSource::Remote(io.clone()),
-                byte_ranges,
-                priority,
-                num_rows,
-                details,
-                bits_per_offset,
-            );
+            let io_future = io.submit_request(byte_ranges, priority);
+            let page_load_task =
+                Self::create_page_load_task(io_future, num_rows, details, bits_per_offset);
             return Ok(vec![page_load_task]);
         }
 
@@ -2403,10 +2400,9 @@ impl FullZipScheduler {
             })
             .collect::<Vec<_>>();
 
+        let io_future = io.submit_request(byte_ranges, self.priority);
         let page_load_task = Self::create_page_load_task(
-            FullZipReadSource::Remote(io.clone()),
-            byte_ranges,
-            self.priority,
+            io_future,
             num_rows,
             self.details.clone(),
             self.bits_per_offset,