Commit 9df651b
authored
perf: submit I/O requests eagerly in FullZipScheduler (#6513)
## Summary
Refactor `FullZipScheduler::create_page_load_task` to accept a
pre-submitted I/O future instead of deferring I/O submission until the
async task executes. This allows the I/O requests to be submitted
immediately during scheduling, enabling the object store layer to batch
and parallelize them. close #6504
## I/O Model Change
### Before: Lazy I/O submission (serialized)
Previously, `create_page_load_task` received a
`FullZipReadSource::Remote(io)` along with byte ranges and priority. The
actual `io.submit_request()` call happened **inside** the async block,
meaning the I/O request was not submitted until the future was first
polled.
When decoding multiple pages (e.g. across many fragments), this created
a sequential I/O pattern:
```
Page 1: [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode]
Page 2: [schedule] -> [poll] -> [submit I/O] -> [wait response] -> [decode]
Page 3: [schedule] -> [poll] -> ...
```
Each page's I/O request could only be submitted after the previous task
started executing. The I/O scheduler had no visibility into upcoming
requests, preventing it from batching or parallelizing them effectively.
### After: Eager I/O submission (pipelined)
Now, `io.submit_request()` is called **before** constructing the
`PageLoadTask`, and the resulting future is passed into
`create_page_load_task`. All I/O requests for all pages are submitted
upfront during the scheduling phase:
```
[schedule all pages] --> submit I/O page 1 -+
--> submit I/O page 2 -+
--> submit I/O page 3 -+ (all in-flight concurrently)
--> submit I/O page N -+
|
[poll] -> [await page 1 response] -> [decode]
[poll] -> [await page 2 response] -> [decode]
[poll] -> [await page 3 response] -> [decode]
```
The object store layer can now see all pending requests at once and
optimize I/O through batching, connection multiplexing, and parallel
fetches. The async tasks only await the already-in-flight I/O futures.
## Changes
- `rust/lance-encoding/src/encodings/logical/primitive.rs`:
- Changed `create_page_load_task` signature to accept
`BoxFuture<'static, Result<Vec<Bytes>>>` instead of `FullZipReadSource`
+ byte ranges + priority
- Moved `io.submit_request()` calls to happen eagerly at both call sites
(`schedule_ranges_with_rep_index` and the non-rep-index path), before
constructing the page load task
## Performance
Tested with a multi-fragment dataset containing fixed-width columns
(768-dim float32 vectors, 40 fragments, 50 rows/fragment):
| Benchmark | Before (p50) | After (p50) | Speedup |
|---|---|---|---|
| Fixed-width column scan | 3453 ms | 523 ms | **6.6x** |
The improvement comes entirely from I/O pipelining — the decoding logic
itself is unchanged. The effect is most pronounced with many fragments
or pages, where the serialized I/O submission was the dominant
bottleneck.1 parent 97a89a6 commit 9df651b
1 file changed
Lines changed: 11 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2165 | 2165 | | |
2166 | 2166 | | |
2167 | 2167 | | |
2168 | | - | |
2169 | | - | |
2170 | | - | |
| 2168 | + | |
2171 | 2169 | | |
2172 | 2170 | | |
2173 | 2171 | | |
2174 | 2172 | | |
2175 | 2173 | | |
2176 | | - | |
| 2174 | + | |
| 2175 | + | |
| 2176 | + | |
| 2177 | + | |
| 2178 | + | |
2177 | 2179 | | |
2178 | 2180 | | |
2179 | 2181 | | |
| |||
2333 | 2335 | | |
2334 | 2336 | | |
2335 | 2337 | | |
2336 | | - | |
2337 | | - | |
2338 | | - | |
2339 | | - | |
2340 | | - | |
2341 | | - | |
2342 | | - | |
2343 | | - | |
| 2338 | + | |
| 2339 | + | |
| 2340 | + | |
2344 | 2341 | | |
2345 | 2342 | | |
2346 | 2343 | | |
| |||
2403 | 2400 | | |
2404 | 2401 | | |
2405 | 2402 | | |
| 2403 | + | |
2406 | 2404 | | |
2407 | | - | |
2408 | | - | |
2409 | | - | |
| 2405 | + | |
2410 | 2406 | | |
2411 | 2407 | | |
2412 | 2408 | | |
| |||
0 commit comments