You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Subsegment reads - on feature vectors as an example, we read 10x more data. Sub segment read p2 #7368
io_uring - currently the cost of planning a single task for small reads outperforms the cost of reading itself.
same as (2) from different endpoint: vectored reads. If we can batch small read_at() calls for local disks, we'll save on tokio planning
Moving the needle (won't do):
Using Natural split heuristic not for range distance constant, but for ranges count: 33% improvement on feature-vectors, 10% regression on nested lists since this favours flat data.
For small split read tasks the time of reading is marginal compared to tokio task planning and LazyScanStream initialization. This may be solved by a heuristic - performance improves marginally only for the main thread.-
TODOs:
feature-vectors/correlated is faster with footer reopen. Why?
Additional context:
We were incorrectly measuring performance for vortex, likely influenced by Lance #8470
Lance is better on feature vectors by around 3-15x.
Big improvements:
Moving the needle (won't do):
TODOs:
Additional context:
We were incorrectly measuring performance for vortex, likely influenced by Lance #8470