Commit 1e862f9
authored
perf(read): consolidate driver-side Dataset opens and pin version for… (#567)
… snapshot isolation
## Summary
Consolidate the two driver-side `Dataset.open()` calls during scan
planning into one, and pin the resolved table version onto the read
options shipped to executors. This reduces manifest IO and closes a
cross-task snapshot-isolation gap on large tables.
## Motivation
For each Spark scan, `LanceScanBuilder.build()` opens a dataset to read
manifest summary, schema, sharding spec, and zonemap stats. Then
`LanceScan.planInputPartitions()` opens the dataset *again* via
`LanceSplit.planScan(readOptions)` to enumerate fragments and
per-fragment row counts.
Two issues:
1. **Performance** — Driver pays the manifest IO / `Dataset.open()` cost
twice per query. On very large tables this measurably increases planning
latency.
2. **Snapshot isolation bug** — When the user does not specify a
version, both opens resolve "latest" independently. If a concurrent
writer commits a new version between the two opens (or between the
driver-side open and an executor-side open), tasks can observe a newer
version than the one used for fragment pruning / statistics. The query
then sees an inconsistent view.
## Changes
### `LanceSplit`
- New overload `planScan(Dataset)` that accepts an already-opened
dataset and does not close it.
- Existing `planScan(LanceSparkReadOptions)` becomes a thin wrapper,
kept for tests and external callers.
### `LanceScanBuilder.build()`
- Calls `LanceSplit.planScan(dataset)` against the same handle used for
manifest / zonemap loading, before `closeLazyDataset()`.
- Calls `readOptions.withVersion(resolvedVersion)` to produce a pinned,
immutable copy of the read options.
- Passes the pre-computed splits, per-fragment row counts, and pinned
options to `LanceScan`.
### `LanceScan`
- Constructor accepts `precomputedSplits` and
`precomputedFragmentRowCounts`.
- `planInputPartitions()` no longer opens the dataset; it consumes the
pre-computed result and skips the redundant `withVersion` wrap (the
options are already pinned upstream).
- The per-fragment row-count map is marked `transient` so it does not
bloat plan-tree serialization or affect `BatchScanExec` /
`ReusedExchange` comparisons.
### Tests
- `LanceScanTest` updated for the new constructor parameters.
- `LanceSplitTest` unchanged — both overloads remain covered.
## Why this is safe
- `LanceSparkReadOptions.withVersion(...)` returns a new instance via
the existing builder; the user-supplied options object is never mutated.
- When the user explicitly pinned a version upstream,
`dataset.getVersion().getId()` returns that same version, so the pin is
a no-op.
- All existing pruning paths (`pruneByRowAddrFilters`,
`pruneByZonemapStats`, `pruneByLimit`) continue to operate on the same
`List<LanceSplit>` shape; only the source of the list changed.
## Performance impact
- Driver `Dataset.open()` calls per scan: **2 → 1**.
- Manifest reads per scan: **2 → 1**.
- No change to executor-side IO.
## Correctness impact
- All tasks of a single query are now guaranteed to read the same
dataset version, even under concurrent writes.1 parent 20345de commit 1e862f9
8 files changed
Lines changed: 253 additions & 126 deletions
File tree
- lance-spark-base_2.12/src
- main/java/org/lance/spark
- read
- utils
- test/java/org/lance/spark/read
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
241 | | - | |
| 241 | + | |
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
| |||
312 | 312 | | |
313 | 313 | | |
314 | 314 | | |
315 | | - | |
| 315 | + | |
316 | 316 | | |
317 | 317 | | |
318 | 318 | | |
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
414 | | - | |
| 414 | + | |
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| |||
453 | 453 | | |
454 | 454 | | |
455 | 455 | | |
456 | | - | |
| 456 | + | |
457 | 457 | | |
458 | 458 | | |
459 | 459 | | |
| |||
546 | 546 | | |
547 | 547 | | |
548 | 548 | | |
549 | | - | |
| 549 | + | |
550 | 550 | | |
551 | 551 | | |
552 | 552 | | |
| |||
Lines changed: 31 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
92 | 108 | | |
93 | 109 | | |
94 | 110 | | |
| |||
121 | 137 | | |
122 | 138 | | |
123 | 139 | | |
| 140 | + | |
| 141 | + | |
124 | 142 | | |
125 | 143 | | |
126 | 144 | | |
| |||
140 | 158 | | |
141 | 159 | | |
142 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
143 | 166 | | |
144 | 167 | | |
145 | 168 | | |
| |||
154 | 177 | | |
155 | 178 | | |
156 | 179 | | |
157 | | - | |
158 | | - | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
159 | 184 | | |
160 | 185 | | |
161 | 186 | | |
| |||
166 | 191 | | |
167 | 192 | | |
168 | 193 | | |
169 | | - | |
| 194 | + | |
170 | 195 | | |
171 | 196 | | |
172 | 197 | | |
173 | 198 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
| 199 | + | |
| 200 | + | |
178 | 201 | | |
179 | 202 | | |
180 | 203 | | |
| |||
192 | 215 | | |
193 | 216 | | |
194 | 217 | | |
195 | | - | |
| 218 | + | |
196 | 219 | | |
197 | 220 | | |
198 | 221 | | |
| |||
Lines changed: 116 additions & 100 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
155 | 158 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
170 | 173 | | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
187 | 205 | | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
199 | 214 | | |
200 | | - | |
201 | 215 | | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
210 | 239 | | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
233 | 272 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | 273 | | |
258 | 274 | | |
259 | 275 | | |
| |||
0 commit comments