Commit 4932bfd
[data] DataSourceV2: support
``read_parquet`` was raising ``NotImplementedError`` for ``_block_udf``
and ``tensor_column_schema`` on the V2 path, skipping a batch of V1
tests. Wire them through:
- ``ReadFiles`` gets an optional ``block_udf: Callable[[Block], Block]``
field. ``plan_read_files_op`` applies it after
``reader.read(manifest)`` and before column renames so the UDF sees
on-disk column names (V1 ``ParquetDatasource`` semantics).
- ``_read_datasource_v2`` accepts a ``block_udf`` kwarg and stores it on
the logical op.
- ``ReadFiles.infer_schema`` probes the UDF's schema effect via a dummy
empty table (mirrors V1's ``dummy_table`` trick) so ``ds.schema()``
reflects post-transform types before materialization. The scanner
keeps the *pre-UDF* schema so pyarrow sees the raw on-disk types.
- ``read_parquet`` drops the two ``NotImplementedError`` raises;
``tensor_column_schema`` is already folded into ``_block_udf`` by
``_resolve_parquet_args`` so no extra handling is needed.
While un-skipping V1 tests, a second issue surfaced:
``test_multiple_files_with_ragged_arrays`` was failing because
``pds.dataset(paths).scanner().scan_batches()`` forces a cross-fragment
schema unification inside pyarrow. That unification casts per-file
``ArrowTensorTypeV2(shape=X)`` to the unified type and pyarrow refuses
extension-to-extension casts — "One can first cast to the storage
type, then to the extension type". V1 avoids this by iterating
``fragment.to_batches`` per fragment.
Port the pattern: ``FileReader._read_fragment_batches`` builds a
per-fragment scanner with that fragment's ``physical_schema`` so
pyarrow keeps the native per-file type. Downstream concat handles
heterogeneous block schemas, same as V1. The caller-supplied
``file_dataset_schema`` still applies for the common all-null
first-column case, and steps aside when any extension column is
present.
Tests: V2 unit 64/64, parquet broad slice 103 pass / 1 skip / 0 fail,
checkpoint suite 63 pass / 3 pre-existing ``ModuleNotFoundError``
failures (reproduced on master).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Co-authored-by: Goutam V. <>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>_block_udf + tensor_column_schema; per-fragment reads for variable-shape tensors (ray-project#63174)1 parent 60ecbed commit 4932bfd
10 files changed
Lines changed: 221 additions & 47 deletions
File tree
- python/ray/data
- _internal
- datasource_v2
- readers
- scanners
- tests
- logical/operators
- planner
- tests
- datasource
- unit/datasource_v2
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
89 | 90 | | |
90 | 91 | | |
91 | 92 | | |
| 93 | + | |
92 | 94 | | |
93 | 95 | | |
94 | 96 | | |
| |||
245 | 247 | | |
246 | 248 | | |
247 | 249 | | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
248 | 260 | | |
249 | 261 | | |
250 | 262 | | |
| |||
269 | 281 | | |
270 | 282 | | |
271 | 283 | | |
| 284 | + | |
272 | 285 | | |
273 | 286 | | |
274 | 287 | | |
| |||
Lines changed: 117 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
22 | 31 | | |
23 | 32 | | |
24 | 33 | | |
| |||
50 | 59 | | |
51 | 60 | | |
52 | 61 | | |
| 62 | + | |
53 | 63 | | |
54 | 64 | | |
55 | 65 | | |
| |||
68 | 78 | | |
69 | 79 | | |
70 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
71 | 87 | | |
72 | 88 | | |
73 | 89 | | |
| |||
86 | 102 | | |
87 | 103 | | |
88 | 104 | | |
| 105 | + | |
89 | 106 | | |
90 | 107 | | |
91 | 108 | | |
92 | 109 | | |
93 | 110 | | |
94 | 111 | | |
95 | 112 | | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
100 | 131 | | |
101 | 132 | | |
102 | 133 | | |
| 134 | + | |
| 135 | + | |
103 | 136 | | |
104 | 137 | | |
105 | 138 | | |
106 | 139 | | |
107 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
108 | 148 | | |
109 | | - | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
110 | 152 | | |
111 | 153 | | |
112 | 154 | | |
| |||
146 | 188 | | |
147 | 189 | | |
148 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
149 | 199 | | |
150 | 200 | | |
151 | 201 | | |
| |||
169 | 219 | | |
170 | 220 | | |
171 | 221 | | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
178 | 228 | | |
179 | | - | |
180 | 229 | | |
181 | 230 | | |
182 | 231 | | |
183 | | - | |
184 | | - | |
| 232 | + | |
| 233 | + | |
185 | 234 | | |
186 | 235 | | |
187 | 236 | | |
| |||
216 | 265 | | |
217 | 266 | | |
218 | 267 | | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
219 | 283 | | |
220 | 284 | | |
221 | 285 | | |
| |||
262 | 326 | | |
263 | 327 | | |
264 | 328 | | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| 145 | + | |
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| |||
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
| 164 | + | |
| 165 | + | |
163 | 166 | | |
164 | 167 | | |
165 | 168 | | |
| |||
174 | 177 | | |
175 | 178 | | |
176 | 179 | | |
| 180 | + | |
177 | 181 | | |
178 | 182 | | |
179 | 183 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| 38 | + | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| |||
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
Lines changed: 34 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
266 | 267 | | |
267 | 268 | | |
268 | 269 | | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
269 | 276 | | |
270 | 277 | | |
271 | 278 | | |
| |||
314 | 321 | | |
315 | 322 | | |
316 | 323 | | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
317 | 336 | | |
318 | 337 | | |
319 | 338 | | |
| |||
0 commit comments