Commit 8e92de5
authored
This PR is generally just a refactoring of the `vortex-scalar` crate and
all of its dependents.
Currently, we store an opaque `InnerScalarValue` inside a `ScalarValue`,
and the `InnerScalarValue` is allowed to be `Null`.
This PR both removes outlines the `InnerScalarValue::Null` variant,
where a nullable scalar is now an `Option<ScalarValue>` (instead of just
a `ScalarValue`. This also completely removes `InnerScalarValue` in
favor of a public `ScalarValue` enum:
```rust
// Before:
pub struct ScalarValue(pub(crate) InnerScalarValue);
// After:
pub enum ScalarValue {
// No `Null` variant!
/// A boolean value.
Bool(bool),
/// A primitive numeric value.
Primitive(PValue),
/// A decimal value.
Decimal(DecimalValue),
/// A UTF-8 encoded string value.
Utf8(BufferString),
/// A binary (byte array) value.
Binary(ByteBuffer),
/// A list of potentially null scalar values.
List(Vec<Option<ScalarValue>>),
/// TODO?
// Extension(ExtScalarRef), // ?
}
```
(**IMPORTANT CHANGE**) Additionally, all `Scalar`s are verified to be
sound on construction by checking that the `DType` of the `Scalar`
`is_compatible` with the `Option<&ScalarValue>` that is passed.
The stricter construction changes then mean that we need to change how
deserialization of scalars work. The protobuf format is not exactly 1-1
with our `Scalar`s (notably, it only supports 64-bit integers). This
means that we might write 8-bit integers and the round trip returns a
64-bit integer. So this PR also changes some APIs to allow us to pass a
`DType` to the statistics deserializers. TBD on if this needs to happen
in more places (`FileStatistics`?).
For reviewers: try to look over all of the diffs since a large majority
is **not** just renaming variables, they are semantic changes that I am
not super confidant is right.
## Breaks
Breaks the old `file_stats` method on `VortexFile` to return
`FileStatistics` instead of the array of `StatsSet`. We needed to do
this in order to correctly read statistics from DataFusion, specifically
in the case of schema evolution.
## TODO
Some benchmarks are failing, and also I still need to review everything
myself to make sure everything is correct. I also want to add more tests
in certain places where I'm very scared things are wrong.
---------
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
1 parent b8d106c commit 8e92de5
File tree
148 files changed
+6298
-5968
lines changed- encodings
- alp/src
- alp_rd/compute
- alp
- compute
- datetime-parts/src/compute
- decimal-byte-parts/src/decimal_byte_parts
- compute
- fastlanes/src
- bitpacking/array
- for
- array
- compute
- vtable
- rle/vtable
- fsst/src
- compute
- runend/src
- compute
- sequence/src
- compute
- sparse/src
- compute
- zigzag/src
- zstd/src
- fuzz/src/array
- vortex-array/src
- arrays
- bool/compute
- chunked/compute
- constant
- compute
- vtable
- decimal
- compute
- vtable
- dict
- compute
- masked/compute
- null/compute
- primitive
- array
- compute
- take
- struct_/compute
- varbin
- compute
- array
- arrow
- executor
- builders
- compute
- expr
- exprs
- stats
- stats
- vortex-btrblocks/src/compressor
- vortex-buffer/src
- vortex-datafusion/src
- convert
- persistent
- vortex-dtype/src
- vortex-duckdb/src
- convert
- exporter
- vortex-file/src
- vortex-jni/src
- vortex-layout/src/layouts
- flat
- zoned
- vortex-python/src/scalar
- vortex-scalar/src
- arrow
- convert
- decimal
- tests
- typed_view
- decimal
- primitive
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
148 files changed
+6298
-5968
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
49 | | - | |
| 48 | + | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
46 | 52 | | |
47 | 53 | | |
48 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 25 | + | |
| 26 | + | |
29 | 27 | | |
30 | 28 | | |
31 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
36 | 35 | | |
37 | 36 | | |
38 | 37 | | |
39 | | - | |
| 38 | + | |
40 | 39 | | |
41 | 40 | | |
42 | 41 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
| 174 | + | |
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
| |||
Lines changed: 11 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | 49 | | |
51 | | - | |
| 50 | + | |
| 51 | + | |
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
169 | 172 | | |
170 | 173 | | |
171 | 174 | | |
| |||
215 | 218 | | |
216 | 219 | | |
217 | 220 | | |
218 | | - | |
| 221 | + | |
219 | 222 | | |
220 | | - | |
221 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
222 | 226 | | |
223 | 227 | | |
224 | 228 | | |
| |||
236 | 240 | | |
237 | 241 | | |
238 | 242 | | |
239 | | - | |
| 243 | + | |
240 | 244 | | |
241 | 245 | | |
242 | 246 | | |
| |||
Lines changed: 11 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
285 | 286 | | |
286 | 287 | | |
287 | 288 | | |
288 | | - | |
| 289 | + | |
289 | 290 | | |
290 | | - | |
291 | | - | |
| 291 | + | |
| 292 | + | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
| |||
319 | 320 | | |
320 | 321 | | |
321 | 322 | | |
| 323 | + | |
322 | 324 | | |
323 | 325 | | |
324 | 326 | | |
| |||
339 | 341 | | |
340 | 342 | | |
341 | 343 | | |
342 | | - | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
343 | 349 | | |
344 | 350 | | |
345 | 351 | | |
346 | | - | |
| 352 | + | |
347 | 353 | | |
348 | 354 | | |
349 | 355 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
265 | | - | |
| 265 | + | |
266 | 266 | | |
267 | 267 | | |
268 | 268 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
| 178 | + | |
182 | 179 | | |
183 | 180 | | |
184 | 181 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
33 | 32 | | |
34 | 33 | | |
35 | 34 | | |
36 | | - | |
| 35 | + | |
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
| |||
0 commit comments