Skip to content

Commit 932d671

Browse files
committed
rfc: file compat testing
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
1 parent 390cd49 commit 932d671

1 file changed

Lines changed: 20 additions & 17 deletions

File tree

proposed/0023-file-compat-testing.md

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -34,35 +34,35 @@ Two binaries in a standalone crate (`vortex-test/compat-gen/`), not a workspace
3434
└────────────┘
3535
```
3636

37-
| Binary | Purpose |
38-
| -------------- | -------------------------------------------------------------------------------- |
39-
| `compat-gen` | Write fixture `.vortex` files + a `manifest.json` listing them |
40-
| `compat-test` | Fetch fixtures from S3, read them, rebuild expected arrays, `assert_arrays_eq!` |
37+
| Binary | Purpose |
38+
| ------------- | ------------------------------------------------------------------------------- |
39+
| `compat-gen` | Write fixture `.vortex` files + a `manifest.json` listing them |
40+
| `compat-test` | Fetch fixtures from S3, read them, rebuild expected arrays, `assert_arrays_eq!` |
4141

4242
When cherry-picked onto an old release branch the only thing that changes is a thin API adapter layer (~20 lines that call the version's write/read API). Everything else — fixture definitions, correctness checks — stays identical.
4343

4444
### Fixture Suite
4545

4646
**Synthetic fixtures** (deterministic, hardcoded values):
4747

48-
| File | Schema | Data | Purpose |
49-
| ---------------------- | ---------------------------------------------- | --------------------------------------- | ------------------------ |
50-
| `primitives.vortex` | `Struct{u8, u16, u32, u64, i32, i64, f32, f64}` | Boundary values (0, min, max) per type | Primitive type round-trip |
51-
| `strings.vortex` | `Struct{Utf8}` | `["", "hello", "こんにちは", "🦀"]` | String encoding round-trip |
52-
| `booleans.vortex` | `Struct{Bool}` | `[true, false, true, true, false]` | Bool round-trip |
53-
| `nullable.vortex` | `Struct{Nullable<i32>, Nullable<Utf8>}` | Mix of values and nulls | Null handling |
54-
| `struct_nested.vortex` | `Struct{Struct{i32, Utf8}, f64}` | Nested struct | Nested type round-trip |
55-
| `chunked.vortex` | Chunked `Struct{u32}` | 3 chunks of 1000 rows each | Multi-chunk files |
48+
| File | Schema | Data | Purpose |
49+
| ---------------------- | ----------------------------------------------- | -------------------------------------- | -------------------------- |
50+
| `primitives.vortex` | `Struct{u8, u16, u32, u64, i32, i64, f32, f64}` | Boundary values (0, min, max) per type | Primitive type round-trip |
51+
| `strings.vortex` | `Struct{Utf8}` | `["", "hello", "こんにちは", "🦀"]` | String encoding round-trip |
52+
| `booleans.vortex` | `Struct{Bool}` | `[true, false, true, true, false]` | Bool round-trip |
53+
| `nullable.vortex` | `Struct{Nullable<i32>, Nullable<Utf8>}` | Mix of values and nulls | Null handling |
54+
| `struct_nested.vortex` | `Struct{Struct{i32, Utf8}, f64}` | Nested struct | Nested type round-trip |
55+
| `chunked.vortex` | Chunked `Struct{u32}` | 3 chunks of 1000 rows each | Multi-chunk files |
5656

5757
Every stable array encoding should also contribute a fixture file — a struct with multiple columns, each using a different encoding of that array type. This ensures that encoding-specific read paths are exercised across versions.
5858

5959
**Realistic fixtures** (real-world schemas and data distributions):
6060

61-
| File | Source | Rows | Purpose |
62-
| ---------------------------- | ----------------------------------- | ----- | ---------------------------------------------- |
63-
| `tpch_lineitem.vortex` | TPC-H SF 0.01, `lineitem` table | ~60K | Real-world numeric + string schema |
64-
| `tpch_orders.vortex` | TPC-H SF 0.01, `orders` table | ~15K | Date + decimal types |
65-
| `clickbench_hits_1k.vortex` | First 1000 rows of ClickBench `hits` | 1000 | Wide table (105 columns), deep nested types |
61+
| File | Source | Rows | Purpose |
62+
| --------------------------- | ------------------------------------ | ---- | ------------------------------------------- |
63+
| `tpch_lineitem.vortex` | TPC-H SF 0.01, `lineitem` table | ~60K | Real-world numeric + string schema |
64+
| `tpch_orders.vortex` | TPC-H SF 0.01, `orders` table | ~15K | Date + decimal types |
65+
| `clickbench_hits_1k.vortex` | First 1000 rows of ClickBench `hits` | 1000 | Wide table (105 columns), deep nested types |
6666

6767
SF 0.01 is used instead of 0.1 to keep fixture file sizes small (~few MB) so downloads in tests are fast.
6868

@@ -81,6 +81,7 @@ trait Fixture {
8181
```
8282

8383
A single `Fixture` impl is sufficient for both generation and validation:
84+
8485
- `compat-gen` calls `build()` and writes the result to disk
8586
- `compat-test` calls the same `build()` to produce the expected array and compares it against what was read from the old file via `assert_arrays_eq!`
8687

@@ -115,6 +116,7 @@ impl Fixture for TpchLineitemFixture {
115116
Correctness is validated by **comparing arrays in memory** — no checksums or spot-checks needed.
116117

117118
For every fixture in every version:
119+
118120
1. Download the old `.vortex` file from S3 (written by an older Vortex version)
119121
2. Read it into an array with the current reader
120122
3. Call `fixture.build()` to produce the expected array at the current version
@@ -214,6 +216,7 @@ v0.65.0/manifest.json → ["primitives.vortex", "strings.vortex", ..., "list.v
214216
```
215217

216218
Adding a new fixture:
219+
217220
1. Add the builder function in `fixtures/` (e.g., `build_list_array()`)
218221
2. Register it in `fixtures/mod.rs` so `compat-gen` includes it
219222
3. Tag a release — the pre-release CI job generates fixtures including the new one

0 commit comments

Comments
 (0)