@@ -34,35 +34,35 @@ Two binaries in a standalone crate (`vortex-test/compat-gen/`), not a workspace
3434 └────────────┘
3535```
3636
37- | Binary | Purpose |
38- | -------------- | - ------------------------------------------------------------------------------- |
39- | ` compat-gen ` | Write fixture ` .vortex ` files + a ` manifest.json ` listing them |
40- | ` compat-test ` | Fetch fixtures from S3, read them, rebuild expected arrays, ` assert_arrays_eq! ` |
37+ | Binary | Purpose |
38+ | ------------- | ------------------------------------------------------------------------------- |
39+ | ` compat-gen ` | Write fixture ` .vortex ` files + a ` manifest.json ` listing them |
40+ | ` compat-test ` | Fetch fixtures from S3, read them, rebuild expected arrays, ` assert_arrays_eq! ` |
4141
4242When cherry-picked onto an old release branch the only thing that changes is a thin API adapter layer (~ 20 lines that call the version's write/read API). Everything else — fixture definitions, correctness checks — stays identical.
4343
4444### Fixture Suite
4545
4646** Synthetic fixtures** (deterministic, hardcoded values):
4747
48- | File | Schema | Data | Purpose |
49- | ---------------------- | ---------------------------------------------- | --------------------------------------- | ------------------------ |
50- | ` primitives.vortex ` | ` Struct{u8, u16, u32, u64, i32, i64, f32, f64} ` | Boundary values (0, min, max) per type | Primitive type round-trip |
51- | ` strings.vortex ` | ` Struct{Utf8} ` | ` ["", "hello", "こんにちは", "🦀"] ` | String encoding round-trip |
52- | ` booleans.vortex ` | ` Struct{Bool} ` | ` [true, false, true, true, false] ` | Bool round-trip |
53- | ` nullable.vortex ` | ` Struct{Nullable<i32>, Nullable<Utf8>} ` | Mix of values and nulls | Null handling |
54- | ` struct_nested.vortex ` | ` Struct{Struct{i32, Utf8}, f64} ` | Nested struct | Nested type round-trip |
55- | ` chunked.vortex ` | Chunked ` Struct{u32} ` | 3 chunks of 1000 rows each | Multi-chunk files |
48+ | File | Schema | Data | Purpose |
49+ | ---------------------- | ----------------------------------------------- | -------------------------------------- | -- ------------------------ |
50+ | ` primitives.vortex ` | ` Struct{u8, u16, u32, u64, i32, i64, f32, f64} ` | Boundary values (0, min, max) per type | Primitive type round-trip |
51+ | ` strings.vortex ` | ` Struct{Utf8} ` | ` ["", "hello", "こんにちは", "🦀"] ` | String encoding round-trip |
52+ | ` booleans.vortex ` | ` Struct{Bool} ` | ` [true, false, true, true, false] ` | Bool round-trip |
53+ | ` nullable.vortex ` | ` Struct{Nullable<i32>, Nullable<Utf8>} ` | Mix of values and nulls | Null handling |
54+ | ` struct_nested.vortex ` | ` Struct{Struct{i32, Utf8}, f64} ` | Nested struct | Nested type round-trip |
55+ | ` chunked.vortex ` | Chunked ` Struct{u32} ` | 3 chunks of 1000 rows each | Multi-chunk files |
5656
5757Every stable array encoding should also contribute a fixture file — a struct with multiple columns, each using a different encoding of that array type. This ensures that encoding-specific read paths are exercised across versions.
5858
5959** Realistic fixtures** (real-world schemas and data distributions):
6060
61- | File | Source | Rows | Purpose |
62- | ---------------------------- | ----------------------------------- | ----- | --- ------------------------------------------- |
63- | ` tpch_lineitem.vortex ` | TPC-H SF 0.01, ` lineitem ` table | ~ 60K | Real-world numeric + string schema |
64- | ` tpch_orders.vortex ` | TPC-H SF 0.01, ` orders ` table | ~ 15K | Date + decimal types |
65- | ` clickbench_hits_1k.vortex ` | First 1000 rows of ClickBench ` hits ` | 1000 | Wide table (105 columns), deep nested types |
61+ | File | Source | Rows | Purpose |
62+ | --------------------------- | ------------------------------------ | ---- | ------------------------------------------- |
63+ | ` tpch_lineitem.vortex ` | TPC-H SF 0.01, ` lineitem ` table | ~ 60K | Real-world numeric + string schema |
64+ | ` tpch_orders.vortex ` | TPC-H SF 0.01, ` orders ` table | ~ 15K | Date + decimal types |
65+ | ` clickbench_hits_1k.vortex ` | First 1000 rows of ClickBench ` hits ` | 1000 | Wide table (105 columns), deep nested types |
6666
6767SF 0.01 is used instead of 0.1 to keep fixture file sizes small (~ few MB) so downloads in tests are fast.
6868
@@ -81,6 +81,7 @@ trait Fixture {
8181```
8282
8383A single ` Fixture ` impl is sufficient for both generation and validation:
84+
8485- ` compat-gen ` calls ` build() ` and writes the result to disk
8586- ` compat-test ` calls the same ` build() ` to produce the expected array and compares it against what was read from the old file via ` assert_arrays_eq! `
8687
@@ -115,6 +116,7 @@ impl Fixture for TpchLineitemFixture {
115116Correctness is validated by ** comparing arrays in memory** — no checksums or spot-checks needed.
116117
117118For every fixture in every version:
119+
1181201 . Download the old ` .vortex ` file from S3 (written by an older Vortex version)
1191212 . Read it into an array with the current reader
1201223 . Call ` fixture.build() ` to produce the expected array at the current version
@@ -214,6 +216,7 @@ v0.65.0/manifest.json → ["primitives.vortex", "strings.vortex", ..., "list.v
214216```
215217
216218Adding a new fixture:
219+
2172201 . Add the builder function in ` fixtures/ ` (e.g., ` build_list_array() ` )
2182212 . Register it in ` fixtures/mod.rs ` so ` compat-gen ` includes it
2192223 . Tag a release — the pre-release CI job generates fixtures including the new one
0 commit comments