You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
This PR improves the SLT test harness in the following way
1. Uses libtest-mimic that the underlying `sqllogictest` crate
re-exports to get support for nextest and better filtering!
2. Adds a `--complete` flag that can just fill-in results
3. Replaces `$__TEST_DIR__` with `${WORK_DIR}` that is a test-only
scratch dir that gets cleaned up at the end of the test.
4. Also improved the README and other docs
---------
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Copy file name to clipboardExpand all lines: vortex-sqllogictest/README.md
+119-9Lines changed: 119 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,36 +1,139 @@
1
1
# vortex-sqllogictest
2
2
3
-
This crate uses `sqllogictest-rs` to run `slt` based tests on both DF and DuckDB, both preconfigured to work with Vortex.
3
+
This crate uses [`sqllogictest-rs`](https://github.com/risinglightdb/sqllogictest-rs) to run
4
+
`.slt`-based tests against both DataFusion and DuckDB, each preconfigured to read Vortex files.
4
5
5
-
Different test files might run in parallel, but within the same file, the file will run for each query engine sequentially before the next one starts.
6
+
Every `.slt` file is turned into one independent test per engine, driven by `sqllogictest`'s
7
+
`libtest-mimic` harness. Tests run in parallel; within a single file the records execute
8
+
sequentially for one engine. Each test is named `slt::<engine>::<relative-path>`, for example
9
+
`slt::datafusion::integers.slt` or `slt::duckdb::duckdb/explain.slt`.
6
10
7
11
## Running tests
8
12
9
-
In order to run the tests, you first need to generate TPC-H data (scale factor 0.1), the commands are:
13
+
Some tests use TPC-H data at scale factor 0.1. Generate it first, then run the suite with
14
+
`cargo nextest`:
10
15
11
16
```shell
12
17
./vortex-sqllogictest/slt/tpch/generate_data.sh
18
+
cargo nextest run -p vortex-sqllogictest
19
+
# the built-in cargo test harness also works:
13
20
cargo test -p vortex-sqllogictest --test sqllogictests
14
21
```
15
22
16
-
Note that `nextest` isn't currently supported, but might be in the future.
23
+
The generated data lives under `slt/tpch/data/` (git-ignored). If it is missing, the TPC-H tests
24
+
are reported as **ignored** rather than failing, so the rest of the suite still runs. The TPC-H
25
+
`.slt` files load their tables through paths relative to the crate root, so run the tests via
26
+
`cargo nextest`/`cargo test`, which set the working directory accordingly.
27
+
28
+
Because the harness is `libtest-mimic`-based, the standard test flags work, including
29
+
`cargo nextest`, filtering, and listing:
30
+
31
+
```shell
32
+
# Run only DuckDB tests:
33
+
cargo nextest run -p vortex-sqllogictest -E 'test(/slt::duckdb::/)'
34
+
# Run a single file on both engines (substring filter):
35
+
cargo nextest run -p vortex-sqllogictest -E 'test(strings)'
36
+
# List every generated test without running:
37
+
cargo nextest list -p vortex-sqllogictest
38
+
```
39
+
40
+
## Scratch directory and `${WORK_DIR}`
41
+
42
+
Tests reference a per-test working directory through the `${WORK_DIR}` substitution variable. The
43
+
runner sets `WORK_DIR` to a constant, git-ignored scratch directory **inside this crate** —
44
+
`scratch/<test-name>/` — rather than an OS tempdir. The path is deterministic (named after the test,
45
+
not random), so it is easy to inspect, and each test gets its own directory so concurrent tests
46
+
never collide. The directory is recreated empty before each test and removed afterwards — whether
47
+
the test passed, failed, or panicked (cleanup errors are logged, not fatal).
48
+
49
+
Query output is passed through a normalization step that rewrites the scratch path back to the
50
+
`${WORK_DIR}` token. This keeps expected output stable across machines and runs, and is what lets
51
+
`--complete` (below) write portable expected values instead of machine-specific paths.
52
+
53
+
## Selecting which engine runs a test
54
+
55
+
There are two complementary mechanisms:
56
+
57
+
-**Per-file, by directory.** A file under a `datafusion/` directory runs **only** on DataFusion;
58
+
a file under a `duckdb/` directory runs **only** on DuckDB. Anything else runs on **both**. This
59
+
is how engine-specific features (e.g. DuckDB `EXPLAIN` plans) are kept isolated.
60
+
-**Per-record, by label.** Use `onlyif <label>` / `skipif <label>` on an individual record to
61
+
include or exclude it for one engine. The available labels are `datafusion` and `duckdb`:
62
+
63
+
```text
64
+
onlyif duckdb
65
+
query T
66
+
SELECT string_agg(str, ',') FROM '${WORK_DIR}/strings.vortex' WHERE prefix(str, 'He');
67
+
----
68
+
Hello,Hey
69
+
```
70
+
71
+
## Regex assertions (DuckDB only)
72
+
73
+
For volatile output such as `EXPLAIN` plans, the DuckDB validator supports regex directives,
74
+
inspired by DuckDB's own `.test` files. When the expected block is a single line beginning with
75
+
one of these markers, the actual output (rows joined by newlines) is matched against the pattern
76
+
(`.` matches newlines):
77
+
78
+
-`<REGEX>:<pattern>` — passes when the pattern matches.
79
+
-`<!REGEX>:<pattern>` — passes when the pattern does **not** match.
80
+
81
+
```text
82
+
query TT
83
+
EXPLAIN (FORMAT json) SELECT strlen(str) FROM '${WORK_DIR}/pe-pushdown.vortex';
84
+
----
85
+
<REGEX>:SELECT projections
86
+
```
87
+
88
+
These markers are only honored by the DuckDB validator, which is why regex-based plan assertions
89
+
live under `slt/duckdb/`. A malformed pattern fails the assertion (it does not panic the run).
90
+
91
+
## Regenerating expected output (`--complete`)
92
+
93
+
Passing `--complete` rewrites each `.slt` file **in place** so its expected output matches what the
94
+
engine currently produces, instead of comparing against it. This is useful after an intentional
95
+
change to query results or plan formatting.
96
+
97
+
```shell
98
+
# Complete every file (generate TPC-H data first if you want its result files updated):
99
+
cargo test -p vortex-sqllogictest --test sqllogictests -- --complete
100
+
# Complete only the files whose name matches a substring:
101
+
cargo test -p vortex-sqllogictest --test sqllogictests -- --complete strings
102
+
```
103
+
104
+
Notes and caveats:
105
+
106
+
-**It encodes whatever the engine outputs today, bugs included.** Always review the diff before
107
+
committing; a completion is not a substitute for knowing the correct answer.
108
+
- Each file is completed from a **single reference engine**: DuckDB for files under `slt/duckdb/`,
109
+
DataFusion for everything else (including files that also run on DuckDB). If DuckDB then diverges
110
+
from a shared file's DataFusion output, split the differing records out with
111
+
`onlyif`/`skipif`.
112
+
- Scratch paths in output are normalized to `${WORK_DIR}` before being written, so completed files
113
+
stay portable.
114
+
-`--complete` is intercepted before the test harness, so pass it after `--` (it is not a
115
+
`cargo nextest` flag).
17
116
18
117
## Writing a new test
19
118
20
-
Currently, tests must account for the differences between the engines, the general pattern that works for basic things is using views over files, as DuckDB as and DataFusion don't seem to have a shared syntax to create a table backed by an external storage format.
119
+
Tests must account for differences between the engines. The general pattern that works for basic
120
+
cases is a view over a file, since DuckDB and DataFusion don't share syntax for creating a table
121
+
backed by external storage.
21
122
22
-
`$__TEST_DIR__` is a special variable used to point to a tempdir, its only available if substitution is enabled, by using `control substitution on`.
123
+
`${WORK_DIR}` is a special variable pointing to a per-test working directory (the crate scratch
124
+
directory described above). It is only available when substitution is enabled via
125
+
`control substitution on` (see `slt/setup.slt.no`, included by most tests).
23
126
24
127
Here is a simple test that can be reused:
25
128
26
129
```text
27
130
query I
28
-
COPY (values (1, 2), (3, 4)) TO '$__TEST_DIR__/test.vortex';
131
+
COPY (values (1, 2), (3, 4)) TO '${WORK_DIR}/test.vortex';
29
132
----
30
133
2
31
134
32
135
statement ok
33
-
CREATE VIEW foo AS SELECT * FROM '$__TEST_DIR__/test.vortex';
136
+
CREATE VIEW foo AS SELECT * FROM '${WORK_DIR}/test.vortex';
34
137
35
138
query II
36
139
SELECT * FROM foo;
@@ -42,9 +145,16 @@ statement ok
42
145
DROP VIEW IF EXISTS foo;
43
146
```
44
147
148
+
Files ending in `.slt.no` are include fragments (pulled in via `include`), not standalone tests;
149
+
the runner only discovers `.slt` files.
150
+
45
151
## SLT Syntax
46
152
47
-
We generally use the default `slt` syntax as described in the [SQLite wiki](https://sqlite.org/sqllogictest/doc/trunk/about.wiki). The one difference is that we use the same column types as `datafusion-sqllogictest`'s, so when specifying expected query result column types, we support the following identifiers:
153
+
We generally use the default `slt` syntax as described in the
154
+
[SQLite wiki](https://sqlite.org/sqllogictest/doc/trunk/about.wiki). and the underlying crate's
155
+
[SLT Cookbook](https://github.com/risinglightdb/sqllogictest-rs#slt-test-file-format-cookbook). The
156
+
one difference is that we use the same column types as `datafusion-sqllogictest`'s, so when
157
+
specifying expected query result column types, we support the following identifiers:
0 commit comments