You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/compatibility/yardstick.md
+39-6Lines changed: 39 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Yardstick Compatibility
2
2
3
-
Sidemantic's Yardstick adapter parses SQL files containing `CREATE VIEW` statements that use the `AS MEASURE` syntax from Julian Hyde's ["Measures in SQL" proposal](https://arxiv.org/abs/2307.14009). It maps Yardstick concepts to Sidemantic's semantic model (Model, Dimension, Metric) and supports the `SEMANTIC SELECT`, `AGGREGATE()`, and `AT` query modifiers for measure-aware SQL queries.
3
+
Sidemantic's Yardstick adapter parses SQL files containing `CREATE VIEW` statements that use the `AS MEASURE` syntax from Julian Hyde's ["Measures in SQL" proposal](https://arxiv.org/abs/2307.14009). It maps Yardstick concepts to Sidemantic's semantic model (Model, Dimension, Metric) and supports `SEMANTIC SELECT`, optional-prefix`AGGREGATE()`, and `AT` query modifiers for measure-aware SQL queries.
4
4
5
5
Features are marked **supported**, **partial support**, or **unsupported**. Partial support entries include notes explaining the limitation.
6
6
@@ -110,23 +110,27 @@ Derived measure detection works by scanning the expression's column references a
110
110
|`MODE(expr) AS MEASURE name`| Supported (stored as raw SQL expression metric with `agg=None`) |
111
111
|`PERCENTILE_CONT(n) WITHIN GROUP (ORDER BY expr) AS MEASURE name`| Supported (stored as raw SQL expression metric) |
112
112
|`CASE WHEN AGG(...) THEN ... END AS MEASURE name`| Supported (detected as having aggregate semantics; stored as raw SQL expression metric) |
113
-
| Other aggregate functions not in the standard list | Supported (full expression preserved as `Metric.sql`) |
113
+
|`PRODUCT(expr)`, `ENTROPY(expr)`, `KURTOSIS(expr)`, `SKEWNESS(expr)`, `LIST(expr)`, and related DuckDB aggregate functions | Supported (stored as raw SQL expression metrics with aggregate semantics) |
114
+
| Other aggregate functions not in the standard list | Supported when sqlglot identifies them as aggregates; otherwise preserved as raw SQL only when aggregate semantics can be detected |
114
115
115
-
When a measure expression contains aggregate functions (detected by walking the AST for `AggFunc` nodes or known anonymous aggregations like `mode`) but doesn't match a simple aggregation pattern, the full expression is preserved as-is for query-time evaluation.
116
+
When a measure expression contains aggregate functions (detected by walking the AST for `AggFunc` nodes or known anonymous aggregations like `mode`, `product`, and `entropy`) but doesn't match a simple aggregation pattern, the full expression is preserved as-is for query-time evaluation.
116
117
117
118
---
118
119
119
120
## Query Semantics
120
121
121
-
The Yardstick adapter works in tandem with Sidemantic's query rewriter to support the `SEMANTIC SELECT`, `AGGREGATE()`, and `AT` modifiers described in the Measures in SQL proposal.
122
+
The Yardstick adapter works in tandem with Sidemantic's query rewriter to support `SEMANTIC SELECT`, optional-prefix`AGGREGATE()`, and `AT` modifiers described in the Measures in SQL proposal.
| Scalar `AGGREGATE()` without GROUP BY | Supported (produces a single grand-total row) |
142
-
|`AGGREGATE()` without `SEMANTIC` prefix and without `AT`| Error: raises `ValueError` requiring the `SEMANTIC` prefix |
146
+
|`AGGREGATE()` without `SEMANTIC` prefix and without `AT`| Supported |
147
+
| Native DuckDB `aggregate(list, 'function')`| Supported (falls through to DuckDB; not treated as Yardstick syntax) |
148
+
149
+
### Upstream Parity Tests
150
+
151
+
The default test suite replays a vendored Yardstick `measures.test` fixture for stable CI coverage. To check against the live upstream Yardstick repository without copying fixtures into Sidemantic, run:
152
+
153
+
```bash
154
+
SIDEMANTIC_YARDSTICK_UPSTREAM_TESTS=1 uv run pytest -q tests/queries/test_yardstick_measures_replay.py -m yardstick_upstream
155
+
```
156
+
157
+
The same command runs in the `Yardstick Upstream Parity` GitHub Actions workflow. That workflow runs nightly, can be triggered manually with a Yardstick ref override, and runs on pull requests that touch Yardstick-specific code or tests.
158
+
159
+
The live replay fetches `https://github.com/sidequery/yardstick.git` at `main` by default, checks all upstream `test/sql/*.test` files, and validates both facets:
160
+
161
+
| Facet | Coverage |
162
+
|-------|----------|
163
+
| Model/metric definitions | Parses every upstream `CREATE VIEW ... AS MEASURE` statement and asserts model name, source table/base SQL, primary key, Yardstick metadata, dimension SQL/type/granularity, and metric `agg`/`sql`/`filters`/`type`|
164
+
| Query execution | Replays every upstream query block against Sidemantic's Yardstick rewriter and compares result rows |
165
+
166
+
The live definition check covers the `CREATE VIEW ... AS MEASURE` definitions used by Yardstick's SQL tests. Sidemantic's native SQL definition parser owns `MODEL(...)`, `METRIC(...)`, and `DIMENSION(...)` files separately from the Yardstick adapter; the live upstream replay does not treat Yardstick's top-level `yardstick_definitions.sql` helper file as part of the SQL-test corpus.
167
+
168
+
Optional environment variables:
169
+
170
+
| Variable | Purpose |
171
+
|----------|---------|
172
+
|`YARDSTICK_UPSTREAM_PATH`| Use an existing local Yardstick checkout instead of fetching |
173
+
|`YARDSTICK_UPSTREAM_REPO`| Override the upstream Git URL |
174
+
|`YARDSTICK_UPSTREAM_REF`| Override the ref fetched from upstream |
175
+
|`YARDSTICK_UPSTREAM_CACHE_DIR`| Override the temporary checkout path |
0 commit comments