|
| 1 | +# Conformance Harness — Acceptance Criteria & Engine Matrix |
| 2 | + |
| 3 | +Status: Approved for implementation on `feat/conformance-harness`. |
| 4 | +Scope: a cross-engine conformance harness that proves every supported execution |
| 5 | +backend reproduces the **same** typed-ingest / gold-derivation result as the |
| 6 | +established Spark-direct oracle, byte-for-byte, under one shared canonicalization. |
| 7 | + |
| 8 | +This document is the **criteria-first** phase. It defines unambiguous, |
| 9 | +machine-checkable acceptance for each engine, the canonicalization contract every |
| 10 | +engine MUST share, the fixture corpus + tags (including the cases still to add), |
| 11 | +and the matrix assertion the harness enforces. Items marked `(NEW)` do not exist |
| 12 | +yet and are the deliverable of the later implementation phases on this branch. |
| 13 | + |
| 14 | +> **Run prefix** (ALL python/pytest/dbt/uv commands): |
| 15 | +> `UV_PROJECT_ENVIRONMENT=/tmp/tsvenv JAVA_HOME=/home/linuxbrew/.linuxbrew/opt/openjdk@17 SPARK_LOCAL_IP=127.0.0.1 uv run <cmd>` |
| 16 | +> PySpark 4.0 runs ONLY under `JAVA_HOME=openjdk@17` (default JDK 26 crashes in |
| 17 | +> `getSubject`). For any `dbt-spark` (session) leg, set an **isolated** |
| 18 | +> `spark.sql.warehouse.dir` + metastore dir per case for parallel safety. |
| 19 | +
|
| 20 | +--- |
| 21 | + |
| 22 | +## 1. The oracle (the "previous implementation") |
| 23 | + |
| 24 | +The single source of truth is the **Spark-direct ingest baseline**: |
| 25 | +`tablespec.generate_ingest_sql(umf)` executed on Delta-Spark |
| 26 | +(`tests/ingest_parity/test_spark_baseline.py:210`). Its canonicalized output is |
| 27 | +committed as the **corpus golden** under |
| 28 | +`tests/golden/ingest_parity/<fixture>.spark.expected.json`. The gold-derivation |
| 29 | +oracle is `SQLPlanGenerator` / `generate_sql_plan` |
| 30 | +(`src/tablespec/schemas/sql_generator.py`), whose golden is the canonicalized |
| 31 | +result of executing the generated gold SQL on the oracle engine. |
| 32 | + |
| 33 | +Every engine leg compares its canonicalized output to **that same corpus |
| 34 | +golden** (never to itself, never to a freshly-recomputed expectation), AND any |
| 35 | +two engines that can both run a given case MUST agree **pairwise**. An engine |
| 36 | +that cannot run a tier in this environment is `skipif`-gated with an explicit, |
| 37 | +visible reason — it is never silently passed. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## 2. Engines × fidelity tier × what-it-compares-to × gate |
| 42 | + |
| 43 | +| Engine | Fidelity tier | Executed here? | Compares to | Skip gate | |
| 44 | +| --- | --- | --- | --- | --- | |
| 45 | +| **SparkDirect** | Oracle / executed (result-parity) | Yes (Delta-Spark, JVM) | IS the corpus golden (writes it under `--update-golden`); all others compare to it | `spark_only`; skip if no JVM / `JAVA_HOME` not openjdk@17 | |
| 46 | +| **DbtDuckDB** | Executed (result-parity) | Yes (in-process DuckDB) | corpus golden + pairwise vs every other available engine | `no_spark`; `importorskip("duckdb")`, `importorskip("dbt")`, skip if `dbt` CLI absent | |
| 47 | +| **DbtSparkSession** | Executed (result-parity) | Yes (local embedded `dbt-spark[session]`, `method: session`, embedded Hive/Derby) | corpus golden + pairwise | `slow`; skip if `dbt-spark` adapter missing or JVM unavailable; per-case isolated warehouse/metastore dir | |
| 48 | +| **SQLPlanGeneratorGold** | Executed (result-parity) — run on BOTH DuckDB AND the Spark session | Yes (both backends, via the dbt-generated gold project so the dialect layer applies) | corpus golden + Spark↔DuckDB equivalence proven pairwise (closes the "gold never run on Spark" gap) | DuckDB leg: `no_spark` + duckdb/dbt present; Spark leg: `slow` + JVM/`dbt-spark` present | |
| 49 | +| **DbtDatabricks** | Compile-golden (no cluster) | Compile only | the committed compiled-SQL golden; cast-SQL parity to Spark via the shared renderer | `no_spark`; `dbt compile` only — `dbt run` `skipif` no Databricks workspace | |
| 50 | +| **LDP** | Cast-parity + compile-golden + opt-in e2e | Cast-parity + emit-golden executed; e2e opt-in | (a) cast-parity: emitted cast SQL == Spark cast SQL; (b) compile-golden: emitted project text == `tests/golden/ldp/**`; (c) e2e: corpus golden | `no_spark` for (a)+(b); (c) gated behind opt-in `databricks_e2e` marker (`skipif` no Databricks) | |
| 51 | + |
| 52 | +### 2.1 Tier definitions |
| 53 | + |
| 54 | +- **Oracle / executed (result-parity):** generates SQL, executes it on a real |
| 55 | + engine against real CSV data, canonicalizes the resulting table, and that |
| 56 | + canonical form defines (SparkDirect) or must equal (all others) the corpus |
| 57 | + golden. No mocks for the behavior under test. |
| 58 | +- **Compile-golden:** `dbt compile` (or LDP text emission) renders deterministic |
| 59 | + SQL/project text that is byte-compared to a committed golden. Proves the |
| 60 | + emitter, not a live run. Used where no cluster exists here (Databricks; LDP |
| 61 | + Databricks runtime). |
| 62 | +- **Cast-parity:** the per-column cast expression the backend emits is executed |
| 63 | + in isolation (or string-compared) and must reproduce the EXACT value/NULL |
| 64 | + behavior of the Spark `try_to_timestamp` + Java-token oracle, including the |
| 65 | + sub-second / width-boundary cases the second-resolution canonical form would |
| 66 | + otherwise hide. |
| 67 | + |
| 68 | +### 2.2 Marker plan `(NEW where noted)` |
| 69 | + |
| 70 | +Reuse existing markers (`slow`, `fast`, `no_spark`, `spark_only`, `acceptance`, |
| 71 | +`contract`). Add ONE new marker: |
| 72 | + |
| 73 | +- `databricks_e2e` `(NEW)` — opt-in; `skipif` unless a real Databricks workspace |
| 74 | + is configured. Default-deselected so the green suite never depends on a cluster. |
| 75 | + |
| 76 | +Registered in `pyproject.toml [tool.pytest.ini_options].markers` (`--strict-markers` |
| 77 | +is on, so it must be declared). |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## 3. Canonicalization contract `(NEW: extend `tests/ingest_parity/canonical.py`)` |
| 82 | + |
| 83 | +ALL engines MUST canonicalize through the identical `canonical.to_json`. Today |
| 84 | +`render_value` pins timestamps to **second** resolution and assumes UTC, which |
| 85 | +HIDES sub-second and timezone divergence between engines. The contract is |
| 86 | +extended to make that divergence visible while keeping current goldens stable by |
| 87 | +default-equivalence on the corpus that has no sub-second data. |
| 88 | + |
| 89 | +Contract (`canonical.to_json` / `render_value` / `canonical_rows`): |
| 90 | + |
| 91 | +1. **Configurable timestamp precision.** `to_json(..., ts_precision: int = 6)` |
| 92 | + threads through to `render_value(value, *, ts_precision=6)`. A |
| 93 | + `datetime`/timestamp renders as `YYYY-MM-DD HH:MM:SS` when `ts_precision == 0`, |
| 94 | + else `YYYY-MM-DD HH:MM:SS.ffffff` truncated (NOT rounded) to `ts_precision` |
| 95 | + fractional digits. **Default is microsecond (6)** so sub-second divergence is |
| 96 | + visible by default; a case may pin `ts_precision=0` only with an explicit, |
| 97 | + documented reason. |
| 98 | +2. **Explicit timezone handling.** TZ rendering is explicit, not implicit-UTC. |
| 99 | + A tz-aware `datetime` is first normalized to UTC then rendered with a trailing |
| 100 | + `Z`; a naive `datetime` renders with NO suffix. The two are therefore NEVER |
| 101 | + byte-equal, so a TZ-aware↔naive divergence cannot silently pass. Every engine |
| 102 | + leg pins its session to UTC (`SET TimeZone='UTC'` / Spark `spark.sql.session.timeZone=UTC`) |
| 103 | + so wall-clock values agree before this rendering step. |
| 104 | +3. **Identical for all engines.** SparkDirect, DbtDuckDB, DbtSparkSession, |
| 105 | + SQLPlanGeneratorGold (both backends), and the LDP e2e leg import and call the |
| 106 | + SAME `to_json` with the SAME `ts_precision` and the SAME decimal `scales` map. |
| 107 | + Decimals stay fixed at their declared scale; booleans `true`/`false`; NULL -> |
| 108 | + `"NULL"`; rows sorted by all canonical columns. No per-engine canonicalization. |
| 109 | +4. **Backward compatibility (explicit, not hand-waved).** Switching the default |
| 110 | + to `ts_precision=6` is NOT byte-identical to the current second-resolution |
| 111 | + goldens: a whole-second `...:SS` becomes `...:SS.000000`. Two compatible paths, |
| 112 | + one MUST be chosen at implementation: |
| 113 | + - **(a) corpus default `ts_precision=0`** — the existing 10 fixtures keep |
| 114 | + pinning second resolution (their goldens are unchanged, byte-for-byte), and |
| 115 | + ONLY the NEW sub-second/tz cases opt into `ts_precision=6`. This preserves |
| 116 | + every committed golden with zero regeneration. **This is the recommended |
| 117 | + default**; the `to_json` signature default is `6`, but the ingest corpus |
| 118 | + parametrization passes `ts_precision=0` explicitly except for `tz`-tagged |
| 119 | + cases. |
| 120 | + - **(b) global `ts_precision=6` + one-time golden migration** — regenerate all |
| 121 | + goldens under `--update-golden` so whole seconds carry `.000000`. This is |
| 122 | + compatibility by MIGRATION (a single reviewed golden churn), not byte |
| 123 | + compatibility of the unchanged files. |
| 124 | + The harness records the chosen precision per case so golden + every engine leg |
| 125 | + compare at one precision. |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## 4. Fixture corpus, tags, and cases to add |
| 130 | + |
| 131 | +### 4.1 Existing ingest corpus (`tests/fixtures/ingest/`) |
| 132 | + |
| 133 | +`claims_incremental_pk`, `currency_amounts`, `dates_formats`, |
| 134 | +`events_incremental_nopk`, `members_snapshot_pk`, `messy_incremental_pk`, |
| 135 | +`nopad_formats`, `parity_hardening`, `provider_snapshot`, `types_basic`. |
| 136 | +Two-batch fixtures are tracked by `_TWO_BATCH` in `test_spark_baseline.py`. |
| 137 | + |
| 138 | +### 4.2 Tag taxonomy `(NEW: a `tags:` list on each fixture UMF, surfaced as pytest marks/ids)` |
| 139 | + |
| 140 | +- `types` — scalar type coverage (passthrough, numeric, boolean). |
| 141 | +- `decimal` — decimal precision / scale / overflow boundaries. |
| 142 | +- `datetime` — date/timestamp format parsing. |
| 143 | +- `tz` — timezone-aware + sub-second timestamp behavior. |
| 144 | +- `incremental` — incremental (merge / append) ingestion. |
| 145 | +- `snapshot` — full-snapshot ingestion. |
| 146 | +- `pk` / `nopk` — has / lacks a primary key (dedup vs blind-append). |
| 147 | +- `multibatch` — 3+ batches / out-of-order `_load_ts` / tie-break / tombstone. |
| 148 | +- `gold` — cross-table gold derivation (join/pivot/unpivot/window/etc). |
| 149 | + |
| 150 | +### 4.3 Missing cases to add `(NEW)` |
| 151 | + |
| 152 | +Ingest tier: |
| 153 | + |
| 154 | +1. **`decimal_boundaries`** (`decimal`) — values at `precision`/`scale` limits, |
| 155 | + rounding at scale boundary, and OVERFLOW inputs that must NULL/error |
| 156 | + identically across engines (largest-representable + just-over-precision). |
| 157 | +2. **`tz_subsecond_timestamps`** (`datetime,tz`) — tz-aware offsets (`+00:00`, |
| 158 | + `-05:00`, `Z`) AND `.SSS`/`.SSSSSS` fractional seconds; exercises the |
| 159 | + microsecond + explicit-TZ canonicalization so sub-second/TZ divergence is |
| 160 | + visible and must agree. |
| 161 | +3. **`multibatch_ooo_tiebreak`** (`incremental,pk,multibatch`) — 3+ batches with |
| 162 | + OUT-OF-ORDER `_load_ts`, an exact-tie `_load_ts` requiring a deterministic |
| 163 | + tie-break, and a **tombstone** (delete-marker) row that removes a prior key. |
| 164 | + |
| 165 | +Gold pattern family (`gold`, executed via `generate_sql_plan` on BOTH backends): |
| 166 | + |
| 167 | +4. **`gold_join`** — multi-table sequential join (member×claims). Generator path: |
| 168 | + `_generate_join_step` (direct/sequential join). |
| 169 | +5. **`gold_pivot`** — pivot derivation. Generator path: `_generate_pivot_join`. |
| 170 | +6. **`gold_unpivot`** — UNPIVOT base-view derivation. Generator path: |
| 171 | + `_generate_unpivot_base_view`. |
| 172 | +7. **`gold_window_aggregation`** — window / pre-aggregation view (`ROW_NUMBER` / |
| 173 | + `RANK` / pre-aggregation). Generator path: `_generate_pre_aggregation_views`. |
| 174 | +8. **`gold_survivorship_priority`** — survivorship across `union_sources` via the |
| 175 | + priority-sorted `COALESCE` candidate order (the generator's supported |
| 176 | + survivorship mechanism). Generator path: `_generate_member_universe_view` + |
| 177 | + priority `COALESCE`. (Most-recent / longest-value survivorship is NOT a named |
| 178 | + generator strategy and is out of scope for this case.) |
| 179 | +9. **`gold_first_record`** — first-record-per-key selection. Generator path: |
| 180 | + `_generate_first_record_join` (`strategy in ("first", "first_record")`). |
| 181 | +10. **`gold_fk_integrity`** — referential-integrity coverage. NOTE: orphan-FK |
| 182 | + validation is NOT emitted by `generate_sql_plan` (FK metadata there only |
| 183 | + drives join planning / join type). FK-integrity is therefore tested at the |
| 184 | + **dbt `relationships` schema-test** tier: `generate_dbt_dag_project` emits the |
| 185 | + `relationships` test and `dbt build`/`dbt test` is asserted to PASS on clean |
| 186 | + data and FAIL on an injected orphan row (the explicit negative). The SparkDirect |
| 187 | + gold join result for the clean data is still the corpus golden; the orphan |
| 188 | + negative is a dbt-test assertion, not a canonical-row comparison. |
| 189 | + |
| 190 | +Each new case ships: `<name>.umf.yaml` (with `tags:`), CSV batch(es), and a |
| 191 | +committed corpus golden produced by the SparkDirect oracle under `--update-golden`. |
| 192 | + |
| 193 | +--- |
| 194 | + |
| 195 | +## 5. The matrix assertion |
| 196 | + |
| 197 | +For the parametrized product **(case × available-engine)** the harness asserts: |
| 198 | + |
| 199 | +- **A. Golden conformance:** `canonical(engine, case) == read(case.golden)` — |
| 200 | + byte-identical, using the case's pinned `ts_precision` + decimal `scales`. The |
| 201 | + golden is the SparkDirect oracle output (the previous implementation). |
| 202 | +- **B. Pairwise agreement:** for any two engines `e1`, `e2` both available for a |
| 203 | + case, `canonical(e1, case) == canonical(e2, case)`. (Transitively implied by A |
| 204 | + when both pass, but asserted explicitly so a shared-golden-but-divergent-render |
| 205 | + bug is localized to the engine pair.) |
| 206 | +- **C. Gold Spark↔DuckDB equivalence:** for every `gold` case, the |
| 207 | + `SQLPlanGeneratorGold` output is executed on BOTH DuckDB and the Spark session |
| 208 | + **via the dbt-generated gold project** (so the dialect layer rewrites |
| 209 | + Spark-flavored constructs like `SELECT * EXCEPT (rn)` / `UNPIVOT EXCLUDE NULLS` |
| 210 | + appropriately per backend) and the two canonical forms MUST be equal (and each |
| 211 | + equal to the golden) — explicitly closing the "gold never run on Spark" gap. |
| 212 | +- **D. Compile-golden stability:** `DbtDatabricks` `dbt compile` output and LDP |
| 213 | + emitted project text are byte-equal to their committed goldens; LDP cast SQL == |
| 214 | + Spark cast SQL (cast-parity). |
| 215 | +- **E. Skip visibility:** any unavailable (engine, tier) emits a `skip` with an |
| 216 | + explicit reason; the run summary shows skips so a silently-missing engine is |
| 217 | + detectable (never reported as a pass). |
| 218 | + |
| 219 | +Encapsulation (`tests/test_core_encapsulation.py`) and `make check` |
| 220 | +(lint + pyright + full suite) MUST stay green; no core→dbt/ldp import is added by |
| 221 | +the harness. |
0 commit comments