diff --git a/.github/workflows/README.md b/.github/workflows/README.md index 95e28726..c38d6918 100644 --- a/.github/workflows/README.md +++ b/.github/workflows/README.md @@ -107,7 +107,7 @@ All jobs run on `blacksmith-16vcpu-ubuntu-2204`. "PG set" follows the event | **test** (sharded) | `test:sqlx:partition` | Run the archived sqlx binaries (default features), hash-partitioned across shards | yes (per PG) | no (replays archive) | | **e2e** | `test:sqlx:e2e` | The `proptest-e2e` fresh-encryption property suite (`e2e_oracle`) — PG17 only, version-independent | yes (PG17) | **yes** | | **validate** (per PG) | `docs:validate:documented-sql` + `test:clean_install_v3` | DB-backed SQL doc-syntax check; clean-DB `eql_v3` install smoke | yes | no | -| **docs-static** | `docs:validate:source` | SQL doxygen coverage + required-tags (DB-free); **unconditional — runs on every PR incl. docs-only** | no | no | +| **docs-static** | `docs:validate:source` | SQL doxygen coverage + required-tags (DB-free); relevance-gated like the other heavy jobs (its inputs — `src/**`, the `crates/**` codegen build, `tasks/docs/**` — are a subset of the `relevant` filter) | no | no | | **schema** | `test:schema` | v2.2 / v2.3 payload JSON-schema validation | no | no | | **rust-crates** | `test:crates` + `types:check` | `cargo fmt --check`, clippy + `cargo test` for `eql-scalars` / `eql-codegen` / `eql-tests-macros` / `eql-types`; verify TS bindings + JSON schemas are fresh | no | no | | **codegen** | `codegen:parity` | Generated encrypted-domain SQL matches the golden output | no | no | @@ -145,19 +145,26 @@ non-default feature and needs `CS_*` at run time). stale `cargo expand` snapshot surfaces on the daily schedule, not on the PR that introduced it. Accepted trade-off. -2. **`docs/**` markdown is not content-validated.** The `docs-static` job - guarantees the SQL `--!` doxygen comments under `src/**` are always checked, - but nothing lints the prose/links in `docs/**` itself. A docs-only PR now runs - `docs-static` (so it is no longer un-gated), but that job validates *source* - documentation, not the markdown the PR changed. Adding a markdown - linter/link-checker is a separate, unfilled capability. +2. **`docs/**` markdown is not content-validated.** The `docs-static` job checks + the SQL `--!` doxygen comments under `src/**`, not the prose/links in `docs/**` + itself. A markdown-only PR leaves `relevant` false, so `docs-static` is skipped + along with the other heavy jobs — and that loses no coverage, because the job's + inputs (`src/**` `.sql`/`.template`, the `crates/**` codegen build, the + `tasks/docs/**` scripts) are all in the `relevant` filter, so a PR that doesn't + trip `relevant` cannot change its outcome. Linting the markdown the PR actually + changed (prose/links) is a separate, unfilled capability. ### Recently closed - *The e2e (fresh-encryption) suite never ran in CI.* Now covered by the **e2e** job (`test:sqlx:e2e`), PG17, on relevant PRs + the queue. -- *Docs-only PRs ran no doc validation.* The **docs-static** job now runs the - source-only doc checks unconditionally on every PR. +- *`docs-static` ran unconditionally on every PR.* It is now relevance-gated like + every other heavy job. Because its inputs are a strict subset of the `relevant` + filter, gating it both makes the workflow consistent (one uniform `if:`) and + drops a redundant codegen build on markdown-only PRs without losing any + coverage. A narrower bespoke `src/**`-only filter was rejected: it would risk a + silent false-green (`ci-required` counts `skipped` as pass) by skipping on a + real input change in `crates/**` or `tasks/docs/**`. --- @@ -175,9 +182,9 @@ Then verify (see `docs/plans/2026-06-09-ci-pr-feedback-sharding-rollout.md`): 4 `Validate …` jobs + `build-archive`, `e2e`, `docs-static`, `schema`, `rust-crates`, `codegen`, `self-contained-v3`, `matrix-coverage`, `splinter`) → `ci-required` green → PR merges. -- **Open a docs-only PR** → on its `pull_request` run the relevance-gated heavy - jobs skip, but `docs-static` still runs; `ci-required` reports **Success** (not - stuck *Pending*), so the PR can be queued. +- **Open a docs-only PR** → on its `pull_request` run every relevance-gated heavy + job skips (`docs-static` included); `ci-required` reports **Success** (not stuck + *Pending*) because it counts `skipped` as pass, so the PR can be queued. ## References diff --git a/.github/workflows/test-eql.yml b/.github/workflows/test-eql.yml index 4150ba6e..a4bcbd61 100644 --- a/.github/workflows/test-eql.yml +++ b/.github/workflows/test-eql.yml @@ -268,10 +268,10 @@ jobs: run: | mise run postgres:up postgres-${POSTGRES_VERSION} --extra-args "--detach --wait" - # Source-only doc checks (coverage + required-tags) moved to the - # unconditional `docs-static` job so they run on every PR (incl. docs-only) - # and exactly once, not per-Postgres. This step keeps only the DB-backed - # SQL-syntax validation, which genuinely needs the per-version Postgres. + # Source-only doc checks (coverage + required-tags) moved to the dedicated + # `docs-static` job so they run exactly once, not per-Postgres. This step + # keeps only the DB-backed SQL-syntax validation, which genuinely needs the + # per-version Postgres. - name: Validate documented SQL syntax (Postgres ${{ matrix.postgres-version }}) run: | mise run docs:validate:documented-sql diff --git a/CHANGELOG.md b/CHANGELOG.md index 7d04e1cc..a7cde151 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,7 +33,7 @@ Each entry that ships in a published release links to the PR that introduced it. - **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`, `text_search`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). The combined **`text_search`** domain carries all three capabilities in one type — `=` / `<>` via HMAC, `<` `<=` `>` `>=` / `min` / `max` via ORE, and `@>` / `<@` via bloom filter. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` / `text_search` domains — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality. **Equality on the ordered text domains (`text_ord`, `text_ord_ore`) and on `text_search` always routes `=` / `<>` through `hm` (exact HMAC), never the ORE term — ORE is not exact-equality for text** (integer ordered domains keep exact ORE equality, which is lossless for them). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260)) - **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255)) - **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260)) -- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). The e2e suite — which appends fresh duplicate plaintexts each run — is the one that exercises equality across two independent encryptions of one value. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293)) +- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). Equality across two independent encryptions of one value is exercised credential-free by the fixture suite via committed per-type *doubles* fixtures (each plaintext encrypted twice — `property::cross_ciphertext`), through both the `hm` (`_eq`) and ORE (`_ord`/`_ord_ore`) equality paths, and additionally by the e2e suite via fresh duplicate plaintexts each run. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293)) - **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255)) - **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267)) - **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295)) diff --git a/docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md b/docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md index d4cac1ee..873455d0 100644 --- a/docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md +++ b/docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md @@ -130,6 +130,15 @@ Source of truth: `crates/eql-scalars/src` (`CATALOG`). Adding a type is one `Sca Domain → role: empty ⇒ Storage, first term `Hm` ⇒ Eq, `Ore` ⇒ Ord, `Bloom` ⇒ Match. +**Cross-ciphertext equality** ("two independent encryptions of one value compare +equal") is now covered credential-free for every comparison scalar by the fixture +suite, via committed per-type *doubles* tables (`fixtures.eql_v2__doubles` — each +plaintext encrypted twice) read by `property::cross_ciphertext`. It exercises both +equality mechanisms: the `hm` path (`_eq`) and the ORE `ob` path (`_ord` / +`_ord_ore`, where `=` routes through `compare_ore_block_256_terms(...) = 0`). The +matrix's curated fixtures have unique plaintexts, so the doubles tables are what +make the equality-across-distinct-ciphertext branch fire without fresh encryption. + **Generated surface per domain:** domain definition + CHECK, extractors (inlinable `LANGUAGE sql`), supported-op wrappers (inlinable), **blockers** (`LANGUAGE plpgsql`, NOT STRICT — opaque to planner so the `RAISE` always survives), 44 `CREATE OPERATOR`s, and `min`/`max` aggregates for ord-capable domains. Blocker count per domain: Storage 44, Eq 38, Ord 26, Match 38. **Behavioural blocker coverage caveat:** the scalar matrix does not execute every generated operator signature per domain. It covers the important caller-visible blocker classes: unsupported comparison/containment operators, typed-column `col op col` blockers, scalar path blockers (`->`, `->>`), and native-absent LIKE/ILIKE resolution. It does not sweep every generated JSON-style signature such as `?`, `?|`, `?&`, `@?`, `@@`, `#>`, `#>>`, `-`, `#-`, and `||` for every scalar domain. diff --git a/tasks/test/stub-fixtures.sh b/tasks/test/stub-fixtures.sh index 18d9c219..510cbd93 100644 --- a/tasks/test/stub-fixtures.sh +++ b/tasks/test/stub-fixtures.sh @@ -16,9 +16,13 @@ # The set is derived from the two sources of truth, not from parsing rustc # errors (an earlier preamble looped over compile-error text — brittle, coupled # to rustc's wording, capped at 12 retries): -# 1. Catalog scalar tokens (`eql-codegen list-types`) -> `eql_v2_.sql`, -# covering the `tests/sqlx/fixtures/eql_v2*` .gitignore glob. A new scalar -# is stubbed automatically. +# 1. Catalog scalar tokens (`eql-codegen list-types`) -> `eql_v2_.sql` +# AND `eql_v2__doubles.sql` (the per-type doubles fixture the +# cross-ciphertext oracle `include_str!`s), both covered by the +# `tests/sqlx/fixtures/eql_v2*` .gitignore glob. A new scalar is stubbed +# automatically. The doubles variant is stubbed for every token, not only +# the comparison-capable ones that have a real doubles fixture — a harmless +# extra under this helper's stub-the-complete-set policy. # 2. The literal `tests/sqlx/fixtures/*.sql` entries in `.gitignore` (the # non-catalog generated fixtures: `v3_ste_vec`, `v3_doc_int4`, # `v3_numeric_collision`). A newly-generated fixture is stubbed @@ -42,13 +46,15 @@ __eql_stub_dir="${__eql_stub_root}/tests/sqlx/fixtures" __eql_stub_created=$(mktemp) trap 'while IFS= read -r f; do [ -n "$f" ] && rm -f "$f"; done < "$__eql_stub_created"; rm -f "$__eql_stub_created"' EXIT -# (1) Catalog scalar tokens -> eql_v2_.sql. A failure here aborts under -# the caller's `set -e` with cargo's own error — no silent fallback. +# (1) Catalog scalar tokens -> eql_v2_.sql + eql_v2__doubles.sql. +# A failure here aborts under the caller's `set -e` with cargo's own error — no +# silent fallback. __eql_stub_paths="" __eql_stub_tokens=$(cd "$__eql_stub_root" && cargo run -q -p eql-codegen -- list-types) while IFS= read -r __eql_stub_t; do [ -n "$__eql_stub_t" ] || continue __eql_stub_paths="${__eql_stub_paths}${__eql_stub_dir}/eql_v2_${__eql_stub_t}.sql +${__eql_stub_dir}/eql_v2_${__eql_stub_t}_doubles.sql " done <` fixture is the curated `fixture_values()` set exactly (the +//! `scalars::*` matrix asserts that), so it has no room for duplicate +//! plaintexts. These tiny sibling tables (`fixtures.eql_v2__doubles`) exist +//! only so the credential-free fixture suite can prove "two independent +//! encryptions of one value compare equal" without any fresh test-time +//! encryption. Read ONLY by `property::cross_ciphertext`, never by the matrix. +//! +//! Plaintexts are the FIRST THREE of each type's curated `fixture_values()` +//! (guaranteed catalog-valid: text already excludes the empty string per #262, +//! temporals/numerics are already in-range), each duplicated once → 6 rows, 3 +//! equal-plaintext pairs. Each value is encrypted independently by the driver, +//! so a repeated plaintext lands as a distinct ciphertext row. +//! +//! Gitignored output: tests/sqlx/fixtures/eql_v2__doubles.sql +//! (regenerated by `mise run fixture:generate:all`). + +use anyhow::Result; + +use crate::fixtures::driver::FixtureValue; +use crate::scalar_domains::ScalarType; + +/// The comparison-capable scalar tokens that get a doubles fixture. `bool` is +/// storage-only (no equality domain) and is excluded. +pub const DOUBLES_TOKENS: &[&str] = &[ + "int2", + "int4", + "int8", + "date", + "timestamptz", + "numeric", + "text", +]; + +/// How many distinct plaintexts to double. Small on purpose — the test only +/// needs a handful of equal-plaintext pairs. +const DISTINCT: usize = 3; + +/// Repeat each value once, preserving order: `[a, b, c] -> [a, a, b, b, c, c]`. +fn doubled(values: &[T]) -> Vec { + values.iter().flat_map(|v| [v.clone(), v.clone()]).collect() +} + +/// Generate `fixtures.eql_v2__doubles` — the first `DISTINCT` catalog values, +/// each encrypted twice. Generic over the type: the fixture name, the plaintext +/// source, and the bloom-index decision are all derived from `T` (and the +/// catalog), so there are no per-token strings to keep in sync. Indexes mirror +/// the type's catalog fixture so the payload carries the same terms (`hm` + `ob`, +/// plus `bf` for `text`) and the doubles cast cleanly to every comparison domain. +async fn generate_doubles_for() -> Result<()> +where + T: ScalarType + FixtureValue, +{ + let name = format!("eql_v2_{}_doubles", T::PG_TYPE); + let head: Vec = ::fixture_values() + .iter() + .take(DISTINCT) + .cloned() + .collect(); + let sample = doubled(&head); + let mut spec = super::spec::FixtureSpec::new(&name) + .with_index(super::index_kind::IndexKind::Unique) + .with_index(super::index_kind::IndexKind::Ore); + // text carries the Match (bloom) index too — derived from the catalog, not + // hardcoded — so its doubles cast to `text_match` / `text_search` as well. + if crate::scalar_domains::token_has_bloom_term(T::PG_TYPE) { + spec = spec.with_index(super::index_kind::IndexKind::Match); + } + spec.with_column_type("jsonb") + .with_values(&sample) + .run() + .await +} + +/// Run the doubles generator for one catalog token. Loud catch-all so an +/// unwired token fails generation rather than silently skipping. +pub async fn generate(token: &str) -> Result<()> { + match token { + "int2" => generate_doubles_for::().await, + "int4" => generate_doubles_for::().await, + "int8" => generate_doubles_for::().await, + "date" => generate_doubles_for::().await, + "timestamptz" => generate_doubles_for::>().await, + "numeric" => generate_doubles_for::().await, + "text" => generate_doubles_for::().await, + other => anyhow::bail!( + "no doubles generator wired for token '{other}'; add it to \ + fixtures::eql_doubles (DOUBLES_TOKENS + the generate dispatch)" + ), + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn doubled_repeats_each_value_once_in_order() { + let src = [10i32, 20, 30]; + let out = doubled(&src); + // 3 distinct plaintexts, each appearing exactly twice. + assert_eq!(out, vec![10, 10, 20, 20, 30, 30]); + for v in src { + assert_eq!(out.iter().filter(|&&x| x == v).count(), 2); + } + } +} diff --git a/tests/sqlx/src/fixtures/mod.rs b/tests/sqlx/src/fixtures/mod.rs index 62e86713..f4d5683a 100644 --- a/tests/sqlx/src/fixtures/mod.rs +++ b/tests/sqlx/src/fixtures/mod.rs @@ -46,6 +46,10 @@ pub mod v3_doc_int4; // (committed-fixture) home instead of a creds-gated runtime encryption. pub mod v3_numeric_collision; +// Per-type "doubles" fixtures (each plaintext encrypted twice) for the +// cross-ciphertext-equality test. Non-catalog, like `v3_numeric_collision`. +pub mod eql_doubles; + // The per-type scalar fixture modules (`eql_v2_int4`, `eql_v2_int2`, …) are // generated from the harness list in `scalar_types.rs`. Each expands to // `pub mod eql_v2_ { … scalar_fixture! … }`, reading its plaintext values diff --git a/tests/sqlx/tests/encrypted_domain/property/README.md b/tests/sqlx/tests/encrypted_domain/property/README.md index bed3ae04..823af27c 100644 --- a/tests/sqlx/tests/encrypted_domain/property/README.md +++ b/tests/sqlx/tests/encrypted_domain/property/README.md @@ -17,11 +17,13 @@ named for what they operate on, not by an abstract tier letter: | **fixture** | [`fixture_oracle.rs`](./fixture_oracle.rs) | integration | committed fixture rows (real ciphertext) | isolated per-test DB (`#[sqlx::test]`) | | **e2e** | [`e2e_oracle.rs`](./e2e_oracle.rs) | integration | freshly generated plaintexts, encrypted each run | shared test DB **+ ZeroKMS creds** | -The `fixture` suite spans two files: [`fixture_oracle.rs`](./fixture_oracle.rs) -(the operator **and** function-double oracles + term-extractor identity) and -[`match_smoke.rs`](./match_smoke.rs) (example-based bloom containment for the text -`_match` domain). Both are un-gated — they read the already-encrypted fixtures, -no fresh ZeroKMS. +The `fixture` suite spans several files: [`fixture_oracle.rs`](./fixture_oracle.rs) +(the operator **and** function-double oracles + term-extractor identity), +[`cross_ciphertext.rs`](./cross_ciphertext.rs) (proves two independent +encryptions of one value compare equal, over the committed per-type *doubles* +fixtures), and [`match_smoke.rs`](./match_smoke.rs) (example-based bloom +containment for the text `_match` domain). All are un-gated — they read the +already-encrypted fixtures, no fresh ZeroKMS. Plus [`edge_cases.rs`](./edge_cases.rs): example-based unit tests for NULL propagation, blockers raising on unsupported operators (including the @@ -55,6 +57,16 @@ identity** — `eql_v3.eq_term` returns the payload's exact `hm`, `eql_v3.ord_te returns its exact `ob`. [`match_smoke.rs`](./match_smoke.rs) adds the example-based bloom containment (`@>`/`<@`) for the text `_match` domain. +Cross-ciphertext equality — "two independent encryptions of one value compare +equal" — needs equal-plaintext / distinct-ciphertext rows, which the curated +matrix fixture (unique plaintexts) has no room for. So each comparison type +carries a tiny sibling table, `fixtures.eql_v2__doubles` (generated by +[`fixtures::eql_doubles`](../../../src/fixtures/eql_doubles.rs)): the first three +catalog values, each encrypted twice. [`cross_ciphertext.rs`](./cross_ciphertext.rs) +reads ONLY those tables and proves the equality holds through both the `hm` +(`_eq`) and the ORE `ob` (`_ord` / `_ord_ore`) paths — credential-free, no fresh +test-time encryption. + [overload]: ../../../src/property.rs ### e2e — oracle over fresh end-to-end encryption @@ -73,9 +85,12 @@ Defence in depth over the fixture suite: the fixtures are encrypted once at `test:sqlx:prep`, so they pin behaviour against a *frozen* ciphertext snapshot; the e2e suite re-encrypts on **every run**, so it catches a live crypto-path regression (a `cipherstash-client` / ZeroKMS change) that leaves the committed -fixtures untouched. The e2e suite is also the one that exercises "same plaintext, -*different* ciphertext" (equality across independently-encrypted values), via the -fresh duplicate plaintexts it appends each run. +fixtures untouched. The e2e suite also exercises "same plaintext, *different* +ciphertext" (equality across independently-encrypted values) via the fresh +duplicate plaintexts it appends each run — but the fixture suite already covers +that case credential-free through the committed per-type *doubles* tables +(`cross_ciphertext.rs`), so the e2e run is defence in depth on it, not the only +home for it. ## The shared oracle engine @@ -129,11 +144,13 @@ encryption to reach inputs the fixtures can't. fine). [`match_smoke.rs`](./match_smoke.rs) is a plain `#[sqlx::test]` (not proptest-driven), loading the fixtures into its own isolated DB. - **Equality-true must actually fire.** Random distinct plaintexts almost never - collide, so the e2e suite injects deliberate duplicate plaintexts (plus signed - extremes and zero) each run to exercise the `a == b ⇒ eq` branch across - *distinct* ciphertexts. The fixture suite's curated rows have unique plaintexts, - so it exercises the equality-true branch on self-pairs (same ciphertext); the - cross-ciphertext case is the e2e suite's job. + collide, so both DB suites inject deliberate duplicate plaintexts to exercise + the `a == b ⇒ eq` branch across *distinct* ciphertexts: the fixture suite via + the committed per-type *doubles* tables (`cross_ciphertext.rs`), and the e2e + suite via fresh duplicates (plus signed extremes and zero) each run. The + matrix's own curated rows have unique plaintexts, so they exercise the + equality-true branch only on self-pairs (same ciphertext) — which is why the + doubles tables exist. ## Running @@ -144,7 +161,7 @@ cargo test -p eql-scalars proptest_invariants # fixture + edge-case suites (needs a prepared DB) mise run test:sqlx:prep cd tests/sqlx && cargo test --test encrypted_domain \ - property::fixture_oracle property::match_smoke property::edge_cases + property::fixture_oracle property::cross_ciphertext property::match_smoke property::edge_cases # all suites incl. e2e (needs DB + CS_* creds) mise run test:sqlx # enables --features proptest-e2e diff --git a/tests/sqlx/tests/encrypted_domain/property/cross_ciphertext.rs b/tests/sqlx/tests/encrypted_domain/property/cross_ciphertext.rs new file mode 100644 index 00000000..a6c52dc4 --- /dev/null +++ b/tests/sqlx/tests/encrypted_domain/property/cross_ciphertext.rs @@ -0,0 +1,114 @@ +//! fixture-suite (CIP-3141) cross-ciphertext equality test. +//! +//! Proves "two independent encryptions of one value compare equal" using the +//! committed `fixtures.eql_v2__doubles` tables — each plaintext encrypted +//! twice, so the table carries equal-plaintext / distinct-ciphertext rows. No +//! fresh encryption, no creds: it reads the already-encrypted doubles, so it +//! runs in the credential-free `mise run test:sqlx` path. Distinct from the +//! matrix (which reads the curated `fixtures.eql_v2_`) and from the e2e suite +//! (which re-encrypts fresh duplicates each run). +//! +//! Each type asserts, on its doubles rows: +//! 1. a distinct-ciphertext pair exists (an equal-plaintext pair whose +//! `payload_json` differs) — so the equality assertions below are non-trivial; +//! 2. `=` TRUE / `<>` FALSE across every pair through the `_eq` (hm/HMAC) domain +//! (`assert_eq_oracle`); +//! 3. the ordering operators agree with the plaintext oracle on both ordered +//! twins (`assert_ord_oracle`), PLUS `=` TRUE / `<>` FALSE on an equal pair +//! through `_ord` and `_ord_ore` — the ORE (`ob`) equality path, which routes +//! `=` through `compare_ore_block_256_terms(...) = 0` (GUARANTEED equal for +//! two independent encryptions of one value; see the ORE finding in the plan). +//! +//! `#[sqlx::test]` per type (its own migrated scratch DB), like the rest of the +//! fixture suite. + +use super::fixture_oracle::load_doubles_rows; +use anyhow::Result; +use eql_tests::property::{assert_eq_oracle, assert_ord_oracle, Row}; +use eql_tests::scalar_domains::{ScalarDomainSpec, ScalarType, Variant}; +use sqlx::PgPool; + +/// Find two rows with equal plaintext but DIFFERENT ciphertext, or fail. The +/// doubles fixture encrypts each plaintext independently, so an equal-plaintext +/// pair is expected to differ in ciphertext; a failure here means the fixture +/// was not regenerated. +fn first_distinct_ciphertext_pair(rows: &[Row]) -> Result<(&Row, &Row)> { + for i in 0..rows.len() { + for j in (i + 1)..rows.len() { + if rows[i].plaintext == rows[j].plaintext + && rows[i].payload_json != rows[j].payload_json + { + return Ok((&rows[i], &rows[j])); + } + } + } + anyhow::bail!( + "doubles fixture for {} has no equal-plaintext/distinct-ciphertext pair; \ + regenerate via mise run test:sqlx:prep", + T::PG_TYPE + ) +} + +/// Assert `=` TRUE / `<>` FALSE for one equal-plaintext distinct-ciphertext pair +/// on `variant`'s domain. Used for the ORE path (`Ord` / `OrdOre`), which routes +/// `=` through `compare_ore_block_256_terms(...) = 0` — the assertion the +/// plaintext ordering oracle does not itself make on the ordered twins. +async fn assert_pair_eq_on( + pool: &PgPool, + variant: Variant, + a: &Row, + b: &Row, +) -> Result<()> { + let domain = ScalarDomainSpec::new::(variant).sql_domain; + // `''::jsonb::` for each side; escape single quotes the same + // way property.rs's `cast` does. + let a_cast = format!("'{}'::jsonb::{domain}", a.payload_json.replace('\'', "''")); + let b_cast = format!("'{}'::jsonb::{domain}", b.payload_json.replace('\'', "''")); + let sql = format!("SELECT ({a_cast}) = ({b_cast}), ({a_cast}) <> ({b_cast})"); + let (eq, neq): (Option, Option) = sqlx::query_as(&sql).fetch_one(pool).await?; + anyhow::ensure!( + eq == Some(true), + "cross-ciphertext `=` on {domain} must be TRUE for equal plaintext, got {eq:?}" + ); + anyhow::ensure!( + neq == Some(false), + "cross-ciphertext `<>` on {domain} must be FALSE for equal plaintext, got {neq:?}" + ); + Ok(()) +} + +/// The full cross-ciphertext check for an ordered scalar `T`. +async fn assert_cross_ciphertext(pool: &PgPool) -> Result<()> { + let rows = load_doubles_rows::(pool).await?; + + // (1) the doubles really are distinct ciphertext. + let (a, b) = first_distinct_ciphertext_pair::(&rows)?; + + // (2) hm/HMAC equality path across all pairs. + assert_eq_oracle::(pool, &rows).await?; + + // (3) ordering oracle on both ordered twins, plus the explicit ORE-path + // equality on the distinct-ciphertext pair. + assert_ord_oracle::(pool, Variant::Ord, &rows).await?; + assert_ord_oracle::(pool, Variant::OrdOre, &rows).await?; + assert_pair_eq_on::(pool, Variant::Ord, a, b).await?; + assert_pair_eq_on::(pool, Variant::OrdOre, a, b).await?; + Ok(()) +} + +macro_rules! cross_ciphertext_test { + ($name:ident, $ty:ty) => { + #[sqlx::test] + async fn $name(pool: PgPool) -> Result<()> { + assert_cross_ciphertext::<$ty>(&pool).await + } + }; +} + +cross_ciphertext_test!(cross_ciphertext_int2, i16); +cross_ciphertext_test!(cross_ciphertext_int4, i32); +cross_ciphertext_test!(cross_ciphertext_int8, i64); +cross_ciphertext_test!(cross_ciphertext_date, chrono::NaiveDate); +cross_ciphertext_test!(cross_ciphertext_timestamptz, chrono::DateTime); +cross_ciphertext_test!(cross_ciphertext_numeric, rust_decimal::Decimal); +cross_ciphertext_test!(cross_ciphertext_text, String); diff --git a/tests/sqlx/tests/encrypted_domain/property/fixture_oracle.rs b/tests/sqlx/tests/encrypted_domain/property/fixture_oracle.rs index 9c3b9c03..60edc603 100644 --- a/tests/sqlx/tests/encrypted_domain/property/fixture_oracle.rs +++ b/tests/sqlx/tests/encrypted_domain/property/fixture_oracle.rs @@ -85,6 +85,52 @@ pub(crate) fn embedded_fixture_sql() -> &'static str { } } +/// The `_doubles` fixture SQL for `T`, `include_str!`-embedded at compile time +/// (one arm per comparison-capable token). Same embed rationale as +/// `embedded_fixture_sql` — the prebuilt nextest archive carries the gitignored +/// fixtures into CI shards. The table is `fixtures.eql_v2__doubles`; the file +/// is `fixtures/eql_v2__doubles.sql`. `bool` is storage-only and has no +/// doubles fixture; the cross-ciphertext test never instantiates it, so its +/// absence (caught by the loud catch-all) is correct. +pub(crate) fn embedded_doubles_sql() -> &'static str { + match T::PG_TYPE { + "int2" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_int2_doubles.sql" + )), + "int4" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_int4_doubles.sql" + )), + "int8" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_int8_doubles.sql" + )), + "date" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_date_doubles.sql" + )), + "timestamptz" => { + include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_timestamptz_doubles.sql" + )) + } + "numeric" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_numeric_doubles.sql" + )), + "text" => include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/fixtures/eql_v2_text_doubles.sql" + )), + other => panic!( + "no embedded doubles fixture for catalog token '{other}'; \ + add an include_str! arm in embedded_doubles_sql" + ), + } +} + /// Load `T`'s committed fixtures into `pool`'s isolated scratch DB via the /// `include_str!`-embedded SQL. The fixture SQL is self-contained (`CREATE SCHEMA /// IF NOT EXISTS fixtures` / `CREATE` / `INSERT`); since each `#[sqlx::test]` DB @@ -123,6 +169,30 @@ pub(crate) async fn load_rows(pool: &PgPool) -> Result_doubles` (NOT the matrix's `fixtures.eql_v2_`), so it +/// carries the equal-plaintext / distinct-ciphertext rows the cross-ciphertext +/// test needs. +pub(crate) async fn load_doubles_rows(pool: &PgPool) -> Result>>> { + sqlx::raw_sql(embedded_doubles_sql::()) + .execute(pool) + .await + .with_context(|| format!("loading doubles fixtures for {}", T::PG_TYPE))?; + let table = format!("fixtures.eql_v2_{}_doubles", T::PG_TYPE); + let sql = format!("SELECT plaintext, payload::text FROM {table} ORDER BY id"); + let raw: Vec<(T, String)> = sqlx::query_as(&sql).fetch_all(pool).await?; + let rows: Vec> = raw + .into_iter() + .map(|(plaintext, payload_json)| Row { + plaintext, + payload_json, + }) + .collect(); + anyhow::ensure!(!rows.is_empty(), "doubles fixture {table} is empty"); + Ok(Arc::new(rows)) +} + /// Build a sample by selecting indices (with repeats) into the loaded fixtures. /// `idxs` are already bounded to `0..all.len()` by the proptest strategy. fn pick(all: &[Row], idxs: &[usize]) -> Vec> { diff --git a/tests/sqlx/tests/encrypted_domain/property/mod.rs b/tests/sqlx/tests/encrypted_domain/property/mod.rs index e52b891c..cb6ff651 100644 --- a/tests/sqlx/tests/encrypted_domain/property/mod.rs +++ b/tests/sqlx/tests/encrypted_domain/property/mod.rs @@ -27,6 +27,11 @@ mod edge_cases; mod fixture_oracle; // fixture suite: example-based bloom match smoke over the text `_match` fixtures. mod match_smoke; +// fixture suite: cross-ciphertext equality over the per-type doubles fixtures +// (each plaintext encrypted twice) — proves two independent encryptions of one +// value compare equal through both the hm (`_eq`) and ORE (`_ord`/`_ord_ore`) +// paths. +mod cross_ciphertext; // e2e suite: oracle over freshly generated + batch-encrypted values. #[cfg(feature = "proptest-e2e")] mod e2e_oracle; diff --git a/tests/sqlx/tests/generate_all_fixtures.rs b/tests/sqlx/tests/generate_all_fixtures.rs index dcb50c27..bc419144 100644 --- a/tests/sqlx/tests/generate_all_fixtures.rs +++ b/tests/sqlx/tests/generate_all_fixtures.rs @@ -58,5 +58,17 @@ async fn generate_all() -> anyhow::Result<()> { eprintln!("Generating fixture v3_numeric_collision (1 == 1.0 ORE collision)..."); eql_tests::fixtures::v3_numeric_collision::generate().await?; eprintln!("Regenerated v3_numeric_collision."); + + // Per-type "doubles" fixtures (each plaintext encrypted twice) for the + // credential-free cross-ciphertext-equality test. Non-catalog (the catalog + // fixture is the curated set exactly), generated through the same pipeline. + for token in eql_tests::fixtures::eql_doubles::DOUBLES_TOKENS { + eprintln!("Generating fixture eql_v2_{token}_doubles..."); + eql_tests::fixtures::eql_doubles::generate(token).await?; + } + eprintln!( + "Regenerated {} doubles fixture(s).", + eql_tests::fixtures::eql_doubles::DOUBLES_TOKENS.len() + ); Ok(()) }