Skip to content

Commit 9b75762

Browse files
committed
docs: describe doubles fixtures + cross-ciphertext test (hm + ORE)
1 parent 940ee8b commit 9b75762

3 files changed

Lines changed: 41 additions & 15 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Each entry that ships in a published release links to the PR that introduced it.
3333
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`, `text_search`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). The combined **`text_search`** domain carries all three capabilities in one type — `=` / `<>` via HMAC, `<` `<=` `>` `>=` / `min` / `max` via ORE, and `@>` / `<@` via bloom filter. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` / `text_search` domains — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality. **Equality on the ordered text domains (`text_ord`, `text_ord_ore`) and on `text_search` always routes `=` / `<>` through `hm` (exact HMAC), never the ORE term — ORE is not exact-equality for text** (integer ordered domains keep exact ORE equality, which is lossless for them). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
3434
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3535
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
36-
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). The e2e suite — which appends fresh duplicate plaintexts each run — is the one that exercises equality across two independent encryptions of one value. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293))
36+
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). Equality across two independent encryptions of one value is exercised credential-free by the fixture suite via committed per-type *doubles* fixtures (each plaintext encrypted twice — `property::cross_ciphertext`), through both the `hm` (`_eq`) and ORE (`_ord`/`_ord_ore`) equality paths, and additionally by the e2e suite via fresh duplicate plaintexts each run. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293))
3737
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3838
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
3939
- **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295))

docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,15 @@ Source of truth: `crates/eql-scalars/src` (`CATALOG`). Adding a type is one `Sca
130130

131131
Domain → role: empty ⇒ Storage, first term `Hm` ⇒ Eq, `Ore` ⇒ Ord, `Bloom` ⇒ Match.
132132

133+
**Cross-ciphertext equality** ("two independent encryptions of one value compare
134+
equal") is now covered credential-free for every comparison scalar by the fixture
135+
suite, via committed per-type *doubles* tables (`fixtures.eql_v2_<T>_doubles` — each
136+
plaintext encrypted twice) read by `property::cross_ciphertext`. It exercises both
137+
equality mechanisms: the `hm` path (`_eq`) and the ORE `ob` path (`_ord` /
138+
`_ord_ore`, where `=` routes through `compare_ore_block_256_terms(...) = 0`). The
139+
matrix's curated fixtures have unique plaintexts, so the doubles tables are what
140+
make the equality-across-distinct-ciphertext branch fire without fresh encryption.
141+
133142
**Generated surface per domain:** domain definition + CHECK, extractors (inlinable `LANGUAGE sql`), supported-op wrappers (inlinable), **blockers** (`LANGUAGE plpgsql`, NOT STRICT — opaque to planner so the `RAISE` always survives), 44 `CREATE OPERATOR`s, and `min`/`max` aggregates for ord-capable domains. Blocker count per domain: Storage 44, Eq 38, Ord 26, Match 38.
134143

135144
**Behavioural blocker coverage caveat:** the scalar matrix does not execute every generated operator signature per domain. It covers the important caller-visible blocker classes: unsupported comparison/containment operators, typed-column `col op col` blockers, scalar path blockers (`->`, `->>`), and native-absent LIKE/ILIKE resolution. It does not sweep every generated JSON-style signature such as `?`, `?|`, `?&`, `@?`, `@@`, `#>`, `#>>`, `-`, `#-`, and `||` for every scalar domain.

tests/sqlx/tests/encrypted_domain/property/README.md

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,13 @@ named for what they operate on, not by an abstract tier letter:
1717
| **fixture** | [`fixture_oracle.rs`](./fixture_oracle.rs) | integration | committed fixture rows (real ciphertext) | isolated per-test DB (`#[sqlx::test]`) |
1818
| **e2e** | [`e2e_oracle.rs`](./e2e_oracle.rs) | integration | freshly generated plaintexts, encrypted each run | shared test DB **+ ZeroKMS creds** |
1919

20-
The `fixture` suite spans two files: [`fixture_oracle.rs`](./fixture_oracle.rs)
21-
(the operator **and** function-double oracles + term-extractor identity) and
22-
[`match_smoke.rs`](./match_smoke.rs) (example-based bloom containment for the text
23-
`_match` domain). Both are un-gated — they read the already-encrypted fixtures,
24-
no fresh ZeroKMS.
20+
The `fixture` suite spans several files: [`fixture_oracle.rs`](./fixture_oracle.rs)
21+
(the operator **and** function-double oracles + term-extractor identity),
22+
[`cross_ciphertext.rs`](./cross_ciphertext.rs) (proves two independent
23+
encryptions of one value compare equal, over the committed per-type *doubles*
24+
fixtures), and [`match_smoke.rs`](./match_smoke.rs) (example-based bloom
25+
containment for the text `_match` domain). All are un-gated — they read the
26+
already-encrypted fixtures, no fresh ZeroKMS.
2527

2628
Plus [`edge_cases.rs`](./edge_cases.rs): example-based unit tests for
2729
NULL propagation, blockers raising on unsupported operators (including the
@@ -55,6 +57,16 @@ identity** — `eql_v3.eq_term` returns the payload's exact `hm`, `eql_v3.ord_te
5557
returns its exact `ob`. [`match_smoke.rs`](./match_smoke.rs) adds the
5658
example-based bloom containment (`@>`/`<@`) for the text `_match` domain.
5759

60+
Cross-ciphertext equality — "two independent encryptions of one value compare
61+
equal" — needs equal-plaintext / distinct-ciphertext rows, which the curated
62+
matrix fixture (unique plaintexts) has no room for. So each comparison type
63+
carries a tiny sibling table, `fixtures.eql_v2_<T>_doubles` (generated by
64+
[`fixtures::eql_doubles`](../../../src/fixtures/eql_doubles.rs)): the first three
65+
catalog values, each encrypted twice. [`cross_ciphertext.rs`](./cross_ciphertext.rs)
66+
reads ONLY those tables and proves the equality holds through both the `hm`
67+
(`_eq`) and the ORE `ob` (`_ord` / `_ord_ore`) paths — credential-free, no fresh
68+
test-time encryption.
69+
5870
[overload]: ../../../src/property.rs
5971

6072
### e2e — oracle over fresh end-to-end encryption
@@ -73,9 +85,12 @@ Defence in depth over the fixture suite: the fixtures are encrypted once at
7385
`test:sqlx:prep`, so they pin behaviour against a *frozen* ciphertext snapshot;
7486
the e2e suite re-encrypts on **every run**, so it catches a live crypto-path
7587
regression (a `cipherstash-client` / ZeroKMS change) that leaves the committed
76-
fixtures untouched. The e2e suite is also the one that exercises "same plaintext,
77-
*different* ciphertext" (equality across independently-encrypted values), via the
78-
fresh duplicate plaintexts it appends each run.
88+
fixtures untouched. The e2e suite also exercises "same plaintext, *different*
89+
ciphertext" (equality across independently-encrypted values) via the fresh
90+
duplicate plaintexts it appends each run — but the fixture suite already covers
91+
that case credential-free through the committed per-type *doubles* tables
92+
(`cross_ciphertext.rs`), so the e2e run is defence in depth on it, not the only
93+
home for it.
7994

8095
## The shared oracle engine
8196

@@ -129,11 +144,13 @@ encryption to reach inputs the fixtures can't.
129144
fine). [`match_smoke.rs`](./match_smoke.rs) is a plain `#[sqlx::test]` (not
130145
proptest-driven), loading the fixtures into its own isolated DB.
131146
- **Equality-true must actually fire.** Random distinct plaintexts almost never
132-
collide, so the e2e suite injects deliberate duplicate plaintexts (plus signed
133-
extremes and zero) each run to exercise the `a == b ⇒ eq` branch across
134-
*distinct* ciphertexts. The fixture suite's curated rows have unique plaintexts,
135-
so it exercises the equality-true branch on self-pairs (same ciphertext); the
136-
cross-ciphertext case is the e2e suite's job.
147+
collide, so both DB suites inject deliberate duplicate plaintexts to exercise
148+
the `a == b ⇒ eq` branch across *distinct* ciphertexts: the fixture suite via
149+
the committed per-type *doubles* tables (`cross_ciphertext.rs`), and the e2e
150+
suite via fresh duplicates (plus signed extremes and zero) each run. The
151+
matrix's own curated rows have unique plaintexts, so they exercise the
152+
equality-true branch only on self-pairs (same ciphertext) — which is why the
153+
doubles tables exist.
137154

138155
## Running
139156

@@ -144,7 +161,7 @@ cargo test -p eql-scalars proptest_invariants
144161
# fixture + edge-case suites (needs a prepared DB)
145162
mise run test:sqlx:prep
146163
cd tests/sqlx && cargo test --test encrypted_domain \
147-
property::fixture_oracle property::match_smoke property::edge_cases
164+
property::fixture_oracle property::cross_ciphertext property::match_smoke property::edge_cases
148165

149166
# all suites incl. e2e (needs DB + CS_* creds)
150167
mise run test:sqlx # enables --features proptest-e2e

0 commit comments

Comments
 (0)