Skip to content

Commit 3e57268

Browse files
committed
docs: align property-test docs to three-suite classification with function doubles (CIP-3141)
Update the property/README.md, tests/sqlx/README.md, and the CHANGELOG entry to describe the fixture suite's function-double oracles, term-extractor identity, randomized corpus (mandatory ∪ random ∪ duplicates) + corpus_invariants guard, and bloom match smoke — all within the catalog/fixture/e2e three-suite frame. e2e is documented as defence in depth (re-encrypts each run vs the frozen fixture snapshot).
1 parent 8497df6 commit 3e57268

3 files changed

Lines changed: 62 additions & 16 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Each entry that ships in a published release links to the PR that introduced it.
3333
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`, `text_search`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). The combined **`text_search`** domain carries all three capabilities in one type — `=` / `<>` via HMAC, `<` `<=` `>` `>=` / `min` / `max` via ORE, and `@>` / `<@` via bloom filter. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` / `text_search` domains — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality. **Equality on the ordered text domains (`text_ord`, `text_ord_ore`) and on `text_search` always routes `=` / `<>` through `hm` (exact HMAC), never the ORE term — ORE is not exact-equality for text** (integer ordered domains keep exact ORE equality, which is lossless for them). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
3434
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3535
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
36-
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that samples the committed fixture corpus (real ciphertext) and checks all ordered pairs in each sampled corpus, and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) oracles plus NULL/blocker/CHECK edge cases. Why: the prior matrix exercised fixed pivots only; property tests catch operator/oracle disagreements across the whole value space. ([#275](https://github.com/cipherstash/encrypt-query-language/pull/275))
36+
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that samples the committed fixture corpus (real ciphertext) and checks all ordered pairs in each sampled corpus, and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. The fixture corpus is itself randomized at generation time — `mandatory ∪ random ∪ duplicates`: the curated catalog floor (`Min`/`Max`/`Zero`/pivots), a seeded per-type random sample, and deliberate duplicate plaintexts so equality across two independent encryptions of one value is exercised without fresh test-time encryption (`FIXTURE_SEED` makes any corpus reproducible; a `corpus_invariants` guard fails loudly if a regeneration drops the floor or the cross-ciphertext-equality rows). Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). Why: the prior matrix exercised fixed pivots only; property tests over a randomized real-ciphertext corpus catch operator/oracle disagreements across the whole value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#275](https://github.com/cipherstash/encrypt-query-language/pull/275))
3737
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3838
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
3939

tests/sqlx/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,8 +277,11 @@ Tests connect to PostgreSQL database configured by SQLx:
277277
-~~Convert remaining SQL tests~~ **COMPLETE!**
278278
- Property-based tests: implemented in `tests/encrypted_domain/property/` and
279279
`crates/eql-scalars/src/proptest_invariants.rs` (CIP-3141). One unit-level
280-
**catalog** suite (no DB) plus two integration suites — **fixture** (oracle
281-
over the committed fixture corpus) and **e2e** (oracle over fresh end-to-end
282-
encryption, `--features proptest-e2e`).
280+
**catalog** suite (no DB) plus two integration suites — **fixture** (operator +
281+
function-double oracles, term-extractor identity, corpus invariants, and bloom
282+
match smoke over a randomized real-ciphertext corpus, `mandatory ∪ random ∪
283+
duplicates`) and **e2e** (oracle over fresh end-to-end encryption each run,
284+
`--features proptest-e2e`). See
285+
`tests/encrypted_domain/property/README.md` for the full structure.
283286
- Performance benchmarks: Measure query performance with encrypted data
284287
- Integration tests: Test with CipherStash Proxy

tests/sqlx/tests/encrypted_domain/property/README.md

Lines changed: 55 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,17 @@ named for what they operate on, not by an abstract tier letter:
1717
| **fixture** | [`fixture_oracle.rs`](./fixture_oracle.rs) | integration | committed fixture corpus (real ciphertext) | shared test DB |
1818
| **e2e** | [`e2e_oracle.rs`](./e2e_oracle.rs) | integration | freshly generated plaintexts, encrypted each run | shared test DB **+ ZeroKMS creds** |
1919

20+
The `fixture` suite spans several files: [`fixture_oracle.rs`](./fixture_oracle.rs)
21+
(the operator **and** function-double oracles + term-extractor identity),
22+
[`corpus_invariants.rs`](./corpus_invariants.rs) (structural guards on the
23+
randomized corpus), and [`match_smoke.rs`](./match_smoke.rs) (example-based bloom
24+
containment for the text `_match` domain). All three are un-gated — they read the
25+
already-encrypted corpus, no fresh ZeroKMS.
26+
2027
Plus [`edge_cases.rs`](./edge_cases.rs): example-based unit tests for
2128
NULL propagation, blockers raising on unsupported operators (including the
22-
native-`jsonb` `->`/`@>` domain-fallback paths), `timestamptz` ordering
23-
deferral, and CHECK rejection of malformed payloads.
29+
native-`jsonb` `->`/`@>` domain-fallback paths) and CHECK rejection of malformed
30+
payloads.
2431

2532
### catalog — catalog invariants, no database
2633

@@ -40,16 +47,42 @@ so it runs whenever the fixtures are present, and it generalises across the whol
4047
catalog for free (every fixtured type gets an `eq_oracle`, ordered types also get
4148
an `ord_oracle`).
4249

50+
The corpus is itself **randomized at generation time** — `mandatory ∪ random ∪
51+
duplicates` (see [`fixtures::random`](../../../src/fixtures/random.rs)): the
52+
curated catalog floor (`Min`/`Max`/`Zero`/pivots), a seeded per-type random
53+
sample, and deliberate duplicate plaintexts. Each duplicate is encrypted
54+
independently, so the corpus carries **equal-plaintext / distinct-ciphertext**
55+
rows — the fixture suite therefore exercises "two independent encryptions of one
56+
value compare equal" without any fresh test-time encryption. `FIXTURE_SEED`
57+
(logged and embedded in the generated SQL) makes any corpus reproducible;
58+
[`corpus_invariants.rs`](./corpus_invariants.rs) fails loudly if a regeneration
59+
drops the mandatory floor or the cross-ciphertext-equality rows.
60+
61+
On top of the operator oracles, the fixture suite runs the **function-double
62+
oracles**: it calls the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte`
63+
**functions** by name across all three [`Overload`s][overload]
64+
(domain–domain, domain–jsonb, jsonb–domain) and asserts **term-extractor
65+
identity**`eql_v3.eq_term` returns the payload's exact `hm`, `eql_v3.ord_term`
66+
returns its exact `ob`. [`match_smoke.rs`](./match_smoke.rs) adds the
67+
example-based bloom containment (`@>`/`<@`) for the text `_match` domain.
68+
69+
[overload]: ../../../src/property.rs
70+
4371
### e2e — oracle over fresh end-to-end encryption
4472

4573
Same oracle engine, but each case **generates fresh random plaintexts and
4674
encrypts them end-to-end through ZeroKMS** (one batched call per case) before
4775
querying. Gated behind the `proptest-e2e` cargo feature — `mise run test:sqlx`
48-
enables it (CI has the secrets); a bare `cargo test` compiles it out. It is the
49-
**only** suite that can exercise "same plaintext, *different* ciphertext"
50-
(equality across independently-encrypted values), because the committed fixture
51-
corpus has no duplicate plaintexts. Integer scalars only for now (random `T`
52-
generation is trivial for integers).
76+
enables it (CI has the secrets); a bare `cargo test` compiles it out. Integer
77+
scalars only for now (random `T` generation is trivial for integers).
78+
79+
Defence in depth over the fixture suite: the fixture corpus is encrypted once at
80+
`test:sqlx:prep`, so it pins behaviour against a *frozen* ciphertext snapshot;
81+
the e2e suite re-encrypts on **every run**, so it catches a live crypto-path
82+
regression (a `cipherstash-client` / ZeroKMS change) that leaves the committed
83+
fixtures untouched. Both suites exercise cross-ciphertext equality — the fixture
84+
suite via the corpus's duplicate plaintexts, the e2e suite via fresh duplicates
85+
each run.
5386

5487
## The shared oracle engine
5588

@@ -64,6 +97,13 @@ generation is trivial for integers).
6497
The fixture and e2e suites differ only in **where the rows come from**; the
6598
engine is identical.
6699

100+
The same file also holds the **function-double** helpers the fixture suite layers
101+
on: `assert_eq_fn_oracle` / `assert_ord_fn_oracle` (the named `eql_v3.*`
102+
functions across every [`Overload`][overload]), `assert_extractor_oracle`
103+
(`eq_term`==`hm` / `ord_term`==`ob` identity), and `assert_match_smoke` (bloom
104+
containment). They take the same `Row` corpus, so they ride the fixture corpus at
105+
zero marginal ZeroKMS cost.
106+
67107
## Why ciphertext can't be `Arbitrary`-derived
68108

69109
A valid payload's `hm`/`ob` terms are real ciphertext from `cipherstash-client`
@@ -89,10 +129,12 @@ encryption to reach inputs the fixtures can't.
89129
via `connect_pool()` (they cannot use `#[sqlx::test]`'s injected pool from a
90130
sync body). Operator evaluation is read-only `SELECT`, so no per-test schema
91131
isolation is needed.
92-
- **Equality-true must actually fire.** Random integer pairs almost never
93-
collide, so the e2e corpus injects deliberate duplicate plaintexts (plus signed
94-
extremes and zero) to exercise the `a == b ⇒ eq` branch across distinct
95-
ciphertexts.
132+
- **Equality-true must actually fire.** Random pairs almost never collide, so
133+
both DB suites inject deliberate duplicate plaintexts to exercise the
134+
`a == b ⇒ eq` branch across distinct ciphertexts: the fixture corpus carries
135+
duplicates from `fixtures::random` (`corpus_invariants` guards they survive
136+
regeneration), and the e2e corpus appends fresh duplicates (plus signed
137+
extremes and zero) each run.
96138

97139
## Running
98140

@@ -102,7 +144,8 @@ cargo test -p eql-scalars proptest_invariants
102144

103145
# fixture + edge-case suites (needs a prepared DB)
104146
mise run test:sqlx:prep
105-
cd tests/sqlx && cargo test --test encrypted_domain property::fixture_oracle property::edge_cases
147+
cd tests/sqlx && cargo test --test encrypted_domain \
148+
property::fixture_oracle property::corpus_invariants property::match_smoke property::edge_cases
106149

107150
# all suites incl. e2e (needs DB + CS_* creds)
108151
mise run test:sqlx # enables --features proptest-e2e

0 commit comments

Comments
 (0)