Skip to content

Commit 9768032

Browse files
committed
fix(test): compare cross-width float ORE terms via the SQL ORE operator (CIP-3141)
`float4_and_float8_share_index_terms_for_the_same_value` asserted byte-equality of the raw `ob` ORE arrays of two independently-encrypted payloads. That can never hold: a BlockORE term is `Left (deterministic) ++ Right (16-byte random per-ciphertext nonce + nonce-masked truth tables)`, so two encodings of the SAME value — same width, same cast — are byte-UNEQUAL by construction. Ordering is decided by the ORE compare function, not raw bytes. (The cast is irrelevant: `real`/`double` collapse to one f64 `ColumnType::Float` in cipherstash-client; the deterministic Left halves are byte-identical, which is what proves the two widths share an encoding.) The bug stayed latent because the e2e suite is feature/creds-gated and, when it did run, an earlier `ob`-as-string extraction errored out before the assertion; fixing that extraction (5bbd959) unmasked the wrong assertion. Compare the extracted `ord_term`s through the SQL `eql_v3.ore_block_256` `=` operator (the only correct ORE check) and keep the deterministic `hm` equality term as a direct byte comparison. Also correct the CHANGELOG claim of a "byte-identical ORE term" to "equal under the ORE comparator". Verified: the test now passes against fresh ZeroKMS encryption.
1 parent 5bbd959 commit 9768032

2 files changed

Lines changed: 41 additions & 24 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Each entry that ships in a published release links to the PR that introduced it.
3737
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3838
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
3939
- **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295))
40-
- **`eql_v3.float4` / `eql_v3.float8` encrypted-domain type families (ordered).** Four jsonb-backed domains each for encrypted `real` / `double precision` columns — `eql_v3.float4` / `eql_v3.float8` (storage-only), `eql_v3.<T>_eq` (`=` / `<>` via HMAC), and `eql_v3.<T>_ord` / `eql_v3.<T>_ord_ore` (also `<` `<=` `>` `>=`, `MIN` / `MAX` via 8-block ORE) — generated from the `float4` / `float8` rows in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Both widths encrypt through a single f64 crypto path (`Plaintext::Float`): a `real` is widened to f64 before encryption (exact and monotonic), so `float4` vs `float8` is purely a Postgres-surface distinction and the ciphertext / ORE term are byte-identical. Ordering is correct for all non-NaN values via the standard monotonic IEEE-754 byte mapping (`f64::ENCODED_LEN == 8`, same as `int8`); `-0.0` canonicalizes to `+0.0` and `±Inf` order correctly. NaN is unordered and unspecified in the encoder — it can be encrypted and stored but is not given a meaningful comparison guarantee (any NaN rejection is client-side). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted IEEE-754 float column, closing the gap for `real` / `double` columns that had no v3 equivalent (the v3 `numeric` family is arbitrary-precision decimal, not binary float). ([#299](https://github.com/cipherstash/encrypt-query-language/pull/299))
40+
- **`eql_v3.float4` / `eql_v3.float8` encrypted-domain type families (ordered).** Four jsonb-backed domains each for encrypted `real` / `double precision` columns — `eql_v3.float4` / `eql_v3.float8` (storage-only), `eql_v3.<T>_eq` (`=` / `<>` via HMAC), and `eql_v3.<T>_ord` / `eql_v3.<T>_ord_ore` (also `<` `<=` `>` `>=`, `MIN` / `MAX` via 8-block ORE) — generated from the `float4` / `float8` rows in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Both widths encrypt through a single f64 crypto path (`Plaintext::Float`): a `real` is widened to f64 before encryption (exact and monotonic), so `float4` vs `float8` is purely a Postgres-surface distinction — the `hm` equality term is byte-identical and the ORE terms compare equal under the `eql_v3.ore_block_256` operator (the ORE term itself is probabilistic — a fresh per-ciphertext nonce — so it is never byte-identical, even same-width; ordering is decided by the ORE comparator, not by raw bytes). Ordering is correct for all non-NaN values via the standard monotonic IEEE-754 byte mapping (`f64::ENCODED_LEN == 8`, same as `int8`); `-0.0` canonicalizes to `+0.0` and `±Inf` order correctly. NaN is unordered and unspecified in the encoder — it can be encrypted and stored but is not given a meaningful comparison guarantee (any NaN rejection is client-side). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted IEEE-754 float column, closing the gap for `real` / `double` columns that had no v3 equivalent (the v3 `numeric` family is arbitrary-precision decimal, not binary float). ([#299](https://github.com/cipherstash/encrypt-query-language/pull/299))
4141

4242
### Changed
4343

tests/sqlx/tests/encrypted_domain/property/e2e_oracle.rs

Lines changed: 40 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -209,10 +209,19 @@ e2e_oracle_suite!(
209209

210210
/// Both float widths encrypt through the SINGLE f64 crypto path
211211
/// (`F4::to_plaintext` widens `self.0 as f64`; `F8::to_plaintext` is the
212-
/// identity), so an f32 value and its exact f64 widening MUST produce identical
213-
/// index terms — this is the byte-identity the CHANGELOG claims. Encrypt the
214-
/// same value both ways (an f32-exact value, so `x as f64` is lossless) and
215-
/// assert the `hm` (HMAC equality) and `ob` (ORE) terms match across widths.
212+
/// identity), so an f32 value and its exact f64 widening are the SAME real
213+
/// number and are equality- and order-interchangeable across widths. The two
214+
/// index terms behave differently and so are checked differently:
215+
///
216+
/// - `hm` (HMAC equality) is a **deterministic** keyed hash of the value, so the
217+
/// two widths produce a **byte-identical** `hm` — assert that directly.
218+
/// - `ob` (ORE ordering) is **probabilistic**: each encryption draws a fresh
219+
/// per-ciphertext nonce (the random Right half of the BlockORE term), so two
220+
/// encodings of one value are byte-UNEQUAL *by construction* — even same-width,
221+
/// same-value. Ordering is decided by the ORE compare function, never by raw
222+
/// bytes, so the ONLY correct cross-width ORE check is the SQL
223+
/// `eql_v3.ore_block_256` `=` operator over the extracted `ord_term`s.
224+
///
216225
/// Creds/e2e-gated like the rest of this file.
217226
#[test]
218227
fn float4_and_float8_share_index_terms_for_the_same_value() -> Result<()> {
@@ -222,8 +231,8 @@ fn float4_and_float8_share_index_terms_for_the_same_value() -> Result<()> {
222231
.enable_all()
223232
.build()?;
224233

225-
// f32-exact value: `x as f64` is the same real number, so any term
226-
// difference would be a width artifact, which is exactly what we forbid.
234+
// f32-exact value: `x as f64` is the same real number, so both widths encode
235+
// the identical f64 — any *value* difference would be a width artifact.
227236
let x: f32 = 2.25;
228237

229238
let f4_payloads = rt.block_on(async {
@@ -241,34 +250,42 @@ fn float4_and_float8_share_index_terms_for_the_same_value() -> Result<()> {
241250
encrypt_store("xwidth_f8", "payload", &[F8(x as f64)], &cfg).await
242251
})?;
243252

244-
// Pull the scalar `hm` index term (a JSON string) from the EQL payload.
253+
// `hm` (deterministic HMAC) is byte-identical across widths — compare directly.
245254
let hm = |p: &serde_json::Value| -> Result<String> {
246255
p.get("hm")
247256
.and_then(serde_json::Value::as_str)
248257
.map(str::to_string)
249258
.ok_or_else(|| anyhow::anyhow!("payload missing string `hm`: {p}"))
250259
};
251-
// Pull the `ob` ORE term. Unlike `hm`, `ob` is a JSON array of block
252-
// strings, so compare the arrays directly rather than coercing to a string.
253-
let ob = |p: &serde_json::Value| -> Result<serde_json::Value> {
254-
p.get("ob")
255-
.filter(|v| v.is_array())
256-
.cloned()
257-
.ok_or_else(|| anyhow::anyhow!("payload missing array `ob`: {p}"))
258-
};
259-
260-
// HMAC equality term: identical plaintext + key => identical hm, so the two
261-
// widths are equality-interchangeable at the term level.
262260
assert_eq!(
263261
hm(&f4_payloads[0])?,
264262
hm(&f8_payloads[0])?,
265263
"float4 and float8 of the same value must share the hm equality term"
266264
);
267-
// ORE term: same f64 input => same ORE ciphertext, so ordering is identical.
268-
assert_eq!(
269-
ob(&f4_payloads[0])?,
270-
ob(&f8_payloads[0])?,
271-
"float4 and float8 of the same value must share the ob ORE term"
265+
266+
// `ob` (probabilistic ORE) is NOT byte-comparable — the only correct check is
267+
// the SQL ORE operator over the extracted `ord_term`s. Cast each payload to
268+
// its width's `_ord_ore` domain, extract the `eql_v3.ore_block_256` term, and
269+
// compare with `=` (eql_v3.ore_block_256_eq => compare_ore_block_256_terms = 0).
270+
let pool: PgPool = rt.block_on(connect_pool())?;
271+
rt.block_on(ensure_eql_installed(&pool, &super::migrator()))?;
272+
273+
let ord_term = |p: &serde_json::Value, domain: &str| -> String {
274+
let lit = p.to_string().replace('\'', "''");
275+
format!("eql_v3.ord_term('{lit}'::jsonb::{domain})")
276+
};
277+
let sql = format!(
278+
"SELECT {} = {}",
279+
ord_term(&f4_payloads[0], "eql_v3.float4_ord_ore"),
280+
ord_term(&f8_payloads[0], "eql_v3.float8_ord_ore"),
281+
);
282+
let ore_equal: Option<bool> = rt
283+
.block_on(sqlx::query_scalar(&sql).fetch_one(&pool))
284+
.map_err(|e| anyhow::anyhow!("cross-width ORE compare query ({sql}): {e}"))?;
285+
anyhow::ensure!(
286+
ore_equal == Some(true),
287+
"float4 and float8 of the same value must compare equal under the SQL ORE \
288+
operator (eql_v3.ore_block_256 `=`); got {ore_equal:?}"
272289
);
273290
Ok(())
274291
}

0 commit comments

Comments
 (0)