Skip to content

Commit a2a31f6

Browse files
authored
Merge pull request #299 from cipherstash/eql_v3_float_domains
feat(v3): float4/float8 encrypted-domain types
2 parents 302a5f1 + 58e0749 commit a2a31f6

60 files changed

Lines changed: 7806 additions & 71 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Each entry that ships in a published release links to the PR that introduced it.
3737
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3838
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
3939
- **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295))
40+
- **`eql_v3.float4` / `eql_v3.float8` encrypted-domain type families (ordered).** Four jsonb-backed domains each for encrypted `real` / `double precision` columns — `eql_v3.float4` / `eql_v3.float8` (storage-only), `eql_v3.<T>_eq` (`=` / `<>` via HMAC), and `eql_v3.<T>_ord` / `eql_v3.<T>_ord_ore` (also `<` `<=` `>` `>=`, `MIN` / `MAX` via 8-block ORE) — generated from the `float4` / `float8` rows in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Both widths encrypt through a single f64 crypto path (`Plaintext::Float`): a `real` is widened to f64 before encryption (exact and monotonic), so `float4` vs `float8` is purely a Postgres-surface distinction and the ciphertext / ORE term are byte-identical. Ordering is correct for all non-NaN values via the standard monotonic IEEE-754 byte mapping (`f64::ENCODED_LEN == 8`, same as `int8`); `-0.0` canonicalizes to `+0.0` and `±Inf` order correctly. NaN is unordered and unspecified in the encoder — it can be encrypted and stored but is not given a meaningful comparison guarantee (any NaN rejection is client-side). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: a type-safe, per-capability encrypted IEEE-754 float column, closing the gap for `real` / `double` columns that had no v3 equivalent (the v3 `numeric` family is arbitrary-precision decimal, not binary float). ([#299](https://github.com/cipherstash/encrypt-query-language/pull/299))
4041

4142
### Changed
4243

crates/eql-scalars/src/fixture.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ impl Fixture {
3434
| Fixture::Jsonb(_)
3535
| Fixture::Date(_)
3636
| Fixture::Timestamptz(_)
37+
| Fixture::Float(_)
3738
| Fixture::Bool(_) => None,
3839
}
3940
}

crates/eql-scalars/src/kind.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ impl ScalarKind {
7272
| ScalarKind::Text
7373
| ScalarKind::Jsonb
7474
| ScalarKind::Bool
75+
| ScalarKind::F32
76+
| ScalarKind::F64
7577
| ScalarKind::Date
7678
| ScalarKind::Timestamptz => None,
7779
}
@@ -97,6 +99,14 @@ impl ScalarKind {
9799
matches!(self, ScalarKind::Text)
98100
}
99101

102+
/// True for the IEEE-754 float kinds (`F32`, `F64`) — ordered, non-integer,
103+
/// string-backed-fixture scalars whose `impl ScalarType` is hand-written in
104+
/// `scalar_domains.rs` (like `text`/`numeric`). Keeps float classification in
105+
/// the catalog crate alongside `is_int`/`is_temporal`/`is_text`.
106+
pub const fn is_float(self) -> bool {
107+
matches!(self, ScalarKind::F32 | ScalarKind::F64)
108+
}
109+
100110
/// A debug/identifier string for the kind: the canonical Rust plaintext type
101111
/// name (`"i32"`, `"chrono::NaiveDate"`, `"rust_decimal::Decimal"`). `Jsonb`
102112
/// has **no generated SQL surface** and no catalog row, so calling this on it
@@ -113,6 +123,8 @@ impl ScalarKind {
113123
ScalarKind::Timestamptz => "chrono::DateTime<Utc>",
114124
ScalarKind::Numeric => "rust_decimal::Decimal",
115125
ScalarKind::Bool => "bool",
126+
ScalarKind::F32 => "f32",
127+
ScalarKind::F64 => "f64",
116128
ScalarKind::Jsonb => {
117129
panic!("ScalarKind::rust_type: jsonb has no generated surface yet")
118130
}

crates/eql-scalars/src/lib.rs

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,16 @@ pub enum ScalarKind {
8080
/// server-side. Like the other non-integer kinds, the bounded-numeric
8181
/// accessors are unreachable for it by construction.
8282
Bool,
83+
/// 32-bit IEEE-754 binary float (`f32`, Postgres `real`/`float4`).
84+
/// Ordered like the integer kinds via ORE, but with no i128 range
85+
/// (`as_bounded_int()` returns `None`) and string-backed at the catalog
86+
/// layer. Encrypts through the single f64 float crypto path
87+
/// (`Plaintext::Float`) — the f32→f64 widening is exact and monotonic.
88+
F32,
89+
/// 64-bit IEEE-754 binary float (`f64`, Postgres `double precision`/
90+
/// `float8`). The native width of the float crypto path (`F32` widens into
91+
/// it); otherwise classified exactly like [`ScalarKind::F32`].
92+
F64,
8393
}
8494

8595
/// Always-present payload keys required by every generated domain CHECK,
@@ -178,6 +188,12 @@ pub enum Fixture {
178188
/// storage-only, so this fixture is encrypted (ciphertext only, no index
179189
/// term) and never participates in a comparison pivot. Distinct by value.
180190
Bool(bool),
191+
/// An IEEE-754 float plaintext rendered as a string (`"0.5"`, `"-inf"`).
192+
/// The catalog stays zero-dep, so the string is parsed into `f32`/`f64` in
193+
/// the SQLx harness, not here. Distinct by parsed value (the harness
194+
/// `float_fixtures_are_distinct_by_value` guard enforces this). NaN and
195+
/// `-0.0` are deliberately excluded; `±Inf` (`"inf"`/`"-inf"`) ARE fixtures.
196+
Float(&'static str),
181197
}
182198

183199
/// One generated public domain: a suffix appended to the type token and the
@@ -221,6 +237,7 @@ macro_rules! fixtures {
221237
(date; $($s:literal),* $(,)?) => { &[$(Fixture::Date($s)),*] };
222238
(timestamptz; $($s:literal),* $(,)?) => { &[$(Fixture::Timestamptz($s)),*] };
223239
(bool; $($b:literal),* $(,)?) => { &[$(Fixture::Bool($b)),*] };
240+
(float; $($s:literal),* $(,)?) => { &[$(Fixture::Float($s)),*] };
224241
}
225242

226243
/// Domains shared by every ordered-integer scalar, in manifest file order:
@@ -488,9 +505,67 @@ pub const TEXT: ScalarSpec = ScalarSpec {
488505
fixtures: TEXT_FIXTURES,
489506
};
490507

508+
/// `float4` fixture plaintexts — IEEE-754 strings parsed into `f32` in the SQLx
509+
/// harness (the catalog stays zero-dep). EVERY value is exactly representable in
510+
/// f32 — each is a dyadic rational `n/2^k` (e.g. `2.25 = 9/4`, `0.25 = 1/4`,
511+
/// `1024 = 2^10`), the value class `real` stores losslessly — so the `real`
512+
/// round-trip is lossless and the f32→f64 widening before encryption is exact.
513+
/// Keep new fixtures dyadic: a value like `0.1` is NOT f32-exact, and the
514+
/// oracle's expected order (parsed `f32`) would then disagree with the value the
515+
/// `real` column actually rounds to. The three pivots MUST be present
516+
/// verbatim: `"-inf"` (min_pivot), `"0"` (origin/mid), `"inf"` (max_pivot).
517+
/// NaN and `-0.0` are deliberately excluded (see the `float_special` suite).
518+
/// Distinctness is enforced by `Fixture::Float` (above) and its guard test.
519+
const FLOAT4_FIXTURES: &[Fixture] = fixtures!(float;
520+
"-inf", "-1024", "-2.25", "-1", "-0.5", "-0.25",
521+
"0", "0.25", "0.5", "1", "2.25", "1024", "inf");
522+
523+
/// `float8` fixture plaintexts — IEEE-754 strings parsed into `f64` in the SQLx
524+
/// harness. The native width of the float crypto path; values span sign and
525+
/// magnitude including subnormal-free interior points. The three pivots MUST be
526+
/// present verbatim: `"-inf"` (min_pivot), `"0"` (origin/mid), `"inf"`
527+
/// (max_pivot). NaN and `-0.0` are deliberately excluded.
528+
const FLOAT8_FIXTURES: &[Fixture] = fixtures!(float;
529+
"-inf", "-1e300", "-1000000", "-1.5", "-1", "-0.001",
530+
"0", "0.001", "1", "1.5", "1000000", "1e300", "inf");
531+
532+
/// `float4` — an **ordered**, non-integer scalar (Postgres `real`). Reuses the
533+
/// four-domain ordered shape (`ORDERED_INT_DOMAINS`); only kind and fixtures
534+
/// differ. Both float widths encrypt through the SAME f64 crypto path
535+
/// (`Plaintext::Float`), so `float4` vs `float8` is purely a Postgres-surface
536+
/// distinction. Public (like `DATE`/`NUMERIC`) so the SQLx harness reads
537+
/// `FLOAT4.fixtures` directly to parse the strings into `f32`.
538+
pub const FLOAT4: ScalarSpec = ScalarSpec {
539+
token: "float4",
540+
kind: ScalarKind::F32,
541+
domains: ORDERED_INT_DOMAINS,
542+
fixtures: FLOAT4_FIXTURES,
543+
};
544+
545+
/// `float8` — an **ordered**, non-integer scalar (Postgres `double precision`),
546+
/// the native width of the float crypto path. Reuses the ordered shape. Public
547+
/// so the SQLx harness reads `FLOAT8.fixtures` directly to parse into `f64`.
548+
pub const FLOAT8: ScalarSpec = ScalarSpec {
549+
token: "float8",
550+
kind: ScalarKind::F64,
551+
domains: ORDERED_INT_DOMAINS,
552+
fixtures: FLOAT8_FIXTURES,
553+
};
554+
491555
/// The scalar catalog — the single source of truth. Order is significant (it
492556
/// drives generation order). New types are appended as their SQL surface lands.
493-
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ, NUMERIC, TEXT, BOOL];
557+
pub const CATALOG: &[ScalarSpec] = &[
558+
INT4,
559+
INT2,
560+
INT8,
561+
DATE,
562+
TIMESTAMPTZ,
563+
NUMERIC,
564+
TEXT,
565+
BOOL,
566+
FLOAT4,
567+
FLOAT8,
568+
];
494569

495570
/// Materialise an integer scalar's fixtures into a typed `&'static` slice at
496571
/// compile time. This is the **single-sourced** plaintext list the SQLx test

crates/eql-scalars/src/proptest_invariants.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ fn any_term() -> impl Strategy<Value = Term> {
1313
prop_oneof![Just(Term::Hm), Just(Term::Ore), Just(Term::Bloom)]
1414
}
1515

16-
/// Strategy over the eight scalar kinds.
16+
/// Strategy over the ten scalar kinds.
1717
fn any_kind() -> impl Strategy<Value = ScalarKind> {
1818
prop_oneof![
1919
Just(ScalarKind::I16),
@@ -24,6 +24,8 @@ fn any_kind() -> impl Strategy<Value = ScalarKind> {
2424
Just(ScalarKind::Jsonb),
2525
Just(ScalarKind::Date),
2626
Just(ScalarKind::Timestamptz),
27+
Just(ScalarKind::F32),
28+
Just(ScalarKind::F64),
2729
]
2830
}
2931

crates/eql-scalars/src/tests.rs

Lines changed: 113 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ mod rust_tests {
4242
assert_eq!(ScalarKind::Jsonb.as_bounded_int(), None);
4343
assert_eq!(ScalarKind::Date.as_bounded_int(), None);
4444
assert_eq!(ScalarKind::Timestamptz.as_bounded_int(), None);
45+
assert_eq!(ScalarKind::F32.as_bounded_int(), None);
46+
assert_eq!(ScalarKind::F64.as_bounded_int(), None);
4547
}
4648

4749
#[test]
@@ -80,6 +82,8 @@ mod rust_tests {
8082
assert!(!ScalarKind::Jsonb.is_int());
8183
assert!(!ScalarKind::Date.is_int());
8284
assert!(!ScalarKind::Timestamptz.is_int());
85+
assert!(!ScalarKind::F32.is_int());
86+
assert!(!ScalarKind::F64.is_int());
8387
}
8488

8589
#[test]
@@ -93,6 +97,8 @@ mod rust_tests {
9397
ScalarKind::Jsonb,
9498
ScalarKind::Date,
9599
ScalarKind::Timestamptz,
100+
ScalarKind::F32,
101+
ScalarKind::F64,
96102
] {
97103
assert!(!k.is_text());
98104
}
@@ -171,6 +177,8 @@ mod rust_tests {
171177
assert!(!ScalarKind::I16.is_temporal());
172178
assert!(!ScalarKind::I32.is_temporal());
173179
assert!(!ScalarKind::I64.is_temporal());
180+
assert!(!ScalarKind::F32.is_temporal());
181+
assert!(!ScalarKind::F64.is_temporal());
174182
}
175183

176184
#[test]
@@ -523,7 +531,7 @@ mod catalog_tests {
523531
}
524532

525533
#[test]
526-
fn catalog_has_int4_int2_int8_date_timestamptz_numeric_text_bool_in_order() {
534+
fn catalog_has_all_tokens_in_order() {
527535
let tokens: Vec<&str> = CATALOG.iter().map(|s| s.token).collect();
528536
assert_eq!(
529537
tokens,
@@ -535,7 +543,9 @@ mod catalog_tests {
535543
"timestamptz",
536544
"numeric",
537545
"text",
538-
"bool"
546+
"bool",
547+
"float4",
548+
"float8"
539549
]
540550
);
541551
}
@@ -896,6 +906,102 @@ mod values_tests {
896906
}
897907
}
898908

909+
mod float_tests {
910+
use crate::*;
911+
912+
fn scalar(token: &str) -> &'static ScalarSpec {
913+
CATALOG
914+
.iter()
915+
.find(|s| s.token == token)
916+
.unwrap_or_else(|| panic!("{token} missing from CATALOG"))
917+
}
918+
919+
#[test]
920+
fn float_specs_are_in_catalog_with_ordered_shape() {
921+
for token in ["float4", "float8"] {
922+
let s = scalar(token);
923+
let suffixes: Vec<_> = s.domains.iter().map(|d| d.suffix).collect();
924+
assert_eq!(suffixes, vec!["", "_eq", "_ord_ore", "_ord"]);
925+
}
926+
assert_eq!(scalar("float4").kind, ScalarKind::F32);
927+
assert_eq!(scalar("float8").kind, ScalarKind::F64);
928+
}
929+
930+
#[test]
931+
fn float_kinds_are_not_bounded_int_temporal_or_text() {
932+
for k in [ScalarKind::F32, ScalarKind::F64] {
933+
assert_eq!(k.as_bounded_int(), None);
934+
assert!(!k.is_int());
935+
assert!(!k.is_temporal());
936+
assert!(!k.is_text());
937+
assert!(k.is_float());
938+
}
939+
}
940+
941+
#[test]
942+
fn float_rust_types_are_f32_and_f64() {
943+
assert_eq!(ScalarKind::F32.rust_type(), "f32");
944+
assert_eq!(ScalarKind::F64.rust_type(), "f64");
945+
}
946+
947+
/// NaN and -0.0 must never be fixtures: NaN is unordered/unspecified in the
948+
/// encoder; -0.0 canonicalizes to +0.0 and would duplicate the +0.0 row.
949+
/// ±Inf MUST be present (the boundary pivots).
950+
#[test]
951+
fn float_fixtures_exclude_nan_and_negative_zero_and_include_infinities() {
952+
for token in ["float4", "float8"] {
953+
let s = scalar(token);
954+
let strings: Vec<&str> = s
955+
.fixtures
956+
.iter()
957+
.map(|f| match f {
958+
Fixture::Float(v) => *v,
959+
other => panic!("{token} fixture must be Fixture::Float, got {other:?}"),
960+
})
961+
.collect();
962+
for v in &strings {
963+
let parsed: f64 = v
964+
.parse()
965+
.unwrap_or_else(|_| panic!("{token} fixture {v:?} must parse as f64"));
966+
assert!(!parsed.is_nan(), "{token} fixture {v:?} is NaN");
967+
assert!(
968+
!(parsed == 0.0 && parsed.is_sign_negative()),
969+
"{token} fixture {v:?} is -0.0"
970+
);
971+
}
972+
assert!(strings.contains(&"inf"), "{token} must include +inf pivot");
973+
assert!(strings.contains(&"-inf"), "{token} must include -inf pivot");
974+
assert!(strings.contains(&"0"), "{token} must include 0 (origin)");
975+
}
976+
}
977+
978+
/// Distinct by parsed f64 value (the catalog dedupes only by literal string;
979+
/// the fixture table keys on the value, so an aliasing pair would break
980+
/// fetch_fixture_payload's fetch_one).
981+
#[test]
982+
fn float_fixtures_are_distinct_by_value() {
983+
for token in ["float4", "float8"] {
984+
let s = scalar(token);
985+
let parsed: Vec<u64> = s
986+
.fixtures
987+
.iter()
988+
.map(|f| match f {
989+
Fixture::Float(v) => {
990+
let x: f64 = v.parse().unwrap();
991+
// total_cmp bit key; -0.0 already excluded so +0.0 is unique.
992+
x.to_bits()
993+
}
994+
other => panic!("non-float fixture: {other:?}"),
995+
})
996+
.collect();
997+
let mut sorted = parsed.clone();
998+
sorted.sort_unstable();
999+
sorted.dedup();
1000+
assert_eq!(sorted.len(), parsed.len(), "{token} has duplicate fixtures");
1001+
}
1002+
}
1003+
}
1004+
8991005
mod invariant_tests {
9001006
use crate::*;
9011007
use std::collections::HashMap;
@@ -936,7 +1042,11 @@ mod invariant_tests {
9361042
| Fixture::Text(s)
9371043
| Fixture::Jsonb(s)
9381044
| Fixture::Date(s)
939-
| Fixture::Timestamptz(s) => DistinctKey::Str(s),
1045+
| Fixture::Timestamptz(s)
1046+
// Float fixtures dedupe by their literal here, like the other
1047+
// string-backed kinds (every float literal is distinct; the harness
1048+
// `float_fixtures_are_distinct_by_value` guard pins value-distinctness).
1049+
| Fixture::Float(s) => DistinctKey::Str(s),
9401050
// `bool` is storage-only and string-backed for distinctness: the two
9411051
// values dedupe by their literal, like the other non-numeric kinds.
9421052
Fixture::Bool(b) => DistinctKey::Str(if b { "true" } else { "false" }),

0 commit comments

Comments
 (0)