Skip to content

Commit 3eca6c5

Browse files
authored
Merge pull request #260 from cipherstash/v3-domain-type-text
feat(scalars): add eql_v3.text encrypted-domain family (eq / match / ord)
2 parents 49e6d01 + 7df91e0 commit 3eca6c5

28 files changed

Lines changed: 1615 additions & 259 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Each entry that ships in a published release links to the PR that introduced it.
2828
- **`eql_v3.date` encrypted-domain type family.** Four jsonb-backed domains for encrypted `date` columns — `eql_v3.date` (storage-only), `eql_v3.date_eq` (`=` / `<>` via HMAC), and `eql_v3.date_ord` / `eql_v3.date_ord_ore` (also `<` `<=` `>` `>=` via ORE block terms, with `MIN` / `MAX` aggregates) — generated from the `date` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. Plaintexts encrypt under the `date` cast and compare via the same ORE block terms as the integer scalars (ORE is plaintext-agnostic — dates order like integers). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` extractors, not an operator class on the domain. Why: the first **non-integer ordered** scalar encrypted-domain type — a type-safe, per-capability encrypted `date` column — proving the generator and SQLx test matrix generalize beyond fixed-width integers. ([#256](https://github.com/cipherstash/encrypt-query-language/pull/256))
2929
- **`eql_v3.timestamptz` encrypted-domain type family (equality-only).** Two jsonb-backed domains for encrypted `timestamptz` columns — `eql_v3.timestamptz` (storage-only) and `eql_v3.timestamptz_eq` (`=` / `<>` via HMAC) — generated from the `timestamptz` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.date` family. Values are **UTC-normalized** (cipherstash has no timezone-preserving type): plaintexts encrypt under the `timestamp` cast. Index via a functional index on the `eql_v3.eq_term` extractor, not an operator class on the domain. **Ordering (`<` `<=` `>` `>=`, `MIN` / `MAX`) is deferred:** cipherstash encrypts `Plaintext::Timestamp` at native 12-block ORE width, but EQL's only ORE comparator (`eql_v2.compare_ore_block_u64_8_256_term`) is hardcoded to 8 blocks, so ordered timestamptz domains would silently mis-order. There are no `eql_v3.timestamptz_ord` / `_ord_ore` domains and no timestamptz `MIN` / `MAX` aggregates until a wide-ORE (12-block) term lands — tracked in [#241](https://github.com/cipherstash/encrypt-query-language/issues/241). Why: a type-safe, equality-searchable encrypted UTC-timestamp column, stacking on the `date` temporal-scalar foundation; ordering follows once the comparator supports the native ciphertext width. ([#257](https://github.com/cipherstash/encrypt-query-language/pull/257))
3030
- **Per-domain `MIN` / `MAX` aggregates for the encrypted-domain family.** `eql_v3.min(eql_v3.<T>_ord)` / `eql_v3.max(eql_v3.<T>_ord)` (and the `_ord_ore` twin) are generated for every ord-capable scalar variant, giving type-safe extrema on domain-typed columns — comparison routes through the variant's `<` / `>` operator (ORE block term, no decryption). The aggregates are declared `PARALLEL = SAFE` with a combine function (the state function itself — min/max are associative), so PostgreSQL can use partial/parallel aggregation on large `GROUP BY` workloads. Why: the new domain types previously had no equivalent of the composite-type aggregates. The existing `eql_v2.min(eql_v2_encrypted)` / `eql_v2.max(eql_v2_encrypted)` aggregates are **retained** and continue to work on `eql_v2_encrypted` columns; the per-domain aggregates are additive and coexist with them. ([#239](https://github.com/cipherstash/encrypt-query-language/pull/239))
31+
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
3132
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
3233

3334
### Changed

crates/eql-codegen/src/context.rs

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -310,19 +310,43 @@ mod tests {
310310
#[test]
311311
fn operator_entry_emits_metadata_only_when_supported() {
312312
use crate::operator_surface::operator;
313-
// Supported comparison operator carries its planner metadata.
314-
let eq = operator_entry(&operator("="), "eql_v3.int4_eq", "eql_v3.int4_eq", true);
315-
assert_eq!(eq.symbol, "=");
316-
assert_eq!(eq.function_name, "eq");
317-
assert_eq!(
318-
eq.metadata.as_deref(),
319-
Some("COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel")
320-
);
321-
// The same operator, unsupported on this domain → no metadata line.
322-
let eq_unsupported = operator_entry(&operator("="), "eql_v3.int4", "eql_v3.int4", false);
323-
assert_eq!(eq_unsupported.metadata, None);
324-
// Supported but metadata-less operator (`@>`) → still no metadata line.
325-
let contains = operator_entry(&operator("@>"), "eql_v3.int4_eq", "eql_v3.int4_eq", true);
326-
assert_eq!(contains.metadata, None);
313+
314+
// (symbol, domain, supported) -> expected `CREATE OPERATOR` metadata
315+
// clause. Adding a term that carries operator metadata is one new row
316+
// here, not another hand-rolled assertion block.
317+
let cases: &[(&str, &str, bool, Option<&str>)] = &[
318+
// Supported comparison operator carries its planner metadata.
319+
(
320+
"=",
321+
"eql_v3.int4_eq",
322+
true,
323+
Some("COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel"),
324+
),
325+
// The same operator, unsupported on this domain → no metadata line.
326+
("=", "eql_v3.int4", false, None),
327+
// Supported but metadata-less operator (`->`) → still no metadata.
328+
("->", "eql_v3.int4_eq", true, None),
329+
// `@>` carries containment metadata when supported (the Bloom
330+
// `text_match` path).
331+
(
332+
"@>",
333+
"eql_v3.text_match",
334+
true,
335+
Some("COMMUTATOR = <@, RESTRICT = contsel, JOIN = contjoinsel"),
336+
),
337+
// ... but suppressed when `@>` is a blocker (non-Bloom domains),
338+
// which is why the int4 golden is unchanged.
339+
("@>", "eql_v3.int4_eq", false, None),
340+
];
341+
342+
for (symbol, dom, supported, expected) in cases {
343+
let entry = operator_entry(&operator(symbol), dom, dom, *supported);
344+
assert_eq!(entry.symbol, *symbol);
345+
assert_eq!(
346+
entry.metadata.as_deref(),
347+
*expected,
348+
"operator {symbol} on {dom} (supported={supported})",
349+
);
350+
}
327351
}
328352
}

crates/eql-codegen/src/operator_surface.rs

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ impl OperatorMetadata {
3333
}
3434

3535
/// Render the `CREATE OPERATOR` metadata clause, or `None` when no hint is
36-
/// present (the `@>`/`<@` symmetric-but-empty case collapses to `None`).
36+
/// present (e.g. the path-selector operators, which carry no metadata).
3737
pub fn render(self) -> Option<String> {
3838
let mut extras = Vec::new();
3939
if let Some(c) = self.commutator {
@@ -208,6 +208,18 @@ const fn cmp_metadata(
208208
}
209209
}
210210

211+
/// Containment-operator metadata (`@>` / `<@`): commutator is the mirror
212+
/// operator, no negator (a non-containment is not another listed operator),
213+
/// containment selectivity estimators.
214+
const fn containment_metadata(commutator: &'static str) -> OperatorMetadata {
215+
OperatorMetadata {
216+
restrict: Some("contsel"),
217+
join: Some("contjoinsel"),
218+
commutator: Some(commutator),
219+
negator: None,
220+
}
221+
}
222+
211223
/// The 20-operator catalog. Order is: comparison operators, then path-selector
212224
/// operators, then the remaining native jsonb operators.
213225
pub const OPERATORS: &[Operator] = &[
@@ -251,13 +263,13 @@ pub const OPERATORS: &[Operator] = &[
251263
symbol: "@>",
252264
function_name: "contains",
253265
signatures: BOOL_SYMMETRIC_SIGNATURES,
254-
metadata: OperatorMetadata::none(),
266+
metadata: containment_metadata("<@"),
255267
},
256268
Operator {
257269
symbol: "<@",
258270
function_name: "contained_by",
259271
signatures: BOOL_SYMMETRIC_SIGNATURES,
260-
metadata: OperatorMetadata::none(),
272+
metadata: containment_metadata("@>"),
261273
},
262274
Operator {
263275
symbol: "->",
@@ -519,7 +531,25 @@ mod tests {
519531
"COMMUTATOR = =, NEGATOR = <>, RESTRICT = eqsel, JOIN = eqjoinsel"
520532
);
521533
assert_eq!(operator("->").metadata.render(), None);
522-
assert_eq!(operator("@>").metadata.render(), None);
534+
// `@>`/`<@` now carry containment metadata (no negator).
535+
assert_eq!(
536+
operator("@>").metadata.render().unwrap(),
537+
"COMMUTATOR = <@, RESTRICT = contsel, JOIN = contjoinsel"
538+
);
539+
}
540+
541+
#[test]
542+
fn containment_operators_have_containment_metadata() {
543+
let c = operator("@>");
544+
assert_eq!(c.metadata.commutator, Some("<@"));
545+
assert_eq!(c.metadata.restrict, Some("contsel"));
546+
assert_eq!(c.metadata.join, Some("contjoinsel"));
547+
assert_eq!(c.metadata.negator, None);
548+
let cb = operator("<@");
549+
assert_eq!(cb.metadata.commutator, Some("@>"));
550+
assert_eq!(cb.metadata.restrict, Some("contsel"));
551+
assert_eq!(cb.metadata.join, Some("contjoinsel"));
552+
assert_eq!(cb.metadata.negator, None);
523553
}
524554

525555
#[test]

crates/eql-scalars/src/fixture.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,10 @@ impl Fixture {
2525
Some(k) => Some(k.max_value()),
2626
None => None,
2727
},
28-
Fixture::Zero => Some(0),
28+
Fixture::Zero => match kind.as_bounded_int() {
29+
Some(_) => Some(0),
30+
None => None,
31+
},
2932
Fixture::Int(n) => Some(n),
3033
Fixture::Numeric(_)
3134
| Fixture::Text(_)

crates/eql-scalars/src/lib.rs

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ pub enum ScalarKind {
8383
pub enum Term {
8484
Hm,
8585
Ore,
86+
Bloom,
8687
}
8788

8889
/// A single fixture plaintext value, value-kind tagged: `Min`/`Max`/`Zero` are
@@ -298,9 +299,61 @@ pub const TIMESTAMPTZ: ScalarSpec = ScalarSpec {
298299
fixtures: TIMESTAMPTZ_FIXTURES,
299300
};
300301

302+
/// Domains for `text`: the ordered shape plus a `_match` domain backed by the
303+
/// `Bloom` term (`@>`/`<@` containment). The ordered subset (`""`, `_eq`,
304+
/// `_ord_ore`, `_ord`) is identical to `ORDERED_INT_DOMAINS`; `_match` is the
305+
/// only addition, so text still runs the standard ordered matrix.
306+
const TEXT_DOMAINS: &[DomainSpec] = &[
307+
DomainSpec {
308+
suffix: "",
309+
terms: &[],
310+
},
311+
DomainSpec {
312+
suffix: "_eq",
313+
terms: &[Term::Hm],
314+
},
315+
DomainSpec {
316+
suffix: "_match",
317+
terms: &[Term::Bloom],
318+
},
319+
DomainSpec {
320+
suffix: "_ord_ore",
321+
terms: &[Term::Ore],
322+
},
323+
DomainSpec {
324+
suffix: "_ord",
325+
terms: &[Term::Ore],
326+
},
327+
];
328+
329+
/// `text` fixture plaintexts — curated so eq/ord give a lexicographic spread
330+
/// and the match suite has a known substring pair (`"aardvark"`/`"aard"`,
331+
/// sharing 3-grams) and a disjoint value (`"zzzz"`, no shared 3-grams).
332+
/// `"aard"` is the lexicographic `min_pivot`, `"zzzz"` the `max_pivot`, and
333+
/// `"frank"` the interior `mid_pivot`; all three must be present verbatim so the
334+
/// matrix can fetch their ciphertext. All distinct.
335+
///
336+
/// The empty string is deliberately **not** a fixture: text is an ordered, not
337+
/// signed, scalar (no numeric origin), and `""` encrypts to an empty ORE term
338+
/// whose comparison is undefined (see issue #262). The interior pivot is a real
339+
/// median value, not `String::default()`.
340+
const TEXT_FIXTURES: &[Fixture] = fixtures!(text;
341+
"aard", "aardvark", "alice", "bob", "carol",
342+
"dave", "erin", "frank", "mallory", "trent", "zzzz");
343+
344+
/// `text` — an ordered, non-integer, unbounded scalar. Adds a `_match` domain
345+
/// (the `Bloom` term) on top of the ordered shape. Public because the SQLx
346+
/// harness reads `TEXT_VALUES` (materialised below).
347+
pub const TEXT: ScalarSpec = ScalarSpec {
348+
token: "text",
349+
kind: ScalarKind::Text,
350+
domains: TEXT_DOMAINS,
351+
fixtures: TEXT_FIXTURES,
352+
};
353+
301354
/// The scalar catalog — the single source of truth. Order is significant (it
302355
/// drives generation order). New types are appended as their SQL surface lands.
303-
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ];
356+
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ, TEXT];
304357

305358
/// Materialise an integer scalar's fixtures into a typed `&'static` slice at
306359
/// compile time. This is the **single-sourced** plaintext list the SQLx test
@@ -353,5 +406,37 @@ int_values!(INT4_VALUES, i32, INT4);
353406
int_values!(INT2_VALUES, i16, INT2);
354407
int_values!(INT8_VALUES, i64, INT8);
355408

409+
/// Materialise a `text` scalar's fixtures into a `&'static [&'static str]` at
410+
/// compile time — the single-sourced plaintext list the SQLx matrix reads via
411+
/// `ScalarType::fixture_values()` and the fixture generator encrypts. Unlike
412+
/// `date` (chrono is not `const`-friendly), a `Fixture::Text(&'static str)` is
413+
/// already const, so text materialises a typed slice like the integer kinds.
414+
/// A non-text fixture is a const-eval panic (compile-time guard).
415+
macro_rules! text_values {
416+
($name:ident, $spec:expr) => {
417+
#[doc = concat!("Distinct plaintext fixture values for `", stringify!($spec), "`, ")]
418+
#[doc = "materialised from its `CATALOG` row (see `text_values!`)."]
419+
pub const $name: &[&'static str] = {
420+
const SPEC: ScalarSpec = $spec;
421+
const N: usize = SPEC.fixtures.len();
422+
const ARR: [&'static str; N] = {
423+
let mut out = [""; N];
424+
let mut i = 0;
425+
while i < N {
426+
out[i] = match SPEC.fixtures[i] {
427+
Fixture::Text(s) => s,
428+
_ => panic!("text scalar fixture must be Fixture::Text"),
429+
};
430+
i += 1;
431+
}
432+
out
433+
};
434+
&ARR
435+
};
436+
};
437+
}
438+
439+
text_values!(TEXT_VALUES, TEXT);
440+
356441
#[cfg(test)]
357442
mod tests;

crates/eql-scalars/src/term.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ impl Term {
1111
match self {
1212
Term::Hm => "hm",
1313
Term::Ore => "ob",
14+
Term::Bloom => "bf",
1415
}
1516
}
1617

@@ -19,6 +20,7 @@ impl Term {
1920
match self {
2021
Term::Hm => "eq_term",
2122
Term::Ore => "ord_term",
23+
Term::Bloom => "match_term",
2224
}
2325
}
2426

@@ -27,6 +29,7 @@ impl Term {
2729
match self {
2830
Term::Hm => "hmac_256",
2931
Term::Ore => "ore_block_u64_8_256",
32+
Term::Bloom => "bloom_filter",
3033
}
3134
}
3235

@@ -35,6 +38,7 @@ impl Term {
3538
match self {
3639
Term::Hm => "eq",
3740
Term::Ore => "ord",
41+
Term::Bloom => "match",
3842
}
3943
}
4044

@@ -43,6 +47,7 @@ impl Term {
4347
match self {
4448
Term::Hm => &["=", "<>"],
4549
Term::Ore => &["=", "<>", "<", "<=", ">", ">="],
50+
Term::Bloom => &["@>", "<@"],
4651
}
4752
}
4853

@@ -54,6 +59,7 @@ impl Term {
5459
"src/v3/sem/ore_block_u64_8_256/functions.sql",
5560
"src/v3/sem/ore_block_u64_8_256/operators.sql",
5661
],
62+
Term::Bloom => &["src/v3/sem/bloom_filter/functions.sql"],
5763
}
5864
}
5965
}

0 commit comments

Comments
 (0)