Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ tests/sqlx/migrations/001_install_eql.sql
tests/sqlx/fixtures/eql_v2*
tests/sqlx/fixtures/v3_ste_vec.sql
tests/sqlx/fixtures/v3_doc_int4.sql
tests/sqlx/fixtures/v3_numeric_collision.sql

# Generated encrypted-domain SQL — regenerated by `tasks/build.sh` from the
# eql-scalars::CATALOG via `cargo run -p eql-codegen` on every build. The
Expand Down
11 changes: 8 additions & 3 deletions CHANGELOG.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ This project uses `mise` for task management. Common commands:
This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for searchable encryption. Key architectural components:

### Core Structure
- **Schema**: Core EQL functions/types are in the `eql_v2` PostgreSQL schema. The encrypted-domain type families (`int4` and future scalar domains) live in a separate `eql_v3` schema (see below). The `eql_v3` surface is **self-contained**: it owns its own copies of the searchable-encrypted-metadata (SEM) index-term types (`eql_v3.hmac_256`, `eql_v3.ore_block_u64_8_256`, hand-written under `src/v3/sem/`) and has no runtime dependency on `eql_v2`. `eql_v2` is unchanged and remains the documented public API.
- **Schema**: Core EQL functions/types are in the `eql_v2` PostgreSQL schema. The encrypted-domain type families (`int4` and future scalar domains) live in a separate `eql_v3` schema (see below). The `eql_v3` surface is **self-contained**: it owns its own copies of the searchable-encrypted-metadata (SEM) index-term types (`eql_v3.hmac_256`, `eql_v3.ore_block_256`, hand-written under `src/v3/sem/`) and has no runtime dependency on `eql_v2`. `eql_v2` is unchanged and remains the documented public API.
- **Main Type**: `eql_v2_encrypted` - composite type for encrypted columns (stored as JSONB)
- **Configuration**: `eql_v2_configuration` table tracks encryption configs
- **Index Types**: Various encrypted index types (blake3, hmac_256, bloom_filter, ore variants)
Expand All @@ -64,7 +64,7 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search
- `src/operators/` - SQL operators for encrypted data comparisons
- `src/config/` - Configuration management functions
- `src/blake3/`, `src/hmac_256/`, `src/bloom_filter/`, `src/ore_*` - Index implementations
- `src/v3/` - Self-contained `eql_v3` surface: `src/v3/schema.sql`, forked `src/v3/crypto.sql` / `src/v3/common.sql`, hand-written SEM index-term types under `src/v3/sem/` (`hmac_256`, `ore_block_u64_8_256`), and the generated scalar encrypted-domain families under `src/v3/scalars/<T>/` (plus the shared blocker `src/v3/scalars/functions.sql`)
- `src/v3/` - Self-contained `eql_v3` surface: `src/v3/schema.sql`, forked `src/v3/crypto.sql` / `src/v3/common.sql`, hand-written SEM index-term types under `src/v3/sem/` (`hmac_256`, `ore_block_256`), and the generated scalar encrypted-domain families under `src/v3/scalars/<T>/` (plus the shared blocker `src/v3/scalars/functions.sql`)
- `tasks/` - mise task scripts
- `tests/sqlx/` - Rust/SQLx test framework (PostgreSQL 14-17 support)
- `release/` - Generated SQL installation files
Expand All @@ -78,7 +78,7 @@ This is the **Encrypt Query Language (EQL)** - a PostgreSQL extension for search

### Encrypted-Domain Types

`src/v3/scalars/` holds the generated **encrypted-domain type families** — jsonb-backed PostgreSQL domains in the **`eql_v3` schema**, one domain per operator/index capability (`eql_v3.<T>` storage-only, `eql_v3.<T>_eq`, `eql_v3.<T>_ord`). The schema qualifier replaces the old version-prefixed name, so the domains are `eql_v3.int4`, `eql_v3.int4_eq`, `eql_v3.int4_ord`, `eql_v3.int4_ord_ore` — created in `eql_v3`, not `public`. Their extractors/wrappers/aggregates (`eql_v3.eq_term`, `eql_v3.ord_term`, `eql_v3.eq`/`lt`/…, `eql_v3.min`/`max`) also live in `eql_v3`, and the SEM index-term types they return and construct (`eql_v3.hmac_256`, `eql_v3.ore_block_u64_8_256`) are **also `eql_v3`** — hand-written under `src/v3/sem/` so the whole v3 surface is self-contained (no `eql_v2.<symbol>` appears anywhere in v3 SQL; CI gates this via `mise run test:self_contained_v3` and the standalone `release/cipherstash-encrypt-v3.sql` installer). `eql_v3.int4` (PR #239, supersedes #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, `timestamp`, `text`, and `jsonb` follow this materializer pattern. `text`, `numeric`, and `jsonb` are planned but have no generated SQL surface yet — `jsonb` in particular needs a separate SQL design beyond the ordered-scalar materializer. The `eql-scalars` fixture catalog (`crates/eql-scalars`) already models their fixture values ahead of the SQL surface.
`src/v3/scalars/` holds the generated **encrypted-domain type families** — jsonb-backed PostgreSQL domains in the **`eql_v3` schema**, one domain per operator/index capability (`eql_v3.<T>` storage-only, `eql_v3.<T>_eq`, `eql_v3.<T>_ord`). The schema qualifier replaces the old version-prefixed name, so the domains are `eql_v3.int4`, `eql_v3.int4_eq`, `eql_v3.int4_ord`, `eql_v3.int4_ord_ore` — created in `eql_v3`, not `public`. Their extractors/wrappers/aggregates (`eql_v3.eq_term`, `eql_v3.ord_term`, `eql_v3.eq`/`lt`/…, `eql_v3.min`/`max`) also live in `eql_v3`, and the SEM index-term types they return and construct (`eql_v3.hmac_256`, `eql_v3.ore_block_256`) are **also `eql_v3`** — hand-written under `src/v3/sem/` so the whole v3 surface is self-contained (no `eql_v2.<symbol>` appears anywhere in v3 SQL; CI gates this via `mise run test:self_contained_v3` and the standalone `release/cipherstash-encrypt-v3.sql` installer). `eql_v3.int4` (PR #239, supersedes #225) is the reference scalar implementation; future scalar types such as `int8`, `bool`, `date`, `float`, `numeric`, `timestamp`, `text`, and `jsonb` follow this materializer pattern. `text`, `numeric`, and `jsonb` are planned but have no generated SQL surface yet — `jsonb` in particular needs a separate SQL design beyond the ordered-scalar materializer. The `eql-scalars` fixture catalog (`crates/eql-scalars`) already models their fixture values ahead of the SQL surface.

Adding a scalar encrypted-domain type is one row in the Rust catalog `eql-scalars::CATALOG` (`crates/eql-scalars/src/lib.rs`): a `ScalarSpec` giving the type `token` (e.g. `int8`), its `ScalarKind` (the `kind` field), the `DomainSpec`s mapping each generated domain suffix to its fixed index `Term`s (`_eq => [Hm]`, `_ord`/`_ord_ore => [Ore]`), and the `Fixture` value list. Term capabilities are fixed in the `Term` enum's `impl` methods (with unit tests): `Hm` provides equality, and `Ore` provides equality plus ordering. There is no TOML manifest and no Python — the catalog is the source of truth, validated by the compiler (an undefined term or unknown scalar is a compile error) plus catalog `#[test]`s. `mise run build` runs `cargo run -p eql-codegen`, which regenerates the scalar SQL surface into `src/v3/scalars/<T>/` from `CATALOG` at the start of every build; that surface includes supported comparison wrappers plus blockers for native `jsonb` operators that would otherwise be reachable through domain fallback. `cargo run -p eql-codegen` regenerates every type at once (the same call `mise run build` uses; there is no per-type codegen task). The generated `*_types.sql` / `*_functions.sql` / `*_operators.sql` / `*_aggregates.sql` files are gitignored and never committed. The per-type plaintext fixture lists the SQLx matrix consumes are **not** a generated file — they are materialised from each `CATALOG` row at compile time as `eql_scalars::INT4_VALUES` / `INT2_VALUES` (the `int_values!` macro) and read directly by `ScalarType::FIXTURE_VALUES`; a Rust source of truth no longer round-trips through a committed generated `.rs`. Generated SQL carries a `-- AUTOMATICALLY GENERATED FILE` header (the project-wide marker `docs:validate` greps on); change the catalog and rebuild, never hand-edit. Hand-written SQL beyond the fixed surface goes in `src/v3/scalars/<T>/<T>_extensions.sql` with no auto-generated header and explicit `-- REQUIRE:` edges — that file IS committed. `text` and `jsonb` are out of scope for this scalar materializer.

Expand Down
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion crates/eql-codegen/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ pub struct FunctionsContext {
/// Build the inlinable index-extractor entry for a domain term.
///
/// The `RETURNS` type name equals the constructor name (`hmac_256`,
/// `ore_block_u64_8_256`); qualify it with `SCHEMA` — the same schema as the
/// `ore_block_256`); qualify it with `SCHEMA` — the same schema as the
/// body's constructor call — so the declared return type and the call stay in
/// lockstep. `Term::returns()` is intentionally not used.
pub fn extractor_entry(term: Term) -> FnEntry {
Expand Down
2 changes: 1 addition & 1 deletion crates/eql-codegen/src/generate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ mod tests {
let sql = render_functions_file(s.token, domain(s, "_ord"));
assert_eq!(sql.matches("CREATE FUNCTION").count(), 45);
assert!(sql.contains("CREATE FUNCTION eql_v3.ord_term(a eql_v3.int4_ord)"));
assert!(sql.contains("RETURNS eql_v3.ore_block_u64_8_256"));
assert!(sql.contains("RETURNS eql_v3.ore_block_256"));
assert_eq!(
sql.matches("LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE")
.count(),
Expand Down
15 changes: 8 additions & 7 deletions crates/eql-scalars/src/kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,11 @@ impl ScalarKind {
}

/// A debug/identifier string for the kind: the canonical Rust plaintext type
/// name (`"i32"`, `"chrono::NaiveDate"`). `Numeric`/`Jsonb` have **no
/// generated SQL surface** and no catalog row, so calling this on them is a
/// programming error and panics loudly rather than returning a plausible SQL
/// token a premature caller might feed into codegen. Only call site today is
/// `crates/eql-scalars/src/tests.rs`.
/// name (`"i32"`, `"chrono::NaiveDate"`, `"rust_decimal::Decimal"`). `Jsonb`
/// has **no generated SQL surface** and no catalog row, so calling this on it
/// is a programming error and panics loudly rather than returning a plausible
/// SQL token a premature caller might feed into codegen. Only call site today
/// is `crates/eql-scalars/src/tests.rs`.
pub const fn rust_type(self) -> &'static str {
match self {
ScalarKind::I16 => "i16",
Expand All @@ -110,8 +110,9 @@ impl ScalarKind {
ScalarKind::Text => "text",
ScalarKind::Date => "chrono::NaiveDate",
ScalarKind::Timestamptz => "chrono::DateTime<Utc>",
ScalarKind::Numeric | ScalarKind::Jsonb => {
panic!("ScalarKind::rust_type: numeric/jsonb have no generated surface yet")
ScalarKind::Numeric => "rust_decimal::Decimal",
ScalarKind::Jsonb => {
panic!("ScalarKind::rust_type: jsonb has no generated surface yet")
}
}
}
Expand Down
70 changes: 49 additions & 21 deletions crates/eql-scalars/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -230,13 +230,15 @@ const ORDERED_INT_DOMAINS: &[DomainSpec] = &[
},
];

/// Equality-only domains: storage (no terms) + `_eq` (hm). Used by scalar types
/// that can hash for equality but cannot (yet) be ordered. `timestamptz` is the
/// first such type: cipherstash encrypts `Plaintext::Timestamp` at native
/// 12-block ORE width, but EQL's only ORE comparator
/// (`eql_v2.compare_ore_block_u64_8_256_term`) is hardcoded to 8 blocks, so an
/// ordered domain would silently mis-order. Ordering is deferred until a
/// wide-ORE (12-block) term exists.
/// Equality-only domains: storage (no terms) + `_eq` (hm). The canonical shape
/// for a scalar type that can hash for equality but is not ORE-orderable.
/// **Currently unused:** `timestamptz` (the previous sole user) was promoted to
/// the ordered shape once `eql_v3.compare_ore_block_256_term` generalized to N
/// blocks and could order its native 12-block ORE width. Retained — and still
/// validated as a known-valid shape by `every_type_uses_a_known_domain_shape` —
/// so a future non-orderable scalar (e.g. a hash-only type) can reuse it without
/// reconstructing the shape.
#[allow(dead_code)]
const EQ_ONLY_DOMAINS: &[DomainSpec] = &[
DomainSpec {
suffix: "",
Expand Down Expand Up @@ -296,6 +298,17 @@ const TIMESTAMPTZ_FIXTURES: &[Fixture] = fixtures!(timestamptz;
"2012-06-30T11:59:59Z", "2016-03-15T08:15:30Z", "2020-10-21T14:45:00Z",
"2024-02-29T17:30:45Z", "2038-01-19T03:14:07Z", "2099-12-31T23:59:59Z");

/// `numeric` fixture plaintexts — distinct by `Decimal` value, spanning sign,
/// magnitude, and scale, and including `0` plus the min/max pivots
/// (`-1000000000000` / `1000000000000`). They mirror `ore-rs`'s own
/// order-pinning vectors so the 14-block ORE edges (sign + high/low blocks) are
/// exercised. Each literal is distinct by parsed value (no `"1"`/`"1.0"`
/// aliasing) — the harness `numeric_fixtures_distinct_by_value` guard enforces
/// this, since the zero-dep catalog only dedupes by literal string.
const NUMERIC_FIXTURES: &[Fixture] = fixtures!(numeric;
"-1000000000000", "-1000000", "-1.001", "-1", "-0.5", "-0.001",
"0", "0.001", "0.5", "0.999999999", "1", "1.001", "1000000", "1000000000000");

const INT4: ScalarSpec = ScalarSpec {
token: "int4",
kind: ScalarKind::I32,
Expand Down Expand Up @@ -332,27 +345,42 @@ pub const DATE: ScalarSpec = ScalarSpec {
fixtures: DATE_FIXTURES,
};

/// `timestamptz` — an **equality-only** (UTC-normalized) non-integer scalar.
/// Uses `EQ_ONLY_DOMAINS` (storage + `_eq`) rather than the four-domain ordered
/// shape: cipherstash encrypts `Plaintext::Timestamp` at native 12-block ORE
/// width, but EQL's only ORE comparator
/// (`eql_v2.compare_ore_block_u64_8_256_term`) is hardcoded to 8 blocks, so an
/// ordered timestamptz domain would silently mis-order. Ordering is deferred to
/// a future PR that adds a wide-ORE (12-block) term. The three "pivot" fixture
/// values are retained as equality pivots; the kind stays ordered-shaped
/// (carries a rust type, no i128 range) so the harness can parse them.
/// `timestamptz` — an **ordered**, UTC-normalized non-integer scalar. Uses the
/// four-domain ordered shape (storage, `_eq`, `_ord`, `_ord_ore`): cipherstash
/// encrypts `Plaintext::Timestamp` at native 12-block ORE width, which the
/// generalized `eql_v3.compare_ore_block_256_term` comparator orders correctly.
/// Values are UTC-normalized (cipherstash has no tz-preserving type) and encrypt
/// under the `timestamp` cast.
///
/// Public (like `DATE`) because the SQLx harness reads `TIMESTAMPTZ.fixtures`
/// directly to parse the RFC3339 strings into `chrono::DateTime<Utc>` at
/// runtime — there is no `TIMESTAMPTZ_VALUES` const (chrono is not
/// `const`-friendly and `eql-scalars` stays zero-dep).
/// directly to parse the RFC3339 strings into `chrono::DateTime<Utc>` at runtime
/// (no `TIMESTAMPTZ_VALUES` const; `eql-scalars` stays zero-dep).
pub const TIMESTAMPTZ: ScalarSpec = ScalarSpec {
token: "timestamptz",
kind: ScalarKind::Timestamptz,
domains: EQ_ONLY_DOMAINS,
domains: ORDERED_INT_DOMAINS,
fixtures: TIMESTAMPTZ_FIXTURES,
};

/// `numeric` — an **ordered** non-integer scalar backed by
/// `rust_decimal::Decimal`. Uses the four-domain ordered shape: cipherstash
/// encrypts `Plaintext::Decimal` at native 14-block ORE width, which the
/// generalized `eql_v3.compare_ore_block_256_term` comparator orders correctly.
/// `numeric_value` returns `None` (no i128 range); ordering is supplied by the
/// harness `Decimal: Ord`, which `ore-rs` guarantees agrees with the ciphertext
/// order (equivalent scales collide, like `Decimal`'s own `Ord`).
///
/// Public (like `DATE` / `TIMESTAMPTZ`) so the SQLx harness reads
/// `NUMERIC.fixtures` directly to parse the decimal strings into
/// `rust_decimal::Decimal` at runtime (the catalog stays zero-dep: no
/// `rust_decimal`).
pub const NUMERIC: ScalarSpec = ScalarSpec {
token: "numeric",
kind: ScalarKind::Numeric,
domains: ORDERED_INT_DOMAINS,
fixtures: NUMERIC_FIXTURES,
};

/// Domains for `text`: the ordered shape (with exact `hm` equality on the
/// ordered domains), a `_match` domain (`Bloom` containment), and a combined
/// `_search` domain carrying equality + ordering + match in one type.
Expand Down Expand Up @@ -417,7 +445,7 @@ pub const TEXT: ScalarSpec = ScalarSpec {

/// The scalar catalog — the single source of truth. Order is significant (it
/// drives generation order). New types are appended as their SQL surface lands.
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ, TEXT];
pub const CATALOG: &[ScalarSpec] = &[INT4, INT2, INT8, DATE, TIMESTAMPTZ, NUMERIC, TEXT];

/// Materialise an integer scalar's fixtures into a typed `&'static` slice at
/// compile time. This is the **single-sourced** plaintext list the SQLx test
Expand Down
6 changes: 3 additions & 3 deletions crates/eql-scalars/src/term.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ impl Term {
pub const fn ctor(self) -> &'static str {
match self {
Term::Hm => "hmac_256",
Term::Ore => "ore_block_u64_8_256",
Term::Ore => "ore_block_256",
Term::Bloom => "bloom_filter",
}
}
Expand Down Expand Up @@ -57,8 +57,8 @@ impl Term {
match self {
Term::Hm => &["src/v3/sem/hmac_256/functions.sql"],
Term::Ore => &[
"src/v3/sem/ore_block_u64_8_256/functions.sql",
"src/v3/sem/ore_block_u64_8_256/operators.sql",
"src/v3/sem/ore_block_256/functions.sql",
"src/v3/sem/ore_block_256/operators.sql",
],
Term::Bloom => &["src/v3/sem/bloom_filter/functions.sql"],
}
Expand Down
Loading
Loading