Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 19 additions & 12 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ All jobs run on `blacksmith-16vcpu-ubuntu-2204`. "PG set" follows the event
| **test** (sharded) | `test:sqlx:partition` | Run the archived sqlx binaries (default features), hash-partitioned across shards | yes (per PG) | no (replays archive) |
| **e2e** | `test:sqlx:e2e` | The `proptest-e2e` fresh-encryption property suite (`e2e_oracle`) — PG17 only, version-independent | yes (PG17) | **yes** |
| **validate** (per PG) | `docs:validate:documented-sql` + `test:clean_install_v3` | DB-backed SQL doc-syntax check; clean-DB `eql_v3` install smoke | yes | no |
| **docs-static** | `docs:validate:source` | SQL doxygen coverage + required-tags (DB-free); **unconditional — runs on every PR incl. docs-only** | no | no |
| **docs-static** | `docs:validate:source` | SQL doxygen coverage + required-tags (DB-free); relevance-gated like the other heavy jobs (its inputs — `src/**`, the `crates/**` codegen build, `tasks/docs/**` — are a subset of the `relevant` filter) | no | no |
| **schema** | `test:schema` | v2.2 / v2.3 payload JSON-schema validation | no | no |
| **rust-crates** | `test:crates` + `types:check` | `cargo fmt --check`, clippy + `cargo test` for `eql-scalars` / `eql-codegen` / `eql-tests-macros` / `eql-types`; verify TS bindings + JSON schemas are fresh | no | no |
| **codegen** | `codegen:parity` | Generated encrypted-domain SQL matches the golden output | no | no |
Expand Down Expand Up @@ -145,19 +145,26 @@ non-default feature and needs `CS_*` at run time).
stale `cargo expand` snapshot surfaces on the daily schedule, not on the PR
that introduced it. Accepted trade-off.

2. **`docs/**` markdown is not content-validated.** The `docs-static` job
guarantees the SQL `--!` doxygen comments under `src/**` are always checked,
but nothing lints the prose/links in `docs/**` itself. A docs-only PR now runs
`docs-static` (so it is no longer un-gated), but that job validates *source*
documentation, not the markdown the PR changed. Adding a markdown
linter/link-checker is a separate, unfilled capability.
2. **`docs/**` markdown is not content-validated.** The `docs-static` job checks
the SQL `--!` doxygen comments under `src/**`, not the prose/links in `docs/**`
itself. A markdown-only PR leaves `relevant` false, so `docs-static` is skipped
along with the other heavy jobs — and that loses no coverage, because the job's
inputs (`src/**` `.sql`/`.template`, the `crates/**` codegen build, the
`tasks/docs/**` scripts) are all in the `relevant` filter, so a PR that doesn't
trip `relevant` cannot change its outcome. Linting the markdown the PR actually
changed (prose/links) is a separate, unfilled capability.

### Recently closed

- *The e2e (fresh-encryption) suite never ran in CI.* Now covered by the **e2e**
job (`test:sqlx:e2e`), PG17, on relevant PRs + the queue.
- *Docs-only PRs ran no doc validation.* The **docs-static** job now runs the
source-only doc checks unconditionally on every PR.
- *`docs-static` ran unconditionally on every PR.* It is now relevance-gated like
every other heavy job. Because its inputs are a strict subset of the `relevant`
filter, gating it both makes the workflow consistent (one uniform `if:`) and
drops a redundant codegen build on markdown-only PRs without losing any
coverage. A narrower bespoke `src/**`-only filter was rejected: it would risk a
silent false-green (`ci-required` counts `skipped` as pass) by skipping on a
real input change in `crates/**` or `tasks/docs/**`.

---

Expand All @@ -175,9 +182,9 @@ Then verify (see `docs/plans/2026-06-09-ci-pr-feedback-sharding-rollout.md`):
4 `Validate …` jobs + `build-archive`, `e2e`, `docs-static`, `schema`,
`rust-crates`, `codegen`, `self-contained-v3`, `matrix-coverage`, `splinter`) →
`ci-required` green → PR merges.
- **Open a docs-only PR** → on its `pull_request` run the relevance-gated heavy
jobs skip, but `docs-static` still runs; `ci-required` reports **Success** (not
stuck *Pending*), so the PR can be queued.
- **Open a docs-only PR** → on its `pull_request` run every relevance-gated heavy
job skips (`docs-static` included); `ci-required` reports **Success** (not stuck
*Pending*) because it counts `skipped` as pass, so the PR can be queued.

## References

Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/test-eql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -268,10 +268,10 @@ jobs:
run: |
mise run postgres:up postgres-${POSTGRES_VERSION} --extra-args "--detach --wait"

# Source-only doc checks (coverage + required-tags) moved to the
# unconditional `docs-static` job so they run on every PR (incl. docs-only)
# and exactly once, not per-Postgres. This step keeps only the DB-backed
# SQL-syntax validation, which genuinely needs the per-version Postgres.
# Source-only doc checks (coverage + required-tags) moved to the dedicated
# `docs-static` job so they run exactly once, not per-Postgres. This step
# keeps only the DB-backed SQL-syntax validation, which genuinely needs the
# per-version Postgres.
- name: Validate documented SQL syntax (Postgres ${{ matrix.postgres-version }})
run: |
mise run docs:validate:documented-sql
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Each entry that ships in a published release links to the PR that introduced it.
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`, `text_search`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). The combined **`text_search`** domain carries all three capabilities in one type — `=` / `<>` via HMAC, `<` `<=` `>` `>=` / `min` / `max` via ORE, and `@>` / `<@` via bloom filter. Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` / `text_search` domains — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality. **Equality on the ordered text domains (`text_ord`, `text_ord_ore`) and on `text_search` always routes `=` / `<>` through `hm` (exact HMAC), never the ORE term — ORE is not exact-equality for text** (integer ordered domains keep exact ORE equality, which is lossless for them). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
- **`eql_v3.text` encrypted-domain family (`text`, `text_eq`, `text_match`, `text_ord`, `text_ord_ore`).** Adds equality (`=` / `<>` via HMAC), match (`@>` / `<@` via a new self-contained `eql_v3.bloom_filter` SEM index term), and ORE ordering (`<` `<=` `>` `>=`, `min` / `max`) for encrypted text, at parity with EQL v2 text — generated from the `text` row in `eql-scalars::CATALOG` by the same materializer as the `eql_v3.int4` reference. `text` is the first scalar to add a new index `Term` (`Bloom`) and the first non-integer, unbounded ordered kind (lexicographic pivots, hand-written `impl ScalarType`). Index via a functional index on the `eql_v3.eq_term` / `eql_v3.ord_term` / `eql_v3.match_term` extractors, not an operator class on the domain. Why: brings searchable encrypted text to the namespaced, `eql_v2`-free `eql_v3` surface. Match is exposed as bloom-filter containment on the `text_match` domain — deliberately *not* SQL `LIKE` (no wildcard/anchoring; probabilistic ngram containment) — and never backs equality (which always routes through `Hm`). ([#260](https://github.com/cipherstash/encrypt-query-language/pull/260))
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). The e2e suite — which appends fresh duplicate plaintexts each run — is the one that exercises equality across two independent encryptions of one value. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293))
- **Property-based tests for the `eql_v3` encrypted scalar domains.** A harness of three suites asserts SQL operator results agree with a plaintext oracle across a generated input space: a pure-Rust **catalog** suite (no database) over the term/scalar catalog, a **fixture** suite that runs an all-pairs oracle over the committed real-ciphertext fixtures (the curated catalog values per type), and an **e2e** suite (gated behind the `proptest-e2e` cargo feature) that batch-encrypts freshly generated plaintexts end-to-end through ZeroKMS each run. Beyond the operator oracles, the fixture suite drives **function-double** oracles — the generated `eql_v3.eq`/`neq`/`lt`/`lte`/`gt`/`gte` functions across all three overloads (domain–domain, domain–jsonb, jsonb–domain) — plus **term-extractor identity** (`eq_term`==`hm`, `ord_term`==`ob`) and an example-based bloom **match** smoke for the text `_match` domain. Covers the equality (`=`/`<>`) and ordering (`<`/`<=`/`>`/`>=`, `ord_term` sort order) operator and function oracles plus NULL/blocker/CHECK edge cases, across every fixtured scalar (`int2`/`int4`/`int8`/`date`/`timestamptz`/`numeric`/`text`). Equality across two independent encryptions of one value is exercised credential-free by the fixture suite via committed per-type *doubles* fixtures (each plaintext encrypted twice — `property::cross_ciphertext`), through both the `hm` (`_eq`) and ORE (`_ord`/`_ord_ore`) equality paths, and additionally by the e2e suite via fresh duplicate plaintexts each run. Why: the prior matrix exercised fixed pivots only; property tests over the whole fixture set catch operator/oracle disagreements across the value space, and the e2e suite adds defence in depth by re-encrypting every run rather than pinning a frozen ciphertext snapshot. ([#293](https://github.com/cipherstash/encrypt-query-language/pull/293))
- **Self-contained `eql_v3` schema + standalone `release/cipherstash-encrypt-v3.sql` installer.** The `eql_v3` encrypted-domain surface no longer depends on `eql_v2` at runtime: it now owns its own copies of the searchable-encrypted-metadata (SEM) index-term types — `eql_v3.hmac_256` and `eql_v3.ore_block_u64_8_256` (with its btree operator class) — so the `eql_v3.eq_term` / `eql_v3.ord_term` extractors return `eql_v3` types and no `eql_v2.<symbol>` appears anywhere in the v3 SQL. The whole v3 surface relocated under a single `src/v3/` tree (`src/v3/sem/` for the hand-written SEM types, `src/v3/scalars/` for the generated domain families). A new build variant ships the `eql_v3` schema on its own as `release/cipherstash-encrypt-v3.sql`, installable into a database with no `eql_v2` present; a CI gate greps that artifact and its dependency closure to keep it `eql_v2`-free. Why: a clean foundation for the per-scalar encrypted-domain model to stand alone, ahead of it replacing the `eql_v2_encrypted` composite column type. This is additive — a new schema and a new artifact — and leaves `eql_v2` byte-for-byte unchanged. ([#255](https://github.com/cipherstash/encrypt-query-language/pull/255))
- **`eql_v3.min` / `eql_v3.max` aggregates over `eql_v3.ste_vec_entry`.** SteVec document entries extracted at a selector (`doc -> 'sel'`) can now be aggregated like ordered scalars: `eql_v3.min(doc -> 'sel')` / `eql_v3.max(...)` return the entry with the smallest / largest ordered leaf. Ordering routes through the entry's `oc` (CLLW ORE) term via `eql_v3.ore_cllw` — the same comparator the entry `<` / `<=` / `>` / `>=` operators use, not the scalar Block-ORE `ord_term`. Only `oc`-carrying entries are orderable: an entry without an `oc` term (`eql_v3.ore_cllw` returns NULL) is non-orderable and is ignored by the aggregate — the same way the `eql_v3.ore_cllw` btree NULL-filters such rows — so a mix of `oc`-carrying and `oc`-less entries yields the extremum of the orderable subset rather than a corrupted result. Declared `PARALLEL = SAFE` with a combine function (the state function itself), so partial / parallel aggregation is available on large `GROUP BY` workloads. Why: brings encrypted-JSONB entry ordering to parity with the scalar encrypted-domain families' `MIN` / `MAX`, and lets the shared scalar behaviour matrix cover entry aggregation. Additive — the document and entry comparison surface is otherwise unchanged. ([#267](https://github.com/cipherstash/encrypt-query-language/pull/267))
- **`eql_v3.bool` encrypted-domain type family (storage-only / encryption-only).** A single jsonb-backed domain for encrypted `bool` columns — `eql_v3.bool` — generated from the `bool` row in `eql-scalars::CATALOG`. Unlike every other scalar family, `bool` is **encryption-only**: it carries no SEM index term and exposes **no** `_eq` / `_ord` domains, so the value is encrypted at rest and decrypted by the proxy but is **not searchable server-side**. This is deliberate — a two-value column has so little cardinality that any searchable index (even HMAC equality) would trivially leak the plaintext distribution. Every comparison / containment / path operator reachable through domain fallback (`=`, `<>`, `<`, `<=`, `>`, `>=`, `@>`, `<@`, `->`, `->>`, …) is blocked (raises rather than silently routing to plaintext-`jsonb` semantics); the domain `CHECK` still requires the EQL envelope (`v`, `i`), the ciphertext (`c`), and pins the payload version (`VALUE->>'v' = '2'`). The encrypted payload is `{v,i,c}` only — no `hm` / `ob` / `bf` term. Why: lets callers encrypt a low-cardinality boolean column at rest without offering a server-side search surface that would leak it; the first **storage-only** member of the generated scalar encrypted-domain family. ([#295](https://github.com/cipherstash/encrypt-query-language/pull/295))
Expand Down
9 changes: 9 additions & 0 deletions docs/analysis/2026-06-11-v3-scalar-vs-jsonb-test-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,15 @@ Source of truth: `crates/eql-scalars/src` (`CATALOG`). Adding a type is one `Sca

Domain → role: empty ⇒ Storage, first term `Hm` ⇒ Eq, `Ore` ⇒ Ord, `Bloom` ⇒ Match.

**Cross-ciphertext equality** ("two independent encryptions of one value compare
equal") is now covered credential-free for every comparison scalar by the fixture
suite, via committed per-type *doubles* tables (`fixtures.eql_v2_<T>_doubles` — each
plaintext encrypted twice) read by `property::cross_ciphertext`. It exercises both
equality mechanisms: the `hm` path (`_eq`) and the ORE `ob` path (`_ord` /
`_ord_ore`, where `=` routes through `compare_ore_block_256_terms(...) = 0`). The
matrix's curated fixtures have unique plaintexts, so the doubles tables are what
make the equality-across-distinct-ciphertext branch fire without fresh encryption.

**Generated surface per domain:** domain definition + CHECK, extractors (inlinable `LANGUAGE sql`), supported-op wrappers (inlinable), **blockers** (`LANGUAGE plpgsql`, NOT STRICT — opaque to planner so the `RAISE` always survives), 44 `CREATE OPERATOR`s, and `min`/`max` aggregates for ord-capable domains. Blocker count per domain: Storage 44, Eq 38, Ord 26, Match 38.

**Behavioural blocker coverage caveat:** the scalar matrix does not execute every generated operator signature per domain. It covers the important caller-visible blocker classes: unsupported comparison/containment operators, typed-column `col op col` blockers, scalar path blockers (`->`, `->>`), and native-absent LIKE/ILIKE resolution. It does not sweep every generated JSON-style signature such as `?`, `?|`, `?&`, `@?`, `@@`, `#>`, `#>>`, `-`, `#-`, and `||` for every scalar domain.
Expand Down
Loading
Loading