Summary
The eql_v2.hash_encrypted chain underpins GROUP BY / JOIN / DISTINCT on eql_v2_encrypted columns. PR #196 wired up the hash operator class correctly — the planner picks HashAggregate / engages bench_text_hmac_idx on equality joins — but per-row hashing is dominated by eql_v2.to_ste_vec_value's JSONB inspection and reconstruction. Even fully inlined (LANGUAGE sql IMMUTABLE), to_ste_vec_value's body parses the JSONB once to check val ? 'sv' AND jsonb_array_length(val->'sv') = 1, and on the THEN branch builds a new JSONB via meta_data(val) || (val->'sv'->0). That's the residual cost.
Fix is a fast-path read of root-level hm with to_ste_vec_value reserved for the rare single-element ste_vec-wrapped case.
Measured impact
10K-row bench fixture, PG 17, EXPLAIN ANALYZE. Each value is a fresh-DB run with the patch applied via CREATE OR REPLACE:
| Query |
current main (plpgsql + SET search_path) |
naive inline (plpgsql → SQL, body unchanged) |
fast-path (coalesce(val.data ->> 'hm', …)) |
SELECT count(*) FROM bench GROUP BY encrypted_text |
~309 ms |
~336 ms (no improvement) |
~73 ms |
SELECT count(*) FROM bench a JOIN bench b ON a.encrypted_text = b.encrypted_text |
~308 ms |
~285 ms (marginal) |
~185 ms |
SELECT DISTINCT encrypted_text FROM bench |
~515 ms |
~315 ms (some win from removing plpgsql call overhead, but cost is now dominated by HashAggregate output materialization × to_ste_vec_value) |
~72 ms |
Plan shapes don't change in any column — both naive-inline and fast-path produce the same HashAggregate / Nested Loop + Memoize structure as current main. The win is purely per-row CPU.
Fix shape
Same pattern as #189 / #201 plus a body change to hash_encrypted:
-
Rewrite eql_v2.hash_encrypted(eql_v2_encrypted) to short-circuit on root-level hm:
CREATE OR REPLACE FUNCTION eql_v2.hash_encrypted(val eql_v2_encrypted)
RETURNS integer
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
AS $$
SELECT pg_catalog.hashtext(
coalesce(
val.data ->> 'hm',
eql_v2.hmac_256(eql_v2.to_ste_vec_value(val))::text
)
)
$$;
The coalesce is short-circuit-evaluating — when hm is present at root (the common case, including everything the bench fixture emits and everything @cipherstash/protect ships today), the second arm is never evaluated and to_ste_vec_value is never called. When hm is absent at root (single-element ste_vec wrapping), the unwrap kicks in and the existing semantics are preserved.
If hm is missing entirely, the function returns the hash of NULL rather than raising. Three options for that case, in increasing strictness:
- Accept it (NULL hashes to a stable bucket; equality will sort it out).
- Raise an explicit error inside the
coalesce via a CASE WHEN … THEN … ELSE raise_exception() END shim.
- Add a separate
CHECK constraint enforcing hm presence at config time.
Worth deciding as part of this issue.
-
Convert helpers to inlinable SQL so the planner can splat the chain into the calling query. These are no-op semantically (function bodies stay the same):
eql_v2.hmac_256(jsonb) and eql_v2.hmac_256(eql_v2_encrypted)
eql_v2.to_ste_vec_value(jsonb) and eql_v2.to_ste_vec_value(eql_v2_encrypted)
-
Allowlist hash_encrypted + the helpers in tasks/pin_search_path.sql so the post-install loop doesn't re-pin them.
-
Add splinter justifications in tasks/test/splinter.sh (same shape as the entries added in #201).
bloom_filter extractors are in the same plpgsql + SET search_path shape — worth doing in the same pass since they're called per row by inlined ~~ and the GIN functional index path. Probably negligible at the index-bound baseline (~0.20 ms for bare LIKE) but no reason not to bundle.
Out of scope (separate Phase 3)
Verification
Bench plan + regression assertions covering GROUP BY / JOIN / DISTINCT already exist in #203 (in-repo SQLx tests), #[ignore]'d pending this issue with thresholds set to the measured fast-path numbers + ~2x CI headroom. Remove the #[ignore] markers when this lands.
Summary
The
eql_v2.hash_encryptedchain underpinsGROUP BY/JOIN/DISTINCToneql_v2_encryptedcolumns. PR #196 wired up the hash operator class correctly — the planner picksHashAggregate/ engagesbench_text_hmac_idxon equality joins — but per-row hashing is dominated byeql_v2.to_ste_vec_value's JSONB inspection and reconstruction. Even fully inlined (LANGUAGE sql IMMUTABLE),to_ste_vec_value's body parses the JSONB once to checkval ? 'sv' AND jsonb_array_length(val->'sv') = 1, and on the THEN branch builds a new JSONB viameta_data(val) || (val->'sv'->0). That's the residual cost.Fix is a fast-path read of root-level
hmwithto_ste_vec_valuereserved for the rare single-element ste_vec-wrapped case.Measured impact
10K-row bench fixture, PG 17, EXPLAIN ANALYZE. Each value is a fresh-DB run with the patch applied via
CREATE OR REPLACE:SET search_path)coalesce(val.data ->> 'hm', …))SELECT count(*) FROM bench GROUP BY encrypted_textSELECT count(*) FROM bench a JOIN bench b ON a.encrypted_text = b.encrypted_textSELECT DISTINCT encrypted_text FROM benchto_ste_vec_value)Plan shapes don't change in any column — both naive-inline and fast-path produce the same HashAggregate / Nested Loop + Memoize structure as current main. The win is purely per-row CPU.
Fix shape
Same pattern as #189 / #201 plus a body change to
hash_encrypted:Rewrite
eql_v2.hash_encrypted(eql_v2_encrypted)to short-circuit on root-levelhm:The
coalesceis short-circuit-evaluating — whenhmis present at root (the common case, including everything the bench fixture emits and everything@cipherstash/protectships today), the second arm is never evaluated andto_ste_vec_valueis never called. Whenhmis absent at root (single-element ste_vec wrapping), the unwrap kicks in and the existing semantics are preserved.If
hmis missing entirely, the function returns the hash of NULL rather than raising. Three options for that case, in increasing strictness:coalescevia aCASE WHEN … THEN … ELSE raise_exception() ENDshim.CHECKconstraint enforcinghmpresence at config time.Worth deciding as part of this issue.
Convert helpers to inlinable SQL so the planner can splat the chain into the calling query. These are no-op semantically (function bodies stay the same):
eql_v2.hmac_256(jsonb)andeql_v2.hmac_256(eql_v2_encrypted)eql_v2.to_ste_vec_value(jsonb)andeql_v2.to_ste_vec_value(eql_v2_encrypted)Allowlist
hash_encrypted+ the helpers intasks/pin_search_path.sqlso the post-install loop doesn't re-pin them.Add splinter justifications in
tasks/test/splinter.sh(same shape as the entries added in #201).bloom_filterextractors are in the same plpgsql + SET search_path shape — worth doing in the same pass since they're called per row by inlined~~and the GIN functional index path. Probably negligible at the index-bound baseline (~0.20 ms for bare LIKE) but no reason not to bundle.Out of scope (separate Phase 3)
</<=/>/>=— still plpgsql VOLATILE, but they need acmp_semextractor design before they're an inlining task. Issue perf: eql_v2.like / eql_v2.ilike are VOLATILE — blocks planner inlining and bloom_filter index match #189 flagged this.eql_v2.jsonb_path_exists/eql_v2.jsonb_path_query_first— tracked in Drizzle jsonbPathQueryFirst / jsonbGet typed as predicates but return eql_v2_encrypted stack#423.eql_v2.order_by— no Supabase functional-index path either; separate scope.Verification
Bench plan + regression assertions covering
GROUP BY/JOIN/DISTINCTalready exist in #203 (in-repo SQLx tests),#[ignore]'d pending this issue with thresholds set to the measured fast-path numbers + ~2x CI headroom. Remove the#[ignore]markers when this lands.