You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GROUP BY on a JSON field extracted from an encrypted column — the canonical pattern for "tell me how many users I have in each region" against ste_vec encryption — is dominated by per-row plpgsql cost in eql_v2."->", eql_v2."->>", and eql_v2.jsonb_path_query_first. Closing #202 doesn't fix this on its own — even with hash_encrypted short-circuited, the field extractor is still a multi-statement plpgsql function with SET search_path, run per row.
Same theme as Phase 1 (#196, equality/containment operators), Phase 2 (#202, hash_encrypted short-circuit), this is Phase 3 — JSONB path extractors.
Measured impact
10K-row synthetic bench_json fixture (existing bench data overlaid with hm at the $.hello selector, simulating a unique index on that path), PG 17:
GROUP BY eql_v2.jsonb_path_query_first(col, '<sel>') (docs-recommended)
~496 ms
~234 ms
—
GROUP BY (col).data->'sv'->N->>'hm' (proxy emits raw JSONB path)
—
—
~7 ms
¹ The proxy could in principle emit (col).data->'sv'->N->>'hm' to bypass all eql_v2 plpgsql machinery, but this leaks the positional sv index and the field's hmac value into the SQL — fragile and a layering violation. A selector-keyed variant ((SELECT elem->>'hm' FROM jsonb_array_elements((col).data->'sv') elem WHERE elem->>'s' = '<sel>')) would be more robust but still plpgsql-free and very fast. Either way, this column shows the lower bound the EQL-side forms should be aiming for.
Fix shape
Convert the field extractors to inlinable SQL bodies:
eql_v2."->"(eql_v2_encrypted, text) — current body is plpgsql with a FOR idx IN 1..array_length(sv, 1) LOOP that finds the matching selector. Rewrite as a single-statement SQL subquery: SELECT (eql_v2.meta_data(e.data) || elem)::eql_v2_encrypted FROM jsonb_array_elements(e.data->'sv') elem WHERE elem->>'s' = selector LIMIT 1.
eql_v2."->"(eql_v2_encrypted, eql_v2_encrypted) and eql_v2."->"(eql_v2_encrypted, integer) — same treatment.
eql_v2."->>" (text extraction variant) — same.
eql_v2.jsonb_path_query_first(...) — already wraps eql_v2.jsonb_path_query LIMIT 1. Rewrite as plain SQL: SELECT e FROM eql_v2.jsonb_path_query(val.data, selector) e LIMIT 1.
Allowlist all of the above in tasks/pin_search_path.sql and add splinter justifications in tasks/test/splinter.sh. Same pattern as #189 / #201.
Open design question: should the planner be able to engage HashAggregate directly on eql_v2."->"(e, text), or does it always pick Sort + GroupAggregate? Even with -> inlined, the planner today picks the Sort path for this shape. Inlining may flip it to HashAggregate via the hash op class on eql_v2_encrypted — worth verifying when the fix lands.
Out of scope
The proxy-side decision to emit jsonb_path_query_first vs raw data->'sv'->N->>'hm' extraction. This issue is about making the EQL-side form perform; the proxy can decide later whether to keep wrapping or emit the raw form once EQL is fast enough that the difference doesn't matter.
Bench plan + regression assertions for GROUP BY eql_v2.jsonb_path_query_first(col, '<sel>') will land alongside this issue's fix in #203 (currently scoped to root-level GROUP BY / JOIN / DISTINCT; will extend to field-level). The new tests will be #[ignore]'d pending this issue, with thresholds set to the measured-post-fix numbers + ~2x CI headroom.
Summary
GROUP BYon a JSON field extracted from an encrypted column — the canonical pattern for "tell me how many users I have in each region" against ste_vec encryption — is dominated by per-row plpgsql cost ineql_v2."->",eql_v2."->>", andeql_v2.jsonb_path_query_first. Closing #202 doesn't fix this on its own — even withhash_encryptedshort-circuited, the field extractor is still a multi-statement plpgsql function withSET search_path, run per row.Same theme as Phase 1 (#196, equality/containment operators), Phase 2 (#202,
hash_encryptedshort-circuit), this is Phase 3 — JSONB path extractors.Measured impact
10K-row synthetic
bench_jsonfixture (existingbenchdata overlaid withhmat the$.helloselector, simulating auniqueindex on that path), PG 17:GROUP BY encrypted_col(root, hmac at root)GROUP BY encrypted_col -> '<sel>'::textGROUP BY eql_v2.jsonb_path_query_first(col, '<sel>')(docs-recommended)GROUP BY (col).data->'sv'->N->>'hm'(proxy emits raw JSONB path)¹ The proxy could in principle emit
(col).data->'sv'->N->>'hm'to bypass all eql_v2 plpgsql machinery, but this leaks the positional sv index and the field's hmac value into the SQL — fragile and a layering violation. A selector-keyed variant ((SELECT elem->>'hm' FROM jsonb_array_elements((col).data->'sv') elem WHERE elem->>'s' = '<sel>')) would be more robust but still plpgsql-free and very fast. Either way, this column shows the lower bound the EQL-side forms should be aiming for.Fix shape
Convert the field extractors to inlinable SQL bodies:
eql_v2."->"(eql_v2_encrypted, text)— current body is plpgsql with aFOR idx IN 1..array_length(sv, 1) LOOPthat finds the matching selector. Rewrite as a single-statement SQL subquery:SELECT (eql_v2.meta_data(e.data) || elem)::eql_v2_encrypted FROM jsonb_array_elements(e.data->'sv') elem WHERE elem->>'s' = selector LIMIT 1.eql_v2."->"(eql_v2_encrypted, eql_v2_encrypted)andeql_v2."->"(eql_v2_encrypted, integer)— same treatment.eql_v2."->>"(text extraction variant) — same.eql_v2.jsonb_path_query_first(...)— already wrapseql_v2.jsonb_path_queryLIMIT 1. Rewrite as plain SQL:SELECT e FROM eql_v2.jsonb_path_query(val.data, selector) e LIMIT 1.Allowlist all of the above in
tasks/pin_search_path.sqland add splinter justifications intasks/test/splinter.sh. Same pattern as #189 / #201.Open design question: should the planner be able to engage
HashAggregatedirectly oneql_v2."->"(e, text), or does it always pickSort + GroupAggregate? Even with->inlined, the planner today picks the Sort path for this shape. Inlining may flip it to HashAggregate via the hash op class oneql_v2_encrypted— worth verifying when the fix lands.Out of scope
jsonb_path_query_firstvs rawdata->'sv'->N->>'hm'extraction. This issue is about making the EQL-side form perform; the proxy can decide later whether to keep wrapping or emit the raw form once EQL is fast enough that the difference doesn't matter.</<=/>/>=— Phase 4, separatecmp_semextractor design needed (Phase 3 of Drizzle jsonbPathQueryFirst / jsonbGet typed as predicates but return eql_v2_encrypted stack#423 picks it up).Verification
Bench plan + regression assertions for
GROUP BY eql_v2.jsonb_path_query_first(col, '<sel>')will land alongside this issue's fix in #203 (currently scoped to root-level GROUP BY / JOIN / DISTINCT; will extend to field-level). The new tests will be#[ignore]'d pending this issue, with thresholds set to the measured-post-fix numbers + ~2x CI headroom.