Skip to content

perf: inline JSONB field extractors (->, ->>, jsonb_path_query_first) for field-level GROUP BY #204

Description

@coderdan

Summary

GROUP BY on a JSON field extracted from an encrypted column — the canonical pattern for "tell me how many users I have in each region" against ste_vec encryption — is dominated by per-row plpgsql cost in eql_v2."->", eql_v2."->>", and eql_v2.jsonb_path_query_first. Closing #202 doesn't fix this on its own — even with hash_encrypted short-circuited, the field extractor is still a multi-statement plpgsql function with SET search_path, run per row.

Same theme as Phase 1 (#196, equality/containment operators), Phase 2 (#202, hash_encrypted short-circuit), this is Phase 3 — JSONB path extractors.

Measured impact

10K-row synthetic bench_json fixture (existing bench data overlaid with hm at the $.hello selector, simulating a unique index on that path), PG 17:

Form current main with #202 fast-path proxy-emitted raw extraction¹
GROUP BY encrypted_col (root, hmac at root) ~229 ms ~109 ms
GROUP BY encrypted_col -> '<sel>'::text ~331 ms ~287 ms
GROUP BY eql_v2.jsonb_path_query_first(col, '<sel>') (docs-recommended) ~496 ms ~234 ms
GROUP BY (col).data->'sv'->N->>'hm' (proxy emits raw JSONB path) ~7 ms

¹ The proxy could in principle emit (col).data->'sv'->N->>'hm' to bypass all eql_v2 plpgsql machinery, but this leaks the positional sv index and the field's hmac value into the SQL — fragile and a layering violation. A selector-keyed variant ((SELECT elem->>'hm' FROM jsonb_array_elements((col).data->'sv') elem WHERE elem->>'s' = '<sel>')) would be more robust but still plpgsql-free and very fast. Either way, this column shows the lower bound the EQL-side forms should be aiming for.

Fix shape

Convert the field extractors to inlinable SQL bodies:

  • eql_v2."->"(eql_v2_encrypted, text) — current body is plpgsql with a FOR idx IN 1..array_length(sv, 1) LOOP that finds the matching selector. Rewrite as a single-statement SQL subquery: SELECT (eql_v2.meta_data(e.data) || elem)::eql_v2_encrypted FROM jsonb_array_elements(e.data->'sv') elem WHERE elem->>'s' = selector LIMIT 1.
  • eql_v2."->"(eql_v2_encrypted, eql_v2_encrypted) and eql_v2."->"(eql_v2_encrypted, integer) — same treatment.
  • eql_v2."->>" (text extraction variant) — same.
  • eql_v2.jsonb_path_query_first(...) — already wraps eql_v2.jsonb_path_query LIMIT 1. Rewrite as plain SQL: SELECT e FROM eql_v2.jsonb_path_query(val.data, selector) e LIMIT 1.

Allowlist all of the above in tasks/pin_search_path.sql and add splinter justifications in tasks/test/splinter.sh. Same pattern as #189 / #201.

Open design question: should the planner be able to engage HashAggregate directly on eql_v2."->"(e, text), or does it always pick Sort + GroupAggregate? Even with -> inlined, the planner today picks the Sort path for this shape. Inlining may flip it to HashAggregate via the hash op class on eql_v2_encrypted — worth verifying when the fix lands.

Out of scope

  • The proxy-side decision to emit jsonb_path_query_first vs raw data->'sv'->N->>'hm' extraction. This issue is about making the EQL-side form perform; the proxy can decide later whether to keep wrapping or emit the raw form once EQL is fast enough that the difference doesn't matter.
  • Comparison operators < / <= / > / >= — Phase 4, separate cmp_sem extractor design needed (Phase 3 of Drizzle jsonbPathQueryFirst / jsonbGet typed as predicates but return eql_v2_encrypted stack#423 picks it up).

Verification

Bench plan + regression assertions for GROUP BY eql_v2.jsonb_path_query_first(col, '<sel>') will land alongside this issue's fix in #203 (currently scoped to root-level GROUP BY / JOIN / DISTINCT; will extend to field-level). The new tests will be #[ignore]'d pending this issue, with thresholds set to the measured-post-fix numbers + ~2x CI headroom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions