Skip to content

Replace PgProtocolHandler string-rewrite hacks with Calcite Babel parser + ScalarFunctions #560

@brittleboye

Description

@brittleboye

Context

PgProtocolHandler.rewriteQuery() in convex-db rewrites incoming PostgreSQL SQL into Calcite-compatible SQL via string replacement. Each of the 10 hacks carries a "TODO: Remove hack" comment and names the proper fix.

String-munging is fragile (e.g. the regex-operator check returns an empty result for any query containing ~, which false-matches some legitimate SQL), loses query semantics, and will keep growing as more PostgreSQL clients hit it.

Proposal — phased refactor

Phase 1: Adopt Calcite Babel parser (biggest leverage)

Switch the Calcite SqlParser.Config to use SqlBabelParserImpl.FACTORY with PostgreSQL conformance. Babel natively supports:

  • POSIX regex operators ~, ~*, !~, !~* → removes line 256 hack
  • ::type cast syntax → removes line 320 hack (and the follow-up ::integer, ::int4, ::text, ::regclass, ::oid etc. stripping)

One parser config change eliminates two of the most aggressive rewrites.

Phase 2: Register PostgreSQL system functions as Calcite ScalarFunctions

Replace 6 regex substitutions with proper function registrations against the Calcite root schema. Each reads from session context (user, database, processId) rather than being baked into the rewritten string:

TODO line Function Replaces
289 CURRENT_SCHEMA() regex → 'public'
295 CURRENT_DATABASE() regex → database literal
300 CURRENT_USER regex → 'convex'
305 SESSION_USER regex → 'convex'
310 version() regex → version literal
315 pg_backend_pid() regex → processId

Suggest a new convex.db.calcite.pgcatalog.PgSystemFunctions class that registers all six in one place, analogous to the existing pg_catalog table setup.

Phase 3: Expand pg_catalog virtual tables and fix search_path

Partial implementations already exist under convex-db/src/main/java/convex/db/calcite/pgcatalog/ (PgClassTable, PgAttributeTable, PgNamespaceTable, PgDatabaseTable, PgTypeTable, PgTablesTable). Phase 3 fills in the gaps:

Once Phase 3 lands, rewriteQuery() should be removable entirely (or reduced to a no-op).

Why do this

  • Real PostgreSQL clients (pgAdmin, DBeaver, psql \d) issue introspection queries that the current empty-result hack silently breaks.
  • Rewriting via regex drops query semantics — e.g. any user table with ~ in a column comparison currently returns no rows.
  • The :: cast stripping silently changes query meaning when a cast was semantically required.
  • Babel is already a supported Calcite dialect, so Phase 1 is low-risk.

Out of scope

  • Full PostgreSQL dialect parity (procedures, arrays, ranges, JSON operators). This issue is strictly about removing the existing string-rewrite layer.

Related

Part of the broader TODO cleanup identified in the repo review. See also #559 (Local op cache).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions