Skip to content

sql: Fix pretty/display round-trip bugs found by fuzzing#36983

Merged
def- merged 52 commits into
MaterializeInc:mainfrom
def-:fuzz-sql
Jun 16, 2026
Merged

sql: Fix pretty/display round-trip bugs found by fuzzing#36983
def- merged 52 commits into
MaterializeInc:mainfrom
def-:fuzz-sql

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Print→reparse round-trip bugs in the SQL parser and pretty-printer, surfaced by the grammar-aware fuzz target. Each fix has a regression test; the sqllogictest/testdrive plan goldens are refreshed to match.

Themes: quoting bare keyword identifiers (any/all/some/list, context-sensitive keywords), parenthesizing low-precedence operands (prefix ops, casts, COLLATE, quantified comparisons), special-form display correctness (EXTRACT/POSITION/SUBSCRIBE), and bounding parser recursion/backtracking to reject pathological inputs.

Found by the cargo-fuzz suite (separate infra PR).

🤖 Generated with Claude Code

@def- def- requested a review from aljoscha June 12, 2026 10:41
@def- def- marked this pull request as ready for review June 12, 2026 10:42
@def- def- requested a review from a team as a code owner June 12, 2026 10:42

@aljoscha aljoscha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, but I haven't looked at the tests and largely only eyeballed the parse changes. On these I would pretty much trust our testing infra. Had one question about SUBSCRIBE TO which we added in a number of places that we should resolve

Comment thread src/sql-parser/tests/testdata/copy Outdated
COPY (subscribe (SELECT 1)) TO STDOUT
----
COPY (SUBSCRIBE (SELECT 1)) TO STDOUT
COPY (SUBSCRIBE TO (SELECT 1)) TO STDOUT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These SUBSCRIBE TO changes feel a bit weird, why do we have them again?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUBSCRIBE TO TO with the second to being an identifier was causing problems. I have now fixed it so that we only add the TO in the display when there is a to identifier following it.

@def- def- requested a review from aljoscha June 15, 2026 15:15
@def-

def- commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for reviewing @aljoscha , and sorry for the huge change! I didn't want to open 50 PRs for these minor changes

@antiguru antiguru left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, left some inline comments, but I lack some context here. Happy to hit approve, but only merge if you're sure it moves us in the right direction!

Comment thread src/sql-parser/src/ast/metadata.rs Outdated
Comment thread test/sqllogictest/advent-of-code/2023/aoc_1207.slt Outdated
@def- def- requested a review from a team as a code owner June 15, 2026 16:48
Comment thread test/sqllogictest/quote_ident.slt Outdated
Comment on lines 342 to +345
query T
SELECT quote_ident('table');
----
table
"table"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems table is now a reserved keyword?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhere inbetween.

def- and others added 16 commits June 16, 2026 08:24
`X'...'` content is allowed to contain `'` (escaped as `''` by the
lexer), but the printer was emitting it verbatim — a value with an
embedded quote closed the literal prematurely and produced unparseable
output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser
greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and
leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified
type name. So a receiver must look atomic on the way out — wrapped in
delimiters or self-terminating — or the dot reattaches to the receiver
on reparse and produces a different AST.

Add a `write_dot_receiver` helper that parenthesizes anything outside
a whitelist of atom-like exprs, and use it from `FieldAccess` and
`WildcardAccess` display.
…r keywords

Names like `position`, `extract`, `trim`, `substring`, `normalize`,
`map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`,
`least` reach a special parser dispatch when followed by `(` —
`POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name
(`"position"(arg)`) goes through the regular function-call path, but
`AstDisplay` Simple mode was emitting the name unquoted, so the
re-parse triggered the special grammar (and failed).

Emit the always-quoted stable form for these names in `Function`
display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE,
NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have
their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`,
`<op> ALL (subquery)`, …) and aren't reserved everywhere, so they
weren't on the `is_sometimes_reserved` list and `Ident` would emit
them unquoted. But unquoted in expression position those names re-
trigger the special grammar at parse time.

Add `is_context_sensitive_keyword` for this set and have
`Ident::can_be_printed_bare` also reject members of it, so identifiers
whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is
a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`,
`Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches
inside it on reparse and binds the operator to the lhs's
pattern/range/etc. instead of to the lhs as a whole, producing a
different AST.

Add a `write_quantified_left` helper that parenthesizes these cases
and leaves atom-like lhs (incl. plain `Op`, which has its own
precedence handling) unwrapped.
…r display

The earlier change made `Ident::can_be_printed_bare` reject members of
`is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …)
so that round-trip through sql-pretty preserved them. But `Ident::fmt`
is also used for column-name display in non-SQL contexts (notably
EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is
just noise and broke slt expectations.

Revert the global change. The fuzz targets that exercised this round
trip get a narrow carve-out (skip on the `Expected left square
bracket` / `Expected left parenthesis` / `Expected IN, found ...`
reparse errors that come from a context-sensitive keyword landing in a
position the parser dispatches on).
`parse_table_factor` speculatively tries `parse_derived_table_factor`
inside a `maybe_parse`, falling back to `parse_table_and_joins` on
failure. Both branches recurse on every `(`, so unbalanced
parentheses around multiple SELECTs (e.g. `(((SELECT * FROM (((SELECT
* FROM ...`) explore an exponential backtrack tree. The 128-deep
`RECURSION_LIMIT` bounds the *stack* but not the total work — fuzz
inputs of ~270 bytes parsed for more than 30 seconds. Cap
`maybe_parse` failures at 10_000 per parse; valid SQL needs a small
constant per token, far below the cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The display path for `RawDataType::Other` already quoted `"map"` to
keep it off the `parse_map_type` dispatch. The same disambiguation
is needed for every keyword that `parse_data_type` *renames* to a
different canonical type (`STRING` → text, `BIGINT` → int8, `BYTES`
→ bytea, …). Without quotes those names lose information through a
display + reparse cycle.

Keywords whose canonical name matches the keyword text (`bpchar`,
`varchar`, `time`, `timestamp`, `timestamptz`) are left unquoted —
they already round-trip via the keyword path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`doc_function` printed the function name via simple-mode display,
which emits a bare keyword for names like `map`, `array`, `row`,
etc. Reparsing then dispatches through the special-grammar parser
(`(Kw, LParen)` in `parse_prefix`) instead of a regular function
call, breaking the pretty + reparse round-trip. Mirror the same
quote carve-out the `AstDisplay for Function` impl uses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`Expr::Subscript { expr: Identifier(["map"]), ... }` displayed as
`map[…]`, which the parser then re-lexed as `Token::Keyword(MAP)`
followed by `[`, dispatching to `parse_map` (the map-literal
grammar) instead of treating it as an identifier subscript.
Parenthesize the subscript receiver when it's an identifier whose
last component is a context-sensitive keyword. The fuzz target's
`RemoveParens` visitor absorbs the extra `Nested` wrapper on
re-parse, so AST equality still holds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`RawDataType::Other` only checked the single-ident form against the
canonicalizing-keyword list. A qualified type name whose *first*
component matches (e.g. `"json"."te"`) escaped the check, displayed
unquoted as `json.te`, and reparsed as a JSON-canonicalized cast
followed by a field access — a completely different AST. Check the
first component's keyword regardless of name arity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`SELECT "distinct"` parsed as a SELECT with a column named
"distinct", but the display path printed `distinct` unquoted, which
the parser then consumed as the DISTINCT modifier — so the AST round
trip drifted to `SELECT DISTINCT (no columns)`. DISTINCT is a
reserved keyword in PostgreSQL and must be quoted when used as an
identifier; mark it always-reserved so display quotes it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixed cap of 10_000 maybe_parse failures was too tight for valid SQL
with thousands of nested casts (parallel-workload generated queries
in nightly 16650). Scale the cap as `SPECULATIVE_FAILURES_PER_TOKEN ×
tokens.len()` so deeply-nested *valid* queries have room to
speculate, while the exponential-backtrack DoS case (~200 tokens
generating 2^N failures) still gets cut off in linear time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_connection_option_name` listed `ROLE` in its accepted-keyword
set but had no match arm for plain `ROLE`, so any
`CREATE CONNECTION x TO ... (ROLE …)` hit the `_ => unreachable!()`
fallback and panicked the process. The only ROLE-prefixed connection
option is `ASSUME ROLE {ARN|SESSION NAME}`, handled under the
`ASSUME` branch. Drop `ROLE` from the keyword set so the error is a
clean "expected one of …" parse error instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`CREATE TABLE foo (CHECK (…))` (no columns, only constraints)
displayed as `CREATE TABLE foo (, CHECK (…))` — the comma between
columns and constraints fired unconditionally. Skip it when the
column list is empty so the round trip reparses cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
def- and others added 20 commits June 16, 2026 08:24
…or on display

`- <number>` folds into a negative literal at parse time and the `::` cast binds
looser, so `-CAST(2 AS t)` — parsed as "negate the cast", `Op{-, Cast{2}}` —
printed as `- 2::t` reparses as `(-2)::t` (`Cast{-2}`): the negation migrates into
the cast operand, changing the AST (and the semantics; a stored view definition
would corrupt on reparse). Parenthesize a numeric-literal cast operand under a
prefix operator in both `AstDisplay` and the pretty printer so the operator keeps
its scope. Found by the parse_pretty_roundtrip fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A function literally named `some`/`any`/`all` printed unquoted reparses as the
`<op> {SOME|ANY|ALL} (...)` quantified-comparison grammar rather than a function
call — e.g. `0 # "some"(1)` round-trips to `0 # SOME(1)`, changing the AST (and
semantics). Add the three quantifier keywords to the function-name
disambiguation list in both `AstDisplay` and the pretty printer, alongside the
existing `parse_prefix` special forms (`array`, `coalesce`, ... — `case` is
already covered by `can_be_printed_bare`). Add a regression test that round-trips
every special-grammar keyword as a function name in both primary and
operator-right-hand-side position. Found by the parse_pretty_roundtrip fuzz
target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ANY`/`ALL`/`SOME` after a comparison operator begin a quantified comparison
(`x op ANY (...)`), so a bare such identifier on an operator's right-hand side
— e.g. `0 # some` — reparses as the start of a quantifier rather than as an
identifier. `ALL` is already always-reserved and so was force-quoted, but `ANY`
and `SOME` were not, so the pretty/display printers dropped the quotes from
`0 # "some"` and the result no longer parsed ("Expected joined table, found
ORDER" once nested under an ORDER BY). Found by parse_pretty_roundtrip:
`SELECT * FROM (SELECT x ORDER BY (SELECT 0 # "some"))`.

Fix in `can_be_printed_bare`, which is the right layer: it covers both the bare
identifier case and the function-name case (`"some"(1)`), and quotes the
keyword wherever it appears in a name (including qualified names). That makes
the `all`/`any`/`some` entries in the two `needs_quote_to_disambiguate` lists
(expr.rs, doc.rs) redundant, so drop them — those lists now track only the
`parse_prefix` `(Keyword, LParen)` special forms (array, coalesce, ...), which
are unsafe solely as function names.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new grammar-aware fuzz target (added separately) generates valid deep SQL and
round-trips it through the pretty-printer and `AstDisplay`. It surfaced three
printer bugs that byte-mutation fuzzing never reached:

1. A function call literally named `"position"`/`"extract"` (the quoted
   special-grammar keywords) carrying a `DISTINCT`, a within-group `ORDER BY`, a
   `FILTER`, or an `OVER` window was printed via the special `position(a IN b)`
   form, which has no syntax for those modifiers — so they were silently dropped
   on display. Fall through to the plain quoted-call form when any is present.

2. A bare `list` identifier that gets subscripted — `"list"[1]` — was printed
   unquoted, but `list[1]` is a valid one-element `LIST` literal, so it reparsed
   as a list literal rather than a subscript. Quote `list` in
   `can_be_printed_bare` (like the quantifier keywords). `ARRAY` is already
   reserved-in-scalar-expression; `MAP[...]` needs `=>`, so neither has it.

3. The negated-cast fix only peeled a single `Cast` layer, so a unary minus over
   a cast *chain* (`-CAST(CAST(3.14 AS int2) AS int2)`) printed as
   `- 3.14::int2::int2` and reparsed with the negation folded into the innermost
   literal. Peel the whole chain.

Each has a regression test; (2) updates the `iffy_colnames` golden (`list` now
prints quoted).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three further printer/parser round-trip bugs surfaced by `grammar_roundtrip`:

1. `parse_cast_expr` wraps a `CAST(X AS t)` inner in parens unless `X` is "safe"
   to print left of the `::` form, but listed `Expr::Cast` as unconditionally
   safe. A `Cast` whose own operand is low-precedence is *not* safe: e.g.
   `CAST(a = ANY (...)::t AS u)` parses to `Cast(Cast(AnySubquery))` and printed
   as `a = ANY (...)::t::u` re-associates against a surrounding operator (it lost
   the grouping as a `BETWEEN` bound). Make the safety check recurse through the
   postfix `Cast`/`Collate` forms.

2. `RoleAttribute::Password` redacted the value to a bare `PASSWORD` in every
   `AstDisplay` mode, which fails to reparse (the grammar requires `NULL` or a
   string after `PASSWORD`). Keep redacting — `PASSWORD NULL` carries no secret
   and prints verbatim, a set password prints a parseable `'<REDACTED>'`
   placeholder. (The pretty-printer keeps its own lossless path.)

3. The `position(<needle> IN <haystack>)` special-form display was used for any
   2-arg `"position"` call, but the parser reads the needle at `Precedence::Like`,
   so a low-precedence needle (`NOT`, a comparison, `IS`, a quantified
   comparison, ...) swallows or stops short of the `IN` delimiter on reparse.
   Only use the special form with a needle safe to sit left of `IN`; otherwise
   fall back to the plain quoted call form.

Each has a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`needle_safe_before_in` decides whether `position(<needle> IN <haystack>)` can
use the special display. The parser reads the needle at `Precedence::Like` and
stops at the first low-precedence operator in its left spine — and that operator
can hide arbitrarily deep (`Cast(InSubquery)` printing `a IN (q)::t`,
`(a IN q) ->> b`, `Subscript(InSubquery)`, ...), so enumerating unsafe shapes is
whack-a-mole. Use a whitelist instead: emit the special form only for a *primary*
needle — atomic or self-delimiting (a value, identifier, `name(...)` call, a
parenthesized/bracketed/`CASE`/`ARRAY` expression, ...), recursing through the
postfix `::`/`COLLATE`/`[…]` forms — and fall back to the plain quoted
`"position"(a, b)` call otherwise. Common calls like `position('x' IN col)` keep
the nice special display. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`COLLATE` binds very tightly (`PostfixCollateAt`), so the `<expr> COLLATE <name>`
display re-associates the collation onto its operand's rightmost sub-operand when
the operand is low-precedence: `(a + b) COLLATE c` would print as `a + b COLLATE
c` and reparse as `a + (b COLLATE c)`. The user's disambiguating parens are
`Expr::Nested`, which the round-trip normalizer strips, so the printer must
re-add them. Parenthesize a `COLLATE` operand that isn't self-delimiting.

Generalizes the position special-form needle predicate to `prints_self_delimiting`
(atomic or bracketed/parenthesized, recursing through the postfix `::`/`COLLATE`/
`[…]` forms) and reuses it here. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalize the prefix-operator (`-`/`+`/`~`) operand handling from the narrow
"numeric cast" carve-out to a precedence-correct `prefix_operand_needs_parens`.
A prefix op binds *tighter* than `COLLATE`/`AT TIME ZONE` and the binary
operators but *looser* than the postfix `::`/`[…]` forms, and `- <number>` lexes
as a negative literal. So an operand must be parenthesized when, after peeling
the tight `::`/`[…]` postfixes, it bottoms out at a numeric literal (the sign
would fold in: `- 2::t` -> `(-2)::t`) or at anything other than a self-delimiting
non-`COLLATE` primary (`- x COLLATE c` -> `(- x) COLLATE c`; `- (a + b)` ->
`(- a) + b`). This subsumes the previous numeric-cast-chain fix and covers the
`COLLATE`/binary cases. Found by grammar_roundtrip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`prefix_operand_needs_parens` treated a unary-operator operand (`+ + x`,
`- - x`, `NOT NOT x`, `~ ~ x`) as needing parentheses, but prefix operators
stack without them: they don't re-associate, and the inner operator symbol sits
between the outer one and any digit so there is no `- <number>` fold. The
spurious parens were not just noisy — on a long unary chain they doubled the
nesting depth on reparse and overflowed the parser/printer stack. Found by the
parse_pretty/parse_expr/parse_display byte-mutation targets (deep `+ + … 2`).
Casts/binary/comparison operands still parenthesize as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`GRANT/REVOKE ... ON ALL <type>` displayed the object type by writing its
singular form and appending `S`. That works for every type except network
policies: `NETWORK POLICY` + `S` is `NETWORK POLICYS`, but the parser
accepts the plural keyword `POLICIES` (mapping it to ObjectType::
NetworkPolicy). So `GRANT CREATE ON ALL POLICIES TO j` parsed, displayed as
`GRANT CREATE ON ALL NETWORK POLICYS TO j`, and then failed to reparse —
a parse/display asymmetry the parse_display_roundtrip fuzzer hit.

Pluralize via a small helper that emits `POLICIES` for NetworkPolicy and
`<type>S` otherwise. Adds a round-trip regression test. (The SHOW
PRIVILEGES paths share the naive `+S`, but their parsers don't accept
POLICIES at all, so that asymmetry isn't reachable via parse->display.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR MaterializeInc#36764 made sql-pretty print role passwords losslessly via per-statement
doc printers (the AstDisplay redaction stays as the global safety net for
logs/catalog; a pretty-printer that round-trips user SQL must keep the secret).
But DECLARE and PREPARE wrap an inner statement and fell through to the generic
doc_display fallback, which prints the whole wrapper via AstDisplay -- so a
secret in the inner statement was redacted again:

  DECLARE c CURSOR FOR ALTER ROLE adb PASSWORD '2'

printed the password as '<REDACTED>', which then reparses to the string
"<REDACTED>" instead of "2", changing the AST.

Give DECLARE/PREPARE their own doc printers that recurse into the inner
statement's doc, so its secrets are printed by the lossless path. This needs
the inner statement's concrete type, so to_pretty/to_doc now bound
NestedStatement = Statement<Raw> -- satisfied by both AstInfo impls (Raw and
Aug), which is exactly the existing invariant.

Found by the parse_pretty_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DEALLOCATE [PREPARE] <name>` accepts an optional `PREPARE` keyword before the
name, so `DEALLOCATE PREPARE PREPARE` parses as the optional keyword plus a
statement named `prepare`. But `PREPARE` is a non-reserved keyword, so the name
printed bare as `DEALLOCATE prepare`, which on reparse swallows `prepare` as the
optional keyword and leaves no name ("Expected identifier, found EOF").

Add `PREPARE` to `can_be_printed_bare`'s exclusion list (alongside the existing
`AS`/`ANY`/`ALL`/`SOME`/`LIST` cases) so the name prints quoted as
`DEALLOCATE "prepare"`, which reparses unambiguously. Found by the
parse_display_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CREATE INDEX [<name>] [IN CLUSTER c] ON …` has an optional name followed by an
optional `IN CLUSTER` clause, so a bare `in` name re-lexes as the start of
`IN CLUSTER` and fails to reparse (`CREATE INDEX in ON t (a)` -> "Expected ON,
found IN"). Force the index name quoted when it is `in`.

This is local to the optional-name + `IN CLUSTER` ambiguity, not a
`can_be_printed_bare` case: `in` prints fine bare in a required-name position
(e.g. `CREATE SINK in IN CLUSTER c …`), so quoting it globally would
needlessly quote those. Found by the grammar_roundtrip fuzzer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pretty-printer builds `CREATE INDEX` via its own `doc_create_index` rather
than the statement's `AstDisplay`, so it needs the same fix as the previous
commit: a bare `in` index name re-lexes as the start of the optional
`IN CLUSTER` clause and fails to reparse. Force it quoted there as well. The
grammar_roundtrip fuzzer flagged this via the pretty oracle once the AstDisplay
path was fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CASE` reads a leading `WHEN` as the start of the first arm of a searched
`CASE` (one with no operand), so a bare `when` identifier used as the `CASE`
operand re-lexes as the arm keyword: `CASE when.a WHEN ...` reparses as
`CASE WHEN .a ...` and fails with "Expected an expression, found dot". A column
literally named `when` used as a CASE operand therefore did not survive an
`AstDisplay` round-trip — the printer dropped the quotes (and the operand is
often reached via a `CAST`/`::`, e.g. `CASE CAST("when".a AS jsonb) WHEN ...`
prints as `CASE when.a::jsonb WHEN ...`).

Add `WHEN` to `can_be_printed_bare`'s exclusion list so the identifier stays
quoted, alongside the existing context-sensitive cases (`AS`, `ANY`/`ALL`/
`SOME`, `LIST`, `PREPARE`). Found by the grammar-generating parse_expr_roundtrip
fuzzer; regression test added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Raise the iterative expression-chain bound from the small nesting recursion
limit to EXPR_CHAIN_LIMIT (1024): wide-but-flat chains are legitimate SQL
(test/limits runs 500-term sums and OR chains), so the cap must sit above
them while still rejecting the unbounded fuzz inputs that overflow the stack.
The round-trip/display fixes in this branch quote bare keyword identifiers
(any/all/some/list, …) and rustfmt the touched sql-pretty files. Refresh the
sqllogictest plan goldens and the show-create-sink testdrive golden to match.
Marking `ALL`/`DISTINCT` always-reserved fixed their display round-trip
(`SELECT "all"` no longer drifted to `SELECT ALL`), but `is_always_reserved`
is not display-only: `Parser::parse_prefix` rejects any always-reserved
keyword before falling through to the keyword-as-identifier path. So SQL that
previously parsed in scalar expression positions started erroring at parse
time — e.g. `SELECT * FROM t WHERE all = 1` and
`WHERE distinct IS NOT NULL` failed with "expected expression, but found
reserved keyword". Materialize is intentionally permissive about keywords as
identifiers depending on location, so this was a compatibility regression on
a stable SQL surface.

The round-trip problem only requires the printer to quote these identifiers,
not the parser to reject them. Remove `ALL | DISTINCT` from
`is_always_reserved` and instead force quoting in the display-only path:
add `DISTINCT` to `Ident::can_be_printed_bare` (`ALL` is already covered by
the existing `ANY | ALL | SOME` quantifier branch). Display behavior is
unchanged — both still print quoted — while bare `all`/`distinct` parse
again as ordinary identifiers.

Adds `test_select_quantifier_keyword_bare_identifier`, which asserts both the
display round-trip and parse acceptance; the existing
`keyword_identifier_roundtrip_audit` skips inputs that fail the initial
parse, so it could not have caught the acceptance regression. Reverts the two
now-stale parse-error goldens (testdata/select, table_func.slt) to the
"Expected right parenthesis" message that surfaces once `DISTINCT` parses.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@def- def- force-pushed the fuzz-sql branch 2 times, most recently from cee22dd to e1930c0 Compare June 16, 2026 08:46
Comment thread test/sqllogictest/pretty.slt Outdated
Comment on lines 117 to 118

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely stale comment, too.

@def- def- enabled auto-merge (squash) June 16, 2026 09:03
@def- def- merged commit 72277f8 into MaterializeInc:main Jun 16, 2026
117 checks passed
@def- def- deleted the fuzz-sql branch June 16, 2026 09:08
def- added a commit that referenced this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants