Skip to content

fix(unparser): make BigQueryDialect more robust#21296

Merged
alamb merged 3 commits into
apache:mainfrom
spiceai:sgrebnov/0410-bigquery-unparsing-improve
Apr 19, 2026
Merged

fix(unparser): make BigQueryDialect more robust#21296
alamb merged 3 commits into
apache:mainfrom
spiceai:sgrebnov/0410-bigquery-unparsing-improve

Conversation

@sgrebnov
Copy link
Copy Markdown
Member

@sgrebnov sgrebnov commented Apr 1, 2026

Which issue does this PR close?

PR improves BigQueryDialect dialect to make generated SQL BigQuery-compatible (fix execution errors).

What changes are included in this PR?

Eight Dialect trait overrides added to BigQueryDialect:

https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types

  1. date_field_extract_styleExtract + scalar_function_to_sql_overrides

BigQuery does not support date_part(). TPC-H Q7, Q8, Q9 fail with Function not found: date_part.

Before (error) After
date_part('YEAR', l_shipdate) EXTRACT(YEAR FROM l_shipdate)
  1. interval_styleSQLStandard

BigQuery does not support PostgreSQL-style interval abbreviations. TPC-H Q4, Q20 fail with Syntax error: Unexpected ")".

Before (error) After
INTERVAL '3 MONS' INTERVAL '3' MONTH
  1. float64_ast_dtypeFloat64

BigQuery does not support DOUBLE. Fails with Type not found: DOUBLE.

Before (error) After
CAST(a AS DOUBLE) CAST(a AS FLOAT64)
  1. supports_column_alias_in_table_aliasfalse

BigQuery does not support column aliases in table alias definitions. Fails with Expected ")" but got "(".

Before (error) After
SELECT c.key FROM (...) AS c(key) SELECT c.key FROM (SELECT o_orderkey AS key FROM orders) AS c
  1. utf8_cast_dtype + large_utf8_cast_dtypeString

BigQuery does not support VARCHAR/TEXT. Fails with Type not found: VARCHAR, Type not found: Text.

Before (error) After
CAST(a AS VARCHAR) CAST(a AS STRING)
CAST(a AS TEXT) CAST(a AS STRING)
  1. int64_cast_dtypeInt64

  2. timestamp_cast_dtypeTimestamp (no timezone qualifier)

https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type

BigQuery does not support TIMESTAMP WITH TIME ZONE. Fails with Syntax error: Expected ')' or keyword FORMAT but got keyword WITH. TIMESTAMP should be used (preserves time zone information)/

Before (error) After
CAST(a AS TIMESTAMP WITH TIME ZONE) CAST(a AS TIMESTAMP)

Are these changes tested?

Yes. Added test_bigquery_dialect_overrides unit test covering all eight overrides, verified against BigQuery before and after.

Are there any user-facing changes?

No API changes. BigQueryDialect now generates valid BigQuery SQL for the affected expressions.

@github-actions github-actions Bot added the sql SQL Planner label Apr 3, 2026
Copy link
Copy Markdown
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgrebnov, I doubled checked with sqlglot and LGTM.

Comment thread datafusion/sql/src/unparser/expr.rs Outdated
@sgrebnov sgrebnov requested a review from nuno-faria April 17, 2026 08:25
@alamb alamb added this pull request to the merge queue Apr 19, 2026
Merged via the queue into apache:main with commit 7e1a710 Apr 19, 2026
36 checks passed
Rich-T-kid pushed a commit to Rich-T-kid/datafusion that referenced this pull request Apr 21, 2026
## Which issue does this PR close?

PR improves `BigQueryDialect` dialect to make generated SQL
`BigQuery`-compatible (fix execution errors).

## What changes are included in this PR?

Eight `Dialect` trait overrides added to `BigQueryDialect`:


https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types

1. `date_field_extract_style` → `Extract` +
`scalar_function_to_sql_overrides`

BigQuery does not support `date_part()`. TPC-H Q7, Q8, Q9 fail with
`Function not found: date_part`.

| Before (error) | After |
|---|---|
| `date_part('YEAR', l_shipdate)` | `EXTRACT(YEAR FROM l_shipdate)` |

2. `interval_style` → `SQLStandard`

BigQuery does not support PostgreSQL-style interval abbreviations. TPC-H
Q4, Q20 fail with `Syntax error: Unexpected ")"`.

| Before (error) | After |
|---|---|
| `INTERVAL '3 MONS'` | `INTERVAL '3' MONTH` |

3. `float64_ast_dtype` → `Float64`

BigQuery does not support `DOUBLE`. Fails with `Type not found: DOUBLE`.

| Before (error) | After |
|---|---|
| `CAST(a AS DOUBLE)` | `CAST(a AS FLOAT64)` |

4. `supports_column_alias_in_table_alias` → `false`

BigQuery does not support column aliases in table alias definitions.
Fails with `Expected ")" but got "("`.

| Before (error) | After |
|---|---|
| `SELECT c.key FROM (...) AS c(key)` | `SELECT c.key FROM (SELECT
o_orderkey AS key FROM orders) AS c` |

5. `utf8_cast_dtype` + `large_utf8_cast_dtype` → `String`

BigQuery does not support `VARCHAR`/`TEXT`. Fails with `Type not found:
VARCHAR`, `Type not found: Text`.

| Before (error) | After |
|---|---|
| `CAST(a AS VARCHAR)` | `CAST(a AS STRING)` |
| `CAST(a AS TEXT)` | `CAST(a AS STRING)` |

6. ~`int64_cast_dtype` → `Int64`~


7. `timestamp_cast_dtype` → `Timestamp` (no timezone qualifier)


https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type

BigQuery does not support `TIMESTAMP WITH TIME ZONE`. Fails with `Syntax
error: Expected ')' or keyword FORMAT but got keyword WITH`. `TIMESTAMP`
should be used (preserves time zone information)/

| Before (error) | After |
|---|---|
| `CAST(a AS TIMESTAMP WITH TIME ZONE)` | `CAST(a AS TIMESTAMP)` |

## Are these changes tested?

Yes. Added `test_bigquery_dialect_overrides` unit test covering all
eight overrides, verified against BigQuery before and after.

## Are there any user-facing changes?

No API changes. `BigQueryDialect` now generates valid BigQuery SQL for
the affected expressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants