Skip to content

[SPARK-57780][SQL] Do not classify Postgres permission-denied as a syntax error#56899

Open
vladanvasi-db wants to merge 1 commit into
apache:masterfrom
vladanvasi-db:effort/postgres-syntax-error
Open

[SPARK-57780][SQL] Do not classify Postgres permission-denied as a syntax error#56899
vladanvasi-db wants to merge 1 commit into
apache:masterfrom
vladanvasi-db:effort/postgres-syntax-error

Conversation

@vladanvasi-db

@vladanvasi-db vladanvasi-db commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

PostgresDialect.isSyntaxErrorBestEffort classified any SQLState in class 42
as a syntax error:

Option(exception.getSQLState).exists(_.startsWith("42"))

Per the Postgres error code reference,
class 42 is "Syntax Error or Access Rule Violation", so it bundles genuine
syntax errors together with authorization failures. In particular it covers
42501 (insufficient_privilege, e.g. "permission denied for table ..."),
which is an access rule violation, not a syntax error.

This PR:

  • Excludes 42501 from the class-42 check.
  • Additionally guards by message, so access-control failures
    ("permission denied", "access denied", "unauthorized") that some
    Postgres-compatible engines surface under the generic 42000 state are never
    classified as syntax errors.

This restores the method's documented contract that a true result is
guaranteed to be a syntax error.

Why are the changes needed?

When a federated foreign-table schema-resolution probe (SELECT * FROM ... WHERE 1=0)
hits a table the connection role cannot read (e.g. a pg_catalog system table),
the resulting permission error was classified as a syntax error. This both
misled users (an authorization failure reported as a syntax error) and polluted
syntax-error monitoring with non-actionable noise.

Does this PR introduce any user-facing change?

No. Genuine syntax errors are still classified as such; only Postgres
permission/access-control failures are no longer misreported as syntax errors.

How was this patch tested?

Added unit tests to PostgresDialectSuite covering:

  • a genuine syntax error (42601) — still classified as a syntax error;
  • access-control failures under both 42501 and the generic 42000 state
    (permission denied, access denied, unauthorized) — not syntax errors;
  • a non-class-42 SQLState (28000) — not a syntax error;
  • a null SQLState — not a syntax error.

The suite passes (14 tests, 0 failures).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

…ntax error

### What changes were proposed in this pull request?

`PostgresDialect.isSyntaxErrorBestEffort` classified any SQLState in class
`42` as a syntax error. Postgres class 42 is "Syntax Error or Access Rule
Violation", so it also covers `42501` (`insufficient_privilege` ->
"permission denied for table ..."). This excludes `42501` from the check, and
additionally guards by message so access-control failures ("permission
denied", "access denied", "unauthorized") that some Postgres-compatible
engines report under the generic `42000` state are never treated as syntax
errors.

### How was this patch tested?

Added unit tests to `PostgresDialectSuite` covering a genuine syntax error
(`42601`), access-control failures under `42501`/`42000`, a non-class-42
SQLState, and a null SQLState. Ran the suite: 14 tests pass.

Co-authored-by: Isaac
@vladanvasi-db vladanvasi-db changed the title [SPARK-57780][SQL] Do not classify Postgres permission-denied as a sy… [SPARK-57780][SQL] Do not classify Postgres permission-denied as a syntax error Jun 30, 2026

// See https://www.postgresql.org/docs/current/errcodes-appendix.html
// Class 42 is "Syntax Error or Access Rule Violation", so it bundles genuine syntax errors
// together with authorization failures. 42501 is the insufficient_privilege state (e.g.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too verbose, can we make this more concise, e.g.

// Class 42 mixes syntax and auth errors (https://www.postgresql.org/docs/current/errcodes-appendix.html).
// Filter out 42501 and check message keywords because some Postgres-compatible engines 
// use generic 42000 for auth failures. Never wrap these as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR.

// must never be wrapped as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR.
override def isSyntaxErrorBestEffort(exception: SQLException): Boolean = {
Option(exception.getSQLState).exists(_.startsWith("42"))
lazy val isAccessControlError = Option(exception.getMessage)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get permission error, we should report table not found as well. Can you take a look on method isObjectNotFoundException and check whether we need to update it?

verify(conn).setAutoCommit(false)
}

test("isSyntaxErrorBestEffort: genuine syntax error (42601) is classified as a syntax error") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have docker instance here, can we create new user and do integration testing as well

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 2 non-blocking, 0 nits.
Correct, well-tested, strict improvement — it only removes true results (42501 + access-control), narrowing the over-broad class-42 match toward the method's contract with no regression. Two non-blocking notes from tracing the consumers and the sibling dialects:

Suggestions (2)

  • Same bug unfixed in five sibling dialects — the exact pattern this PR fixes (Option(exception.getSQLState).exists(_.startsWith("42"))) is copy-identical in DatabricksDialect, TeradataDialect, DerbyDialect, H2Dialect, and DB2Dialect, which all still classify class-42 access-rule violations (incl. 42501 insufficient_privilege) as syntax errors. Consider lifting the 42501 + access-control exclusion into the base JdbcDialect.isSyntaxErrorBestEffort (or a shared helper) so every class-42 dialect benefits, rather than patching Postgres alone. (Snowflake/MySQL/Oracle use exact "42000" and MsSqlServer matches by message, so those are unaffected.)
  • PostgresDialect.scala:276: the "restores the contract" framing is only partial — non-syntax class-42 codes (42P01, 42703) still return true, and that's live at the query-execution path — see inline.

Verification

  • Logic: 42501 and 42000+access-control excluded; genuine 42601 still true; non-42 and null false; Locale is imported. Strict narrowing (only removes true results) — no regression vs the prior startsWith("42").
  • Consumer trace: isSyntaxErrorBestEffort is consumed at JDBCRDD:82 (resolveTable) and JDBCRDD:373 (query execution). In resolveTable, the preceding isObjectNotFoundException (Postgres: 42P01/3F000/42704) masks 42P01; but JDBCRDD:373 has no not-found pre-check, so the residual 42P01/42703 mislabel as ...SYNTAX_ERROR.DURING_QUERY_EXECUTION is reachable there. A 42501 permission error now matches neither case and propagates to the generic JDBC error path (the de-classification this PR intends).

message.contains("access denied") ||
message.contains("unauthorized")
}
Option(exception.getSQLState).exists(s => s.startsWith("42") && s != "42501") &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base contract (JdbcDialect Scaladoc) is "if this method returns true, the exception is guaranteed to be a syntax error." This denylist (class 42 minus 42501 minus access-control messages) still returns true for non-syntax class-42 codes — e.g. 42P01 (undefined_table), 42703 (undefined_column) — so the contract is only partially restored.

It's not purely cosmetic: in resolveTable the preceding isObjectNotFoundException (Postgres matches 42P01/3F000/42704) masks 42P01, but the query-execution path (JDBCRDD ~L373) has no not-found pre-check, so there a 42P01/42703 is still wrapped as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR.DURING_QUERY_EXECUTION.

For a genuine "true ⟹ syntax" guarantee, consider allowlisting the real syntax states (e.g. 42601, 42P02) rather than denylisting from class 42. Non-blocking — this extends @urosstan-db's thread about class-42 codes beyond 42501.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants