[SPARK-57780][SQL] Do not classify Postgres permission-denied as a syntax error#56899
[SPARK-57780][SQL] Do not classify Postgres permission-denied as a syntax error#56899vladanvasi-db wants to merge 1 commit into
Conversation
…ntax error
### What changes were proposed in this pull request?
`PostgresDialect.isSyntaxErrorBestEffort` classified any SQLState in class
`42` as a syntax error. Postgres class 42 is "Syntax Error or Access Rule
Violation", so it also covers `42501` (`insufficient_privilege` ->
"permission denied for table ..."). This excludes `42501` from the check, and
additionally guards by message so access-control failures ("permission
denied", "access denied", "unauthorized") that some Postgres-compatible
engines report under the generic `42000` state are never treated as syntax
errors.
### How was this patch tested?
Added unit tests to `PostgresDialectSuite` covering a genuine syntax error
(`42601`), access-control failures under `42501`/`42000`, a non-class-42
SQLState, and a null SQLState. Ran the suite: 14 tests pass.
Co-authored-by: Isaac
|
|
||
| // See https://www.postgresql.org/docs/current/errcodes-appendix.html | ||
| // Class 42 is "Syntax Error or Access Rule Violation", so it bundles genuine syntax errors | ||
| // together with authorization failures. 42501 is the insufficient_privilege state (e.g. |
There was a problem hiding this comment.
Too verbose, can we make this more concise, e.g.
// Class 42 mixes syntax and auth errors (https://www.postgresql.org/docs/current/errcodes-appendix.html).
// Filter out 42501 and check message keywords because some Postgres-compatible engines
// use generic 42000 for auth failures. Never wrap these as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR.
| // must never be wrapped as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR. | ||
| override def isSyntaxErrorBestEffort(exception: SQLException): Boolean = { | ||
| Option(exception.getSQLState).exists(_.startsWith("42")) | ||
| lazy val isAccessControlError = Option(exception.getMessage) |
There was a problem hiding this comment.
When we get permission error, we should report table not found as well. Can you take a look on method isObjectNotFoundException and check whether we need to update it?
| verify(conn).setAutoCommit(false) | ||
| } | ||
|
|
||
| test("isSyntaxErrorBestEffort: genuine syntax error (42601) is classified as a syntax error") { |
There was a problem hiding this comment.
Since we have docker instance here, can we create new user and do integration testing as well
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 2 non-blocking, 0 nits.
Correct, well-tested, strict improvement — it only removes true results (42501 + access-control), narrowing the over-broad class-42 match toward the method's contract with no regression. Two non-blocking notes from tracing the consumers and the sibling dialects:
Suggestions (2)
- Same bug unfixed in five sibling dialects — the exact pattern this PR fixes (
Option(exception.getSQLState).exists(_.startsWith("42"))) is copy-identical inDatabricksDialect,TeradataDialect,DerbyDialect,H2Dialect, andDB2Dialect, which all still classify class-42 access-rule violations (incl.42501 insufficient_privilege) as syntax errors. Consider lifting the42501+ access-control exclusion into the baseJdbcDialect.isSyntaxErrorBestEffort(or a shared helper) so every class-42 dialect benefits, rather than patching Postgres alone. (Snowflake/MySQL/Oracle use exact"42000"and MsSqlServer matches by message, so those are unaffected.) PostgresDialect.scala:276: the "restores the contract" framing is only partial — non-syntax class-42 codes (42P01,42703) still returntrue, and that's live at the query-execution path — see inline.
Verification
- Logic:
42501and42000+access-control excluded; genuine42601stilltrue; non-42 and null false;Localeis imported. Strict narrowing (only removestrueresults) — no regression vs the priorstartsWith("42"). - Consumer trace:
isSyntaxErrorBestEffortis consumed atJDBCRDD:82(resolveTable) andJDBCRDD:373(query execution). InresolveTable, the precedingisObjectNotFoundException(Postgres:42P01/3F000/42704) masks42P01; butJDBCRDD:373has no not-found pre-check, so the residual42P01/42703mislabel as...SYNTAX_ERROR.DURING_QUERY_EXECUTIONis reachable there. A42501permission error now matches neither case and propagates to the generic JDBC error path (the de-classification this PR intends).
| message.contains("access denied") || | ||
| message.contains("unauthorized") | ||
| } | ||
| Option(exception.getSQLState).exists(s => s.startsWith("42") && s != "42501") && |
There was a problem hiding this comment.
The base contract (JdbcDialect Scaladoc) is "if this method returns true, the exception is guaranteed to be a syntax error." This denylist (class 42 minus 42501 minus access-control messages) still returns true for non-syntax class-42 codes — e.g. 42P01 (undefined_table), 42703 (undefined_column) — so the contract is only partially restored.
It's not purely cosmetic: in resolveTable the preceding isObjectNotFoundException (Postgres matches 42P01/3F000/42704) masks 42P01, but the query-execution path (JDBCRDD ~L373) has no not-found pre-check, so there a 42P01/42703 is still wrapped as JDBC_EXTERNAL_ENGINE_SYNTAX_ERROR.DURING_QUERY_EXECUTION.
For a genuine "true ⟹ syntax" guarantee, consider allowlisting the real syntax states (e.g. 42601, 42P02) rather than denylisting from class 42. Non-blocking — this extends @urosstan-db's thread about class-42 codes beyond 42501.
What changes were proposed in this pull request?
PostgresDialect.isSyntaxErrorBestEffortclassified any SQLState in class42as a syntax error:
Per the Postgres error code reference,
class 42 is "Syntax Error or Access Rule Violation", so it bundles genuine
syntax errors together with authorization failures. In particular it covers
42501(insufficient_privilege, e.g."permission denied for table ..."),which is an access rule violation, not a syntax error.
This PR:
42501from the class-42 check.(
"permission denied","access denied","unauthorized") that somePostgres-compatible engines surface under the generic
42000state are neverclassified as syntax errors.
This restores the method's documented contract that a
trueresult isguaranteed to be a syntax error.
Why are the changes needed?
When a federated foreign-table schema-resolution probe (
SELECT * FROM ... WHERE 1=0)hits a table the connection role cannot read (e.g. a
pg_catalogsystem table),the resulting permission error was classified as a syntax error. This both
misled users (an authorization failure reported as a syntax error) and polluted
syntax-error monitoring with non-actionable noise.
Does this PR introduce any user-facing change?
No. Genuine syntax errors are still classified as such; only Postgres
permission/access-control failures are no longer misreported as syntax errors.
How was this patch tested?
Added unit tests to
PostgresDialectSuitecovering:42601) — still classified as a syntax error;42501and the generic42000state(
permission denied,access denied,unauthorized) — not syntax errors;28000) — not a syntax error;The suite passes (14 tests, 0 failures).
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.8