Summary
DatabricksStatement.execute(String sql) in databricks-jdbc 3.3.1 returns true (JDBC-spec "first result is a ResultSet") for DML statements whose body contains the set operators UNION, INTERSECT, or EXCEPT, even when those keywords appear inside a subquery or CTE of an INSERT / UPDATE / DELETE / MERGE statement. This violates JDBC spec and breaks frameworks that use !execute() as the DML detector, such as Slick (a Scala ORM).
Environment
- databricks-jdbc:
3.3.1
- Observed on both SEA and Thrift backends (
UseThriftClient=1 does not change the behaviour — the decision is made client-side before any backend call)
- Does not reproduce on Simba legacy driver
2.7.5
Root cause (from source)
src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java:724-755 shouldReturnResultSet(String, List<String>) pre-parses the SQL string with the regex list in src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java:139-177:
public static final Pattern UNION_PATTERN =
Pattern.compile("\\s+UNION\\s+", Pattern.CASE_INSENSITIVE);
public static final Pattern INTERSECT_PATTERN =
Pattern.compile("\\s+INTERSECT\\s+", Pattern.CASE_INSENSITIVE);
public static final Pattern EXCEPT_PATTERN =
Pattern.compile("\\s+EXCEPT\\s+", Pattern.CASE_INSENSITIVE);
These patterns are non-anchored and matched via find(). Any occurrence anywhere in the SQL — even deep inside a CTE or subquery of an outer DML — forces execute() to return true.
Minimal reproduction
INSERT INTO my_catalog.my_schema.target_table
SELECT *
FROM (
SELECT col1, col2 FROM my_catalog.my_schema.source_table WHERE 1 = 0
UNION ALL
SELECT col1, col2 FROM my_catalog.my_schema.source_table WHERE 1 = 0
) t
Java test:
try (Statement st = conn.createStatement()) {
boolean hasResultSet = st.execute(sql);
System.out.println("hasResultSet = " + hasResultSet);
System.out.println("updateCount = " + st.getUpdateCount());
}
Observed:
hasResultSet = true
updateCount = -1
Expected (per JDBC spec, since this is a DML):
hasResultSet = false
updateCount = <number of rows affected>
Removing the UNION ALL (keeping one of the two subqueries) makes the same INSERT return hasResultSet = false as expected.
The same issue triggers on:
INSERT INTO t WITH cte AS (... UNION ...) SELECT ...
INSERT OVERWRITE DIRECTORY 's3://...' USING CSV ... SELECT ... INTERSECT SELECT ...
- Any DML that uses
SELECT * EXCEPT (col) column-exclusion syntax — \s+EXCEPT\s+ matches there too even though it is not a set operator in that position
Impact on Slick (sqlu)
Slick's StatementInvoker.results uses:
val doUpdate = !st.execute()
if (doUpdate) Left(st.getUpdateCount)
else Right(iterator over st.getResultSet) // extracted with GetUpdateValue
where GetResult.GetUpdateValue unconditionally throws:
object GetUpdateValue extends GetResult[Int] {
def apply(pr: PositionedResult) =
throw new SlickException("Update statements should not return a ResultSet")
}
So every Slick sqlu against Databricks whose SQL body contains UNION / INTERSECT / EXCEPT throws on execute() = true, regardless of the actual shape of the returned ResultSet:
- Non-empty ResultSet (e.g.
INSERT INTO t ... UNION ... returning num_affected_rows metrics) → "Update statements should not return a ResultSet"
- Empty ResultSet (e.g.
INSERT OVERWRITE DIRECTORY 's3://...' ... INTERSECT ...) → NoSuchElementException("empty result set")
This is a regression from the Simba 2.x driver, which returned execute() = false for the same statements.
Why SQL-side workaround is often not feasible
INSERT OVERWRITE DIRECTORY combined with set operators has no equivalent rewrite that preserves both semantics and performance.
SELECT * EXCEPT (col) is a core Databricks SQL feature with no widely-used alternative.
Proposed fix
- Remove the non-anchored set-operator patterns from
shouldReturnResultSet's match list. A top-level SELECT ... UNION ... or (SELECT ...) UNION (SELECT ...) is already covered by SELECT_PATTERN / FROM_PATTERN / WITH_PATTERN via anchored prefix match after comment stripping. The non-anchored patterns only matter in subquery / CTE positions, where they should not change the enclosing statement's classification.
- Alternatively, if prefix-based heuristics must be retained: either switch to a real SQL tokenizer that understands CTE / subquery boundaries, or anchor all three patterns to the statement start after prefix-stripping parentheses.
- Add
INSERT_PATTERN (already defined in constants) to the negative-match path of shouldReturnResultSet so INSERT-prefixed DMLs cannot be promoted to ResultSet mode.
- Document the behaviour and provide a URL parameter (e.g.
NonRowcountQueryPrefixes=-UNION,-INTERSECT,-EXCEPT) to let applications opt out of the legacy behaviour without code changes.
Related changelog
v3.2.1 "getResultSet() now returns null in case of DML statements to honour JDBC spec" — this fix only applies when shouldReturnResultSet == false, so it does not help the cases reported here.
Summary
DatabricksStatement.execute(String sql)in databricks-jdbc3.3.1returnstrue(JDBC-spec "first result is a ResultSet") for DML statements whose body contains the set operatorsUNION,INTERSECT, orEXCEPT, even when those keywords appear inside a subquery or CTE of an INSERT / UPDATE / DELETE / MERGE statement. This violates JDBC spec and breaks frameworks that use!execute()as the DML detector, such as Slick (a Scala ORM).Environment
3.3.1UseThriftClient=1does not change the behaviour — the decision is made client-side before any backend call)2.7.5Root cause (from source)
src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java:724-755shouldReturnResultSet(String, List<String>)pre-parses the SQL string with the regex list insrc/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java:139-177:These patterns are non-anchored and matched via
find(). Any occurrence anywhere in the SQL — even deep inside a CTE or subquery of an outer DML — forcesexecute()to returntrue.Minimal reproduction
Java test:
Observed:
Expected (per JDBC spec, since this is a DML):
Removing the
UNION ALL(keeping one of the two subqueries) makes the same INSERT returnhasResultSet = falseas expected.The same issue triggers on:
INSERT INTO t WITH cte AS (... UNION ...) SELECT ...INSERT OVERWRITE DIRECTORY 's3://...' USING CSV ... SELECT ... INTERSECT SELECT ...SELECT * EXCEPT (col)column-exclusion syntax —\s+EXCEPT\s+matches there too even though it is not a set operator in that positionImpact on Slick (
sqlu)Slick's
StatementInvoker.resultsuses:where
GetResult.GetUpdateValueunconditionally throws:So every Slick
sqluagainst Databricks whose SQL body containsUNION/INTERSECT/EXCEPTthrows onexecute() = true, regardless of the actual shape of the returned ResultSet:INSERT INTO t ... UNION ...returningnum_affected_rowsmetrics) →"Update statements should not return a ResultSet"INSERT OVERWRITE DIRECTORY 's3://...' ... INTERSECT ...) →NoSuchElementException("empty result set")This is a regression from the Simba
2.xdriver, which returnedexecute() = falsefor the same statements.Why SQL-side workaround is often not feasible
INSERT OVERWRITE DIRECTORYcombined with set operators has no equivalent rewrite that preserves both semantics and performance.SELECT * EXCEPT (col)is a core Databricks SQL feature with no widely-used alternative.Proposed fix
shouldReturnResultSet's match list. A top-levelSELECT ... UNION ...or(SELECT ...) UNION (SELECT ...)is already covered bySELECT_PATTERN/FROM_PATTERN/WITH_PATTERNvia anchored prefix match after comment stripping. The non-anchored patterns only matter in subquery / CTE positions, where they should not change the enclosing statement's classification.INSERT_PATTERN(already defined in constants) to the negative-match path ofshouldReturnResultSetso INSERT-prefixed DMLs cannot be promoted to ResultSet mode.NonRowcountQueryPrefixes=-UNION,-INTERSECT,-EXCEPT) to let applications opt out of the legacy behaviour without code changes.Related changelog
v3.2.1 "
getResultSet()now returns null in case of DML statements to honour JDBC spec" — this fix only applies whenshouldReturnResultSet == false, so it does not help the cases reported here.