Skip to content

Commit f91aacc

Browse files
authored
Fix SQLInterpolator treating ? in comments and literals as placeholders + shorten next_changelog (#1424)
## Summary Fixes #1331 + some cosmetic change on next changelog When `supportManyParameters=1`, `SQLInterpolator.interpolateSQL` split the SQL on every `?` and counted every occurrence as a placeholder. Question marks inside line/block comments, single-quoted strings, double-quoted identifiers, and backtick-quoted identifiers were all counted, producing spurious `"Parameter count does not match"` errors for queries such as: ```sql -- does this work? select 'hello?', * from mytable where id = ? ``` This had 3 `?` characters but only 1 real placeholder, so the driver rejected the perfectly valid binding of 1 parameter. ## What changed - Added `SqlCommentParser.findPlaceholderPositions(sql)` which walks the SQL with the existing comment/literal state machine and returns the source indices of `?` characters that appear in `State.NORMAL` only. - Reworked `SQLInterpolator.interpolateSQL` to use those positions: a single pass that slices the original SQL between placeholders and substitutes formatted parameter values. Comments, string literals, and quoted identifiers are preserved verbatim in the output. - Removed the now-redundant `countPlaceholders` helper that scanned every `?`. `DatabricksParameterMetaData.countParameters` already used `SqlCommentParser.forEachNonCommentChar` and is unaffected. ## Test plan - [x] Existing `SQLInterpolatorTest` cases (basic interpolation, mixed types, escaped quotes, mismatch errors, etc.) all pass. - [x] Added unit tests for `SQLInterpolator` covering `?` inside line comments, block comments, single-quoted strings, double-quoted identifiers, backtick identifiers, and a combined case. - [x] Added unit tests for `SqlCommentParser.findPlaceholderPositions` covering null/empty input, basic positions, comments, literals, quoted identifiers, and escaped quotes. - [x] `mvn spotless:apply` clean. --------- Signed-off-by: samikshya-chand_data <samikshya.chand@databricks.com>
1 parent 547b38f commit f91aacc

5 files changed

Lines changed: 331 additions & 66 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,29 +19,9 @@
1919
- Arrow schema deserialization failures (Thrift metadata path) now surface a dedicated driver error code `ARROW_SCHEMA_PARSING_ERROR` (vendor code `22000`) and a proper SQLSTATE `22000` (Data Exception) on the thrown `SQLException`, instead of the generic `RESULT_SET_ERROR` (1004) and the enum name as SQLSTATE. The exception message is unchanged.
2020

2121
### Fixed
22+
- Fixed `?` characters inside SQL comments, string literals, and quoted identifiers being incorrectly counted as parameter placeholders when `supportManyParameters=1`. `SQLInterpolator` now uses `SqlCommentParser` to locate only real placeholders. Fixes #1331.
2223
- Fixed `MetadataOperationTimeout` not being applied when metadata operations use SHOW commands. Operations like `getTables`, `getSchemas`, and `getColumns` now respect the `MetadataOperationTimeout` connection property instead of hanging indefinitely with no timeout.
23-
24-
- Reclassify transient/mis-categorized server errors so callers can identify
25-
retryable failures. The remap is applied at all Thrift error sites
26-
(`checkResponseForErrors`, `executeAsync`, `verifySuccessStatus`, and the
27-
polling status handler) so the same server failure surfaces with the same
28-
SQL state regardless of which response carries it.
29-
- Unity Catalog unavailability (`[UC_CLIENT_EXCEPTION]`, previously `XXUCC`)
30-
and parquet read / connection-acquisition deadlines
31-
(`[PARQUET_FAILED_READ_FOOTER]`, `DEADLINE_EXCEEDED: acquiring connection`)
32-
are now reported with SQL state `08S01` (communication link failure).
33-
- Server-side `java.util.ConcurrentModificationException` is now reported
34-
with SQL state `40001` (serialization failure) instead of the misleading
35-
`42000`. The remap only applies when the original SQL state is `42000` so
36-
unrelated `42xxx` states (e.g. `42501` insufficient privilege) are
37-
preserved.
38-
Notes for callers and operators:
39-
- Callers branching on the legacy `XXUCC`/`42000` states for these failures
40-
must update to `08S01`/`40001`. The driver logs the original→remapped
41-
state at `INFO` level for traceability.
42-
- The driver's failure telemetry uses `sqlState` as the error-name field,
43-
so dashboards/alerts keyed on `XXUCC` or `42000` for these specific
44-
failure modes will need to be updated to the new states.
24+
- Reclassify transient server errors to standard SQL states (08S01, 40001) across all Thrift error sites. This ensures UC unavailability and concurrent modification errors surface consistently for better retry handling. Note: Dashboards and branching logic keyed on legacy XXUCC or 42000 must be updated.
4525

4626
---
4727
*Note: When making changes, please add your change under the appropriate section

src/main/java/com/databricks/jdbc/common/util/SQLInterpolator.java

Lines changed: 30 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@
66
import com.databricks.jdbc.exception.DatabricksValidationException;
77
import com.databricks.jdbc.model.core.ColumnInfoTypeName;
88
import java.util.Map;
9-
import java.util.regex.Matcher;
10-
import java.util.regex.Pattern;
119

1210
public class SQLInterpolator {
1311
protected static String escapeInputs(String input) {
@@ -65,22 +63,12 @@ private static String formatObject(ImmutableSqlParameter object) {
6563
}
6664
}
6765

68-
private static int countPlaceholders(String sql) {
69-
int count = 0;
70-
for (char c : sql.toCharArray()) {
71-
if (c == '?') {
72-
count++;
73-
}
74-
}
75-
return count;
76-
}
77-
7866
/**
7967
* Interpolates the given SQL string by replacing placeholders with the provided parameters.
8068
*
81-
* <p>This method splits the SQL string by placeholders (question marks) and replaces each
82-
* placeholder with the corresponding parameter from the provided map. The map keys are 1-based
83-
* indexes, aligning with the SQL parameter positions.
69+
* <p>Only '?' characters that appear outside of comments, string literals, and quoted identifiers
70+
* are treated as placeholders. The map keys are 1-based indexes, aligning with the SQL parameter
71+
* positions.
8472
*
8573
* @param sql the SQL string containing placeholders ('?') to be replaced.
8674
* @param params a map of parameters where the key is the 1-based index of the placeholder in the
@@ -91,37 +79,48 @@ private static int countPlaceholders(String sql) {
9179
*/
9280
public static String interpolateSQL(String sql, Map<Integer, ImmutableSqlParameter> params)
9381
throws DatabricksValidationException {
94-
String[] parts = sql.split("\\?");
95-
if (countPlaceholders(sql) != params.size()) {
82+
int[] positions = SqlCommentParser.findPlaceholderPositions(sql);
83+
if (positions.length != params.size()) {
9684
throw new DatabricksValidationException(
9785
"Parameter count does not match. Provide equal number of parameters as placeholders. SQL "
9886
+ sql);
9987
}
100-
StringBuilder sb = new StringBuilder();
101-
for (int i = 0; i < parts.length; i++) {
102-
sb.append(parts[i]);
103-
if (i < params.size()) {
104-
sb.append(formatObject(params.get(i + 1))); // because we have 1 based index in params
105-
}
88+
StringBuilder sb = new StringBuilder(sql.length() + 16);
89+
int last = 0;
90+
for (int i = 0; i < positions.length; i++) {
91+
int pos = positions[i];
92+
sb.append(sql, last, pos);
93+
sb.append(formatObject(params.get(i + 1))); // 1-based index in params
94+
last = pos + 1;
95+
}
96+
if (last < sql.length()) {
97+
sb.append(sql, last, sql.length());
10698
}
10799
return sb.toString();
108100
}
109101

110102
/**
111-
* Surrounds unquoted placeholders (?) with single quotes, preserving already quoted ones. This is
112-
* crucial for DESCRIBE QUERY commands as unquoted placeholders will cause a parse_syntax_error.
103+
* Surrounds real placeholders (?) with single quotes, leaving '?' characters that appear inside
104+
* comments, string literals, or quoted identifiers untouched. This is crucial for DESCRIBE QUERY
105+
* commands, where bare placeholders cause a parse_syntax_error.
113106
*/
114107
public static String surroundPlaceholdersWithQuotes(String sql) {
115108
if (sql == null || sql.isEmpty()) {
116109
return sql;
117110
}
118-
// This pattern matches any '?' that is NOT already inside single quotes
119-
StringBuilder sb = new StringBuilder();
120-
Matcher m = Pattern.compile("(?<!')\\?(?!')").matcher(sql);
121-
while (m.find()) {
122-
m.appendReplacement(sb, "'?'");
111+
int[] positions = SqlCommentParser.findPlaceholderPositions(sql);
112+
if (positions.length == 0) {
113+
return sql;
114+
}
115+
StringBuilder sb = new StringBuilder(sql.length() + positions.length * 2);
116+
int last = 0;
117+
for (int pos : positions) {
118+
sb.append(sql, last, pos).append("'?'");
119+
last = pos + 1;
120+
}
121+
if (last < sql.length()) {
122+
sb.append(sql, last, sql.length());
123123
}
124-
m.appendTail(sb);
125124
return sb.toString();
126125
}
127126
}

src/main/java/com/databricks/jdbc/common/util/SqlCommentParser.java

Lines changed: 58 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
package com.databricks.jdbc.common.util;
22

3+
import java.util.ArrayList;
4+
import java.util.List;
5+
36
/** Utility class for parsing out comments from SQL strings */
47
public class SqlCommentParser {
58

@@ -17,6 +20,16 @@ public interface SqlCharConsumer {
1720
void accept(State state, char c);
1821
}
1922

23+
/**
24+
* Index-aware variant of {@link SqlCharConsumer}. The third argument is the source-string index
25+
* of the emitted character. For synthetic characters (the space emitted after a comment), the
26+
* index is the position of the character that ended the comment.
27+
*/
28+
@FunctionalInterface
29+
public interface IndexedSqlCharConsumer {
30+
void accept(State state, char c, int sourceIndex);
31+
}
32+
2033
/**
2134
* Iterates over each character in the SQL string while keeping track of comment, string literal,
2235
* and identifier state. Each character that is not part of a comment calls the consumer with the
@@ -26,6 +39,15 @@ public interface SqlCharConsumer {
2639
* @param consumer called for each visible character with its parsing state
2740
*/
2841
public static void forEachNonCommentChar(String sql, SqlCharConsumer consumer) {
42+
forEachNonCommentChar(sql, (state, c, sourceIndex) -> consumer.accept(state, c));
43+
}
44+
45+
/**
46+
* Same as {@link #forEachNonCommentChar(String, SqlCharConsumer)} but the consumer also receives
47+
* the source-string index of each emitted character. Useful when callers need to relate emitted
48+
* characters back to positions in the original SQL (e.g. parameter placeholder discovery).
49+
*/
50+
public static void forEachNonCommentChar(String sql, IndexedSqlCharConsumer consumer) {
2951
if (sql == null || sql.isEmpty()) {
3052
return;
3153
}
@@ -48,42 +70,42 @@ public static void forEachNonCommentChar(String sql, SqlCharConsumer consumer) {
4870
i++; // skip '*'
4971
} else if (c == '\'') {
5072
state = State.IN_SINGLE_QUOTE;
51-
consumer.accept(state, c);
73+
consumer.accept(state, c, i);
5274
} else if (c == '"') {
5375
state = State.IN_DOUBLE_QUOTE;
54-
consumer.accept(state, c);
76+
consumer.accept(state, c, i);
5577
} else if (c == '`') {
5678
state = State.IN_BACKTICK;
57-
consumer.accept(state, c);
79+
consumer.accept(state, c, i);
5880
} else {
59-
consumer.accept(state, c);
81+
consumer.accept(state, c, i);
6082
}
6183
break;
6284

6385
case IN_SINGLE_QUOTE:
64-
consumer.accept(state, c);
86+
consumer.accept(state, c, i);
6587
if (c == '\'' && next == '\'') {
66-
consumer.accept(state, next);
88+
consumer.accept(state, next, i + 1);
6789
i++; // skip escaped quote
6890
} else if (c == '\'') {
6991
state = State.NORMAL;
7092
}
7193
break;
7294

7395
case IN_DOUBLE_QUOTE:
74-
consumer.accept(state, c);
96+
consumer.accept(state, c, i);
7597
if (c == '"' && next == '"') {
76-
consumer.accept(state, next);
98+
consumer.accept(state, next, i + 1);
7799
i++; // skip escaped quote
78100
} else if (c == '"') {
79101
state = State.NORMAL;
80102
}
81103
break;
82104

83105
case IN_BACKTICK:
84-
consumer.accept(state, c);
106+
consumer.accept(state, c, i);
85107
if (c == '`' && next == '`') {
86-
consumer.accept(state, next);
108+
consumer.accept(state, next, i + 1);
87109
i++; // skip escaped backtick
88110
} else if (c == '`') {
89111
state = State.NORMAL;
@@ -93,7 +115,7 @@ public static void forEachNonCommentChar(String sql, SqlCharConsumer consumer) {
93115
case IN_LINE_COMMENT:
94116
if (c == '\n' || c == '\r') {
95117
state = State.NORMAL;
96-
consumer.accept(state, ' ');
118+
consumer.accept(state, ' ', i);
97119
if (c == '\r' && next == '\n') {
98120
i++; // Treat \r\n as a single line ending
99121
}
@@ -110,7 +132,7 @@ public static void forEachNonCommentChar(String sql, SqlCharConsumer consumer) {
110132
i++; // skip '/'
111133
if (blockCommentDepth == 0) {
112134
state = State.NORMAL;
113-
consumer.accept(state, ' ');
135+
consumer.accept(state, ' ', i);
114136
}
115137
}
116138
// else: skip character (part of the comment)
@@ -119,6 +141,30 @@ public static void forEachNonCommentChar(String sql, SqlCharConsumer consumer) {
119141
}
120142
}
121143

144+
/**
145+
* Finds the source-string indices of placeholder ('?') characters that appear in {@link
146+
* State#NORMAL} state — i.e. not inside comments, string literals, or quoted identifiers. Used by
147+
* parameter binding to locate the real placeholders in a SQL statement.
148+
*
149+
* @param sql the SQL string to scan
150+
* @return source-string indices of placeholder '?' characters, in order
151+
*/
152+
public static int[] findPlaceholderPositions(String sql) {
153+
List<Integer> positions = new ArrayList<>();
154+
forEachNonCommentChar(
155+
sql,
156+
(state, c, sourceIndex) -> {
157+
if (state == State.NORMAL && c == '?') {
158+
positions.add(sourceIndex);
159+
}
160+
});
161+
int[] out = new int[positions.size()];
162+
for (int k = 0; k < out.length; k++) {
163+
out[k] = positions.get(k);
164+
}
165+
return out;
166+
}
167+
122168
/**
123169
* Removes all comments and extra whitespace from the SQL string.
124170
*

src/test/java/com/databricks/jdbc/common/util/SQLInterpolatorTest.java

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,78 @@ public void testStringWithNewline() throws DatabricksValidationException {
126126
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
127127
}
128128

129+
@Test
130+
public void testQuestionMarkInLineCommentNotTreatedAsPlaceholder()
131+
throws DatabricksValidationException {
132+
String sql = "-- does this work?\nselect * from mytable where id = ?";
133+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
134+
params.put(1, getSqlParam(1, 42, DatabricksTypeUtil.INT));
135+
String expected = "-- does this work?\nselect * from mytable where id = 42";
136+
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
137+
}
138+
139+
@Test
140+
public void testQuestionMarkInBlockCommentNotTreatedAsPlaceholder()
141+
throws DatabricksValidationException {
142+
String sql = "select /* maybe? or ? */ * from t where id = ?";
143+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
144+
params.put(1, getSqlParam(1, 1, DatabricksTypeUtil.INT));
145+
String expected = "select /* maybe? or ? */ * from t where id = 1";
146+
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
147+
}
148+
149+
@Test
150+
public void testQuestionMarkInStringLiteralNotTreatedAsPlaceholder()
151+
throws DatabricksValidationException {
152+
String sql = "select 'hello?', * from mytable where id = ?";
153+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
154+
params.put(1, getSqlParam(1, 5, DatabricksTypeUtil.INT));
155+
String expected = "select 'hello?', * from mytable where id = 5";
156+
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
157+
}
158+
159+
@Test
160+
public void testQuestionMarkInQuotedIdentifierNotTreatedAsPlaceholder()
161+
throws DatabricksValidationException {
162+
String sql = "select `col?` from t where id = ? and name = \"who?\"";
163+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
164+
params.put(1, getSqlParam(1, 7, DatabricksTypeUtil.INT));
165+
String expected = "select `col?` from t where id = 7 and name = \"who?\"";
166+
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
167+
}
168+
169+
@Test
170+
public void testCombinedCommentsAndLiteralsWithQuestionMarks()
171+
throws DatabricksValidationException {
172+
String sql =
173+
"-- ?\n/* ? */ select 'a?' as x, ? as y, \"b?\" as z, `c?` as w from t where id = ?";
174+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
175+
params.put(1, getSqlParam(1, "alice", DatabricksTypeUtil.STRING));
176+
params.put(2, getSqlParam(2, 99, DatabricksTypeUtil.INT));
177+
String expected =
178+
"-- ?\n/* ? */ select 'a?' as x, 'alice' as y, \"b?\" as z, `c?` as w from t where id = 99";
179+
assertEquals(expected, SQLInterpolator.interpolateSQL(sql, params));
180+
}
181+
182+
@Test
183+
public void testInterpolateAdjacentPlaceholders() throws DatabricksValidationException {
184+
String sql = "select ?,?,? from t";
185+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
186+
params.put(1, getSqlParam(1, 1, DatabricksTypeUtil.INT));
187+
params.put(2, getSqlParam(2, 2, DatabricksTypeUtil.INT));
188+
params.put(3, getSqlParam(3, 3, DatabricksTypeUtil.INT));
189+
assertEquals("select 1,2,3 from t", SQLInterpolator.interpolateSQL(sql, params));
190+
}
191+
192+
@Test
193+
public void testInterpolatePlaceholderAtStartAndEnd() throws DatabricksValidationException {
194+
Map<Integer, ImmutableSqlParameter> params = new HashMap<>();
195+
params.put(1, getSqlParam(1, 7, DatabricksTypeUtil.INT));
196+
assertEquals("7 ", SQLInterpolator.interpolateSQL("? ", params));
197+
assertEquals(" 7", SQLInterpolator.interpolateSQL(" ?", params));
198+
assertEquals("7", SQLInterpolator.interpolateSQL("?", params));
199+
}
200+
129201
@Test
130202
public void testEscapeInputs() {
131203
// Simple apostrophe doubling
@@ -177,7 +249,37 @@ private static Stream<Arguments> providePlaceholderQuotingTestCases() {
177249
Arguments.of(
178250
"SELECT * FROM table WHERE id = ? AND (name = ? OR age = ?) AND status = ?",
179251
"SELECT * FROM table WHERE id = '?' AND (name = '?' OR age = '?') AND status = '?'",
180-
"Complex query with multiple conditions"));
252+
"Complex query with multiple conditions"),
253+
254+
// ? inside line comment must not be quoted
255+
Arguments.of(
256+
"-- is this a ?\nSELECT ? FROM t",
257+
"-- is this a ?\nSELECT '?' FROM t",
258+
"Question mark inside line comment is preserved"),
259+
260+
// ? inside block comment must not be quoted
261+
Arguments.of(
262+
"SELECT /* ? */ ? FROM t",
263+
"SELECT /* ? */ '?' FROM t",
264+
"Question mark inside block comment is preserved"),
265+
266+
// ? inside double-quoted identifier must not be quoted
267+
Arguments.of(
268+
"SELECT \"col?\" FROM t WHERE id = ?",
269+
"SELECT \"col?\" FROM t WHERE id = '?'",
270+
"Question mark inside double-quoted identifier is preserved"),
271+
272+
// ? inside backtick identifier must not be quoted
273+
Arguments.of(
274+
"SELECT `col?` FROM t WHERE id = ?",
275+
"SELECT `col?` FROM t WHERE id = '?'",
276+
"Question mark inside backtick identifier is preserved"),
277+
278+
// Adjacent placeholders — both must be quoted independently
279+
Arguments.of(
280+
"SELECT ?,?,? FROM t",
281+
"SELECT '?','?','?' FROM t",
282+
"Adjacent placeholders are each quoted"));
181283
}
182284

183285
@ParameterizedTest(name = "{2}")

0 commit comments

Comments
 (0)