Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ upgrading. These changes do not affect metadata on All-Purpose Clusters.
- Fixed `?` characters inside SQL comments, string literals, and quoted identifiers being incorrectly counted as parameter placeholders when `supportManyParameters=1`. `SQLInterpolator` now uses `SqlCommentParser` to locate only real placeholders. Fixes #1331.
- Fixed `MetadataOperationTimeout` not being applied when metadata operations use SHOW commands. Operations like `getTables`, `getSchemas`, and `getColumns` now respect the `MetadataOperationTimeout` connection property instead of hanging indefinitely with no timeout.
- Reclassify transient server errors to standard SQL states (08S01, 40001) across all Thrift error sites. This ensures UC unavailability and concurrent modification errors surface consistently for better retry handling. Note: Dashboards and branching logic keyed on legacy XXUCC or 42000 must be updated.
- Fixed client-side enforcement of `maxRows` limit. When `statement.setMaxRows()` is set, `ResultSet.next()` now returns false once the row limit is reached, even if the server returns more rows. Applies to all result types (Thrift, SEA, inline, CloudFetch).

---
*Note: When making changes, please add your change under the appropriate section
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,16 @@ enum ResultSetType {
// for the lifetime of a result set.
private final TelemetryCollector cachedTelemetryCollector;

// Client-side maxRows enforcement. This central check in next() is the single
// enforcement point using the full long precision from getLargeMaxRows().
// No per-impl maxRows enforcement exists — all implementations delegate to this check.
private final long maxRowsLimit;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F5 — High] setMaxRows() called after executeQuery() is silently ignored 🟧

maxRowsLimit is final and captured at construction. PostgreSQL JDBC and major drivers read maxRows lazily on each next(). ORMs (Hibernate, JdbcTemplate) commonly call stmt.setMaxRows(N) after obtaining the ResultSet — and this PR cements snapshot-at-construction as canonical behavior.

Live-verified against PECO:

rs = stmt.executeQuery("SELECT * FROM range(10)");
stmt.setMaxRows(3);                  // called AFTER executeQuery
while (rs.next()) { ... }            // delivered 10 rows ❌  (Postgres: 3)

The pre-existing per-impl enforcement had the same flaw, but the PR was an opportunity to fix it for InlineJsonResult / ArrowStreamResult (the new paths).

Fix options (either is acceptable):

  1. Read parentStatement.getLargeMaxRows() lazily inside next() — one virtual call per row, negligible cost.
  2. Document the construction-time caching in the next() javadoc and tell users setMaxRows must be called before executeQuery.

code-review-squad (architecture + devils-advocate; live confirmed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDBC spec says setMaxRows affects the next execution, not an already-open ResultSet.

private long rowsReturned = 0;
private boolean truncatedByMaxRows = false; // tracks client-side truncation for cursor methods
// Flag to bypass maxRows check during getUpdateCount() internal iteration,
// which calls next() to sum affected-row counts for DML statements.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F7 — Medium] countingUpdateRows mutable-flag back-channel pattern 🟨

Using a mutable instance field to alter the semantics of next() (set in getUpdateCount(), read here) entangles the two methods and makes the contract of next() depend on hidden state. It works today only because JDBC ResultSet is single-threaded; fragile to refactors (e.g. return inside the DML loop, future internal-iteration paths, etc.).

Fix: Extract a private nextRaw() (or nextInternalNoLimit()) helper containing only executionResult.next() + telemetry. Public next() checks maxRows then calls nextRaw(). getUpdateCount() calls nextRaw() directly — the flag and try/finally go away. Removes ~10 lines and one field, with no behavior change.

code-review-squad (language + maintainability + architecture)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bigger change, can be done in followup PR

private boolean countingUpdateRows = false;

// Constructor for SEA result set
public DatabricksResultSet(
StatementStatus statementStatus,
Expand Down Expand Up @@ -121,6 +131,7 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = parentStatement;
this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement);
this.maxRowsLimit = resolveMaxRowsLimit(parentStatement);
this.isClosed = false;
this.wasNull = false;
}
Expand All @@ -142,6 +153,7 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = parentStatement;
this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement);
this.maxRowsLimit = resolveMaxRowsLimit(parentStatement);
this.isClosed = false;
this.wasNull = false;
this.complexDatatypeSupport = complexDatatypeSupport;
Expand Down Expand Up @@ -189,6 +201,7 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = parentStatement;
this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement);
this.maxRowsLimit = resolveMaxRowsLimit(parentStatement);
this.isClosed = false;
this.wasNull = false;
}
Expand Down Expand Up @@ -220,6 +233,7 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = null;
this.cachedTelemetryCollector = null;
this.maxRowsLimit = 0;
this.isClosed = false;
this.wasNull = false;
}
Expand Down Expand Up @@ -251,6 +265,7 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = null;
this.cachedTelemetryCollector = null;
this.maxRowsLimit = 0;
this.isClosed = false;
this.wasNull = false;
}
Expand All @@ -271,14 +286,52 @@ public DatabricksResultSet(
this.updateCount = null;
this.parentStatement = null;
this.cachedTelemetryCollector = null;
this.maxRowsLimit = 0;
this.isClosed = false;
this.wasNull = false;
}

/**
* Advances the cursor to the next row of the result set. Returns {@code false} when no more rows
* are available or when the client-side {@code maxRows} limit has been reached.
*
* <p>{@code rowsReturned} tracks how many rows have been delivered to the caller through this
* ResultSet instance. It is only incremented during normal (non-DML-counting) iteration and
* assumes single-threaded access, which is the standard JDBC contract.
*
* @return {@code true} if the new current row is valid; {@code false} if there are no more rows
* @throws SQLException if the result set is closed or an error occurs
*/
@Override
public boolean next() throws SQLException {
checkIfClosed();
// Client-side maxRows truncation: stop before delegating to the underlying result
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F10 — Medium] "Defense-in-depth" comment is misleading 🟨

The comment claims per-implementation enforcement is retained as defense-in-depth. Verified by grep: InlineJsonResult and arrow/ArrowStreamResult contain zero maxRows references. Defense-in-depth presumes two independent layers that both work; one being known-absent isn't defense, it's the mechanism. The named classes (StreamingInlineArrowResult, StreamingColumnarResult) DO enforce it, but the list is also incomplete (LazyThriftInlineArrowResult is missing) and the two layers use divergent counting semantics (globalRowIndex + 1 >= maxRows int-typed vs rowsReturned >= maxRowsLimit long-typed).

Fix: Either (a) reword the comment to reflect reality — "central enforcement; some streaming impls also enforce locally", or (b) consider removing per-impl enforcement now that tier-1 covers all paths (after integration tests prove it). Single source of truth is cleaner.

code-review-squad (maintainability + architecture + devils-advocate)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

// implementation when the limit has been reached. This is skipped during
// getUpdateCount() internal iteration (countingUpdateRows) to avoid breaking DML
// row counting. This is the single maxRows enforcement point for all implementations.
if (maxRowsLimit > 0 && rowsReturned >= maxRowsLimit && !countingUpdateRows) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F3 — Critical] isAfterLast() / isLast() / getRow() violate JDBC contract after client-side truncation 🟥

After client-side truncation, executionResult.getCurrentRow() stays at the last delivered row and executionResult.hasNext() may still return true. So isAfterLast() (this line) returns false even though next() just returned false. JDBC requires isAfterLast()==true after the cursor passes the last row. Hibernate / MyBatis / Spring JdbcTemplate pagination patterns commonly use isAfterLast() to detect EOF and will misbehave.

Live-verified against PECO:

stmt.setMaxRows(3);
rs = stmt.executeQuery("SELECT * FROM range(10)");
while (rs.next()) { /* 3 rows */ }
rs.isAfterLast();   // false ❌ (JDBC spec: true)
rs.isLast();        // true
rs.getRow();        // 3

This is a new contract divergence for InlineJsonResult and ArrowStreamResult (the two impls this PR adds central truncation for).

Fix: Track truncation explicitly. Add private boolean truncated; set to true when the limit is first hit in next(). Update isAfterLast():

public boolean isAfterLast() throws SQLException {
  checkIfClosed();
  return truncated || executionResult.getCurrentRow() >= resultSetMetaData.getTotalRows();
}

Similarly update isLast(). Add tests covering all three.

code-review-squad (consensus: architecture + devils-advocate; live confirmed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LOGGER.debug(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F9 — Medium] Truncation DEBUG log lacks statementId — un-correlatable in fleet logs 🟨

In a connection-pooled environment with thousands of concurrent queries, this log line can't be tied back to a specific statement. The statementId field is already on the instance.

Fix:

LOGGER.debug("maxRows limit ({}) reached for statement {}; returning false from next() after {} rows",
    maxRowsLimit, statementId, rowsReturned);

code-review-squad (ops)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"maxRows limit ({}) reached for statement {}; returning false after {} rows",
maxRowsLimit,
statementId,
rowsReturned);
if (!truncatedByMaxRows) {
truncatedByMaxRows = true;
// Record telemetry for truncated queries so dashboards reflect the truncation
if (cachedTelemetryCollector != null) {
cachedTelemetryCollector.recordResultSetIteration(
statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), false);
}
}
return false;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F1 — Critical] Result-set consumption telemetry is silently lost on every truncated query 🟥

The early return false on the maxRows path skips the cachedTelemetryCollector.recordResultSetIteration(..., hasNext=false) call at L327-330. ResultLatency.markResultSetConsumption(false) (and therefore resultSetConsumptionLatencyMillis) only fires when hasNext==false is reported. For every truncated query the consumption-end signal is silently dropped.

Live-verified via mitmproxy: ran setMaxRows(3) on range(100) against a real PECO warehouse and captured the telemetry POSTs to /telemetry-ext. Of 8 telemetry payloads, 7 had "result_latency":{} (empty) — only 1 contained result_set_consumption_latency_millis. Dashboards keyed on this metric will undercount for the exact queries the PR was designed to fix.

Fix: Call cachedTelemetryCollector.recordResultSetIteration(statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), false) before the early return, e.g.:

if (maxRowsLimit > 0 && rowsReturned >= maxRowsLimit && !countingUpdateRows) {
  LOGGER.debug("maxRows limit ({}) reached for statement {}; returning false from next() after {} rows",
      maxRowsLimit, statementId, rowsReturned);
  if (cachedTelemetryCollector != null) {
    cachedTelemetryCollector.recordResultSetIteration(
        statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), false);
  }
  return false;
}

— from code-review-squad (multi-reviewer consensus: ops, architecture, devils-advocate)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

}
boolean hasNext = this.executionResult.next();
// Only count rows for customer iteration, not internal DML counting
// (getUpdateCount() sets countingUpdateRows=true to iterate over affected-row counts
// without inflating the user-visible row counter).
if (hasNext && !countingUpdateRows) {
rowsReturned++;
}
if (cachedTelemetryCollector != null) {
cachedTelemetryCollector.recordResultSetIteration(
statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext);
Expand Down Expand Up @@ -321,6 +374,20 @@ private static TelemetryCollector resolveTelemetryCollector(
return null;
}

private static long resolveMaxRowsLimit(IDatabricksStatementInternal parentStatement) {
try {
if (parentStatement != null) {
// Use getLargeMaxRows() to preserve full long precision.
// getMaxRows() returns int and silently truncates values > Integer.MAX_VALUE.
return parentStatement.getLargeMaxRows();
}
} catch (SQLException e) {
// Narrow to SQLException (the only checked exception from getLargeMaxRows).
LOGGER.warn("Error resolving maxRows limit: {}", e.getMessage());
}
return 0;
}

@Override
public boolean wasNull() throws SQLException {
checkIfClosed();
Expand Down Expand Up @@ -662,7 +729,9 @@ public boolean isBeforeFirst() throws SQLException {
@Override
public boolean isAfterLast() throws SQLException {
checkIfClosed();
return executionResult.getCurrentRow() >= resultSetMetaData.getTotalRows();
// Account for client-side maxRows truncation
return truncatedByMaxRows
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[F3 / HIGH] isLast() not updated to mirror this isAfterLast() fix — JDBC contract violation after truncation

This change correctly adds truncatedByMaxRows || to isAfterLast(). But isLast() at line 753-762 is unchanged. For streaming result types (LazyThriftResult, StreamingColumnarResult, LazyThriftInlineArrowResult, StreamingInlineArrowResult), isLast() returns:

executionResult.getCurrentRow() >= 0 && !executionResult.hasNext()

After client-side truncation, the underlying execution result still has more rows → executionResult.hasNext() returns trueisLast() returns false on the maxRows-th row, even though the next next() call will return false.

Per the JDBC spec, isLast() MUST return true on the last row next() will deliver. The fix to isAfterLast() here is therefore asymmetric and the JDBC contract is broken in the non-isAfterLast direction.

Recommendation: also add the truncation check to isLast():

if (maxRowsLimit > 0 && rowsReturned >= maxRowsLimit) {
  return true;  // cursor is on the last row we will deliver
}

…at the top of both branches of isLast(). This same condition is the canonical "next call to next() will return false" predicate already used in next() at line 315.

Flagged by 4 reviewers (architecture, language, maintainability, devils-advocate).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Added maxRows truncation check in isLast():
```java
if (maxRowsLimit > 0 && rowsReturned >= maxRowsLimit) {
return true;
}
```
This mirrors the isAfterLast() fix so both cursor methods are consistent after client-side truncation.

|| executionResult.getCurrentRow() >= resultSetMetaData.getTotalRows();
}

@Override
Expand Down Expand Up @@ -691,6 +760,11 @@ public boolean isFirst() throws SQLException {
@Override
public boolean isLast() throws SQLException {
checkIfClosed();
// Account for client-side maxRows truncation: if the next next() call would
// hit the limit, this is the last row the caller will see.
if (maxRowsLimit > 0 && rowsReturned >= maxRowsLimit) {
return true;
}
if (executionResult instanceof LazyThriftResult
|| executionResult instanceof StreamingColumnarResult
|| executionResult instanceof LazyThriftInlineArrowResult
Expand Down Expand Up @@ -731,6 +805,10 @@ public boolean last() throws SQLException {
@Override
public int getRow() throws SQLException {
checkIfClosed();
// JDBC spec: getRow() returns 0 when cursor is not on a valid row (after last)
if (truncatedByMaxRows) {
return 0;
}
return (int) executionResult.getCurrentRow() + 1;
}

Expand Down Expand Up @@ -1958,8 +2036,13 @@ public long getUpdateCount() throws SQLException {
updateCount = 0L;
} else if (hasUpdateCount()) {
long rowsUpdated = 0;
while (next()) {
rowsUpdated += this.getLong(AFFECTED_ROWS_COUNT);
countingUpdateRows = true;
try {
while (next()) {
rowsUpdated += this.getLong(AFFECTED_ROWS_COUNT);
}
} finally {
countingUpdateRows = false;
}
updateCount = rowsUpdated;
} else {
Expand Down
16 changes: 0 additions & 16 deletions src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package com.databricks.jdbc.api.impl;

import static com.databricks.jdbc.common.EnvironmentVariables.DEFAULT_RESULT_ROW_LIMIT;
import static com.databricks.jdbc.common.util.DatabricksThriftUtil.createColumnarView;

import com.databricks.jdbc.api.internal.IDatabricksSession;
Expand All @@ -21,7 +20,6 @@ public class LazyThriftResult implements IExecutionResult {
private long globalRowIndex;
private final IDatabricksSession session;
private final IDatabricksStatementInternal statement;
private final int maxRows;
private boolean hasReachedEnd;
private boolean isClosed;
private long totalRowsFetched;
Expand All @@ -42,7 +40,6 @@ public LazyThriftResult(
this.currentResponse = initialResponse;
this.statement = statement;
this.session = session;
this.maxRows = statement != null ? statement.getMaxRows() : DEFAULT_RESULT_ROW_LIMIT;
this.globalRowIndex = -1;
this.currentBatchIndex = -1;
this.hasReachedEnd = false;
Expand Down Expand Up @@ -115,13 +112,6 @@ public boolean next() throws SQLException {
return false;
}

// Check if we've reached the maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
hasReachedEnd = true;
return false;
}

// Move to next row in current batch
currentBatchIndex++;
globalRowIndex++;
Expand Down Expand Up @@ -163,12 +153,6 @@ public boolean hasNext() {
return false;
}

// Check maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
return false;
}

// Check if there are more rows in current batch
if (currentBatchIndex + 1 < currentBatch.getRowCount()) {
return true;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package com.databricks.jdbc.api.impl.arrow;

import static com.databricks.jdbc.common.EnvironmentVariables.DEFAULT_RESULT_ROW_LIMIT;
import static com.databricks.jdbc.common.util.ArrowUtil.createArrowByteStream;
import static com.databricks.jdbc.common.util.ArrowUtil.getColumnInfoList;
import static com.databricks.jdbc.common.util.ArrowUtil.getSerializedSchema;
Expand Down Expand Up @@ -35,7 +34,6 @@ public class LazyThriftInlineArrowResult implements IExecutionResult {
private long globalRowIndex;
private final IDatabricksSession session;
private final IDatabricksStatementInternal statement;
private final int maxRows;
private boolean hasReachedEnd;
private boolean isClosed;
private long totalRowsFetched;
Expand All @@ -58,7 +56,6 @@ public LazyThriftInlineArrowResult(
this.currentResponse = initialResponse;
this.statement = statement;
this.session = session;
this.maxRows = statement != null ? statement.getMaxRows() : DEFAULT_RESULT_ROW_LIMIT;
this.globalRowIndex = -1;
this.hasReachedEnd = false;
this.isClosed = false;
Expand Down Expand Up @@ -163,13 +160,6 @@ public boolean next() throws SQLException {
return false;
}

// Check if we've reached the maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
hasReachedEnd = true;
return false;
}

// Try to advance in current chunk
if (currentChunkIterator != null && currentChunkIterator.hasNextRow()) {
boolean advanced = currentChunkIterator.nextRow();
Expand Down Expand Up @@ -209,12 +199,6 @@ public boolean hasNext() {
return false;
}

// Check maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
return false;
}

// Check if there are more rows in current chunk
if (currentChunkIterator != null && currentChunkIterator.hasNextRow()) {
return true;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package com.databricks.jdbc.api.impl.arrow;

import static com.databricks.jdbc.common.EnvironmentVariables.DEFAULT_RESULT_ROW_LIMIT;
import static com.databricks.jdbc.common.EnvironmentVariables.DEFAULT_STREAMING_BATCH_TIMEOUT_SECONDS;
import static com.databricks.jdbc.common.util.ArrowUtil.getColumnInfoList;

Expand Down Expand Up @@ -54,7 +53,6 @@ public class StreamingInlineArrowResult implements IExecutionResult {

// Metadata
private List<ColumnInfo> columnInfos;
private final int maxRows;

// State
private boolean hasReachedEnd;
Expand All @@ -78,8 +76,6 @@ public StreamingInlineArrowResult(
throws DatabricksSQLException {

this.session = session;
this.maxRows = statement != null ? statement.getMaxRows() : DEFAULT_RESULT_ROW_LIMIT;

this.globalRowIndex = -1;
this.hasReachedEnd = false;
this.isClosed = false;
Expand All @@ -105,9 +101,8 @@ public StreamingInlineArrowResult(
}

LOGGER.debug(
"StreamingInlineArrowResult initialized - firstBatchRows={}, maxRows={}, maxBatchesInMemory={}",
"StreamingInlineArrowResult initialized - firstBatchRows={}, maxBatchesInMemory={}",
currentBatch != null ? currentBatch.getRowCount() : 0,
maxRows,
session.getConnectionContext().getThriftMaxBatchesInMemory());
}

Expand Down Expand Up @@ -184,13 +179,6 @@ public boolean next() throws DatabricksSQLException {
return false;
}

// Check maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
hasReachedEnd = true;
return false;
}

globalRowIndex++;

// Try to move to next row in current chunk
Expand Down Expand Up @@ -243,12 +231,6 @@ public boolean hasNext() {
return false;
}

// Check maxRows limit
boolean hasRowLimit = maxRows > 0;
if (hasRowLimit && globalRowIndex + 1 >= maxRows) {
return false;
}

// Check current chunk
if (currentChunkIterator != null && currentChunkIterator.hasNextRow()) {
return true;
Expand Down
Loading
Loading