Skip to content

Commit a8f94f3

Browse files
authored
Use async wait timeout for SEA when direct results disabled (fix hybrid-path truncation) (#1476)
## Description When direct results are disabled on the SEA (SQL Execution API) path, the driver built the `ExecuteStatement` request with `waitTimeout=10s` (`SYNC_TIMEOUT_VALUE`) and `onWaitTimeout=CONTINUE` — the old SEA *hybrid* direct-results path. For results that span **multiple chunks but compress small** (highly-compressible payloads), the server returns only the **first chunk inline** with **no external links** (`hasInlineAttachment=true, numExternalLinks=0`). The driver's inline path returns just that first chunk and never fetches the rest, so the result is **silently truncated**. This change sets `waitTimeout=0s` (`ASYNC_TIMEOUT_VALUE`) when direct results are disabled, avoiding the hybrid inline path. The server then delivers results via **external links**, which the driver downloads in full. Resulting contract (SEA): - `DirectResults = 0` → `WaitTimeout = 0` (async) - `DirectResults = 1` → `WaitTimeout` unset (true direct results) Thrift is unaffected — it has no `waitTimeout` and paginates correctly. Related: **ES-1714092** (server-side compressed-vs-uncompressed byte limit on the hybrid path). This driver change stops using that path; the server-side fix is tracked separately. ## Requirement / Motivation Reported via Slack: SEA with direct results disabled silently truncated a large, highly-compressible result (200MB logical → ~0.8MB compressed) to a handful of rows, while Thrift returned all rows. ## Testing done **Repro (manual, against a serverless SQL warehouse)** — query `SELECT repeat('A', 1024 * 1024) AS payload FROM range(200)` with direct results disabled: | | Before | After | |---|---|---| | SEA | 20 / 200 rows, `isCloudFetchUsed=false` (inline) ❌ | 200 / 200 rows, `isCloudFetchUsed=true` (external links) ✅ | | Thrift | 200 / 200 ✅ | 200 / 200 ✅ | Additional disambiguation runs (repeating the real table via `UNION ALL` up to 64×, 9.6M rows, 40s execution) confirmed the trigger is the **inline-vs-external-links delivery (compressed size)** — not query time or polling: a fast no-poll query truncated while a slow 6-poll query did not. **Regression — SEA unit + fakeservice suites (all pass, 0 failures):** - `DatabricksSdkClientTest` (42) - `SqlExecApiHybridResultsIntegrationTests` (2) - `DatabricksMetadataQueryClientTest` (57) - `CommandBuilderTest` (21) - `DatabricksEmptyMetadataClientTest` (11) - `SeaCircuitBreakerManagerTest` (13) ## Notes / caveats - **Latency:** with direct results disabled, queries now always execute → poll → fetch (one extra round-trip for small results) instead of returning inline on the first response. - The underlying server-side bug (ES-1714092) is separate; this change avoids the affected path rather than fixing the server. --------- Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
1 parent 41d90d4 commit a8f94f3

3 files changed

Lines changed: 61 additions & 2 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
### Fixed
1010
- Fixed `setCatalog()` and `setSchema()` producing invalid SQL (e.g. `SET CATALOG ``name``) when the catalog or schema name was passed already wrapped in backticks. Backticks are now stripped before wrapping, and `getCatalog()`/`getSchema()` return the bare identifier name.
1111
- Fixed metadata SQL generation for catalog, schema, and table identifiers containing backticks.
12+
- Fixed SEA result truncation when direct results are disabled. Large, highly-compressible results that span multiple chunks were delivered inline via the old hybrid path and truncated to the first chunk. The SQL Execution path now uses an async (`0s`) wait timeout when direct results are disabled, so results are returned via external links and fetched in full.
1213

1314
---
1415
*Note: When making changes, please add your change under the appropriate section

src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -740,9 +740,9 @@ private ExecuteStatementRequest getRequest(
740740
if (executeAsync) {
741741
request.setWaitTimeout(ASYNC_TIMEOUT_VALUE);
742742
} else {
743-
// Only set timeout if direct results mode is not enabled
743+
// DirectResults off -> async (0s); avoids truncation (ES-1714092)
744744
if (!connectionContext.getDirectResultMode()) {
745-
request.setWaitTimeout(SYNC_TIMEOUT_VALUE);
745+
request.setWaitTimeout(ASYNC_TIMEOUT_VALUE);
746746
}
747747
request.setOnWaitTimeout(ExecuteStatementRequestOnWaitTimeout.CONTINUE);
748748
}

src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1402,4 +1402,62 @@ public void testCheckStatementAlive_exceptionWrapped() throws Exception {
14021402
() -> databricksSdkClient.checkStatementAlive(STATEMENT_ID));
14031403
assertTrue(exception.getMessage().contains("Heartbeat status check failed"));
14041404
}
1405+
1406+
@Test
1407+
public void testWaitTimeout_directResultsDisabled_usesAsyncZero() throws Exception {
1408+
setupClientMocks(true, false);
1409+
// EnableDirectResults=0 -> getDirectResultMode() is false
1410+
IDatabricksConnectionContext connectionContext =
1411+
DatabricksConnectionContext.parse(JDBC_URL + "EnableDirectResults=0", new Properties());
1412+
DatabricksSdkClient databricksSdkClient =
1413+
new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient);
1414+
DatabricksConnection connection =
1415+
new DatabricksConnection(connectionContext, databricksSdkClient);
1416+
connection.open();
1417+
DatabricksStatement statement = new DatabricksStatement(connection);
1418+
1419+
databricksSdkClient.executeStatement(
1420+
STATEMENT,
1421+
warehouse,
1422+
sqlParams,
1423+
StatementType.QUERY,
1424+
connection.getSession(),
1425+
statement,
1426+
null);
1427+
1428+
ArgumentCaptor<ExecuteStatementRequest> captor =
1429+
ArgumentCaptor.forClass(ExecuteStatementRequest.class);
1430+
verify(apiClient, atLeastOnce()).serialize(captor.capture());
1431+
// Direct results disabled -> async (0s), not the hybrid 10s path that truncates (ES-1714092).
1432+
assertEquals("0s", captor.getValue().getWaitTimeout());
1433+
}
1434+
1435+
@Test
1436+
public void testWaitTimeout_directResultsEnabled_leftUnset() throws Exception {
1437+
setupClientMocks(true, false);
1438+
// Default JDBC_URL has direct results enabled -> getDirectResultMode() is true
1439+
IDatabricksConnectionContext connectionContext =
1440+
DatabricksConnectionContext.parse(JDBC_URL, new Properties());
1441+
DatabricksSdkClient databricksSdkClient =
1442+
new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient);
1443+
DatabricksConnection connection =
1444+
new DatabricksConnection(connectionContext, databricksSdkClient);
1445+
connection.open();
1446+
DatabricksStatement statement = new DatabricksStatement(connection);
1447+
1448+
databricksSdkClient.executeStatement(
1449+
STATEMENT,
1450+
warehouse,
1451+
sqlParams,
1452+
StatementType.QUERY,
1453+
connection.getSession(),
1454+
statement,
1455+
null);
1456+
1457+
ArgumentCaptor<ExecuteStatementRequest> captor =
1458+
ArgumentCaptor.forClass(ExecuteStatementRequest.class);
1459+
verify(apiClient, atLeastOnce()).serialize(captor.capture());
1460+
// Direct results enabled -> WaitTimeout left unset (true SEA direct results).
1461+
assertNull(captor.getValue().getWaitTimeout());
1462+
}
14051463
}

0 commit comments

Comments
 (0)