Skip to content

Commit 3e4f21c

Browse files
gopalldbclaude
andauthored
Fix Arrow field metadata not available for queries with 0 rows (databricks#1177)
### Problem When executing queries that return 0 rows (e.g., `WHERE 1=0`), complex types (ARRAY, MAP, STRUCT) showed only generic type names instead of detailed type information: **Before:** - `ARRAY` instead of `ARRAY<INT>` - `MAP` instead of `MAP<STRING,STRING>` - `STRUCT` instead of `STRUCT<field: TYPE>` **After:** - Detailed type information is correctly preserved for all row counts ### Root Cause In `AbstractArrowResultChunk.java`, Arrow field metadata was only extracted inside the `while(arrowStreamReader.loadNextBatch())` loop. For queries with 0 rows, no batches are loaded, so the loop never executes and metadata is never extracted. **Code location:** `/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractArrowResultChunk.java:338-359` ### Solution Extract metadata from `VectorSchemaRoot` immediately after obtaining it, **before** the `loadNextBatch()` loop. The Arrow IPC format always sends the schema message first (before any record batches), so field metadata is available even when there are 0 rows. `VectorSchemaRoot` contains field vectors with metadata regardless of row count. **Key changes:** 1. Moved metadata extraction from inside the while loop to before it 2. Added defensive null checks for `VectorSchemaRoot` and field vectors 3. Added debug logging to track metadata extraction ### Testing #### Unit Test Coverage - ✅ Added `testMetadataExtractionWithZeroRows()` to `ArrowResultChunkTest` - ✅ Verifies Arrow field metadata is extracted correctly with 0 rows - ✅ Tests complex types: `ARRAY<INT>`, `MAP<STRING,STRING>` - ✅ All 2,693 unit tests pass #### Manual Verification Tested with queries returning 0 rows: ```sql SELECT array_col, map_col, struct_col FROM table WHERE 1=0 Result: Metadata now correctly shows detailed type information Impact - Scope: Both SQL Exec API and Thrift Server (shared code path) - Risk: Low - backward compatible change, only affects metadata extraction timing - Benefits: - Fixes schema discovery for WHERE 1=0 pattern - Improves metadata availability for empty result sets - Aligns with Arrow IPC specification behavior Additional Context - Arrow IPC specification guarantees schema is sent before record batches - VectorSchemaRoot.getFieldVectors() is available immediately after ArrowStreamReader.getVectorSchemaRoot() - No performance impact: metadata extraction is now done once upfront instead of conditionally on first batch --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 0e8b1ad commit 3e4f21c

3 files changed

Lines changed: 70 additions & 5 deletions

File tree

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
### Updated
88

99
### Fixed
10+
- Fixed complex data type metadata support when retrieving 0 rows in Arrow format
1011

1112
---
1213
*Note: When making changes, please add your change under the appropriate section

src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractArrowResultChunk.java

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -337,13 +337,23 @@ private ArrowData getRecordBatchList(
337337
long rowCount = 0L;
338338
try (ArrowStreamReader arrowStreamReader = new ArrowStreamReader(inputStream, rootAllocator)) {
339339
VectorSchemaRoot vectorSchemaRoot = arrowStreamReader.getVectorSchemaRoot();
340-
boolean fetchedMetadata = false;
340+
341+
// Extract metadata from VectorSchemaRoot before loading any batches.
342+
// The Arrow IPC format sends the schema first (before any record batches),
343+
// so field metadata is available even when there are 0 rows.
344+
// VectorSchemaRoot will contain field vectors with metadata, but rowCount will be 0.
345+
if (vectorSchemaRoot != null && vectorSchemaRoot.getFieldVectors() != null) {
346+
metadata = getMetadataInformationFromSchemaRoot(vectorSchemaRoot);
347+
LOGGER.debug(
348+
"Extracted metadata from VectorSchemaRoot before loading batches. "
349+
+ "Schema has {} fields. Statement: {}, Chunk: {}",
350+
vectorSchemaRoot.getFieldVectors().size(),
351+
statementId,
352+
chunkIndex);
353+
}
354+
341355
while (arrowStreamReader.loadNextBatch()) {
342356
rowCount += vectorSchemaRoot.getRowCount();
343-
if (!fetchedMetadata) {
344-
metadata = getMetadataInformationFromSchemaRoot(vectorSchemaRoot);
345-
fetchedMetadata = true;
346-
}
347357
recordBatchList.add(getVectorsFromSchemaRoot(vectorSchemaRoot, rootAllocator));
348358
vectorSchemaRoot.clear();
349359
}

src/test/java/com/databricks/jdbc/api/impl/arrow/ArrowResultChunkTest.java

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@
1616
import java.io.IOException;
1717
import java.util.ArrayList;
1818
import java.util.Collections;
19+
import java.util.HashMap;
1920
import java.util.List;
21+
import java.util.Map;
2022
import java.util.Random;
2123
import org.apache.arrow.memory.RootAllocator;
2224
import org.apache.arrow.vector.*;
@@ -261,4 +263,56 @@ public void testEmptyRecordBatches() throws DatabricksSQLException {
261263
10, iterator.getColumnObjectAtCurrentRow(0, ColumnInfoTypeName.INT, "INT", intColumnInfo));
262264
assertFalse(iterator.hasNextRow());
263265
}
266+
267+
@Test
268+
public void testMetadataExtractionWithZeroRows() throws Exception {
269+
// Arrange - Create schema with Arrow metadata
270+
// This test verifies that VectorSchemaRoot metadata is available even when there are 0 rows
271+
Map<String, String> metadata1 = new HashMap<>();
272+
metadata1.put("Spark:DataType:SqlName", "ARRAY<INT>");
273+
FieldType fieldType1 = new FieldType(false, Types.MinorType.INT.getType(), null, metadata1);
274+
275+
Map<String, String> metadata2 = new HashMap<>();
276+
metadata2.put("Spark:DataType:SqlName", "MAP<STRING,STRING>");
277+
FieldType fieldType2 = new FieldType(false, Types.MinorType.INT.getType(), null, metadata2);
278+
279+
List<Field> fieldList = new ArrayList<>();
280+
fieldList.add(new Field("col1", fieldType1, null));
281+
fieldList.add(new Field("col2", fieldType2, null));
282+
Schema schema = new Schema(fieldList);
283+
284+
// Create Arrow file with 0 rows
285+
Object[][] emptyData = new Object[2][0]; // 2 columns, 0 rows
286+
File arrowFile =
287+
createTestArrowFile(
288+
"TestZeroRowsMetadata", schema, emptyData, new RootAllocator(Integer.MAX_VALUE));
289+
290+
// Create chunk info for 0 rows
291+
BaseChunkInfo chunkInfo =
292+
new BaseChunkInfo().setChunkIndex(0L).setByteCount(200L).setRowOffset(0L).setRowCount(0L);
293+
294+
ArrowResultChunk arrowResultChunk =
295+
ArrowResultChunk.builder()
296+
.withStatementId(TEST_STATEMENT_ID)
297+
.withChunkInfo(chunkInfo)
298+
.withChunkStatus(ChunkStatus.PROCESSING_SUCCEEDED)
299+
.build();
300+
301+
// Act
302+
arrowResultChunk.initializeData(new FileInputStream(arrowFile));
303+
304+
// Assert - Metadata should be available even with 0 rows
305+
List<String> metadata = arrowResultChunk.getArrowMetadata();
306+
assertNotNull(metadata, "Metadata should not be null even with 0 rows");
307+
assertEquals(2, metadata.size(), "Should have metadata for 2 columns");
308+
assertEquals("ARRAY<INT>", metadata.get(0), "First column metadata should be ARRAY<INT>");
309+
assertEquals(
310+
"MAP<STRING,STRING>",
311+
metadata.get(1),
312+
"Second column metadata should be MAP<STRING,STRING>");
313+
314+
// Cleanup
315+
arrowResultChunk.releaseChunk();
316+
arrowFile.delete();
317+
}
264318
}

0 commit comments

Comments
 (0)