Skip to content

[ES-1804970] Fix CloudFetch returning stale column names from cached results#346

Open
sreekanth-db wants to merge 2 commits intodatabricks:mainfrom
sreekanth-db:fix/ES-1804970-cloudfetch-stale-column-names
Open

[ES-1804970] Fix CloudFetch returning stale column names from cached results#346
sreekanth-db wants to merge 2 commits intodatabricks:mainfrom
sreekanth-db:fix/ES-1804970-cloudfetch-stale-column-names

Conversation

@sreekanth-db
Copy link
Copy Markdown
Collaborator

Summary

Fixes a bug where arrow.Record.Schema() returns stale column aliases when CloudFetch serves cached Arrow IPC files from a structurally identical prior query with different AS aliases.

  • Root cause: NewCloudBatchIterator was not receiving the authoritative schema bytes from GetResultSetMetadata, unlike the local batch path which already had this. CloudFetch Arrow IPC files have column names baked in from the original query, and the driver was reading them as-is.
  • Fix: Pass arrowSchemaBytes (the authoritative schema from GetResultSetMetadata) into NewCloudBatchIterator. After records are deserialized from the IPC stream, replace the stale schema with the authoritative one using array.NewRecord() (zero-copy — shares underlying column data, only swaps metadata).

Changes

  • arrowRecordIterator.go — Pass ri.arrowSchemaBytes to NewCloudBatchIterator in newBatchIterator()
  • arrowRows.go — Pass schemaBytes to NewCloudBatchIterator in NewArrowRowScanner()
  • batchloader.go — Core fix:
    • NewCloudBatchIterator accepts arrowSchemaBytes, parses into *arrow.Schema, stores on batchIterator
    • batchIterator.Next() applies override schema to CloudFetch records only (local path is untouched, overrideSchema is nil)
    • Added schemaFromIPCBytes() helper
    • Field count validation guard to prevent panics on schema mismatch
    • Schema parse failure logged at Warn level
  • batchloader_test.go — Added TestCloudFetchSchemaOverride with two subtests:
    • Verifies stale column names ["id","name"] are overridden to ["x","y"]
    • Verifies nil schema bytes pass through original names unchanged

Who is affected

Go driver users with CloudFetch enabled (WithCloudFetch(true)) who read arrow.Record.Schema() directly. Python, ODBC, and JDBC drivers are not affected.

Test plan

  • All existing unit tests pass (37 tests in internal/rows/arrowbased/)
  • New unit test TestCloudFetchSchemaOverride covers the override and no-override paths
  • Verified end-to-end against a real Databricks warehouse using samples.tpch.lineitem (~30M rows) with two queries differing only in column aliases — confirmed arrow.Record.Schema() now returns correct aliases

This pull request was AI-assisted by Isaac.

…results

When the server result cache serves Arrow IPC files from a prior query,
the embedded schema contains stale column aliases. The Go driver's
CloudFetch path read these stale names directly, while the local path
already used the authoritative schema from GetResultSetMetadata.

Pass the authoritative schema bytes into NewCloudBatchIterator and
replace stale column names on deserialized records using
array.NewRecord, which is zero-copy (shares underlying column data).

Co-authored-by: Isaac
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db force-pushed the fix/ES-1804970-cloudfetch-stale-column-names branch from ce777da to 65a8750 Compare April 10, 2026 13:05
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db requested a review from gopalldb April 10, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant