Skip to content

feat(csharp): implement SEA metadata using shared hiveserver2 utilities#256

Closed
msrathore-db wants to merge 10 commits into
mainfrom
feat/sea-metadata
Closed

feat(csharp): implement SEA metadata using shared hiveserver2 utilities#256
msrathore-db wants to merge 10 commits into
mainfrom
feat/sea-metadata

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

Implements all metadata methods for Statement Execution API (SEA) by reusing shared utilities from hiveserver2 PR adbc-drivers/hiveserver2#20, addressing the core review feedback from PR #105: "let's do the refactor work first" and "reuse HiveServer2 code".

New SEA metadata methods:

  • GetObjects (all 4 depths) — IGetObjectsDataProvider + BuildGetObjectsResult orchestrator
  • GetTableSchemaColumnMetadataHelper.GetArrowType for shared type mapping
  • GetTableTypes — TABLE, VIEW
  • GetCatalogs/Schemas/Tables/Columns — via SHOW SQL commands + shared builders
  • GetPrimaryKeysSHOW KEYS + MetadataSchemaFactory
  • GetCrossReferenceSHOW FOREIGN KEYS + MetadataSchemaFactory

SQL command classes (one per command, per reviewer feedback):

  • MetadataCommandBase, ShowCatalogsCommand, ShowSchemasCommand, ShowTablesCommand, ShowColumnsCommand, ShowKeysCommand, ShowForeignKeysCommand

Existing code wired to shared utilities (-132 lines):

  • DatabricksStatement.EmptyPrimaryKeysResult/EmptyCrossReferenceResultMetadataSchemaFactory
  • DatabricksStatement.ShouldReturnEmptyPkFkResultMetadataUtilities
  • DescTableExtendedResult.DataType/DecimalDigits/ColumnSize/IsNumberColumnMetadataHelper

NOT needed (eliminated by shared utilities from hiveserver2):

  • DatabricksTypeMapper.cs (487 lines) → ColumnMetadataHelper
  • ColumnMetadataSchemas.cs (284 lines) → MetadataSchemaFactory + BuildFlatColumnsResult
  • StatementExecInfoArrowStream.csHiveInfoArrowStream
  • ColumnInfo.csTableInfo

Test plan

  • All 472 databricks unit tests pass
  • 61 Thrift E2E regression assertions pass (unchanged)
  • CompareMetadata Thrift vs SEA validation against live warehouse:
Test Match Notes
GetCatalogs 100% (14/14) Perfect
GetPrimaryKeys 100% (6/6) Perfect
GetCrossReference 100% (14/14) Perfect
GetObjects Catalogs 100% (2/2) Perfect
GetObjects DbSchemas 100% (2/2) Perfect
GetObjects Tables 100% (32/32) Perfect
GetObjects All 94.4% (1267/1342) SEA computes xdbc values where Thrift returns null
GetSchemas 59.8% (61/102) Same 51 rows, ordering difference
GetColumns 79.4% (381/480) SEA computes COLUMN_SIZE/DECIMAL_DIGITS from type name

Depends on: adbc-drivers/hiveserver2#20

🤖 Generated with Claude Code

msrathore-db and others added 4 commits February 27, 2026 10:37
…tTableTypes

Implement connection-level metadata for Statement Execution API using
shared utilities from hiveserver2 PR #20:

- StatementExecutionConnection implements IGetObjectsDataProvider
  (GetCatalogs, GetSchemas, GetTables, PopulateColumnInfo)
- GetObjects delegates to GetObjectsResultBuilder.BuildGetObjectsResult
- GetTableSchema uses ColumnMetadataHelper.GetArrowType for type mapping
- GetTableTypes returns TABLE, VIEW

SQL command classes (one per command per reviewer feedback):
- MetadataCommandBase (shared pattern conversion, identifier quoting)
- ShowCatalogsCommand, ShowSchemasCommand, ShowTablesCommand,
  ShowColumnsCommand, ShowKeysCommand, ShowForeignKeysCommand

MetadataUtilities shared between Thrift and SEA:
- NormalizeSparkCatalog, IsInvalidPKFKCatalog, ShouldReturnEmptyPKFKResult
- BuildQualifiedTableName

Validated against live Databricks warehouse:
- GetObjects Catalogs: 100% match with Thrift
- GetObjects DbSchemas: 100% match
- GetObjects Tables: 100% match
- GetObjects All: 94.4% match (SEA provides computed values where Thrift returns null)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement metadata command routing in StatementExecutionStatement:
- SetOption handling for ApacheParameters (catalog, schema, table, etc.)
- IsMetadataCommand flag and command routing
- GetCatalogs: SHOW CATALOGS → TABLE_CAT (100% match with Thrift)
- GetSchemas: SHOW SCHEMAS → TABLE_SCHEM, TABLE_CATALOG
- GetTables: SHOW TABLES → 5-column JDBC result
- GetColumns: SHOW COLUMNS → PopulateTableInfoFromTypeName + BuildFlatColumnsResult
- GetPrimaryKeys: SHOW KEYS with correct server column names (100% match)
- GetCrossReference: SHOW FOREIGN KEYS with correct column mapping (100% match)

Server column names for SHOW KEYS/FOREIGN KEYS:
  catalogName, namespace, tableName, col_name, keySeq, constraintName,
  constraintType, parentCatalogName, parentNamespace, parentTableName,
  parentColName, updateRule, deleteRule, deferrability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e specified

When metadata commands (GetSchemas, GetTables, GetColumns) have no
catalog option set, fall back to the connection's default catalog
(from adbc.connection.catalog config). This matches Thrift behavior
where null catalog defaults to the connection catalog context.

Also fix TABLE_CATALOG in GetSchemas response to use EffectiveCatalog
instead of empty string when server doesn't return catalog column.

GetSchemas: Thrift=51, SEA=51 (same row count, ordering differences)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t to shared utilities

DatabricksStatement:
- EmptyPrimaryKeysResult → MetadataSchemaFactory.CreateEmptyPrimaryKeysResult()
- EmptyCrossReferenceResult → MetadataSchemaFactory.CreateEmptyCrossReferenceResult()
- ShouldReturnEmptyPkFkResult → MetadataUtilities.ShouldReturnEmptyPKFKResult()
  Eliminates 60+ lines of inline schema/array construction and catalog validation

DescTableExtendedResult:
- DataType → ColumnMetadataHelper.GetDataTypeCode (REAL→FLOAT compat preserved)
- IsNumber → ColumnMetadataHelper.GetNumPrecRadix != null
- DecimalDigits → prefer server Type.Scale, fallback to GetDecimalDigitsDefault
- ColumnSize → prefer server Type.Precision/Length, fallback to GetColumnSizeDefault
  Special cases preserved: STRING→MaxValue, BINARY→0, INTERVAL→StartUnit-based
  Eliminates 70+ lines of duplicated type mapping switches

All 472 unit tests pass. 61 Thrift E2E assertions pass unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…FK builders

DescTableExtendedResult:
- DecimalDigits → single line: ColumnMetadataHelper.GetDecimalDigitsDefault
- ColumnSize → single line: ColumnMetadataHelper.GetColumnSizeDefault
  (with INTERVAL StartUnit-only fallback for DESC TABLE EXTENDED)
- Remove all inline switch statements, delegate to shared helpers

Statement PK/FK:
- GetPrimaryKeys → parse server response into tuples, delegate to
  MetadataSchemaFactory.BuildPrimaryKeysResult
- GetCrossReference → parse server response into tuples, delegate to
  MetadataSchemaFactory.BuildCrossReferenceResult
- Eliminates 14 inline Arrow array builders per method

All 472 unit tests pass. PK/FK 100% match with Thrift confirmed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msrathore-db and others added 2 commits February 27, 2026 19:35
…reuse shared GetArrowType

- Move ColumnMetadataHelper to databricks (SEA-only, not used by hiveserver2)
- Move BuildFlatColumnsResult to FlatColumnsResultBuilder in databricks
- Remove duplicate GetArrowType — SEA now calls HiveServer2Connection.GetArrowType
- Wire DatabricksStatement to shared MetadataSchemaFactory.CreateColumnMetadataSchema
- Fix BUFFER_LENGTH type from Int8 to Int32 in DatabricksStatement empty schema
- Add LONGVARBINARY to ColumnMetadataHelper type map (matches HiveServer2 ColumnTypeId)
- Move ColumnMetadataHelperTests to databricks (100 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, fix metadata values

- Implement GetInfo for SEA using shared MetadataSchemaFactory.BuildGetInfoResult
- Implement GetColumnsExtended for SEA using DESC TABLE EXTENDED AS JSON,
  reusing DatabricksStatement.CreateExtendedColumnsResult (shared with Thrift)
- Wire DatabricksStatement to shared MetadataSchemaFactory for Catalogs/Schemas/Tables
- Add 5 missing columns to SEA GetTables schema (TYPE_CAT, TYPE_SCHEM, etc.)
- Fix REMARKS to use empty string (column comment) instead of type name
- Fix COLUMN_DEF to return null instead of empty string
- Fix ORDINAL_POSITION to be 0-based for flat GetColumns (matches Thrift)
- Fix BINARY column size from int.MaxValue to 0 (matches Thrift)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap all 11 SEA metadata methods in TraceActivity with input/output tags
- Connection: GetObjects, GetInfo, GetTableTypes, GetTableSchema
- Statement: GetCatalogs, GetSchemas, GetTables, GetColumns,
  GetColumnsExtended, GetPrimaryKeys, GetCrossReference
- Tags include: sql_query, catalog/schema/table patterns, result counts
- Use shared MetadataSchemaFactory for Catalogs/Schemas schemas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add cancellation token parameter to ExecuteMetadataSql
- Create CreateMetadataTimeoutToken using _waitTimeoutSeconds
- Thread timeout token through all IGetObjectsDataProvider methods
- Thread timeout token through all statement-level metadata methods
- Thread timeout token through GetTableSchema

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire NormalizeSparkCatalog into EffectiveCatalog property so "SPARK"
  catalog is converted to null and falls back to the connection default.
  Without this, SEA would query for literal catalog "SPARK" which
  doesn't exist on the server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant