feat(csharp): implement SEA metadata using shared hiveserver2 utilities#256
Closed
msrathore-db wants to merge 10 commits into
Closed
feat(csharp): implement SEA metadata using shared hiveserver2 utilities#256msrathore-db wants to merge 10 commits into
msrathore-db wants to merge 10 commits into
Conversation
…tTableTypes Implement connection-level metadata for Statement Execution API using shared utilities from hiveserver2 PR #20: - StatementExecutionConnection implements IGetObjectsDataProvider (GetCatalogs, GetSchemas, GetTables, PopulateColumnInfo) - GetObjects delegates to GetObjectsResultBuilder.BuildGetObjectsResult - GetTableSchema uses ColumnMetadataHelper.GetArrowType for type mapping - GetTableTypes returns TABLE, VIEW SQL command classes (one per command per reviewer feedback): - MetadataCommandBase (shared pattern conversion, identifier quoting) - ShowCatalogsCommand, ShowSchemasCommand, ShowTablesCommand, ShowColumnsCommand, ShowKeysCommand, ShowForeignKeysCommand MetadataUtilities shared between Thrift and SEA: - NormalizeSparkCatalog, IsInvalidPKFKCatalog, ShouldReturnEmptyPKFKResult - BuildQualifiedTableName Validated against live Databricks warehouse: - GetObjects Catalogs: 100% match with Thrift - GetObjects DbSchemas: 100% match - GetObjects Tables: 100% match - GetObjects All: 94.4% match (SEA provides computed values where Thrift returns null) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement metadata command routing in StatementExecutionStatement: - SetOption handling for ApacheParameters (catalog, schema, table, etc.) - IsMetadataCommand flag and command routing - GetCatalogs: SHOW CATALOGS → TABLE_CAT (100% match with Thrift) - GetSchemas: SHOW SCHEMAS → TABLE_SCHEM, TABLE_CATALOG - GetTables: SHOW TABLES → 5-column JDBC result - GetColumns: SHOW COLUMNS → PopulateTableInfoFromTypeName + BuildFlatColumnsResult - GetPrimaryKeys: SHOW KEYS with correct server column names (100% match) - GetCrossReference: SHOW FOREIGN KEYS with correct column mapping (100% match) Server column names for SHOW KEYS/FOREIGN KEYS: catalogName, namespace, tableName, col_name, keySeq, constraintName, constraintType, parentCatalogName, parentNamespace, parentTableName, parentColName, updateRule, deleteRule, deferrability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e specified When metadata commands (GetSchemas, GetTables, GetColumns) have no catalog option set, fall back to the connection's default catalog (from adbc.connection.catalog config). This matches Thrift behavior where null catalog defaults to the connection catalog context. Also fix TABLE_CATALOG in GetSchemas response to use EffectiveCatalog instead of empty string when server doesn't return catalog column. GetSchemas: Thrift=51, SEA=51 (same row count, ordering differences) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t to shared utilities DatabricksStatement: - EmptyPrimaryKeysResult → MetadataSchemaFactory.CreateEmptyPrimaryKeysResult() - EmptyCrossReferenceResult → MetadataSchemaFactory.CreateEmptyCrossReferenceResult() - ShouldReturnEmptyPkFkResult → MetadataUtilities.ShouldReturnEmptyPKFKResult() Eliminates 60+ lines of inline schema/array construction and catalog validation DescTableExtendedResult: - DataType → ColumnMetadataHelper.GetDataTypeCode (REAL→FLOAT compat preserved) - IsNumber → ColumnMetadataHelper.GetNumPrecRadix != null - DecimalDigits → prefer server Type.Scale, fallback to GetDecimalDigitsDefault - ColumnSize → prefer server Type.Precision/Length, fallback to GetColumnSizeDefault Special cases preserved: STRING→MaxValue, BINARY→0, INTERVAL→StartUnit-based Eliminates 70+ lines of duplicated type mapping switches All 472 unit tests pass. 61 Thrift E2E assertions pass unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…FK builders DescTableExtendedResult: - DecimalDigits → single line: ColumnMetadataHelper.GetDecimalDigitsDefault - ColumnSize → single line: ColumnMetadataHelper.GetColumnSizeDefault (with INTERVAL StartUnit-only fallback for DESC TABLE EXTENDED) - Remove all inline switch statements, delegate to shared helpers Statement PK/FK: - GetPrimaryKeys → parse server response into tuples, delegate to MetadataSchemaFactory.BuildPrimaryKeysResult - GetCrossReference → parse server response into tuples, delegate to MetadataSchemaFactory.BuildCrossReferenceResult - Eliminates 14 inline Arrow array builders per method All 472 unit tests pass. PK/FK 100% match with Thrift confirmed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…reuse shared GetArrowType - Move ColumnMetadataHelper to databricks (SEA-only, not used by hiveserver2) - Move BuildFlatColumnsResult to FlatColumnsResultBuilder in databricks - Remove duplicate GetArrowType — SEA now calls HiveServer2Connection.GetArrowType - Wire DatabricksStatement to shared MetadataSchemaFactory.CreateColumnMetadataSchema - Fix BUFFER_LENGTH type from Int8 to Int32 in DatabricksStatement empty schema - Add LONGVARBINARY to ColumnMetadataHelper type map (matches HiveServer2 ColumnTypeId) - Move ColumnMetadataHelperTests to databricks (100 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, fix metadata values - Implement GetInfo for SEA using shared MetadataSchemaFactory.BuildGetInfoResult - Implement GetColumnsExtended for SEA using DESC TABLE EXTENDED AS JSON, reusing DatabricksStatement.CreateExtendedColumnsResult (shared with Thrift) - Wire DatabricksStatement to shared MetadataSchemaFactory for Catalogs/Schemas/Tables - Add 5 missing columns to SEA GetTables schema (TYPE_CAT, TYPE_SCHEM, etc.) - Fix REMARKS to use empty string (column comment) instead of type name - Fix COLUMN_DEF to return null instead of empty string - Fix ORDINAL_POSITION to be 0-based for flat GetColumns (matches Thrift) - Fix BINARY column size from int.MaxValue to 0 (matches Thrift) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap all 11 SEA metadata methods in TraceActivity with input/output tags - Connection: GetObjects, GetInfo, GetTableTypes, GetTableSchema - Statement: GetCatalogs, GetSchemas, GetTables, GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference - Tags include: sql_query, catalog/schema/table patterns, result counts - Use shared MetadataSchemaFactory for Catalogs/Schemas schemas Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add cancellation token parameter to ExecuteMetadataSql - Create CreateMetadataTimeoutToken using _waitTimeoutSeconds - Thread timeout token through all IGetObjectsDataProvider methods - Thread timeout token through all statement-level metadata methods - Thread timeout token through GetTableSchema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wire NormalizeSparkCatalog into EffectiveCatalog property so "SPARK" catalog is converted to null and falls back to the connection default. Without this, SEA would query for literal catalog "SPARK" which doesn't exist on the server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements all metadata methods for Statement Execution API (SEA) by reusing shared utilities from hiveserver2 PR adbc-drivers/hiveserver2#20, addressing the core review feedback from PR #105: "let's do the refactor work first" and "reuse HiveServer2 code".
New SEA metadata methods:
IGetObjectsDataProvider+BuildGetObjectsResultorchestratorColumnMetadataHelper.GetArrowTypefor shared type mappingSHOW KEYS+MetadataSchemaFactorySHOW FOREIGN KEYS+MetadataSchemaFactorySQL command classes (one per command, per reviewer feedback):
MetadataCommandBase,ShowCatalogsCommand,ShowSchemasCommand,ShowTablesCommand,ShowColumnsCommand,ShowKeysCommand,ShowForeignKeysCommandExisting code wired to shared utilities (-132 lines):
DatabricksStatement.EmptyPrimaryKeysResult/EmptyCrossReferenceResult→MetadataSchemaFactoryDatabricksStatement.ShouldReturnEmptyPkFkResult→MetadataUtilitiesDescTableExtendedResult.DataType/DecimalDigits/ColumnSize/IsNumber→ColumnMetadataHelperNOT needed (eliminated by shared utilities from hiveserver2):
DatabricksTypeMapper.cs(487 lines) →ColumnMetadataHelperColumnMetadataSchemas.cs(284 lines) →MetadataSchemaFactory+BuildFlatColumnsResultStatementExecInfoArrowStream.cs→HiveInfoArrowStreamColumnInfo.cs→TableInfoTest plan
Depends on: adbc-drivers/hiveserver2#20
🤖 Generated with Claude Code