feat(csharp): extract reusable metadata utilities from HiveServer2Connection#20
Draft
msrathore-db wants to merge 16 commits into
Draft
feat(csharp): extract reusable metadata utilities from HiveServer2Connection#20msrathore-db wants to merge 16 commits into
msrathore-db wants to merge 16 commits into
Conversation
…nection Extract common metadata computation and result-building code from HiveServer2Connection into standalone reusable classes, enabling future consumers to share this logic without duplicating it. Part A - Column Metadata Helper: - Add ColumnMetadataHelper static utility class with functions for computing XDBC metadata field values from type name strings (GetDataTypeCode, GetBaseTypeName, GetColumnSizeDefault, GetDecimalDigitsDefault, GetBufferLength, GetNumPrecRadix, GetCharOctetLength) - Add 100 unit tests for ColumnMetadataHelper Part B - GetObjects Result Builder Extraction: - Extract GetDbSchemas, GetTableSchemas, GetColumnSchema and catalog assembly into GetObjectsResultBuilder static class - Promote TableInfo from nested struct to standalone file - Promote HiveInfoArrowStream from nested class to standalone file - Add IGetObjectsDataProvider interface for data-fetching contract - Centralize metadata column name constants into MetadataColumnNames All existing behavior preserved. Zero correctness regression verified via E2E Thrift metadata tests against live Databricks warehouse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ts orchestrator - Use ColumnMetadataHelper.GetBaseTypeName in SparkConnection and HiveServer2ExtendedConnection SetPrecisionScaleAndTypeName, replacing direct SqlTypeNameParser calls for base type name extraction - Use ColumnMetadataHelper.GetColumnSizeDefault and GetDecimalDigitsDefault in SparkConnection for DECIMAL/CHAR/VARCHAR precision and scale - HiveServer2Connection now implements IGetObjectsDataProvider with Thrift-based data fetching (GetCatalogs, GetSchemas, GetTables, PopulateColumnInfo) - Add BuildFromProvider orchestrator in GetObjectsResultBuilder that takes an IGetObjectsDataProvider and builds the complete GetObjects result, eliminating the need for each consumer to write its own catalog/schema/table/column assembly loop - Simplify HiveServer2Connection.GetObjects to delegate to BuildFromProvider(this, ...) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pen ProcessRelationshipDataSafe - Add MetadataSchemaFactory with CreatePrimaryKeysSchema (6 cols), CreateCrossReferenceSchema (14 cols), and corresponding empty result factory methods for reuse across protocol implementations - Extract ColumnsResultEnhancer from HiveServer2Statement.EnhanceGetColumnsResult — computes BASE_TYPE_NAME, corrected precision/scale for flat GetColumns results using a protocol-agnostic delegate pattern - Make ProcessRelationshipDataSafe internal for reuse by consumers implementing GetColumnsExtended with PK/FK alignment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ds in BuildColumns - Add PopulateTableInfoFromTypeName to ColumnMetadataHelper — populates all TableInfo fields from just (columnName, typeName, ordinalPosition), using existing helper functions for type code, base type, column size, decimal digits. Enables SEA to populate TableInfo without calling 7+ individual helper functions per column. - Add GetSqlDatetimeSub to ColumnMetadataHelper (DATE→1, TIMESTAMP→3) - Wire GetNumPrecRadix, GetCharOctetLength, GetSqlDatetimeSub into GetObjectsResultBuilder.BuildColumns instead of hard-coded nulls, making GetObjects results more complete for both protocols. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s results Add BuildFlatColumnsResult to GetObjectsResultBuilder — takes a list of (catalog, schema, table, TableInfo) tuples and produces the standard JDBC 24+1 column flat RecordBatch (TABLE_CAT through BASE_TYPE_NAME). This enables SEA to build flat GetColumns results by: 1. Calling PopulateTableInfoFromTypeName per column (from SHOW COLUMNS) 2. Calling BuildFlatColumnsResult with the populated TableInfo Eliminates the need for SEA to manually construct 24 Arrow array builders and the full schema definition — all handled by the shared builder using values already computed by ColumnMetadataHelper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aHelper Refactor ColumnMetadataHelper to delegate base type name extraction to the existing SqlTypeNameParser instead of a custom regex + dictionary. SqlTypeNameParser already handles all type formats with proper regex parsing and result caching — no hint needed, it iterates all parsers. - GetBaseTypeName now calls SqlTypeNameParser.TryParse first - Remove custom s_parameterSuffix regex and NormalizeTypeName - s_dataTypeCodeMap keyed by canonical base type names only - s_numericTypes/s_charTypes use canonical base names - Only 3 Databricks-specific aliases (BYTE, SHORT, LONG) kept as fallback — these are not in SqlTypeNameParser since Thrift never encounters them (only DESC TABLE EXTENDED and SHOW COLUMNS return these). All other aliases (INT, DEC, TIMESTAMP_NTZ/LTZ) are already handled by SqlTypeNameParser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…FromProvider - Remove verbose comment on s_aliasToBaseType - Rename BuildFromProvider to BuildGetObjectsResult for clarity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e→Arrow mapping Add GetArrowType(string typeName) that maps a SQL type name string to its corresponding Apache Arrow IArrowType. Extracts the logic from HiveServer2Connection.GetArrowType (which takes int columnTypeId) into a type-name-based version that both Thrift and SEA can use for GetTableSchema without duplicating the type mapping switch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closed
3 tasks
Move the 24-column JDBC GetColumns schema definition into MetadataSchemaFactory.CreateColumnMetadataSchema() so it's defined once and reusable by both protocols. BuildFlatColumnsResult now uses CreateColumnMetadataSchema() instead of inline schema definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o ColumnMetadataHelper Restore SparkConnection and HiveServer2ExtendedConnection back to their original SqlTypeNameParser-based implementations. Thrift server provides all values — no need to compute with ColumnMetadataHelper. Restore BuildColumns xdbc fields (numPrecRadix, datetimeSub, charOctetLength) back to null matching original Thrift GetObjects. ColumnMetadataHelper is for SEA and consumers that only have type name strings. Thrift code uses server values + SqlTypeNameParser. Verified: all Thrift flat GetColumns values match exactly before/after. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…connection changes - Make HiveServer2Connection.GetArrowType internal static (shared with SEA) - Add backward-compatible constant aliases in HiveServer2Connection - Revert Nullable->NullableColumn rename in MetadataColumnNames - Revert changes to ImpalaConnection, SparkConnection, HiveServer2ExtendedConnection - Remove ColumnMetadataHelper (moving to databricks repo, SEA-only) - Remove BuildFlatColumnsResult from GetObjectsResultBuilder (SEA-only) - Remove ColumnMetadataHelperTests (moving to databricks repo) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dataSchemaFactory - Add CreateCatalogsSchema, CreateSchemasSchema, CreateTablesSchema with corresponding CreateEmpty*Result helpers - Add BuildGetInfoResult — shared DenseUnionArray builder for GetInfo, accepts a dictionary of info code to value mappings - Refactor HiveServer2Connection.GetInfo to use BuildGetInfoResult, replacing ~150 lines of inline DenseUnionArray building Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix GetInfo lazy evaluation: only resolve VendorName/VendorVersion when requested, avoiding unnecessary Lazy property forcing - Restore Schema.Validate call in BuildGetInfoResult for safety - Add XML documentation to GetArrowType noting NotImplementedException - Add XML documentation to IGetObjectsDataProvider.PopulateColumnInfo explaining the catalogMap mutation contract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
|
@msrathore-db - What is the status of this PR? Is it a duplicate of the other changes? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
HiveServer2Connectioninto standalone reusable classesColumnMetadataHelperstatic utility class with functions for computing XDBC metadata field values from type name strings (type code mapping, base type extraction, column size, decimal digits, buffer length, etc.)GetObjectsResultBuilderwithBuildResult,BuildDbSchemas,BuildTables,BuildColumnsfrom private static methodsTableInfoandHiveInfoArrowStreamfrom nested types to standalone filesIGetObjectsDataProviderinterface defining data-fetching contract for GetObjectsMetadataColumnNamesColumnMetadataHelperTest plan
🤖 Generated with Claude Code