Skip to content

feat(csharp): extract reusable metadata utilities from HiveServer2Connection#20

Draft
msrathore-db wants to merge 16 commits into
adbc-drivers:mainfrom
msrathore-db:feat/extract-metadata-utilities
Draft

feat(csharp): extract reusable metadata utilities from HiveServer2Connection#20
msrathore-db wants to merge 16 commits into
adbc-drivers:mainfrom
msrathore-db:feat/extract-metadata-utilities

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

Summary

  • Extract common metadata computation and result-building code from HiveServer2Connection into standalone reusable classes
  • Add ColumnMetadataHelper static utility class with functions for computing XDBC metadata field values from type name strings (type code mapping, base type extraction, column size, decimal digits, buffer length, etc.)
  • Extract GetObjectsResultBuilder with BuildResult, BuildDbSchemas, BuildTables, BuildColumns from private static methods
  • Promote TableInfo and HiveInfoArrowStream from nested types to standalone files
  • Add IGetObjectsDataProvider interface defining data-fetching contract for GetObjects
  • Centralize metadata column name constants into MetadataColumnNames
  • Add 100 unit tests for ColumnMetadataHelper

Test plan

  • All 100 ColumnMetadataHelper unit tests pass
  • All 591 SqlTypeNameParser tests pass (unchanged)
  • hiveserver2 project builds with 0 errors, 0 warnings on all 3 target frameworks
  • databricks project builds with 0 errors, 0 warnings
  • All 472 databricks unit tests pass
  • E2E Thrift metadata regression test (61 assertions) passes against live Databricks warehouse:
    • GetCatalogs, GetSchemas, GetTables, GetColumns, GetPrimaryKeys, GetCrossReference
    • GetObjects at all 4 depths (Catalogs, DbSchemas, Tables, All)
  • Zero correctness regression — all metadata values identical before and after refactoring

🤖 Generated with Claude Code

…nection

Extract common metadata computation and result-building code from
HiveServer2Connection into standalone reusable classes, enabling
future consumers to share this logic without duplicating it.

Part A - Column Metadata Helper:
- Add ColumnMetadataHelper static utility class with functions for
  computing XDBC metadata field values from type name strings
  (GetDataTypeCode, GetBaseTypeName, GetColumnSizeDefault,
  GetDecimalDigitsDefault, GetBufferLength, GetNumPrecRadix,
  GetCharOctetLength)
- Add 100 unit tests for ColumnMetadataHelper

Part B - GetObjects Result Builder Extraction:
- Extract GetDbSchemas, GetTableSchemas, GetColumnSchema and catalog
  assembly into GetObjectsResultBuilder static class
- Promote TableInfo from nested struct to standalone file
- Promote HiveInfoArrowStream from nested class to standalone file
- Add IGetObjectsDataProvider interface for data-fetching contract
- Centralize metadata column name constants into MetadataColumnNames

All existing behavior preserved. Zero correctness regression verified
via E2E Thrift metadata tests against live Databricks warehouse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@msrathore-db msrathore-db marked this pull request as draft February 18, 2026 20:14
msrathore-db and others added 7 commits February 19, 2026 03:28
…ts orchestrator

- Use ColumnMetadataHelper.GetBaseTypeName in SparkConnection and
  HiveServer2ExtendedConnection SetPrecisionScaleAndTypeName, replacing
  direct SqlTypeNameParser calls for base type name extraction
- Use ColumnMetadataHelper.GetColumnSizeDefault and GetDecimalDigitsDefault
  in SparkConnection for DECIMAL/CHAR/VARCHAR precision and scale
- HiveServer2Connection now implements IGetObjectsDataProvider with
  Thrift-based data fetching (GetCatalogs, GetSchemas, GetTables,
  PopulateColumnInfo)
- Add BuildFromProvider orchestrator in GetObjectsResultBuilder that
  takes an IGetObjectsDataProvider and builds the complete GetObjects
  result, eliminating the need for each consumer to write its own
  catalog/schema/table/column assembly loop
- Simplify HiveServer2Connection.GetObjects to delegate to
  BuildFromProvider(this, ...)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pen ProcessRelationshipDataSafe

- Add MetadataSchemaFactory with CreatePrimaryKeysSchema (6 cols),
  CreateCrossReferenceSchema (14 cols), and corresponding empty result
  factory methods for reuse across protocol implementations
- Extract ColumnsResultEnhancer from HiveServer2Statement.EnhanceGetColumnsResult
  — computes BASE_TYPE_NAME, corrected precision/scale for flat GetColumns
  results using a protocol-agnostic delegate pattern
- Make ProcessRelationshipDataSafe internal for reuse by consumers
  implementing GetColumnsExtended with PK/FK alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ds in BuildColumns

- Add PopulateTableInfoFromTypeName to ColumnMetadataHelper — populates
  all TableInfo fields from just (columnName, typeName, ordinalPosition),
  using existing helper functions for type code, base type, column size,
  decimal digits. Enables SEA to populate TableInfo without calling 7+
  individual helper functions per column.
- Add GetSqlDatetimeSub to ColumnMetadataHelper (DATE→1, TIMESTAMP→3)
- Wire GetNumPrecRadix, GetCharOctetLength, GetSqlDatetimeSub into
  GetObjectsResultBuilder.BuildColumns instead of hard-coded nulls,
  making GetObjects results more complete for both protocols.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s results

Add BuildFlatColumnsResult to GetObjectsResultBuilder — takes a list of
(catalog, schema, table, TableInfo) tuples and produces the standard
JDBC 24+1 column flat RecordBatch (TABLE_CAT through BASE_TYPE_NAME).

This enables SEA to build flat GetColumns results by:
1. Calling PopulateTableInfoFromTypeName per column (from SHOW COLUMNS)
2. Calling BuildFlatColumnsResult with the populated TableInfo

Eliminates the need for SEA to manually construct 24 Arrow array
builders and the full schema definition — all handled by the shared
builder using values already computed by ColumnMetadataHelper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aHelper

Refactor ColumnMetadataHelper to delegate base type name extraction to
the existing SqlTypeNameParser instead of a custom regex + dictionary.
SqlTypeNameParser already handles all type formats with proper regex
parsing and result caching — no hint needed, it iterates all parsers.

- GetBaseTypeName now calls SqlTypeNameParser.TryParse first
- Remove custom s_parameterSuffix regex and NormalizeTypeName
- s_dataTypeCodeMap keyed by canonical base type names only
- s_numericTypes/s_charTypes use canonical base names
- Only 3 Databricks-specific aliases (BYTE, SHORT, LONG) kept as
  fallback — these are not in SqlTypeNameParser since Thrift never
  encounters them (only DESC TABLE EXTENDED and SHOW COLUMNS return
  these). All other aliases (INT, DEC, TIMESTAMP_NTZ/LTZ) are already
  handled by SqlTypeNameParser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…FromProvider

- Remove verbose comment on s_aliasToBaseType
- Rename BuildFromProvider to BuildGetObjectsResult for clarity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e→Arrow mapping

Add GetArrowType(string typeName) that maps a SQL type name string to
its corresponding Apache Arrow IArrowType. Extracts the logic from
HiveServer2Connection.GetArrowType (which takes int columnTypeId) into
a type-name-based version that both Thrift and SEA can use for
GetTableSchema without duplicating the type mapping switch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msrathore-db and others added 8 commits February 27, 2026 16:41
Move the 24-column JDBC GetColumns schema definition into
MetadataSchemaFactory.CreateColumnMetadataSchema() so it's defined
once and reusable by both protocols.

BuildFlatColumnsResult now uses CreateColumnMetadataSchema() instead
of inline schema definition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…o ColumnMetadataHelper

Restore SparkConnection and HiveServer2ExtendedConnection back to
their original SqlTypeNameParser-based implementations. Thrift server
provides all values — no need to compute with ColumnMetadataHelper.

Restore BuildColumns xdbc fields (numPrecRadix, datetimeSub,
charOctetLength) back to null matching original Thrift GetObjects.

ColumnMetadataHelper is for SEA and consumers that only have type
name strings. Thrift code uses server values + SqlTypeNameParser.

Verified: all Thrift flat GetColumns values match exactly before/after.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…connection changes

- Make HiveServer2Connection.GetArrowType internal static (shared with SEA)
- Add backward-compatible constant aliases in HiveServer2Connection
- Revert Nullable->NullableColumn rename in MetadataColumnNames
- Revert changes to ImpalaConnection, SparkConnection, HiveServer2ExtendedConnection
- Remove ColumnMetadataHelper (moving to databricks repo, SEA-only)
- Remove BuildFlatColumnsResult from GetObjectsResultBuilder (SEA-only)
- Remove ColumnMetadataHelperTests (moving to databricks repo)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dataSchemaFactory

- Add CreateCatalogsSchema, CreateSchemasSchema, CreateTablesSchema
  with corresponding CreateEmpty*Result helpers
- Add BuildGetInfoResult — shared DenseUnionArray builder for GetInfo,
  accepts a dictionary of info code to value mappings
- Refactor HiveServer2Connection.GetInfo to use BuildGetInfoResult,
  replacing ~150 lines of inline DenseUnionArray building

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix GetInfo lazy evaluation: only resolve VendorName/VendorVersion
  when requested, avoiding unnecessary Lazy property forcing
- Restore Schema.Validate call in BuildGetInfoResult for safety
- Add XML documentation to GetArrowType noting NotImplementedException
- Add XML documentation to IGetObjectsDataProvider.PopulateColumnInfo
  explaining the catalogMap mutation contract

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@birschick-bq
Copy link
Copy Markdown
Collaborator

@msrathore-db - What is the status of this PR? Is it a duplicate of the other changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants