feat(hive): Add partition key detection via DESCRIBE FORMATTED #26712#27278
feat(hive): Add partition key detection via DESCRIBE FORMATTED #26712#27278mohitjeswani01 wants to merge 6 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
Adds Hive partition key detection by parsing DESCRIBE FORMATTED output so partition columns can be exposed as TablePartition metadata during ingestion.
Changes:
- Added
HiveSource.get_table_partition_details()that runsDESCRIBE FORMATTEDand parses the# Partition Informationsection intoTablePartition. - Added unit tests covering successful extraction, missing partition section, metastore-skip behavior, and engine errors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
ingestion/src/metadata/ingestion/source/database/hive/metadata.py |
Implements partition-key extraction via DESCRIBE FORMATTED and maps results to TablePartition. |
ingestion/tests/unit/topology/database/test_hive.py |
Adds unit tests for the new Hive partition parsing logic and skip/error behaviors. |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
All bot review comments from Gitar Bot and Copilot have been addressed in commit b2d7645. Summary of fixes:
@harshach @pmbrull could you please add the |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (10 flaky)✅ 3987 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 86 skipped
🟡 10 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
- Fix self.connection_config → self.service_connection - Add identifier_preparer quoting to prevent SQL injection - Add pylint disable for unused inspector argument - Fix Mock → MagicMock in tests for __getitem__ support
- Replace raw getattr truthiness check with _get_validated_metastore_connection() - Use MagicMock for engine to support context manager protocol - Patch _get_validated_metastore_connection in skip detection test - All 40 tests pass locally (pytest -v)
61b08af to
de33001
Compare
return None. Without the guard, dialect.identifier_preparer raises a misleading AttributeError silently caught by the broad except block.
Code Review ✅ Approved 4 resolved / 4 findingsImplements partition key detection using DESCRIBE FORMATTED with sanitized query inputs and robust attribute access. Resolved potential SQL injection risks, attribute errors, and parsing inconsistencies. ✅ 4 resolved✅ Bug:
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
hi @PubChimps , @ulixius9 , @harshach sir i have solved all the merge conflicts here and also recently checked the bots comments and passed the test cases locally could you please give this pr a re-review ? thanks 🙏
|
|




Fixes #26712
Describe your changes:
Added
get_table_partition_details()toHiveSourceto identify Hive partition key columns and expose them asTablePartitionmetadata.DESCRIBE FORMATTEDvia HiveServer2 engine to fetch table description# Partition Informationsection to extract partition key column namesDatabase:andOwner:from being included as partition keysmetastoreConnection— skips engine-based detection when a metastore is configuredTablePartitionwithPartitionColumnDetailsusingCOLUMN_VALUEinterval typeType of change:
Checklist:
feat(hive): Add partition key detection via DESCRIBE FORMATTED #26712Tests:
test_partition_keys_extracted_correctlytest_no_partition_section_returns_falsetest_metastore_connection_skips_detectiontest_engine_exception_returns_false