fix: resolve path-based lineage for Databricks external tables (#27561)#27648
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
…-FQN resolution Reverts the path-based fallback in DATABRICKS_GET_TABLE_LINEAGE and DATABRICKS_GET_COLUMN_LINEAGE queries since DatabricksClient lacks the external_path_to_fqn map needed to resolve paths to FQNs. Without this map, relaxing the IS NOT NULL constraints creates dict keys containing None values that never match downstream lookups.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
#27561 (comment) |
Sure ill attach it |
🟡 Playwright Results — all passed (25 flaky)✅ 4230 passed · ❌ 0 failed · 🟡 25 flaky · ⏭️ 87 skipped
🟡 25 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
Thanks for the PR. This needs to be updated against the latest Could you please rebase on |
Sure sorry got busy with some other things ill do it ASAP |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review ✅ Approved 2 resolved / 2 findingsResolves Databricks lineage gaps by enabling path-based resolution for external tables and updating query filters. Addresses issues with path fallback in lineage caching and removes residual merge conflict markers. ✅ 2 resolved✅ Bug: DatabricksClient column lineage caching ignores path fallback
✅ Bug: Unresolved merge conflict markers in production code
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|


Describe your changes:
Fixes #27561
External tables in Databricks are referenced using cloud storage paths (e.g.
delta.\abfss://...`) instead of table names. In this case, Databricks system tables populatesource_path/target_pathand leavesource_table_full_name/target_table_full_name` as null. The lineage processor was filtering out these rows entirely, resulting in missing lineage for all external tables.Changes:
databricks/queries.py+unitycatalog/queries.py: Addedsource_pathandtarget_pathto SELECT; relaxed WHERE filter from hardIS NOT NULLon name columns to(name IS NOT NULL OR path IS NOT NULL)databricks/client.py: Passsource_pathandtarget_paththrough the lineage cache dictunitycatalog/lineage.py: Build a reversepath → table_fqnmap from the external locations cache; fall back to path resolution whenfull_nameis null; ensure_cache_external_locations()runs before_cache_lineage()so the reverse map is availabletest_unity_catalog_lineage.py: Updated mock row definitions to include path fields; added tests for path resolution, unresolvable path skipping, and reverse map constructionType of change:
Checklist:
Fixes #27561: resolve path-based lineage for Databricks external tables