Fixes #27419: Trino cross-database lineage for case-insensitive table names#27495
Conversation
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
1 similar comment
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
… schema Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR fixes Trino cross-database lineage resolution when the upstream/source database uses case-sensitive (e.g., uppercase) identifiers, while Trino normalizes identifiers to lowercase—preventing valid lineage edges from being created.
Changes:
- Updates Trino cross-database table matching to be case-insensitive for table and column names, with a schema-scoped fallback lookup and caching.
- Adds a regression unit test covering case-insensitive matching and ensuring schema-level lookup is used for cross-database resolution.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
ingestion/src/metadata/ingestion/source/database/trino/lineage.py |
Implements case-insensitive matching and schema-scoped cached fallback lookup for cross-database lineage. |
ingestion/tests/unit/source/database/trino/test_lineage.py |
Adds regression tests validating case-insensitive matching and schema-scoped cross-db lookup behavior. |
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Signed-off-by: hassaansaleem28 <iamhassaans@gmail.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| def list_all_entities_side_effect(entity, params=None, **_kwargs): | ||
| if entity is Database and params == {"service": "repro_trino"}: | ||
| return [trino_database] | ||
| if entity is Database and params == {"service": "repro_postgres"}: | ||
| return [source_database] | ||
| if entity is Table and params == {"database": "repro_trino.postgres"}: | ||
| return [trino_table] | ||
| return [] |
There was a problem hiding this comment.
💡 Bug: Test doesn't verify trino_table has columns/databaseSchema populated
The test test_yield_cross_database_lineage_finds_uppercase_source_table mocks list_all_entities to return trino_table which is a MagicMock with explicitly set columns and databaseSchema. This masks the production bug where list_all_entities is called without fields and these attributes would be empty. Consider adding an assertion that list_all_entities was called with the expected fields parameter.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Code Review 👍 Approved with suggestions 4 resolved / 5 findingsImproves Trino cross-database lineage by resolving case-insensitive matching, redundant cache miss lookups, and schema FQN sensitivity. Ensure the test suite explicitly verifies the population of columns and databaseSchema for trino_table entities. 💡 Bug: Test doesn't verify trino_table has columns/databaseSchema populated📄 ingestion/tests/unit/source/database/trino/test_lineage.py:90-97 The test ✅ 4 resolved✅ Bug: Case-insensitive fallback ignores schema, may match wrong table
✅ Performance: Fallback lists all tables per database on every cache miss
✅ Edge Case: Schema FQN lookup may still be case-sensitive
✅ Performance: No negative caching for schema ES lookup causes repeated queries
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Describe your changes:
Fixes #27419
Trino lowercases identifiers in query history, but OpenMetadata was matching cross-database table names too strictly. That caused the upstream Postgres
CUSTOMERtable to be dropped from the lineage graph.What I worked on
I worked on Trino lineage cross-database matching because Trino lowercases identifiers while OpenMetadata was comparing table names too strictly. I added a regression test and updated the matching logic so the Postgres CUSTOMER table now links into the Trino lineage graph instead of being dropped.
Before
customer.CUSTOMERnode was missing.After
CUSTOMER -> customer -> customer_copy.What changed
Validation
Type of change:
Checklist:
Fixes #27419: Trino cross-database lineage for case-insensitive table namesSummary by Gitar
metadata.list_all_entitiescalls for database and table discovery to improve consistency.This will update automatically on new commits.