Skip to content

Fixes #26670: Resolve QuickSight column aliases to correct upstream column in lineage#27301

Open
mohitjeswani01 wants to merge 5 commits intoopen-metadata:mainfrom
mohitjeswani01:fix/26670-quicksight-column-alias-lineage
Open

Fixes #26670: Resolve QuickSight column aliases to correct upstream column in lineage#27301
mohitjeswani01 wants to merge 5 commits intoopen-metadata:mainfrom
mohitjeswani01:fix/26670-quicksight-column-alias-lineage

Conversation

@mohitjeswani01
Copy link
Copy Markdown

Fixes #26670

Description

I worked on this because QuickSight datasets using CustomSql with column
aliases (e.g. SELECT id AS relation_id) were producing incorrect
column-level lineage. OpenMetadata was matching by column name instead
of tracing the alias back through the SQL expression — so relation_id
would get linked to any other upstream column named relation_id in the
catalog, rather than the actual source column id.

The fix was simpler than it looks — LineageParser was already being
instantiated with the CustomSql query in _yield_lineage_from_query(),
but its column_lineage property was never used for column-level lineage.
The code was falling back to name-based matching instead.

What I did:

  • Added _build_column_lineage_from_parser() method that uses
    lineage_parser.column_lineage to resolve alias mappings correctly
  • Used src_col._parent to filter by parent table — prevents wrong
    lineage when multiple upstream tables share the same column name
  • Extracted column name from raw_name via .split(".")[-1] to handle
    fully-qualified column names from the parser
  • Guarded against single-element tuple edge cases (len(col_pair) < 2)
  • Added graceful fallback to name-based matching when parser returns
    no column lineage (SQL too complex or parsing failed)
  • Added get_column_fqn import which was only in the base class before

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.

  • My PR title is Fixes #26670: Resolve QuickSight column aliases to correct upstream column in lineage

  • I have commented on my code, particularly in hard-to-understand areas.

  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

  • I have added a test that covers the exact scenario we are fixing.

Tests added (test_quicksight.py):

  • test_build_column_lineage_from_parser_resolves_alias — verifies alias resolution
  • test_build_column_lineage_from_parser_multi_table_filters_correctly — verifies parent table filtering prevents wrong lineage in multi-table joins
  • test_build_column_lineage_from_parser_falls_back_when_empty — verifies graceful fallback to name-based matching

@mohitjeswani01 mohitjeswani01 requested a review from a team as a code owner April 12, 2026 21:54
Copilot AI review requested due to automatic review settings April 12, 2026 21:54
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Resolves incorrect QuickSight CustomSql column-level lineage when projected columns use aliases (e.g., SELECT id AS relation_id) by leveraging LineageParser.column_lineage mappings instead of falling back to name-based matching.

Changes:

  • Added _build_column_lineage_from_parser() to build ColumnLineage entries from SQL parser column mappings with parent-table filtering and fallback behavior.
  • Updated _yield_lineage_from_query() to use the new parser-based column lineage builder.
  • Added unit tests covering alias resolution, multi-table filtering, and fallback behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Builds column-level lineage using SQL parser alias mappings and uses it during lineage emission.
ingestion/tests/unit/topology/dashboard/test_quicksight.py Adds unit tests to validate alias resolution, multi-table filtering, and fallback behavior.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Outdated
Comment thread ingestion/tests/unit/topology/dashboard/test_quicksight.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mohitjeswani01
Copy link
Copy Markdown
Author

All bot review comments from Copilot and Gitar Bot have been addressed in commit 106c664:

  1. _parent iterable handling — getattr + list normalisation + any() check
  2. Added test_build_column_lineage_from_parser_iterable_parent test (order 12)
  3. Added logger.debug() when _parent is None for visibility
  4. Fixed Python checkstyle violation — f-string line was 97 chars, now split under 88

@harshach @pmbrull could you please add the safe-to-test label? Thank you! 🙏

Copilot AI review requested due to automatic review settings April 13, 2026 12:09
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Outdated
Comment on lines +500 to +504
@pytest.mark.order(12)
def test_build_column_lineage_from_parser_iterable_parent(self):
"""
When src_col._parent is an iterable of parent tables (as some
parser outputs produce), _build_column_lineage_from_parser must
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests use @pytest.mark.order(12) and then later @pytest.mark.order(11), which makes the ordered test sequence harder to follow and can be confusing when diagnosing failures. Consider keeping the order markers monotonically increasing in the file (swap these two order values, or reorder the tests to match).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Corrected the @pytest.mark.order tags to maintain a monotonically increasing sequence.

@PubChimps
Copy link
Copy Markdown
Contributor

@mohitjeswani01 could you take a look at copilots last 2 comments?

@mohitjeswani01
Copy link
Copy Markdown
Author

@mohitjeswani01 could you take a look at copilots last 2 comments?

yes @PubChimps i will definitely fix those and also check the CI failures!

@mohitjeswani01 mohitjeswani01 force-pushed the fix/26670-quicksight-column-alias-lineage branch from b501e67 to edfdf80 Compare April 16, 2026 20:45
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mohitjeswani01
Copy link
Copy Markdown
Author

@PubChimps I have addressed the Copilot bot findings. The fallback logic in metadata.py has been updated to prevent manufacturing incorrect cross-table lineage, and the corresponding regression test is added. The pytest order markers in test_quicksight.py are also corrected to a monotonic sequence.

The branch is rebased and ready. Could you please add the safe to test label so I can verify the CI pipelines? Thanks.🙏

@mohitjeswani01 mohitjeswani01 force-pushed the fix/26670-quicksight-column-alias-lineage branch from 7735d55 to 9ae5cbc Compare April 29, 2026 17:52
Copilot AI review requested due to automatic review settings April 29, 2026 17:52
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/unit/topology/dashboard/test_quicksight.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 29, 2026

Code Review ✅ Approved 2 resolved / 2 findings

QuickSight column aliases are now correctly resolved to their upstream lineage, addressing issues with parent-table filters and incomplete test coverage. No further issues found.

✅ 2 resolved
Edge Case: No parent-table filter when _parent is None in multi-table SQL

📄 ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:300-307
When src_col._parent is None (line 300), the parent-table filter is skipped entirely. For single-table queries this is correct, but if the parser fails to populate _parent for a multi-table JOIN query, the same column pair will pass the filter for every from_entity in the outer loop of _yield_lineage_from_query, potentially producing duplicate or incorrect column lineage.

get_column_fqn provides a secondary guard (the column must exist in the entity), so impact is limited to cases where both upstream tables have a column with the same name. This is unlikely to cause real issues given the fallback, but worth being aware of.

Bug: Test doesn't exercise parent-table name comparison it claims to

📄 ingestion/tests/unit/topology/dashboard/test_quicksight.py:489-503
In test_build_column_lineage_no_fallback_when_parser_has_global_lineage, other_src_col._parent is assigned a bare MagicMock() (line 492). In the production code (metadata.py line 280), hasattr(parent, "__iter__") is True for MagicMock, so list(parent) is called — which yields an empty list. The any(...) filter at line 284 then returns False and continue is hit, meaning the pair is skipped due to an empty parent list, not because "users_table" != "orders_table" as the test docstring implies. The __str__ configured at line 493 is never invoked.

The test still validates the key invariant (no fallback when parser has global lineage), and test_build_column_lineage_from_parser_multi_table_filters_correctly does properly test the name comparison using _FakeTable. So there's no gap in coverage, but this test is misleading about what it exercises.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@mohitjeswani01
Copy link
Copy Markdown
Author

mohitjeswani01 commented Apr 29, 2026

@PubChimps , @harshach sir i have solved the merge conflicts also resolved bots comments and ran the tests locally could you please add a safe to test label and review this ? thanks 🙏!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuickSight column aliases are resolved to the wrong upstream column in lineage when the alias name exists elsewhere

3 participants