Skip to content

fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670)#27031

Closed
SaaiAravindhRaja wants to merge 5 commits intoopen-metadata:mainfrom
SaaiAravindhRaja:fix/issue-26670-quicksight-column-aliases
Closed

fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670)#27031
SaaiAravindhRaja wants to merge 5 commits intoopen-metadata:mainfrom
SaaiAravindhRaja:fix/issue-26670-quicksight-column-aliases

Conversation

@SaaiAravindhRaja
Copy link
Copy Markdown
Contributor

Fixes #26670

QuickSight datasets backed by CustomSql have aliased columns (e.g. SELECT id AS relation_id). The connector was matching by column name instead of tracing the alias back through the SQL expression, leading to incorrect column-level lineage.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@SaaiAravindhRaja SaaiAravindhRaja changed the title fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) [WIP] Apr 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/tests/unit/topology/dashboard/test_quicksight.py Outdated
@SaaiAravindhRaja SaaiAravindhRaja marked this pull request as ready for review April 4, 2026 11:16
@SaaiAravindhRaja SaaiAravindhRaja requested a review from a team as a code owner April 4, 2026 11:16
Copilot AI review requested due to automatic review settings April 4, 2026 11:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix incorrect QuickSight column-level lineage for datasets backed by CustomSql by using SQL parser-derived column mappings (including aliases) instead of relying solely on name-based matching.

Changes:

  • Route CustomSql-based lineage generation through SQL parser column mappings to resolve aliased target columns to their true upstream source columns.
  • Add unit tests for alias-based lineage mapping and fallback behavior when the parser produces no column lineage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Introduces _build_column_lineage_from_parser and uses it for CustomSql query lineage to map source→alias columns via parsed SQL.
ingestion/tests/unit/topology/dashboard/test_quicksight.py Adds unit tests to validate alias resolution and fallback-to-name-matching behavior.

Comment on lines +598 to +608
for col_pair in lineage_parser.column_lineage or []:
try:
src_col = col_pair[0]
tgt_col = col_pair[-1]
from_col = get_column_fqn(
table_entity=from_entity, column=src_col.raw_name
)
to_col = self._get_data_model_column_fqn(
data_model_entity=data_model_entity,
column=tgt_col.raw_name,
)
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_build_column_lineage_from_parser matches parsed columns to from_entity using only src_col.raw_name. In multi-table CustomSql queries, different source tables often share column names (e.g., id), so this can incorrectly attach a lineage pair to the wrong from_entity (because get_column_fqn will happily resolve id on any table that has it). Consider filtering lineage_parser.column_lineage by the source column’s parent table (e.g., src_col._parent) or by passing the current source table context into this method and using get_column_fqn(..., table=..., schema=..., database=...) so only pairs belonging to the current upstream table are emitted.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja this is a valid comment

Comment on lines +392 to +463
@pytest.mark.order(9)
def test_build_column_lineage_from_parser_uses_aliases(self):
"""
When a CustomSql query uses column aliases (SELECT src_col AS alias_col),
_build_column_lineage_from_parser must resolve the source column name and
map it to the aliased data model column, not drop the lineage pair.
"""
from unittest.mock import MagicMock

src_col = MagicMock()
src_col.raw_name = "original_col"

tgt_col = MagicMock()
tgt_col.raw_name = "alias_col"

mock_parser = MagicMock()
mock_parser.column_lineage = [(src_col, tgt_col)]

src_fqn = "db.schema.my_table.original_col"
alias_fqn = "qs_service.dataset.alias_col"

mock_from_entity = MagicMock()
mock_data_model = MagicMock()

with patch(
"metadata.ingestion.source.dashboard.quicksight.metadata.get_column_fqn",
return_value=src_fqn,
) as mock_get_col_fqn:
with patch.object(
self.quicksight,
"_get_data_model_column_fqn",
return_value=alias_fqn,
) as mock_get_dm_col_fqn:
result = self.quicksight._build_column_lineage_from_parser(
mock_parser, mock_from_entity, mock_data_model
)

mock_get_col_fqn.assert_called_once_with(
table_entity=mock_from_entity, column="original_col"
)
mock_get_dm_col_fqn.assert_called_once_with(
data_model_entity=mock_data_model, column="alias_col"
)
assert len(result) == 1
assert result[0].fromColumns == [src_fqn]
assert result[0].toColumn == alias_fqn

@pytest.mark.order(10)
def test_build_column_lineage_from_parser_falls_back_when_no_parser_results(self):
"""
When lineage_parser.column_lineage is empty, _build_column_lineage_from_parser
must fall back to name-based matching via _get_column_lineage.
"""
mock_parser = MagicMock()
mock_parser.column_lineage = []

fallback_lineage = [MagicMock()]
mock_from_entity = MagicMock()
mock_data_model = MagicMock()
mock_data_model.columns = [MagicMock(name=MagicMock(root="col_a"))]

with patch.object(
self.quicksight,
"_get_column_lineage",
return_value=fallback_lineage,
) as mock_get_col_lineage:
result = self.quicksight._build_column_lineage_from_parser(
mock_parser, mock_from_entity, mock_data_model
)

mock_get_col_lineage.assert_called_once()
assert result is fallback_lineage
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added unit tests validate alias mapping and the empty-parser fallback, but they don’t cover the most failure-prone scenario for this change: a CustomSql query that references multiple upstream tables where the same source column name exists in more than one table (e.g., t1.id and t2.id). Adding a test that asserts _build_column_lineage_from_parser only produces column lineage for the correct upstream table would prevent regressions of incorrect column-level lineage resolution.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja valid comment as well

@SaaiAravindhRaja SaaiAravindhRaja changed the title fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) [WIP] fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) Apr 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@ulixius9 ulixius9 added the safe to test Add this label to run secure Github workflows on PRs label Apr 6, 2026
Copilot AI review requested due to automatic review settings April 6, 2026 11:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +401 to +410
src_col = MagicMock()
src_col.raw_name = "original_col"

tgt_col = MagicMock()
tgt_col.raw_name = "alias_col"

mock_parser = MagicMock()
mock_parser.column_lineage = [(src_col, tgt_col)]

src_fqn = "db.schema.my_table.original_col"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets src_col.raw_name to a bare column name (original_col), but in real LineageParser.column_lineage output raw_name is frequently fully-qualified (e.g. testdb.public.users.id). As written, the test can pass even if _build_column_lineage_from_parser fails to handle fully-qualified raw_name values (and would then fall back to name-based matching).

To prevent regressions, consider updating the fixture to use a fully-qualified raw_name (and/or a quoted identifier) and assert that the implementation still resolves the correct source column FQN.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja valid comment as well

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

🔴 Playwright Results — 1 failure(s), 22 flaky

✅ 3601 passed · ❌ 1 failed · 🟡 22 flaky · ⏭️ 207 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 451 0 6 2
🟡 Shard 2 640 0 3 32
🟡 Shard 3 647 0 2 26
🟡 Shard 4 624 0 3 47
🔴 Shard 5 606 1 2 67
🟡 Shard 6 633 0 6 33

Genuine Failures (failed on all attempts)

Pages/Glossary.spec.ts › Add and Remove Assets (shard 5)
�[31mTest timeout of 180000ms exceeded.�[39m
🟡 22 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Topic - customization should work (shard 1, 1 retry)
  • Features/CustomizeDetailPage.spec.ts › Search Index - customization should work (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from help section (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
  • Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ChangeSummaryBadge.spec.ts › Automated badge should appear on entity description with Automated source (shard 2, 1 retry)
  • Features/Glossary/GlossaryStatusFilterNestedTerms.spec.ts › only leaf nodes match filter - parent chain does not (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContractInheritance.spec.ts › Edit Asset Contract - Add SLA when inheriting SLA from Data Product (PATCH should use /add not /replace) (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Delete Spreadsheet (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Glossary Term Add, Update and Remove (shard 5, 1 retry)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 12, 2026

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Fixes QuickSight column alias resolution in lineage tracking by correctly mapping upstream columns. The MagicMock name kwarg issue has been resolved, though consider handling single-element tuples in _build_column_lineage_from_parser to avoid src_col and tgt_col aliasing to the same object.

💡 Edge Case: col_pair[0] and col_pair[-1] alias to same object for 1-element tuples

📄 ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:600-601

In _build_column_lineage_from_parser, src_col = col_pair[0] and tgt_col = col_pair[-1] will reference the same Column when the parser returns a single-element tuple. This would create a self-referencing lineage entry (source == target). The upstream LineageParser.column_lineage already uses the same [0]/[-1] pattern and documents that intermediate columns may exist, but a single-element edge case is not guarded against.

Suggested fix
Add a guard at the top of the loop body:
    if len(col_pair) < 2:
        continue
✅ 1 resolved
Bug: MagicMock name kwarg sets mock name, not .name attribute

📄 ingestion/tests/unit/topology/dashboard/test_quicksight.py:451
In test_build_column_lineage_from_parser_falls_back_when_no_parser_results, MagicMock(name=MagicMock(root="col_a")) does not set the .name attribute on the mock — the name keyword argument to MagicMock is special and sets the mock's internal _mock_name instead.

This means mock_data_model.columns[0].name.root will return a new auto-generated MagicMock, not "col_a". The test still passes because _get_column_lineage is fully patched and doesn't inspect its columns argument, but the columns list built inside the fallback path will contain garbage values rather than ["col_a"].

If you later add assert_called_once_with(...) to validate the exact arguments, the test will fail unexpectedly.

🤖 Prompt for agents
Code Review: Fixes QuickSight column alias resolution in lineage tracking by correctly mapping upstream columns. The MagicMock `name` kwarg issue has been resolved, though consider handling single-element tuples in `_build_column_lineage_from_parser` to avoid `src_col` and `tgt_col` aliasing to the same object.

1. 💡 Edge Case: col_pair[0] and col_pair[-1] alias to same object for 1-element tuples
   Files: ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:600-601

   In `_build_column_lineage_from_parser`, `src_col = col_pair[0]` and `tgt_col = col_pair[-1]` will reference the same Column when the parser returns a single-element tuple. This would create a self-referencing lineage entry (source == target). The upstream `LineageParser.column_lineage` already uses the same `[0]`/`[-1]` pattern and documents that intermediate columns may exist, but a single-element edge case is not guarded against.

   Suggested fix:
   Add a guard at the top of the loop body:
       if len(col_pair) < 2:
           continue

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@TeddyCr TeddyCr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check copilot comments, all valid

Comment on lines +598 to +608
for col_pair in lineage_parser.column_lineage or []:
try:
src_col = col_pair[0]
tgt_col = col_pair[-1]
from_col = get_column_fqn(
table_entity=from_entity, column=src_col.raw_name
)
to_col = self._get_data_model_column_fqn(
data_model_entity=data_model_entity,
column=tgt_col.raw_name,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja this is a valid comment

Comment on lines +401 to +410
src_col = MagicMock()
src_col.raw_name = "original_col"

tgt_col = MagicMock()
tgt_col.raw_name = "alias_col"

mock_parser = MagicMock()
mock_parser.column_lineage = [(src_col, tgt_col)]

src_fqn = "db.schema.my_table.original_col"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja valid comment as well

Comment on lines +392 to +463
@pytest.mark.order(9)
def test_build_column_lineage_from_parser_uses_aliases(self):
"""
When a CustomSql query uses column aliases (SELECT src_col AS alias_col),
_build_column_lineage_from_parser must resolve the source column name and
map it to the aliased data model column, not drop the lineage pair.
"""
from unittest.mock import MagicMock

src_col = MagicMock()
src_col.raw_name = "original_col"

tgt_col = MagicMock()
tgt_col.raw_name = "alias_col"

mock_parser = MagicMock()
mock_parser.column_lineage = [(src_col, tgt_col)]

src_fqn = "db.schema.my_table.original_col"
alias_fqn = "qs_service.dataset.alias_col"

mock_from_entity = MagicMock()
mock_data_model = MagicMock()

with patch(
"metadata.ingestion.source.dashboard.quicksight.metadata.get_column_fqn",
return_value=src_fqn,
) as mock_get_col_fqn:
with patch.object(
self.quicksight,
"_get_data_model_column_fqn",
return_value=alias_fqn,
) as mock_get_dm_col_fqn:
result = self.quicksight._build_column_lineage_from_parser(
mock_parser, mock_from_entity, mock_data_model
)

mock_get_col_fqn.assert_called_once_with(
table_entity=mock_from_entity, column="original_col"
)
mock_get_dm_col_fqn.assert_called_once_with(
data_model_entity=mock_data_model, column="alias_col"
)
assert len(result) == 1
assert result[0].fromColumns == [src_fqn]
assert result[0].toColumn == alias_fqn

@pytest.mark.order(10)
def test_build_column_lineage_from_parser_falls_back_when_no_parser_results(self):
"""
When lineage_parser.column_lineage is empty, _build_column_lineage_from_parser
must fall back to name-based matching via _get_column_lineage.
"""
mock_parser = MagicMock()
mock_parser.column_lineage = []

fallback_lineage = [MagicMock()]
mock_from_entity = MagicMock()
mock_data_model = MagicMock()
mock_data_model.columns = [MagicMock(name=MagicMock(root="col_a"))]

with patch.object(
self.quicksight,
"_get_column_lineage",
return_value=fallback_lineage,
) as mock_get_col_lineage:
result = self.quicksight._build_column_lineage_from_parser(
mock_parser, mock_from_entity, mock_data_model
)

mock_get_col_lineage.assert_called_once()
assert result is fallback_lineage
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaaiAravindhRaja valid comment as well

@PubChimps
Copy link
Copy Markdown
Contributor

hi @SaaiAravindhRaja, thank you for this pr, but this issue was assigned to another pr in progress here, we will reopen this pr if needed

@PubChimps PubChimps closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuickSight column aliases are resolved to the wrong upstream column in lineage when the alias name exists elsewhere

5 participants