fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) by SaaiAravindhRaja · Pull Request #27031 · open-metadata/OpenMetadata

SaaiAravindhRaja · 2026-04-04T07:03:03Z

QuickSight datasets backed by CustomSql have aliased columns (e.g. SELECT id AS relation_id). The connector was matching by column name instead of tracing the alias back through the SQL expression, leading to incorrect column-level lineage.

github-actions · 2026-04-04T07:03:28Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…pen-metadata#26670)

github-actions · 2026-04-04T09:13:19Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

This PR aims to fix incorrect QuickSight column-level lineage for datasets backed by CustomSql by using SQL parser-derived column mappings (including aliases) instead of relying solely on name-based matching.

Changes:

Route CustomSql-based lineage generation through SQL parser column mappings to resolve aliased target columns to their true upstream source columns.
Add unit tests for alias-based lineage mapping and fallback behavior when the parser produces no column lineage.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py`	Introduces `_build_column_lineage_from_parser` and uses it for CustomSql query lineage to map source→alias columns via parsed SQL.
`ingestion/tests/unit/topology/dashboard/test_quicksight.py`	Adds unit tests to validate alias resolution and fallback-to-name-matching behavior.

Copilot · 2026-04-04T11:21:31Z

+        for col_pair in lineage_parser.column_lineage or []:
+            try:
+                src_col = col_pair[0]
+                tgt_col = col_pair[-1]
+                from_col = get_column_fqn(
+                    table_entity=from_entity, column=src_col.raw_name
+                )
+                to_col = self._get_data_model_column_fqn(
+                    data_model_entity=data_model_entity,
+                    column=tgt_col.raw_name,
+                )


_build_column_lineage_from_parser matches parsed columns to from_entity using only src_col.raw_name. In multi-table CustomSql queries, different source tables often share column names (e.g., id), so this can incorrectly attach a lineage pair to the wrong from_entity (because get_column_fqn will happily resolve id on any table that has it). Consider filtering lineage_parser.column_lineage by the source column’s parent table (e.g., src_col._parent) or by passing the current source table context into this method and using get_column_fqn(..., table=..., schema=..., database=...) so only pairs belonging to the current upstream table are emitted.

@SaaiAravindhRaja this is a valid comment

Copilot · 2026-04-04T11:21:31Z

+    @pytest.mark.order(9)
+    def test_build_column_lineage_from_parser_uses_aliases(self):
+        """
+        When a CustomSql query uses column aliases (SELECT src_col AS alias_col),
+        _build_column_lineage_from_parser must resolve the source column name and
+        map it to the aliased data model column, not drop the lineage pair.
+        """
+        from unittest.mock import MagicMock
+
+        src_col = MagicMock()
+        src_col.raw_name = "original_col"
+
+        tgt_col = MagicMock()
+        tgt_col.raw_name = "alias_col"
+
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = [(src_col, tgt_col)]
+
+        src_fqn = "db.schema.my_table.original_col"
+        alias_fqn = "qs_service.dataset.alias_col"
+
+        mock_from_entity = MagicMock()
+        mock_data_model = MagicMock()
+
+        with patch(
+            "metadata.ingestion.source.dashboard.quicksight.metadata.get_column_fqn",
+            return_value=src_fqn,
+        ) as mock_get_col_fqn:
+            with patch.object(
+                self.quicksight,
+                "_get_data_model_column_fqn",
+                return_value=alias_fqn,
+            ) as mock_get_dm_col_fqn:
+                result = self.quicksight._build_column_lineage_from_parser(
+                    mock_parser, mock_from_entity, mock_data_model
+                )
+
+        mock_get_col_fqn.assert_called_once_with(
+            table_entity=mock_from_entity, column="original_col"
+        )
+        mock_get_dm_col_fqn.assert_called_once_with(
+            data_model_entity=mock_data_model, column="alias_col"
+        )
+        assert len(result) == 1
+        assert result[0].fromColumns == [src_fqn]
+        assert result[0].toColumn == alias_fqn
+
+    @pytest.mark.order(10)
+    def test_build_column_lineage_from_parser_falls_back_when_no_parser_results(self):
+        """
+        When lineage_parser.column_lineage is empty, _build_column_lineage_from_parser
+        must fall back to name-based matching via _get_column_lineage.
+        """
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = []
+
+        fallback_lineage = [MagicMock()]
+        mock_from_entity = MagicMock()
+        mock_data_model = MagicMock()
+        mock_data_model.columns = [MagicMock(name=MagicMock(root="col_a"))]
+
+        with patch.object(
+            self.quicksight,
+            "_get_column_lineage",
+            return_value=fallback_lineage,
+        ) as mock_get_col_lineage:
+            result = self.quicksight._build_column_lineage_from_parser(
+                mock_parser, mock_from_entity, mock_data_model
+            )
+
+        mock_get_col_lineage.assert_called_once()
+        assert result is fallback_lineage


The added unit tests validate alias mapping and the empty-parser fallback, but they don’t cover the most failure-prone scenario for this change: a CustomSql query that references multiple upstream tables where the same source column name exists in more than one table (e.g., t1.id and t2.id). Adding a test that asserts _build_column_lineage_from_parser only produces column lineage for the correct upstream table would prevent regressions of incorrect column-level lineage resolution.

@SaaiAravindhRaja valid comment as well

…ata#27031)

github-actions · 2026-04-04T11:48:04Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Copilot · 2026-04-06T11:43:43Z

+        src_col = MagicMock()
+        src_col.raw_name = "original_col"
+
+        tgt_col = MagicMock()
+        tgt_col.raw_name = "alias_col"
+
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = [(src_col, tgt_col)]
+
+        src_fqn = "db.schema.my_table.original_col"


This test sets src_col.raw_name to a bare column name (original_col), but in real LineageParser.column_lineage output raw_name is frequently fully-qualified (e.g. testdb.public.users.id). As written, the test can pass even if _build_column_lineage_from_parser fails to handle fully-qualified raw_name values (and would then fall back to name-based matching).

To prevent regressions, consider updating the fixture to use a fully-qualified raw_name (and/or a quoted identifier) and assert that the implementation still resolves the correct source column FQN.

@SaaiAravindhRaja valid comment as well

github-actions · 2026-04-06T11:45:40Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

github-actions · 2026-04-06T13:20:11Z

🔴 Playwright Results — 1 failure(s), 22 flaky

✅ 3601 passed · ❌ 1 failed · 🟡 22 flaky · ⏭️ 207 skipped

Shard	Passed	Failed	Flaky	Skipped
🟡 Shard 1	451	0	6	2
🟡 Shard 2	640	0	3	32
🟡 Shard 3	647	0	2	26
🟡 Shard 4	624	0	3	47
🔴 Shard 5	606	1	2	67
🟡 Shard 6	633	0	6	33

Genuine Failures (failed on all attempts)

❌ Pages/Glossary.spec.ts › Add and Remove Assets (shard 5)

�[31mTest timeout of 180000ms exceeded.�[39m

🟡 22 flaky test(s) (passed on retry)

Features/CustomizeDetailPage.spec.ts › Topic - customization should work (shard 1, 1 retry)
Features/CustomizeDetailPage.spec.ts › Search Index - customization should work (shard 1, 1 retry)
Flow/Tour.spec.ts › Tour should work from help section (shard 1, 1 retry)
Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/ChangeSummaryBadge.spec.ts › Automated badge should appear on entity description with Automated source (shard 2, 1 retry)
Features/Glossary/GlossaryStatusFilterNestedTerms.spec.ts › only leaf nodes match filter - parent chain does not (shard 2, 1 retry)
Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/DataContractInheritance.spec.ts › Edit Asset Contract - Add SLA when inheriting SLA from Data Product (PATCH should use /add not /replace) (shard 4, 1 retry)
Pages/Entity.spec.ts › Delete Spreadsheet (shard 4, 1 retry)
Pages/EntityDataSteward.spec.ts › Glossary Term Add, Update and Remove (shard 5, 1 retry)
Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

gitar-bot · 2026-04-12T12:11:49Z

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Fixes QuickSight column alias resolution in lineage tracking by correctly mapping upstream columns. The MagicMock name kwarg issue has been resolved, though consider handling single-element tuples in _build_column_lineage_from_parser to avoid src_col and tgt_col aliasing to the same object.

💡 Edge Case: col_pair[0] and col_pair[-1] alias to same object for 1-element tuples

📄 ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:600-601

In _build_column_lineage_from_parser, src_col = col_pair[0] and tgt_col = col_pair[-1] will reference the same Column when the parser returns a single-element tuple. This would create a self-referencing lineage entry (source == target). The upstream LineageParser.column_lineage already uses the same [0]/[-1] pattern and documents that intermediate columns may exist, but a single-element edge case is not guarded against.

Suggested fix

Add a guard at the top of the loop body:
    if len(col_pair) < 2:
        continue

✅ 1 resolved

✅ Bug: MagicMock name kwarg sets mock name, not .name attribute

📄 ingestion/tests/unit/topology/dashboard/test_quicksight.py:451
In test_build_column_lineage_from_parser_falls_back_when_no_parser_results, MagicMock(name=MagicMock(root="col_a")) does not set the .name attribute on the mock — the name keyword argument to MagicMock is special and sets the mock's internal _mock_name instead.

This means mock_data_model.columns[0].name.root will return a new auto-generated MagicMock, not "col_a". The test still passes because _get_column_lineage is fully patched and doesn't inspect its columns argument, but the columns list built inside the fallback path will contain garbage values rather than ["col_a"].

If you later add assert_called_once_with(...) to validate the exact arguments, the test will fail unexpectedly.

🤖 Prompt for agents

Code Review: Fixes QuickSight column alias resolution in lineage tracking by correctly mapping upstream columns. The MagicMock `name` kwarg issue has been resolved, though consider handling single-element tuples in `_build_column_lineage_from_parser` to avoid `src_col` and `tgt_col` aliasing to the same object.

1. 💡 Edge Case: col_pair[0] and col_pair[-1] alias to same object for 1-element tuples
   Files: ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:600-601

   In `_build_column_lineage_from_parser`, `src_col = col_pair[0]` and `tgt_col = col_pair[-1]` will reference the same Column when the parser returns a single-element tuple. This would create a self-referencing lineage entry (source == target). The upstream `LineageParser.column_lineage` already uses the same `[0]`/`[-1]` pattern and documents that intermediate columns may exist, but a single-element edge case is not guarded against.

   Suggested fix:
   Add a guard at the top of the loop body:
       if len(col_pair) < 2:
           continue

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

github-actions · 2026-04-12T12:16:19Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

sonarqubecloud · 2026-04-12T13:10:35Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
84.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

TeddyCr

can you check copilot comments, all valid

TeddyCr · 2026-04-14T17:36:41Z

+        for col_pair in lineage_parser.column_lineage or []:
+            try:
+                src_col = col_pair[0]
+                tgt_col = col_pair[-1]
+                from_col = get_column_fqn(
+                    table_entity=from_entity, column=src_col.raw_name
+                )
+                to_col = self._get_data_model_column_fqn(
+                    data_model_entity=data_model_entity,
+                    column=tgt_col.raw_name,
+                )


@SaaiAravindhRaja this is a valid comment

TeddyCr · 2026-04-14T17:37:16Z

+        src_col = MagicMock()
+        src_col.raw_name = "original_col"
+
+        tgt_col = MagicMock()
+        tgt_col.raw_name = "alias_col"
+
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = [(src_col, tgt_col)]
+
+        src_fqn = "db.schema.my_table.original_col"


@SaaiAravindhRaja valid comment as well

TeddyCr · 2026-04-14T17:37:38Z

+    @pytest.mark.order(9)
+    def test_build_column_lineage_from_parser_uses_aliases(self):
+        """
+        When a CustomSql query uses column aliases (SELECT src_col AS alias_col),
+        _build_column_lineage_from_parser must resolve the source column name and
+        map it to the aliased data model column, not drop the lineage pair.
+        """
+        from unittest.mock import MagicMock
+
+        src_col = MagicMock()
+        src_col.raw_name = "original_col"
+
+        tgt_col = MagicMock()
+        tgt_col.raw_name = "alias_col"
+
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = [(src_col, tgt_col)]
+
+        src_fqn = "db.schema.my_table.original_col"
+        alias_fqn = "qs_service.dataset.alias_col"
+
+        mock_from_entity = MagicMock()
+        mock_data_model = MagicMock()
+
+        with patch(
+            "metadata.ingestion.source.dashboard.quicksight.metadata.get_column_fqn",
+            return_value=src_fqn,
+        ) as mock_get_col_fqn:
+            with patch.object(
+                self.quicksight,
+                "_get_data_model_column_fqn",
+                return_value=alias_fqn,
+            ) as mock_get_dm_col_fqn:
+                result = self.quicksight._build_column_lineage_from_parser(
+                    mock_parser, mock_from_entity, mock_data_model
+                )
+
+        mock_get_col_fqn.assert_called_once_with(
+            table_entity=mock_from_entity, column="original_col"
+        )
+        mock_get_dm_col_fqn.assert_called_once_with(
+            data_model_entity=mock_data_model, column="alias_col"
+        )
+        assert len(result) == 1
+        assert result[0].fromColumns == [src_fqn]
+        assert result[0].toColumn == alias_fqn
+
+    @pytest.mark.order(10)
+    def test_build_column_lineage_from_parser_falls_back_when_no_parser_results(self):
+        """
+        When lineage_parser.column_lineage is empty, _build_column_lineage_from_parser
+        must fall back to name-based matching via _get_column_lineage.
+        """
+        mock_parser = MagicMock()
+        mock_parser.column_lineage = []
+
+        fallback_lineage = [MagicMock()]
+        mock_from_entity = MagicMock()
+        mock_data_model = MagicMock()
+        mock_data_model.columns = [MagicMock(name=MagicMock(root="col_a"))]
+
+        with patch.object(
+            self.quicksight,
+            "_get_column_lineage",
+            return_value=fallback_lineage,
+        ) as mock_get_col_lineage:
+            result = self.quicksight._build_column_lineage_from_parser(
+                mock_parser, mock_from_entity, mock_data_model
+            )
+
+        mock_get_col_lineage.assert_called_once()
+        assert result is fallback_lineage


@SaaiAravindhRaja valid comment as well

PubChimps · 2026-04-16T01:41:24Z

hi @SaaiAravindhRaja, thank you for this pr, but this issue was assigned to another pr in progress here, we will reopen this pr if needed

WIP (open-metadata#26670)

2d23162

SaaiAravindhRaja changed the title ~~fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670)~~ fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) [WIP] Apr 4, 2026

fix(quicksight): resolve column aliases via SQL expression parsing (o…

8cac467

…pen-metadata#26670)

gitar-bot Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread ingestion/tests/unit/topology/dashboard/test_quicksight.py Outdated

SaaiAravindhRaja marked this pull request as ready for review April 4, 2026 11:16

SaaiAravindhRaja requested a review from a team as a code owner April 4, 2026 11:16

Copilot AI review requested due to automatic review settings April 4, 2026 11:16

Copilot started reviewing on behalf of SaaiAravindhRaja April 4, 2026 11:17 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

SaaiAravindhRaja changed the title ~~fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) [WIP]~~ fix(quicksight): resolve column aliases to correct upstream column in lineage (#26670) Apr 4, 2026

fix(quicksight): correct MagicMock name attribute in test (open-metad…

db4c661

…ata#27031)

ulixius9 added the safe to test Add this label to run secure Github workflows on PRs label Apr 6, 2026

Merge branch 'main' into fix/issue-26670-quicksight-column-aliases

abbfab2

Copilot AI review requested due to automatic review settings April 6, 2026 11:39

Copilot started reviewing on behalf of ulixius9 April 6, 2026 11:40 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

ulixius9 temporarily deployed to test April 6, 2026 11:49 — with GitHub Actions Inactive

Merge branch 'main' into fix/issue-26670-quicksight-column-aliases

f0e443a

SaaiAravindhRaja temporarily deployed to test April 12, 2026 12:22 — with GitHub Actions Inactive

SaaiAravindhRaja had a problem deploying to test April 12, 2026 12:22 — with GitHub Actions Failure

blaackstring mentioned this pull request Apr 12, 2026

QuickSight column aliases are resolved to the wrong upstream column in lineage when the alias name exists elsewhere #26670

Closed

TeddyCr requested changes Apr 14, 2026

View reviewed changes

PubChimps closed this Apr 16, 2026

Conversation

SaaiAravindhRaja commented Apr 4, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 1 failure(s), 22 flaky

Genuine Failures (failed on all attempts)

Uh oh!

gitar-bot Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 12, 2026

Uh oh!

sonarqubecloud Bot commented Apr 12, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

TeddyCr left a comment

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

TeddyCr Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

PubChimps commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented Apr 6, 2026 •

edited

Loading

gitar-bot Bot commented Apr 12, 2026 •

edited

Loading