Skip to content

Fix/issue 26670 quicksight column aliases#27047

Closed
Esvanth wants to merge 17 commits intoopen-metadata:mainfrom
Esvanth:fix/issue-26670-quicksight-column-aliases
Closed

Fix/issue 26670 quicksight column aliases#27047
Esvanth wants to merge 17 commits intoopen-metadata:mainfrom
Esvanth:fix/issue-26670-quicksight-column-aliases

Conversation

@Esvanth
Copy link
Copy Markdown

@Esvanth Esvanth commented Apr 4, 2026

Problem

CustomSql queries in QuickSight using column aliases (e.g., SELECT src AS alias) caused lineage links to be lost because the connector couldn't map the aliased data model column back to the source.

Solution

  • Refactored _yield_lineage_from_query to prioritize results from lineage_parser.
  • Implemented _build_column_lineage_from_parser with robust multi-table matching (filtering by source column parent).
  • Fixed critical Windows-specific environment issues in setup.py (uvloop) and scripts/datamodel_generation.py (Unicode/Case-sensitivity).

Verification

  • Verified with 10 passing unit tests in ingestion/tests/unit/topology/dashboard/test_quicksight.py.

Esvanth and others added 16 commits April 3, 2026 01:48
- Restore numpy vectorization for string types in max_length/min_length
  update_accumulator (as requested by TeddyCr — performance optimization
  must not be removed)
- Add separate elif branch for complex types using pandas str.len()
- Extend fn() SQL path to compute min/max length for complex types
  via LenFn (which already casts to text for Postgres and similar)
- Replace return None in distinct_count.fn() with actual DISTINCT COUNT
  using CAST to Text, so complex types now produce real metric values
- Add is_length_computable() helper in registry.py to consolidate the
  repeated is_concatenable/is_collection/is_struct/is_complex pattern
Moves `from numpy import vectorize` from inside `update_accumulator`
to module-level imports in max_length.py and min_length.py, fixing
the isort checkstyle validation failure.
The import line exceeded isort's 88-char limit (profile=black).
Wraps is_complex_type, is_concatenable, is_length_computable in
parentheses across multiple lines in max_length.py and min_length.py.
@Esvanth Esvanth requested a review from a team as a code owner April 4, 2026 20:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py Outdated
Comment thread scripts/datamodel_generation.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 4, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Fixes QuickSight column alias resolution in lineage matching by addressing substring check false-positives and encoding issues with invalid bytes. All findings resolved.

✅ 2 resolved
Bug: Substring check for parent table causes false-positive lineage matches

📄 ingestion/src/metadata/ingestion/source/dashboard/quicksight/metadata.py:607
At line 607, str(src_col.parent).lower() not in from_entity.name.root.lower() uses Python's in operator, which does substring containment—not equality. This means a source column from table orders would incorrectly match a from_entity named orders_backup (since "orders" is a substring of "orders_backup"), producing bogus lineage links.

Additionally, str(src_col.parent) may produce a qualified name like <schema>.orders, so even the intended match against from_entity.name.root (which is just the table name) could fail for legitimate cases.

Use an equality check on the table name portion instead of substring containment.

Quality: errors='ignore' silently drops invalid bytes in generated files

📄 scripts/datamodel_generation.py:49 📄 scripts/datamodel_generation.py:62 📄 scripts/datamodel_generation.py:81 📄 scripts/datamodel_generation.py:94 📄 scripts/datamodel_generation.py:111
In datamodel_generation.py, four open() calls and one glob-based sanitization pass use errors='ignore', which silently drops any bytes that can't be decoded as UTF-8. This can corrupt file content (e.g., removing characters from string literals or comments) without any warning. Using errors='replace' or errors='backslashreplace' would be safer as it preserves visibility of encoding issues while still preventing crashes.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@Esvanth
Copy link
Copy Markdown
Author

Esvanth commented Apr 4, 2026

Hi, could a maintainer please add the appropriate labels (e.g., type:bug-fix, component:ingestion) to trigger the CI checks? Thank you!

@ulixius9
Copy link
Copy Markdown
Member

ulixius9 commented Apr 6, 2026

@Esvanth looks like you mixed two issues in one pr can you please check?

I'm referring to profiler changes here

@Esvanth
Copy link
Copy Markdown
Author

Esvanth commented Apr 6, 2026

Please add a safe to check parameter so it will be easy

@PubChimps
Copy link
Copy Markdown
Contributor

hi @Esvanth, thank you for this pr, but this issue was assigned to a pr in progress here. we will reopen this pr if needed

@PubChimps PubChimps closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants