Skip to content

Fix: Empty entity name/special name fixes in powerbi#28092

Merged
harshsoni2024 merged 20 commits into
mainfrom
powerbi_entity_name_fixes
May 14, 2026
Merged

Fix: Empty entity name/special name fixes in powerbi#28092
harshsoni2024 merged 20 commits into
mainfrom
powerbi_entity_name_fixes

Conversation

@harshsoni2024
Copy link
Copy Markdown
Contributor

@harshsoni2024 harshsoni2024 commented May 13, 2026

Describe your changes:

  • Entities with empty name ("") coming as API response should be skipped to ingest with DEBUG log to not fail at further steps.
  • Entities with column names with special characters (\n, \t, \r) fails with 400 error on bulk create API, should be addressed with reserve_keyword in custom basemodel validation

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

High-level design:

N/A — small change.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • For UI changes: I attached a screen recording and/or screenshots above.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

Summary by Gitar

  • PowerBI ingestion lifecycle:
    • Deferred self.state.add_dashboard_chart until after a successful CreateChartRequest to ensure state consistency.
  • Code quality:
    • Added pyright ignore comment to table_entity.name.root to silence type-checking errors during lineage logging.
  • Entity name validation:
    • Updated custom_basemodel_validation to include , , and in reserved character escaping to prevent ingest failures.
  • PowerBI ingestion resilience:
    • Enhanced PowerBI client to skip workspaces with missing IDs and added robustness in metadata.py to prevent data model processing from failing the entire ingestion.
  • Refactored internals:
    • Removed legacy global state workspace_data and helper search methods in favor of the centralized WorkspaceState repository.
    • Standardized cross-workspace lineage resolution using _emit_om_target_lineage.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings May 13, 2026 12:39
@harshsoni2024 harshsoni2024 requested a review from a team as a code owner May 13, 2026 12:39
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels May 13, 2026
@harshsoni2024 harshsoni2024 changed the title Fix: Empty entity name fixes in powerbi Fix: Empty entity name/special name fixes in powerbi May 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the PowerBI ingestion connector against production scan payloads that contain missing/empty entity names and PowerBI/DAX identifiers that violate OpenMetadata’s columnName schema constraint (no ::), preventing batch failures during ingestion.

Changes:

  • Make multiple PowerBI scan-response model name fields (and PowerBIDashboard.displayName) optional to tolerate nullable values.
  • Sanitize PowerBI column/measure/table/entity names by replacing :: before truncation, and skip entities with empty names to avoid API rejections.
  • Improve admin scan-result parsing by parsing workspaces individually (skip malformed workspaces) and add unit tests covering the resilience scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py Adds unit tests for nullable names, scan-result resilience, and :: sanitization/truncation behavior.
ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py Loosens several name fields (and dashboard displayName) to Optional to handle nulls from PowerBI APIs.
ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Adds name sanitization helper, skips nameless entities, and applies sanitization before truncation when building OM columns/measures/tables.
ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Parses scan-result workspaces individually so one malformed workspace doesn’t drop the entire batch.
Comments suppressed due to low confidence (1)

ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py:117

  • monkeypatch is declared as a test parameter but never used. This will be reported as an unused argument by Ruff's pylint rules and can fail CI. Remove the unused fixture parameter (or use it).
def test_fetch_workspace_scan_result_handles_empty_response(monkeypatch):

Comment thread ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

🟡 Playwright Results — all passed (15 flaky)

✅ 4055 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 103 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 745 0 5 25
🟡 Shard 3 782 0 2 7
🟡 Shard 4 789 0 1 18
🟡 Shard 5 707 0 2 41
🟡 Shard 6 734 0 4 8
🟡 15 flaky test(s) (passed on retry)
  • Flow/Tour.spec.ts › Tour should work from help section (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/KnowledgeCenterList.spec.ts › Knowledge Center List - Test infinite scroll/pagination (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 2, 1 retry)
  • Features/Permissions/DataProductPermissions.spec.ts › Data Product allow operations (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should not show online status for inactive users (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Should clear search and show all properties for apiCollection in right panel (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/GlossaryImportExport.spec.ts › Glossary CSV import preserves typed relations (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can get all the roles hierarchy and edit roles (shard 6, 1 retry)
  • Pages/Users.spec.ts › User Performance across different entities pages (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings May 14, 2026 04:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
Comment thread ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py
Comment thread ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py Outdated
Copilot AI review requested due to automatic review settings May 14, 2026 15:24
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
try:
pages: Optional[List[ReportPage]] = self.client.api_client.fetch_report_pages(workspace_id, dashboard_id) # noqa: UP006, UP045
if len(pages) >= 1:
if pages and pages[0].name: # if there are pages and page has name then only add page id in url: # if there are pages and page has name then only add page id in url
Copy link
Copy Markdown

@gitar-bot gitar-bot Bot May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Duplicated comment text on line 384

The comment on line 384 is duplicated: # if there are pages and page has name then only add page id in url: # if there are pages and page has name then only add page id in url. This appears to be a copy-paste artifact. Per project guidelines, this comment is also unnecessary since the code (if pages and pages[0].name) is self-documenting.

Remove the redundant/duplicated comment entirely — the code is self-explanatory.:

if pages and pages[0].name:

Was this helpful? React with 👍 / 👎

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot AI review requested due to automatic review settings May 14, 2026 15:38
harshsoni2024 and others added 2 commits May 14, 2026 21:10
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py
pmbrull
pmbrull previously approved these changes May 14, 2026
pmbrull
pmbrull previously approved these changes May 14, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 14, 2026

Code Review 👍 Approved with suggestions 2 resolved / 3 findings

Improves PowerBI ingestion resilience by filtering empty entity names and sanitizing special characters in columns, while resolving potential null pointer exceptions in dashboard filtering. Please remove the duplicated comment on line 384.

💡 Quality: Duplicated comment text on line 384

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:384

The comment on line 384 is duplicated: # if there are pages and page has name then only add page id in url: # if there are pages and page has name then only add page id in url. This appears to be a copy-paste artifact. Per project guidelines, this comment is also unnecessary since the code (if pages and pages[0].name) is self-documenting.

Remove the redundant/duplicated comment entirely — the code is self-explanatory.
if pages and pages[0].name:
✅ 2 resolved
Bug: Missing None guard in get_dashboard allows None names into filtering

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:292-306
The get_dashboard method (line 303) calls get_dashboard_name() which can now return None (since PowerBIDashboard.displayName and PowerBIReport.name are both Optional[str]). This None value is passed to filter_by_dashboard and self.status.filter without any guard.

While yield_dashboard (line 410) has a skip-check for empty names, get_dashboard runs first and populates self.filtered_dashboards. If a filter pattern is configured, the regex match inside _filter will attempt to match against None, which may raise a TypeError. Even without a pattern, dashboards with None names silently pass through into filtered_dashboards only to be skipped later in yield_dashboard — this is fragile and inconsistent.

The fix should add a None guard in get_dashboard before the filter call, consistent with the approach used in yield_dashboard.

Bug: len(pages) crashes with TypeError when pages is None

📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:384
On line 384, pages is typed Optional[List[ReportPage]]. If fetch_report_pages returns None, calling len(pages) raises a TypeError before the new pages[0].name guard is ever evaluated. The crash is caught by the broad except Exception on line 389, so it doesn't propagate, but it generates a misleading warning message ("Error building report page url: object of type 'NoneType' has no len()") instead of silently falling through to an empty page_id.

🤖 Prompt for agents
Code Review: Improves PowerBI ingestion resilience by filtering empty entity names and sanitizing special characters in columns, while resolving potential null pointer exceptions in dashboard filtering. Please remove the duplicated comment on line 384.

1. 💡 Quality: Duplicated comment text on line 384
   Files: ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:384

   The comment on line 384 is duplicated: `# if there are pages and page has name then only add page id in url:  # if there are pages and page has name then only add page id in url`. This appears to be a copy-paste artifact. Per project guidelines, this comment is also unnecessary since the code (`if pages and pages[0].name`) is self-documenting.

   Fix (Remove the redundant/duplicated comment entirely — the code is self-explanatory.):
   if pages and pages[0].name:

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants