Fix: Empty entity name/special name fixes in powerbi#28092
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the PowerBI ingestion connector against production scan payloads that contain missing/empty entity names and PowerBI/DAX identifiers that violate OpenMetadata’s columnName schema constraint (no ::), preventing batch failures during ingestion.
Changes:
- Make multiple PowerBI scan-response model
namefields (andPowerBIDashboard.displayName) optional to tolerate nullable values. - Sanitize PowerBI column/measure/table/entity names by replacing
::before truncation, and skip entities with empty names to avoid API rejections. - Improve admin scan-result parsing by parsing workspaces individually (skip malformed workspaces) and add unit tests covering the resilience scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py |
Adds unit tests for nullable names, scan-result resilience, and :: sanitization/truncation behavior. |
ingestion/src/metadata/ingestion/source/dashboard/powerbi/models.py |
Loosens several name fields (and dashboard displayName) to Optional to handle nulls from PowerBI APIs. |
ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py |
Adds name sanitization helper, skips nameless entities, and applies sanitization before truncation when building OM columns/measures/tables. |
ingestion/src/metadata/ingestion/source/dashboard/powerbi/client.py |
Parses scan-result workspaces individually so one malformed workspace doesn’t drop the entire batch. |
Comments suppressed due to low confidence (1)
ingestion/tests/unit/topology/dashboard/test_powerbi_resilience.py:117
monkeypatchis declared as a test parameter but never used. This will be reported as an unused argument by Ruff's pylint rules and can fail CI. Remove the unused fixture parameter (or use it).
def test_fetch_workspace_scan_result_handles_empty_response(monkeypatch):
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (15 flaky)✅ 4055 passed · ❌ 0 failed · 🟡 15 flaky · ⏭️ 103 skipped
🟡 15 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
| try: | ||
| pages: Optional[List[ReportPage]] = self.client.api_client.fetch_report_pages(workspace_id, dashboard_id) # noqa: UP006, UP045 | ||
| if len(pages) >= 1: | ||
| if pages and pages[0].name: # if there are pages and page has name then only add page id in url: # if there are pages and page has name then only add page id in url |
There was a problem hiding this comment.
💡 Quality: Duplicated comment text on line 384
The comment on line 384 is duplicated: # if there are pages and page has name then only add page id in url: # if there are pages and page has name then only add page id in url. This appears to be a copy-paste artifact. Per project guidelines, this comment is also unnecessary since the code (if pages and pages[0].name) is self-documenting.
Remove the redundant/duplicated comment entirely — the code is self-explanatory.:
if pages and pages[0].name:
Was this helpful? React with 👍 / 👎
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Code Review 👍 Approved with suggestions 2 resolved / 3 findingsImproves PowerBI ingestion resilience by filtering empty entity names and sanitizing special characters in columns, while resolving potential null pointer exceptions in dashboard filtering. Please remove the duplicated comment on line 384. 💡 Quality: Duplicated comment text on line 384📄 ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py:384 The comment on line 384 is duplicated: Remove the redundant/duplicated comment entirely — the code is self-explanatory.✅ 2 resolved✅ Bug: Missing None guard in get_dashboard allows None names into filtering
✅ Bug: len(pages) crashes with TypeError when pages is None
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Describe your changes:
Type of change:
High-level design:
N/A — small change.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Summary by Gitar
self.state.add_dashboard_chartuntil after a successfulCreateChartRequestto ensure state consistency.pyrightignore comment totable_entity.name.rootto silence type-checking errors during lineage logging.custom_basemodel_validationto include,, andin reserved character escaping to prevent ingest failures.PowerBIclient to skip workspaces with missing IDs and added robustness inmetadata.pyto prevent data model processing from failing the entire ingestion.workspace_dataand helper search methods in favor of the centralizedWorkspaceStaterepository._emit_om_target_lineage.This will update automatically on new commits.