refactor(powerbi): scope per-workspace caches to workspace lifetime#28098
Conversation
Move per-workspace state (filtered dashboards/datamodels, dashboard charts, dataflow exports) into `WorkspaceState` with an explicit `enter`/`exit` lifecycle so caches release between workspace iterations. The cross-workspace report registry persists so tile-pinned lineage can still resolve reports from other workspaces. Also in the same refactor: - Extract `_emit_om_target_lineage` shared helper used by the four `create_*_lineage` methods. - Add per-workspace exception isolation: a failing workspace is logged and recorded in status, the next workspace continues. - Tighten `Optional[str]` handling around `fqn.build()` so a missing FQN is logged and skipped before `get_by_name`. - Fix wrong return-type annotation on `create_report_dashboard_lineage` (`Either[CreateDashboardRequest]` -> `Either[AddLineageRequest]`).
There was a problem hiding this comment.
Pull request overview
This PR refactors the PowerBI dashboard ingestion to scope per-workspace caches to an explicit workspace lifecycle, reducing cross-workspace state bleed and memory retention while keeping a cross-workspace report registry for tile lineage resolution. It also consolidates repeated lineage-emission logic into a shared helper and adjusts unit tests accordingly.
Changes:
- Introduce
WorkspaceStatewithenter/exitlifecycle to manage per-workspace caches (filtered dashboards/datamodels, dashboard→chart mapping, dataflow exports) and a cross-workspace report registry. - Add
_emit_om_target_lineagehelper to unify target-entity resolution + lineage emission across multiplecreate_*_lineagemethods. - Isolate per-workspace failures so one broken workspace is recorded/logged and subsequent workspaces continue processing.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ingestion/src/metadata/ingestion/source/dashboard/powerbi/metadata.py | Wire in WorkspaceState, add workspace-level try/finally lifecycle, and refactor lineage creation via _emit_om_target_lineage. |
| ingestion/src/metadata/ingestion/source/dashboard/powerbi/workspace_state.py | New workspace-scoped state container with explicit lifecycle and operation-shaped APIs for caches. |
| ingestion/tests/unit/topology/dashboard/test_powerbi.py | Update tests to use WorkspaceState APIs and adjust mocks for dataset lookup / dashboard chart mapping. |
| ingestion/tests/unit/topology/dashboard/test_powerbi_workspace_state.py | New unit tests locking the WorkspaceState lifecycle contract and cache behavior. |
| tile_report_ids = [ | ||
| report.id | ||
| for chart in dashboard_details.tiles or [] | ||
| for report in [self.state.find_report(chart.reportId)] | ||
| if report is not None | ||
| ] |
…d_lineage The refactor that consolidated the four `create_*_lineage` methods into `_emit_om_target_lineage` dropped the outer try/except wrapper on `create_report_dashboard_lineage`. The helper covers per-target failures, but this method has pre-loop work (resolving the dashboard's FQN + entity) that needs its own guard. Restored, yielding `Either(left=StackTraceError(...))` on resolution failures.
Code Review ✅ ApprovedRefactors PowerBI workspace state management to improve memory handling and adds per-workspace exception isolation. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (14 flaky)✅ 4062 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 97 skipped
🟡 14 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
The base class method `register_record_chart` was added on main in PR #27309 ("Mark Charts deleted") but never backported to 1.12.8. The cherry-pick of #28098 incidentally pulled in the call site via `--theirs` conflict resolution, breaking chart ingestion at runtime with `AttributeError: 'PowerbiSource' object has no attribute 'register_record_chart'`. Removing the call restores the cherry-pick to its semantically correct shape for the 1.12.8 base class. No corresponding change is needed on main, where the method exists.
The base class method `register_record_chart` was added on main in PR #27309 ("Mark Charts deleted") but never backported to 1.12.8. The cherry-pick of #28098 incidentally pulled in the call site via `--theirs` conflict resolution, breaking chart ingestion at runtime with `AttributeError: 'PowerbiSource' object has no attribute 'register_record_chart'`. Removing the call restores the cherry-pick to its semantically correct shape for the 1.12.8 base class. No corresponding change is needed on main, where the method exists.



Summary
WorkspaceStatewith explicitenter/exitlifecycle so caches are released between workspace iterations. The cross-workspace report registry persists so tile-pinned report-to-dashboard lineage still resolves across workspaces._emit_om_target_lineageshared helper used by the fourcreate_*_lineagemethods.status.failed; the next workspace continues. Previously a single workspace error aborted the rest of the dashboard stage.Optional[str]handling aroundfqn.build()(log + skip a missing FQN beforeget_by_name) and fix a wrong return-type annotation oncreate_report_dashboard_lineage(Either[CreateDashboardRequest]->Either[AddLineageRequest]).Test plan
cd ingestion && python -m pytest tests/unit/topology/dashboard/test_powerbi.py tests/unit/topology/dashboard/test_powerbi_workspace_state.py(59 passing locally)cd ingestion && make static-checks(powerbi-scoped: 0 errors / 0 warnings; full-repo errors unchanged from main on Linux runner)cd ingestion && ruff check src/metadata/ingestion/source/dashboard/powerbi/ && ruff format --check src/metadata/ingestion/source/dashboard/powerbi/