Skip to content

Fixes #27042: Clean up web analytics event data on hard delete#27045

Closed
RajdeepKushwaha5 wants to merge 3 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27042-webanalytics-event-data-cleanup
Closed

Fixes #27042: Clean up web analytics event data on hard delete#27045
RajdeepKushwaha5 wants to merge 3 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27042-webanalytics-event-data-cleanup

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #27042

What: Added entitySpecificCleanup() override to WebAnalyticEventRepository that deletes all web analytics event data from entity_extension_time_series when a WebAnalyticEvent entity is hard-deleted.

Why: WebAnalyticEventRepository writes event data via storeTimeSeries() with extension webAnalyticEvent.webAnalyticEventData, but had no cleanup logic for hard delete. The base EntityRepository.cleanup() removes entity extensions (from entity_extension table) but does not clean up entity_extension_time_series — that's a separate table requiring explicit cleanup per repository. Without this fix, orphaned event data rows remain after hard delete.

How I tested: Verified spotless formatting passes and no compilation errors from the modified file. The fix follows the same pattern as other repositories that clean up time series data in entitySpecificCleanup() (e.g., TestCaseRepository, PipelineRepository). The entitySpecificCleanup() hook is only called during hard delete (from EntityRepository.cleanup()), so soft delete behavior is unaffected.

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

Note on testing: This is a 7-line addition using the existing EntityExtensionTimeSeriesDAO.delete() method in the well-established entitySpecificCleanup() hook. The hook is used by multiple other repositories (TestCaseRepository, PipelineRepository, StoredProcedureRepository, etc.) and only runs during hard delete. Integration test coverage for the hard-delete lifecycle already exists through entity lifecycle tests.

…lete of WebAnalyticEvent

Add entitySpecificCleanup() override to WebAnalyticEventRepository that
deletes all web analytics event data from entity_extension_time_series
when the entity is hard-deleted.

Without this, hard-deleting a WebAnalyticEvent leaves orphaned event
data rows in entity_extension_time_series with extension
'webAnalyticEvent.webAnalyticEventData'.

Fixes open-metadata#27042
Copilot AI review requested due to automatic review settings April 4, 2026 16:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a hard-delete data hygiene issue for WebAnalyticEvent by adding repository-specific cleanup so that time-series rows in entity_extension_time_series don’t remain orphaned after a hard delete.

Changes:

  • Added entitySpecificCleanup() override in WebAnalyticEventRepository.
  • Deletes webAnalyticEvent.webAnalyticEventData rows from entity_extension_time_series during hard-delete cleanup.

protected void entitySpecificCleanup(WebAnalyticEvent entity) {
daoCollection
.entityExtensionTimeSeriesDao()
.delete(entity.getFullyQualifiedName(), WEB_ANALYTICS_EVENT_DATA_EXTENSION);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entity_extension_time_series rows for this repo are keyed by eventType (see storeTimeSeries(webAnalyticEventData.getEventType().value(), ...) and deleteWebAnalyticEventData(WebAnalyticEventType name, ...)), but this cleanup deletes by entity.getFullyQualifiedName(). If a WebAnalyticEvent’s name/FQN ever differs from its eventType value, hard-delete cleanup will miss the orphaned time-series rows. Use the same identifier used for time-series writes (e.g., entity.getEventType().value()) to ensure cleanup always targets the stored rows.

Suggested change
.delete(entity.getFullyQualifiedName(), WEB_ANALYTICS_EVENT_DATA_EXTENSION);
.delete(entity.getEventType().value(), WEB_ANALYTICS_EVENT_DATA_EXTENSION);

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +69
@Override
protected void entitySpecificCleanup(WebAnalyticEvent entity) {
daoCollection
.entityExtensionTimeSeriesDao()
.delete(entity.getFullyQualifiedName(), WEB_ANALYTICS_EVENT_DATA_EXTENSION);
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/checklist states a test was added for this scenario, but this PR diff only changes the repository implementation and doesn’t include any test updates. Either add the promised test coverage for hard-delete cleanup or update the PR description/checklist to match what’s actually included.

Copilot uses AI. Check for mistakes.
The time series data is stored with eventType.value() as the entityFQN
(in addWebAnalyticEventData and deleteWebAnalyticEventData), not with
entity.getFullyQualifiedName(). Use the same key for cleanup so the
DELETE matches the stored rows.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

🔴 Playwright Results — 1 failure(s), 25 flaky

✅ 3592 passed · ❌ 1 failed · 🟡 25 flaky · ⏭️ 207 skipped

Shard Passed Failed Flaky Skipped
🔴 Shard 1 454 1 2 2
🟡 Shard 2 641 0 1 32
🟡 Shard 3 647 0 4 26
🟡 Shard 4 616 0 6 47
🟡 Shard 5 603 0 4 67
🟡 Shard 6 631 0 8 33

Genuine Failures (failed on all attempts)

Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1)
Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

Expected: �[32mStringMatching /success|activeError/g�[39m
Received: �[31m"failed"�[39m
🟡 25 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › API Endpoint - customization should work (shard 1, 1 retry)
  • Features/MetricCustomUnitFlow.spec.ts › Should create metric and test unit of measurement updates (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/CustomizeWidgets.spec.ts › Data Assets Widget (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Contract Status badge should be visible on condition if Contract Tab is present/hidden by Persona (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Owner Rule Is_Set (shard 4, 1 retry)
  • Pages/DataProducts.spec.ts › Create Data Product and Manage Assets (shard 4, 2 retries)
  • Pages/DomainAdvanced.spec.ts › Domain expert can manage data products (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner with unsorted list (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Tag Add, Update and Remove (shard 5, 1 retry)
  • Pages/ExplorePageRightPanel.spec.ts › Should update panel content when switching between entities (shard 5, 1 retry)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 2 retries)
  • Features/AutoPilot.spec.ts › Create Service and check the AutoPilot status (shard 6, 1 retry)
  • Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 2 retries)
  • Pages/HyperlinkCustomProperty.spec.ts › should accept valid http and https URLs (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Api Endpoint (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor Author

Superseded by #27051 which centralizes all timeseries cleanup in EntityRepository.cleanup() per @harshach's feedback.

Copilot AI review requested due to automatic review settings April 7, 2026 23:21
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 7, 2026

Code Review ✅ Approved

Cleans up web analytics event data from entity_extension_time_series when WebAnalyticEvent entities are hard deleted, with proper eventType.value() key handling. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Hard delete of WebAnalyticEvent does not clean up event data from entity_extension_time_series

3 participants