Skip to content

Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-delete#27051

Open
RajdeepKushwaha5 wants to merge 7 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/centralized-timeseries-cleanup-on-entity-delete
Open

Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-delete#27051
RajdeepKushwaha5 wants to merge 7 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/centralized-timeseries-cleanup-on-entity-delete

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

@RajdeepKushwaha5 RajdeepKushwaha5 commented Apr 5, 2026

Describe your changes:

Fixes #27041, #27042

Per @harshach's feedback, this consolidates the three individual timeseries cleanup PRs (#27043, #27044, #27045) into a single centralized fix in EntityRepository.cleanup().

Root cause: When entities are hard-deleted, their associated timeseries data (profiler metrics, column profiles, test case results, web analytic events, etc.) in entity_extension_time_series and profiler_data_time_series tables is not cleaned up, causing orphaned rows that resurface when an entity with the same FQN is re-created.

Fix:

  • Added deleteAllByEntityFQNPrefix(String entityFQN) to the base EntityTimeSeriesDAO interface — a prefix-based delete that removes all timeseries rows for a given entity and its children (e.g. column profiles stored under <table_fqn>.<column_name>). Internally it hashes the FQN and uses entityFQNHash = :hash OR entityFQNHash LIKE :hash.% so that both the parent and child rows are deleted in one statement.
  • Called from EntityRepository.cleanup() for both entityExtensionTimeSeriesDao and profilerDataTimeSeriesDao tables — this handles cleanup for all entity types (tables, web analytics, test suites, etc.) in one place.
  • Extracted entityInterface.getFullyQualifiedName() to a local variable to avoid redundant calls.

This approach ensures any future entity type with timeseries data is automatically covered without needing per-entity cleanup logic.

Integration tests added (in TableResourceIT):

Test Validates
hardDelete_cleansUpProfilerTimeSeriesData Table profile rows are deleted on hard-delete; re-created table has no stale profile
hardDelete_cleansUpColumnProfileTimeSeriesData Column profile rows (child FQN) are deleted via prefix match; re-created table has no stale column profiles

Type of change:

  • Bug fix
  • Improvement

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-delete
  • For JSON Schema changes: N/A — no schema changes.
  • I have added integration tests that cover the exact scenario we are fixing.

Summary by Gitar

  • Code Refactoring:
    • Changed fieldName resolution in EntityRepository to use TypeRegistry.getPropertyName instead of substring manipulation.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 5, 2026 06:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to centralize cleanup of time-series rows when an entity is hard-deleted, by adding a generic DAO delete method and invoking it from the base EntityRepository.cleanup() path. This is intended to prevent orphaned rows in the time-series tables from resurfacing when entities are re-created with the same FQN.

Changes:

  • Added deleteAllByEntityFQN(...) to EntityTimeSeriesDAO to delete all rows for an entity FQN hash (regardless of extension).
  • Updated EntityRepository.cleanup() to delete time-series data from both entity_extension_time_series and profiler_data_time_series for the entity being hard-deleted.
  • Minor refactor to store getFullyQualifiedName() in a local variable and reuse it for multiple DAO calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesDAO.java Adds a generic “delete all rows by entityFQNHash” DAO operation for time-series tables.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java Invokes centralized time-series cleanup during cleanup() (hard-delete) for both time-series tables.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@RajdeepKushwaha5 RajdeepKushwaha5 changed the title Fixes #27040, #27041, #27042: Centralize timeseries data cleanup on entity delete Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-delete Apr 5, 2026
@harshach harshach added the safe to test Add this label to run secure Github workflows on PRs label Apr 5, 2026
@harshach
Copy link
Copy Markdown
Collaborator

harshach commented Apr 5, 2026

@RajdeepKushwaha5 this looks good, can you add integration tests for this please

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

🔴 Playwright Results — 1 failure(s), 21 flaky

✅ 3665 passed · ❌ 1 failed · 🟡 21 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 480 0 1 4
🟡 Shard 2 651 0 2 7
🔴 Shard 3 652 1 6 1
🟡 Shard 4 630 0 4 27
✅ Shard 5 611 0 0 42
🟡 Shard 6 641 0 8 8

Genuine Failures (failed on all attempts)

Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3)
Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m200�[39m
Received: �[31m400�[39m
🟡 21 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Domain page assets tab should show only domain assets (shard 2, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Pipeline (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Container (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Domain owner should able to edit description of domain (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Login.spec.ts › Refresh should work (shard 6, 2 retries)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Create and Delete user (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Topic (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@harshach
Copy link
Copy Markdown
Collaborator

harshach commented Apr 8, 2026

@RajdeepKushwaha5 can you address comment and resolve the conflict

Copilot AI review requested due to automatic review settings April 9, 2026 04:04
…tralize timeseries cleanup on entity delete

Add deleteAllByEntityFQN to EntityTimeSeriesDAO base interface and call
it from EntityRepository.cleanup() for both entityExtensionTimeSeries
and profilerDataTimeSeries tables.

This ensures all timeseries data (profiler metrics, test results, etc.)
is cleaned up when any entity is hard-deleted, rather than requiring
each entity subclass to handle it individually.

Also extract entityInterface.getFullyQualifiedName() to a local variable
to avoid redundant calls in cleanup().

Closes open-metadata#27040, closes open-metadata#27041, closes open-metadata#27042
…entities

Copilot review identified that exact-match delete on entityFQNHash
misses child entity timeseries (e.g. column profiles stored under
table.column FQN). Switch to deleteAllByEntityFQNPrefix which deletes
both the entity's own rows (exact hash match) and all child rows
(LIKE hash.%), matching the pattern used by fieldRelationshipDAO and
tagUsageDAO.
- Remove dead deleteAllByEntityFQN() method (cleanup uses
  deleteAllByEntityFQNPrefix instead)
- Narrow test section comment to reference only open-metadata#27041 since
  the tests cover profiler data cleanup, not entity_extension
  time series
Copilot AI review requested due to automatic review settings April 17, 2026 01:59
@RajdeepKushwaha5 RajdeepKushwaha5 force-pushed the fix/centralized-timeseries-cleanup-on-entity-delete branch from f92690b to ff86588 Compare April 17, 2026 01:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 19, 2026

Code Review ✅ Approved

Centralizes timeseries data cleanup logic during entity hard-deletion to resolve issues #27041 and #27042. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Hard delete of Table does not clean up profiler data from profiler_data_time_series — stale profiles resurface on re-creation

4 participants