Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-delete#27051
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR aims to centralize cleanup of time-series rows when an entity is hard-deleted, by adding a generic DAO delete method and invoking it from the base EntityRepository.cleanup() path. This is intended to prevent orphaned rows in the time-series tables from resurfacing when entities are re-created with the same FQN.
Changes:
- Added
deleteAllByEntityFQN(...)toEntityTimeSeriesDAOto delete all rows for an entity FQN hash (regardless of extension). - Updated
EntityRepository.cleanup()to delete time-series data from bothentity_extension_time_seriesandprofiler_data_time_seriesfor the entity being hard-deleted. - Minor refactor to store
getFullyQualifiedName()in a local variable and reuse it for multiple DAO calls.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesDAO.java | Adds a generic “delete all rows by entityFQNHash” DAO operation for time-series tables. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java | Invokes centralized time-series cleanup during cleanup() (hard-delete) for both time-series tables. |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
@RajdeepKushwaha5 this looks good, can you add integration tests for this please |
🔴 Playwright Results — 1 failure(s), 21 flaky✅ 3665 passed · ❌ 1 failed · 🟡 21 flaky · ⏭️ 89 skipped
Genuine Failures (failed on all attempts)❌
|
|
@RajdeepKushwaha5 can you address comment and resolve the conflict |
…tralize timeseries cleanup on entity delete Add deleteAllByEntityFQN to EntityTimeSeriesDAO base interface and call it from EntityRepository.cleanup() for both entityExtensionTimeSeries and profilerDataTimeSeries tables. This ensures all timeseries data (profiler metrics, test results, etc.) is cleaned up when any entity is hard-deleted, rather than requiring each entity subclass to handle it individually. Also extract entityInterface.getFullyQualifiedName() to a local variable to avoid redundant calls in cleanup(). Closes open-metadata#27040, closes open-metadata#27041, closes open-metadata#27042
…entities Copilot review identified that exact-match delete on entityFQNHash misses child entity timeseries (e.g. column profiles stored under table.column FQN). Switch to deleteAllByEntityFQNPrefix which deletes both the entity's own rows (exact hash match) and all child rows (LIKE hash.%), matching the pattern used by fieldRelationshipDAO and tagUsageDAO.
- Remove dead deleteAllByEntityFQN() method (cleanup uses deleteAllByEntityFQNPrefix instead) - Narrow test section comment to reference only open-metadata#27041 since the tests cover profiler data cleanup, not entity_extension time series
f92690b to
ff86588
Compare
Code Review ✅ ApprovedCentralizes timeseries data cleanup logic during entity hard-deletion to resolve issues #27041 and #27042. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Describe your changes:
Fixes #27041, #27042
Per @harshach's feedback, this consolidates the three individual timeseries cleanup PRs (#27043, #27044, #27045) into a single centralized fix in
EntityRepository.cleanup().Root cause: When entities are hard-deleted, their associated timeseries data (profiler metrics, column profiles, test case results, web analytic events, etc.) in
entity_extension_time_seriesandprofiler_data_time_seriestables is not cleaned up, causing orphaned rows that resurface when an entity with the same FQN is re-created.Fix:
deleteAllByEntityFQNPrefix(String entityFQN)to the baseEntityTimeSeriesDAOinterface — a prefix-based delete that removes all timeseries rows for a given entity and its children (e.g. column profiles stored under<table_fqn>.<column_name>). Internally it hashes the FQN and usesentityFQNHash = :hash OR entityFQNHash LIKE :hash.%so that both the parent and child rows are deleted in one statement.EntityRepository.cleanup()for bothentityExtensionTimeSeriesDaoandprofilerDataTimeSeriesDaotables — this handles cleanup for all entity types (tables, web analytics, test suites, etc.) in one place.entityInterface.getFullyQualifiedName()to a local variable to avoid redundant calls.This approach ensures any future entity type with timeseries data is automatically covered without needing per-entity cleanup logic.
Integration tests added (in
TableResourceIT):hardDelete_cleansUpProfilerTimeSeriesDatahardDelete_cleansUpColumnProfileTimeSeriesDataType of change:
Checklist:
Fixes #27041, #27042: Centralize timeseries data cleanup on entity hard-deleteSummary by Gitar
fieldNameresolution inEntityRepositoryto useTypeRegistry.getPropertyNameinstead of substring manipulation.This will update automatically on new commits.