Fixes #27041: Clean up profiler time series data on hard delete of Table#27044
Conversation
Add entitySpecificCleanup() override to TableRepository that deletes all profiler data from profiler_data_time_series when a table is hard-deleted: - Table profiles (table.tableProfile) - System profiles (table.systemProfile) - Column profiles (table.columnProfile) for all columns including nested children Without this, hard-deleting a table leaves orphaned profiler records that resurface when a table with the same FQN is re-ingested. Fixes open-metadata#27041
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
There was a problem hiding this comment.
Pull request overview
This PR addresses orphaned profiler time series records by adding Table-specific hard-delete cleanup logic so that profiler_data_time_series entries don’t persist and reappear when a Table with the same FQN is re-created/re-ingested.
Changes:
- Add
TableRepository.entitySpecificCleanup()to delete table-level (table.tableProfile) and system-level (table.systemProfile) profiler time series for the table FQN. - Enumerate column FQNs (including nested children) and delete corresponding column-level (
table.columnProfile) profiler time series entries during hard delete.
| List<String> columnFqns = new ArrayList<>(); | ||
| collectColumnFqns(table.getColumns(), columnFqns); | ||
| for (String columnFqn : columnFqns) { | ||
| daoCollection.profilerDataTimeSeriesDao().delete(columnFqn, TABLE_COLUMN_PROFILE_EXTENSION); | ||
| } |
There was a problem hiding this comment.
Column profile cleanup only deletes profiles for column FQNs currently present in table.getColumns(). If a column previously existed (and had profiler data) but was removed from the table schema before the table is hard-deleted, its table.columnProfile rows will not be deleted here and can remain orphaned (and potentially resurface if the column name is reintroduced on a later re-ingestion). Consider deleting column-profile rows by the table reference embedded in the JSON payload (e.g., entityReference.id/fullyQualifiedName) so all column profiles linked to the table are removed, not just the current column list.
| @Override | ||
| protected void entitySpecificCleanup(Table table) { | ||
| String fqn = table.getFullyQualifiedName(); | ||
| daoCollection.profilerDataTimeSeriesDao().delete(fqn, TABLE_PROFILE_EXTENSION); | ||
| daoCollection.profilerDataTimeSeriesDao().delete(fqn, SYSTEM_PROFILE_EXTENSION); |
There was a problem hiding this comment.
PR description/checklist says a test was added for this scenario, but there are no test changes in this PR branch/repo state that exercise the new hard-delete cleanup for profiler time series. Please either add an explicit regression test for hard-deleting a table and asserting profiler_data_time_series is cleaned, or update the PR description to point to the existing test that covers this behavior.
Replace N+1 per-column DELETE calls with a single LIKE-based query on entityFQNHash prefix. This fixes two issues: 1. Orphaned column profiles: columns dropped before the table is hard-deleted now get their profiler data cleaned up too, since the prefix match catches all column FQNs under the table, not just the current schema. 2. Performance: one DELETE statement replaces potentially hundreds of individual calls for wide tables. Added EntityTimeSeriesDAO.deleteByFQNPrefix() which builds the hashed FQN prefix and uses LIKE matching, following the same pattern used by CollectionDAO for prefix-based relationship queries.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 1 resolved / 1 findingsCleans up profiler time series data when a table is hard deleted by using prefix-based DELETE queries, addressing the N+1 database calls issue with column profile deletion. No additional issues found. ✅ 1 resolved✅ Performance: Column profile deletion issues N+1 DB calls per column
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
@RajdeepKushwaha5 can you not open multiple PRs on this topic, please merge all of these into single PRs given you are addressing common issue of entitytimeseries data not getting deleted, also try to do this in entityRepository if there is a timeseries data associated with that way you can use one code to address all the entitiy tiem series data where its associated with an entity |
|
@harshach Thanks for the feedback — that makes sense. I initially opened separate PRs since they were filed as individual issues (#27040, #27041, #27042), but I see your point that they share a common root cause: timeseries data not being cleaned up when entities are deleted. I'll close these individual PRs and open a single consolidated PR that adds a generic Working on it now. |
Describe your changes:
Fixes #27041
What: Added
entitySpecificCleanup()override toTableRepositorythat deletes all profiler data fromprofiler_data_time_serieswhen a table is hard-deleted:table.tableProfiletable.systemProfiletable.columnProfileWhy: When a Table entity is hard-deleted, the base
EntityRepository.cleanup()removes entity extensions, relationships, tags, etc. — but it does not clean upprofiler_data_time_series, which is a separate table. Without this fix, orphaned profiler records remain and resurface when a table with the same FQN is re-ingested.Column profiles are stored with column-level FQNs (e.g.,
service.db.schema.table.column), so prefix-based deletion isn't possible on hashed FQN columns. Instead, the fix uses the existingcollectColumnFqns()utility to enumerate all column FQNs (including nested struct children) and deletes each one individually.How I tested: Verified spotless formatting passes and no compilation errors from the modified file. The fix follows the same pattern as
TestCaseRepository.entitySpecificCleanup()which similarly cleans up time series data on hard delete. TheentitySpecificCleanup()hook is only called during hard delete (fromEntityRepository.cleanup()), so soft delete behavior is unaffected.Type of change:
Checklist:
Fixes <issue-number>: <short explanation>