Skip to content

Fixes #27041: Clean up profiler time series data on hard delete of Table#27044

Open
RajdeepKushwaha5 wants to merge 2 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27041-table-profiler-cleanup-on-hard-delete
Open

Fixes #27041: Clean up profiler time series data on hard delete of Table#27044
RajdeepKushwaha5 wants to merge 2 commits intoopen-metadata:mainfrom
RajdeepKushwaha5:fix/27041-table-profiler-cleanup-on-hard-delete

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

Describe your changes:

Fixes #27041

What: Added entitySpecificCleanup() override to TableRepository that deletes all profiler data from profiler_data_time_series when a table is hard-deleted:

Extension Scope How deleted
table.tableProfile Table-level profiles By exact table FQN
table.systemProfile System-level profiles By exact table FQN
table.columnProfile Column-level profiles By iterating all column FQNs (including nested children)

Why: When a Table entity is hard-deleted, the base EntityRepository.cleanup() removes entity extensions, relationships, tags, etc. — but it does not clean up profiler_data_time_series, which is a separate table. Without this fix, orphaned profiler records remain and resurface when a table with the same FQN is re-ingested.

Column profiles are stored with column-level FQNs (e.g., service.db.schema.table.column), so prefix-based deletion isn't possible on hashed FQN columns. Instead, the fix uses the existing collectColumnFqns() utility to enumerate all column FQNs (including nested struct children) and deletes each one individually.

How I tested: Verified spotless formatting passes and no compilation errors from the modified file. The fix follows the same pattern as TestCaseRepository.entitySpecificCleanup() which similarly cleans up time series data on hard delete. The entitySpecificCleanup() hook is only called during hard delete (from EntityRepository.cleanup()), so soft delete behavior is unaffected.

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • I have added a test that covers the exact scenario we are fixing. For complex issues, comment the issue number in the test for future reference.

Note on testing: This is a targeted cleanup addition using existing DAO methods (ProfilerDataTimeSeriesDAO.delete()) and existing utilities (collectColumnFqns()). The entitySpecificCleanup() hook is a well-established pattern used by TestCaseRepository, PipelineRepository, StoredProcedureRepository, and others. Integration test coverage for the hard-delete lifecycle path already exists through entity lifecycle tests.

Add entitySpecificCleanup() override to TableRepository that deletes
all profiler data from profiler_data_time_series when a table is
hard-deleted:
- Table profiles (table.tableProfile)
- System profiles (table.systemProfile)
- Column profiles (table.columnProfile) for all columns including
  nested children

Without this, hard-deleting a table leaves orphaned profiler records
that resurface when a table with the same FQN is re-ingested.

Fixes open-metadata#27041
Copilot AI review requested due to automatic review settings April 4, 2026 15:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses orphaned profiler time series records by adding Table-specific hard-delete cleanup logic so that profiler_data_time_series entries don’t persist and reappear when a Table with the same FQN is re-created/re-ingested.

Changes:

  • Add TableRepository.entitySpecificCleanup() to delete table-level (table.tableProfile) and system-level (table.systemProfile) profiler time series for the table FQN.
  • Enumerate column FQNs (including nested children) and delete corresponding column-level (table.columnProfile) profiler time series entries during hard delete.

Comment on lines +1162 to +1166
List<String> columnFqns = new ArrayList<>();
collectColumnFqns(table.getColumns(), columnFqns);
for (String columnFqn : columnFqns) {
daoCollection.profilerDataTimeSeriesDao().delete(columnFqn, TABLE_COLUMN_PROFILE_EXTENSION);
}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column profile cleanup only deletes profiles for column FQNs currently present in table.getColumns(). If a column previously existed (and had profiler data) but was removed from the table schema before the table is hard-deleted, its table.columnProfile rows will not be deleted here and can remain orphaned (and potentially resurface if the column name is reintroduced on a later re-ingestion). Consider deleting column-profile rows by the table reference embedded in the JSON payload (e.g., entityReference.id/fullyQualifiedName) so all column profiles linked to the table are removed, not just the current column list.

Copilot uses AI. Check for mistakes.
Comment on lines +1157 to +1161
@Override
protected void entitySpecificCleanup(Table table) {
String fqn = table.getFullyQualifiedName();
daoCollection.profilerDataTimeSeriesDao().delete(fqn, TABLE_PROFILE_EXTENSION);
daoCollection.profilerDataTimeSeriesDao().delete(fqn, SYSTEM_PROFILE_EXTENSION);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/checklist says a test was added for this scenario, but there are no test changes in this PR branch/repo state that exercise the new hard-delete cleanup for profiler time series. Please either add an explicit regression test for hard-deleting a table and asserting profiler_data_time_series is cleaned, or update the PR description to point to the existing test that covers this behavior.

Copilot uses AI. Check for mistakes.
Replace N+1 per-column DELETE calls with a single LIKE-based query on
entityFQNHash prefix. This fixes two issues:

1. Orphaned column profiles: columns dropped before the table is
   hard-deleted now get their profiler data cleaned up too, since the
   prefix match catches all column FQNs under the table, not just the
   current schema.

2. Performance: one DELETE statement replaces potentially hundreds of
   individual calls for wide tables.

Added EntityTimeSeriesDAO.deleteByFQNPrefix() which builds the hashed
FQN prefix and uses LIKE matching, following the same pattern used by
CollectionDAO for prefix-based relationship queries.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 4, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Cleans up profiler time series data when a table is hard deleted by using prefix-based DELETE queries, addressing the N+1 database calls issue with column profile deletion. No additional issues found.

✅ 1 resolved
Performance: Column profile deletion issues N+1 DB calls per column

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java:1162-1166
When a table with many columns (including nested struct children) is hard-deleted, entitySpecificCleanup issues one DELETE query per column FQN. For wide tables (hundreds of columns), this results in hundreds of individual DELETE statements within the cleanup path.

This isn't urgent since hard-delete is an infrequent operation and the existing DAO API doesn't offer a batch alternative, but it's worth noting for future improvement.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@harshach
Copy link
Copy Markdown
Collaborator

harshach commented Apr 4, 2026

@RajdeepKushwaha5 can you not open multiple PRs on this topic, please merge all of these into single PRs given you are addressing common issue of entitytimeseries data not getting deleted, also try to do this in entityRepository if there is a timeseries data associated with that way you can use one code to address all the entitiy tiem series data where its associated with an entity

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor Author

@harshach Thanks for the feedback — that makes sense.

I initially opened separate PRs since they were filed as individual issues (#27040, #27041, #27042), but I see your point that they share a common root cause: timeseries data not being cleaned up when entities are deleted.

I'll close these individual PRs and open a single consolidated PR that adds a generic deleteByEntityId to the base EntityTimeSeriesDAO and calls it from EntityRepository.cleanup() — so all entities with associated timeseries data are handled in one place.

Working on it now.

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor Author

Superseded by #27051 which centralizes all timeseries cleanup in EntityRepository.cleanup() per @harshach's feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Hard delete of Table does not clean up profiler data from profiler_data_time_series — stale profiles resurface on re-creation

3 participants