Skip to content

feat(dataRetention): sweep orphan time-series rows for per-type tables#28384

Merged
harshach merged 3 commits into
mainfrom
harshach/clean-orphaned-timeseries-rows
May 24, 2026
Merged

feat(dataRetention): sweep orphan time-series rows for per-type tables#28384
harshach merged 3 commits into
mainfrom
harshach/clean-orphaned-timeseries-rows

Conversation

@harshach

Copy link
Copy Markdown
Collaborator

Describe your changes:

Adds cleanOrphanedTimeSeriesRows() to DataRetention alongside the existing orphan-relationship and orphan-tag sweeps. Five DAOs each get a deleteOrphanedRecords(int limit) query (MySQL + PostgreSQL) that left-joins to the parent and deletes rows whose parent is gone: testCaseResolutionStatus (via entity_relationship PARENT_OF), agentExecution (agentIdai_application_entity.id), mcpExecution (serverIdmcp_server_entity.id), profile_data (entityFQNHashtable_entity.fqnHash), and query_cost_time_series (entityFQNHashquery_entity.fqnHash).

Pairs with #28367 — that PR makes the bulk hard-delete cascade skip time-series children (they can hold millions of rows per parent and a synchronous DELETE would stall the request), and explicitly defers orphan reclamation to DataRetention. The new sweep is wired to run after cleanOrphanedRelationshipsAndHierarchies() so the testCaseResolutionStatus check sees the post-cleanup entity_relationship state.

Type of change:

  • Improvement

Tests:

Use cases covered

  • After a recursive hard-delete of an aiApplication / mcpServer / testCase / table / query, the DataRetention job reclaims the dangling agentExecution / mcpExecution / testCaseResolutionStatus / profile_data / query_cost rows on its next run.
  • Time-series rows whose parent still resolves are never touched by the sweep.

Backend integration tests

  • Added openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/OrphanedTimeSeriesCleanupIT.java with one test per DAO. Each seeds a valid + an orphan row through the existing DAO insert(...) paths, runs deleteOrphanedRecords(BATCH), and asserts orphan-gone / valid-preserved via raw JDBI count queries.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • I have added tests (unit / integration / Playwright as applicable) and listed them above.

🤖 Generated with Claude Code

Adds `cleanOrphanedTimeSeriesRows()` to `DataRetention` alongside the existing
orphan-relationship and orphan-tag sweeps. Each of the five affected DAOs gets a
`deleteOrphanedRecords(int limit)` query (MySQL + PostgreSQL) that left-joins
to its parent and deletes rows the parent no longer covers:

- `testCaseResolutionStatus`: parent link via `entity_relationship` PARENT_OF
- `agentExecution`: `agentId` → `ai_application_entity.id`
- `mcpExecution`: `serverId` → `mcp_server_entity.id`
- `profile_data`: `entityFQNHash` → `table_entity.fqnHash`
- `query_cost_time_series`: `entityFQNHash` → `query_entity.fqnHash`

The sweep runs after `cleanOrphanedRelationshipsAndHierarchies()` so the
PARENT_OF check sees the post-cleanup `entity_relationship` state. Pairs with
PR #28367, where the bulk hard-delete cascade now skips time-series children
and relies on `DataRetention` to reclaim them out-of-band.

Adds `OrphanedTimeSeriesCleanupIT` covering all five per-type queries:
inserts a real-parent row and a bogus-parent row through the existing DAO
`insert(...)` paths, runs `deleteOrphanedRecords(BATCH)`, asserts the orphan
is gone and the valid row is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 18:07
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 22, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…ion table arg

- McpExecutionDAO.insertWithoutExtension uses `<table>` placeholder; the test
  was passing `null`, which made Jdbi fail to render the SQL template. Pass
  the literal table name `mcp_execution_entity`.
- `ns.prefix(...)` embeds class + method names, so chaining it through
  database -> schema -> table -> auto-created test_suite pushed the test_suite
  `name` column past its VARCHAR(256) bound. Use `ns.uniqueShortId()` for the
  hierarchy components and shorten the test method names so the resulting
  FQN stays well under the column limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rify PARENT_OF=9

Addresses two PR review findings:

- `profiler_data_time_series.deleteOrphanedRecords` previously LIMITed distinct
  `entityFQNHash` values, then deleted every row for each hash — a batch could
  delete tens of millions of rows if many orphan hashes each had thousands of
  rows. Switch to row-level limiting: single-table `DELETE ... WHERE NOT EXISTS
  (...) LIMIT N` on MySQL, and `ctid IN (SELECT ... LIMIT N)` on PostgreSQL
  (the table has no `id` column, so we use Postgres ctid for the inner subquery).
  This matches the row-count cap used by the other four orphan-cleanup queries.

- Annotate `er.relation = 9` in the testCaseResolutionStatus query with a
  `// 9 = Relationship.PARENT_OF` inline comment plus a leading block comment
  noting the ordinal is stable because the enum appends new values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 19:03
@gitar-bot

gitar-bot Bot commented May 22, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 2 resolved / 2 findings

Implements orphaned time-series data cleanup in the DataRetention job to reclaim storage after entity deletions, with row-level limits and clarified parent-relationship constants.

✅ 2 resolved
Performance: Profiler delete unbounded: LIMIT caps hashes, not rows deleted

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:9745-9759
The profiler_data_time_series.deleteOrphanedRecords query limits the number of distinct entityFQNHash values (up to 10,000), but deletes ALL rows matching those hashes in a single statement. As noted in the PR description, a single parent table can have millions of profiler rows. If 10,000 orphaned hashes each have thousands of rows, a single batch could lock and delete tens of millions of rows in one transaction, causing long lock holds and potential replication lag.

The other four DAOs (agent_execution, mcp_execution, query_cost, test_case_resolution_status) correctly limit by individual row id, bounding the actual rows deleted per statement. The profiler DAO should follow the same pattern.

Quality: Magic number 9 for PARENT_OF relation in SQL annotation

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:10043-10044 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:10057-10058
The testCaseResolutionStatus orphan query uses er.relation = 9 which is the ordinal value of Relationship.PARENT_OF. While the codebase has precedent for this in annotations (can't use runtime expressions), adding a comment clarifying the constant would aid maintainability, since an enum reordering would silently break this query.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (14 flaky)

✅ 4241 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 87 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 802 0 1 8
🟡 Shard 3 795 0 5 8
🟡 Shard 4 839 0 2 12
🟡 Shard 5 718 0 1 47
🟡 Shard 6 789 0 4 8
🟡 14 flaky test(s) (passed on retry)
  • Features/TagsSuggestion.spec.ts › should decline suggested tags for a container column (shard 1, 1 retry)
  • Features/DataProductRenameConsolidation.spec.ts › Rename then change owner - assets should be preserved (shard 2, 1 retry)
  • Features/LandingPageWidgets/DomainDataProductsWidgets.spec.ts › Domain asset count should update when assets are removed (shard 3, 1 retry)
  • Features/OntologyExplorerE2E.spec.ts › switching back from Hierarchy to Overview restores stats (shard 3, 1 retry)
  • Features/OntologyExplorerE2E.spec.ts › toggling edge labels off and back on leaves the graph and cardinality map intact (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Time Interval (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Chart (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › User as Owner Add, Update and Remove (shard 6, 1 retry)
  • Pages/Users.spec.ts › Token generation & revocation for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@harshach harshach merged commit 5f5a123 into main May 24, 2026
60 of 63 checks passed
@harshach harshach deleted the harshach/clean-orphaned-timeseries-rows branch May 24, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants