feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade#27108
feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade#27108mohityadav766 wants to merge 30 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a faster, prefix-based hard deletion flow aimed at preventing orphaned relationships when ingestion and cascading deletes run concurrently (particularly for large service hierarchies). It does so by adding FQN-hash metadata to relationships, providing a new API endpoint for prefix deletion, and implementing a server-side bulk deletion orchestrator.
Changes:
- Added
/prefix/{id}hard-delete endpoint forDatabaseServicethat triggers an async prefix-based deletion job. - Added
fromFQNHash/toFQNHashcolumns + indexes toentity_relationshipand a v1.14.1 migration utility to backfill existing rows. - Introduced
PrefixDeletionServiceplus supporting DAO/repository changes for prefix ID discovery and batch deletes.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/resources/services/database/DatabaseServiceResource.java | Adds new REST endpoint to start prefix-based hard delete for DatabaseService. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java | Adds async orchestration method deletePrefixHardById calling PrefixDeletionService. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java | New service coordinating bulk prefix-based deletion across tables. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java | Adds DAO helpers for finding IDs by FQN-hash prefix and batch deletes. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java | Extends relationship insertion to persist relationship FQN hashes; adds hook helpers used by prefix deletion. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java | Extends entity_relationship inserts to include new hash columns; adds delete-by-prefix helpers and thread-entity batch lookup. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesDAO.java | Adds delete-by-FQN-hash-prefix helper for time-series tables. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/FeedRepository.java | Adds batch delete of threads by about-entity IDs (used by prefix deletion). |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java | Updates CONTAINS relationship insertion to populate relationship FQN hashes. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java | Updates CONTAINS relationship insertion to populate relationship FQN hashes. |
| openmetadata-service/src/main/java/org/openmetadata/service/initialization/LockManagerInitializer.java | Initializes PrefixDeletionService alongside the existing hierarchical lock manager. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1141/MigrationUtil.java | Adds backfill logic for populating new relationship hash columns for existing rows. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v1141/Migration.java | Runs v1141 backfill for MySQL migrations. |
| openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v1141/Migration.java | Runs v1141 backfill for Postgres migrations. |
| bootstrap/sql/schema/mysql.sql | Adds fromFQNHash/toFQNHash + indexes to base MySQL schema. |
| bootstrap/sql/schema/postgres.sql | Adds fromfqnhash/tofqnhash + indexes to base Postgres schema. |
| bootstrap/sql/migrations/native/1.14.1/mysql/schemaChanges.sql | Adds columns + indexes to entity_relationship for MySQL migrations. |
| bootstrap/sql/migrations/native/1.14.1/postgres/schemaChanges.sql | Adds columns + indexes to entity_relationship for Postgres migrations. |
| ExecutorService executorService = AsyncService.getInstance().getExecutorService(); | ||
| executorService.submit( | ||
| RequestLatencyContext.wrapWithContext( | ||
| () -> { | ||
| try { | ||
| PrefixDeletionService.getInstance().deletePrefixHard(entity, userName); |
There was a problem hiding this comment.
PrefixDeletionService.getInstance() can return null if the lock manager initializer didn’t run (or CollectionDAO was unavailable), which will NPE here. Please either make getInstance() lazily initialize a no-lock instance, or guard here and return a clear 503/500 explaining prefix deletion isn’t available until initialization completes.
| ExecutorService executorService = AsyncService.getInstance().getExecutorService(); | |
| executorService.submit( | |
| RequestLatencyContext.wrapWithContext( | |
| () -> { | |
| try { | |
| PrefixDeletionService.getInstance().deletePrefixHard(entity, userName); | |
| PrefixDeletionService prefixDeletionService = PrefixDeletionService.getInstance(); | |
| if (prefixDeletionService == null) { | |
| return Response.status(Response.Status.SERVICE_UNAVAILABLE) | |
| .type(MediaType.TEXT_PLAIN) | |
| .entity("Prefix deletion is not available until service initialization completes.") | |
| .build(); | |
| } | |
| ExecutorService executorService = AsyncService.getInstance().getExecutorService(); | |
| executorService.submit( | |
| RequestLatencyContext.wrapWithContext( | |
| () -> { | |
| try { | |
| prefixDeletionService.deletePrefixHard(entity, userName); |
| private void deleteDependencyTables(String rootFqn, String fqnHashPrefix, List<String> allIds) { | ||
| CollectionDAO dao = Entity.getCollectionDAO(); | ||
| dao.relationshipDAO().deleteAllByFqnHashPrefix(fqnHashPrefix); | ||
| dao.fieldRelationshipDAO().deleteAllByPrefix(rootFqn); | ||
| dao.entityExtensionDAO().deleteAllBatch(allIds); | ||
| dao.tagUsageDAO().deleteTagLabelsByTargetPrefix(rootFqn); | ||
| dao.usageDAO().deleteBatch(allIds); | ||
| List<UUID> allUuids = allIds.stream().map(UUID::fromString).toList(); | ||
| Entity.getFeedRepository().deleteByAboutBatch(allUuids); |
There was a problem hiding this comment.
This code uses deleteAllByFqnHashPrefix(...) as the sole cleanup for entity_relationship. Any relationship rows with NULL fromFQNHash/toFQNHash (e.g., inserted via existing addRelationship(...) overloads that don’t pass FQNs) will not be deleted, leaving orphaned relationships. Consider adding a fallback delete by descendant IDs (batch delete) or ensuring all relationship inserts always populate the hash columns.
| private void deleteEntityTables( | ||
| Map<String, List<UUID>> descendantsByType, EntityInterface rootEntity) { | ||
| for (Map.Entry<String, List<UUID>> entry : descendantsByType.entrySet()) { | ||
| String entityType = entry.getKey(); | ||
| List<String> ids = entry.getValue().stream().map(UUID::toString).toList(); | ||
| try { | ||
| Entity.getEntityRepository(entityType).getDao().deleteBatch(ids); | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Failed to delete {} entities of type {}: {}", ids.size(), entityType, e.getMessage()); | ||
| } | ||
| } |
There was a problem hiding this comment.
deleteBatch(ids) is implemented as a single WHERE id IN (<ids>) statement. For large services this can exceed DB parameter limits / max packet size and become the new bottleneck or fail outright. Please chunk batch deletes (e.g., 500/1000 IDs per statement) similarly to RelationshipDAO.batchDeleteRelationships(...), for both entity tables and dependent tables that delete by IN (<ids>).
| Entity.getEntityRepository(entityType).deleteTimeSeriesByFqnPrefix(fqnHashPrefix); | ||
| } catch (Exception e) { | ||
| LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage()); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
deleteTimeSeriesByFqnPrefix(...) is a no-op in EntityRepository, and there are currently no repository overrides, so this loop won’t delete any time-series rows (e.g., profiler/data quality/query cost/workflow time-series). Prefix hard-delete will leave orphaned time-series data. Implement overrides in the repositories that own time-series tables to call the appropriate EntityTimeSeriesDAO.deleteByFqnHashPrefix(...) (or equivalent).
| Entity.getEntityRepository(entityType).deleteTimeSeriesByFqnPrefix(fqnHashPrefix); | |
| } catch (Exception e) { | |
| LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage()); | |
| } | |
| } | |
| } | |
| EntityRepository<?> repository = Entity.getEntityRepository(entityType); | |
| if (hasTimeSeriesPrefixDeletionOverride(repository)) { | |
| repository.deleteTimeSeriesByFqnPrefix(fqnHashPrefix); | |
| } else { | |
| LOG.warn( | |
| "Repository {} for entity type {} does not override deleteTimeSeriesByFqnPrefix; " | |
| + "prefix hard delete may leave orphaned time-series data", | |
| repository.getClass().getName(), | |
| entityType); | |
| } | |
| } catch (Exception e) { | |
| LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage()); | |
| } | |
| } | |
| } | |
| private boolean hasTimeSeriesPrefixDeletionOverride(EntityRepository<?> repository) { | |
| try { | |
| return repository | |
| .getClass() | |
| .getMethod("deleteTimeSeriesByFqnPrefix", String.class) | |
| .getDeclaringClass() | |
| != EntityRepository.class; | |
| } catch (NoSuchMethodException e) { | |
| return false; | |
| } | |
| } |
| /** | ||
| * Orchestrates fast prefix-based hard deletion of an entity and all its descendants. | ||
| * | ||
| * <p>Uses 7 phases to atomically delete an entire subtree by FQN prefix rather than walking the | ||
| * entity tree one-by-one. This is orders of magnitude faster for large hierarchies and eliminates | ||
| * the race condition where concurrent ingestion creates orphaned entities during slow cascade | ||
| * deletion. | ||
| * |
There was a problem hiding this comment.
The class-level Javadoc says this deletion is "atomically" deleting the subtree, but the implementation isn’t wrapped in a single transaction and also catches/logs exceptions while continuing. This can leave the DB in a partially-deleted state while still sending a "complete" notification. Either adjust the documentation/notification semantics to reflect best-effort behavior, or introduce a transactional boundary and fail-fast semantics where feasible.
| public void deleteByAboutBatch(List<UUID> entityIds) { | ||
| if (nullOrEmpty(entityIds)) { | ||
| return; | ||
| } | ||
| List<String> ids = entityIds.stream().map(UUID::toString).toList(); | ||
| List<String> threadIds = listOrEmpty(dao.feedDAO().findByEntityIds(ids)); | ||
| for (String threadId : threadIds) { | ||
| try { | ||
| deleteThreadInternal(UUID.fromString(threadId)); | ||
| } catch (Exception ex) { | ||
| // Continue deletion | ||
| } | ||
| } |
There was a problem hiding this comment.
deleteByAboutBatch silently swallows exceptions per-thread. This makes it very hard to diagnose partial deletions during prefix hard-delete and can leave threads/relationships behind without any signal. At minimum log at DEBUG/WARN with the threadId/exception, or aggregate failures and report them back to the caller/job notification.
| /** | ||
| * Hard-deletes the given root entity and all descendants whose FQN starts with the root's FQN. | ||
| * Works at any hierarchy level (service, database, schema, etc.). | ||
| */ | ||
| public void deletePrefixHard(EntityInterface rootEntity, String deletedBy) { | ||
| String rootFqn = rootEntity.getFullyQualifiedName(); | ||
| String fqnHashPrefix = FullyQualifiedName.buildHash(rootFqn); | ||
| DeletionLock lock = acquireLock(rootEntity, deletedBy); | ||
| try { | ||
| Map<String, List<UUID>> descendantsByType = collectDescendantIds(fqnHashPrefix); | ||
| List<String> allIds = buildAllIds(rootEntity.getId(), descendantsByType); | ||
| deleteDependencyTables(rootFqn, fqnHashPrefix, allIds); | ||
| runEntityHooks(rootEntity, deletedBy, fqnHashPrefix); | ||
| deleteEntityTables(descendantsByType, rootEntity); | ||
| emitDeleteEvent(rootEntity); | ||
| } finally { | ||
| releaseLock(lock, rootEntity); | ||
| } | ||
| } |
There was a problem hiding this comment.
This introduces a new deletion code path (prefix hard-delete) with non-trivial DB side effects across many tables, but there are no unit/integration tests validating correctness (e.g., no orphan rows in entity_relationship/field_relationship/time-series, and behavior vs existing recursive hard delete). Please add focused tests for PrefixDeletionService (mock DAO + verification) and at least one integration test covering database service subtree deletion.
| private DeletionLock acquireLock(EntityInterface entity, String deletedBy) { | ||
| if (lockManager == null) { | ||
| return null; | ||
| } | ||
| try { | ||
| return lockManager.acquireDeletionLock(entity, deletedBy, true); | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Could not acquire deletion lock for {}: {}", | ||
| entity.getFullyQualifiedName(), | ||
| e.getMessage()); | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
🚨 Bug: Lock failure silently ignored — deletion proceeds unprotected
The acquireLock method (line 84-97) catches all exceptions (including EntityLockedException thrown on contention) and returns null. The caller deletePrefixHard (line 71) never checks whether the lock was actually acquired — it proceeds with all 7 deletion phases regardless. This completely defeats the stated goal of eliminating race conditions with concurrent ingestion.
If the lock cannot be acquired (e.g., another deletion is in progress, or the lock manager is unavailable), the deletion runs without protection, allowing ingestion to create new entities in the subtree while it's being deleted — exactly the orphan scenario this PR aims to fix.
Suggested fix:
private DeletionLock acquireLock(EntityInterface entity, String deletedBy) {
if (lockManager == null) {
throw new IllegalStateException("Lock manager not initialized");
}
// Let EntityLockedException propagate to the caller
return lockManager.acquireDeletionLock(entity, deletedBy, true);
}
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
🔴 Playwright Results — 1 failure(s), 30 flaky✅ 3593 passed · ❌ 1 failed · 🟡 30 flaky · ⏭️ 207 skipped
Genuine Failures (failed on all attempts)❌
|
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (1)
bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql:142
- These schema changes are being appended to the existing 1.13.0 migration. If a database was already upgraded past 1.13.0, the migration framework will not re-run 1.13.0, so the new columns/indexes will never be applied.
To ensure upgrades work correctly, the DDL should be added to the current/new migration version (e.g. 1.14.1) rather than modifying an older migration script.
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL;
CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash(768));
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash(768));
| private void deleteDependencyTables(String rootFqn, String fqnHashPrefix, List<String> allIds) { | ||
| CollectionDAO dao = Entity.getCollectionDAO(); | ||
| dao.relationshipDAO().deleteAllByFqnHashPrefix(fqnHashPrefix); | ||
| dao.fieldRelationshipDAO().deleteAllByPrefix(rootFqn); | ||
| dao.entityExtensionDAO().deleteAllBatch(allIds); | ||
| dao.tagUsageDAO().deleteTagLabelsByTargetPrefix(rootFqn); | ||
| dao.usageDAO().deleteBatch(allIds); | ||
| List<UUID> allUuids = allIds.stream().map(UUID::fromString).toList(); | ||
| Entity.getFeedRepository().deleteByAboutBatch(allUuids); | ||
| } |
There was a problem hiding this comment.
This deletion path builds allIds for the entire subtree and then issues multiple ... WHERE id IN (<ids>) calls (entity_extension, entity_usage, and thread lookup via deleteByAboutBatch). For large services (the stated 1M+ tables use case), these unchunked IN-lists can exceed database parameter limits and/or cause very large allocations.
Recommend chunking allIds consistently (similar to batchDeleteRelationships), or adding DAO methods that delete by FQN hash prefix directly to avoid materializing/passing millions of IDs through the application.
| @Transaction | ||
| public void deleteByAboutBatch(List<UUID> entityIds) { | ||
| if (nullOrEmpty(entityIds)) { | ||
| return; | ||
| } | ||
| List<String> ids = entityIds.stream().map(UUID::toString).toList(); | ||
| List<String> threadIds = listOrEmpty(dao.feedDAO().findByEntityIds(ids)); | ||
| for (String threadId : threadIds) { | ||
| try { | ||
| deleteThreadInternal(UUID.fromString(threadId)); | ||
| } catch (Exception ex) { | ||
| // Continue deletion | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
deleteByAboutBatch still performs per-thread deletes in a loop and first fetches thread IDs with a single IN (<entityIds>) query. With prefix deletion, entityIds can be extremely large, which can exceed DB bind limits and makes this method a bottleneck.
Consider chunking entityIds for findByEntityIds, and then using the existing deleteThreadsInBatch(...) to delete threads/relationships in bulk instead of calling deleteThreadInternal(...) repeatedly.
| private static void backfillForEntityType(Handle handle, String entityType) { | ||
| EntityRepository<?> repo = Entity.getEntityRepository(entityType); | ||
| String hashCol = repo.getDao().getNameHashColumn(); | ||
| if (!FQNHASH_COL.equals(hashCol) && !NAMEHASH_COL.equals(hashCol)) { | ||
| return; | ||
| } | ||
| String tableName = repo.getDao().getTableName(); | ||
| int offset = 0; | ||
| int processed; | ||
| do { | ||
| processed = processEntityBatch(handle, tableName, hashCol, offset); | ||
| offset += processed; | ||
| } while (processed == BATCH_SIZE); | ||
| LOG.info("Backfilled FQN hashes for entity type {}: {} rows", entityType, offset); | ||
| } | ||
|
|
||
| private static int processEntityBatch(Handle handle, String tableName, String hashCol, int offset) { | ||
| String sql = | ||
| "SELECT id, " | ||
| + hashCol | ||
| + " FROM " | ||
| + tableName | ||
| + " WHERE " | ||
| + hashCol | ||
| + " IS NOT NULL LIMIT :limit OFFSET :offset"; | ||
| List<Map<String, Object>> rows = | ||
| handle.createQuery(sql).bind("limit", BATCH_SIZE).bind("offset", offset).mapToMap().list(); | ||
| for (Map<String, Object> row : rows) { | ||
| backfillRow(handle, row, hashCol); | ||
| } | ||
| return rows.size(); |
There was a problem hiding this comment.
This migration utility uses LIMIT/OFFSET without an ORDER BY. That makes pagination non-deterministic (rows can be skipped/duplicated) and OFFSET becomes very slow on large tables.
For a backfill intended to handle very large installations, prefer a deterministic keyset pagination (e.g., WHERE id > :lastId ORDER BY id LIMIT :limit) and consider restricting to entities that actually have fqnHash (since prefix deletion is based on FQN hashes).
|
|
||
| -- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion. | ||
| -- This allows deleting all relationships for an entire entity subtree in a single indexed query | ||
| -- instead of walking the tree entity-by-entity. | ||
| ALTER TABLE entity_relationship | ||
| ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL, | ||
| ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL; | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash); | ||
| CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash); |
There was a problem hiding this comment.
These schema changes are being appended to the existing 1.13.0 migration. If a database was already upgraded past 1.13.0, the migration framework will not re-run 1.13.0, so the new columns/indexes will never be applied.
To ensure upgrades work correctly, the DDL should be added to the current/new migration version (e.g. 1.14.1) rather than modifying an older migration script.
| -- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion. | |
| -- This allows deleting all relationships for an entire entity subtree in a single indexed query | |
| -- instead of walking the tree entity-by-entity. | |
| ALTER TABLE entity_relationship | |
| ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL, | |
| ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL; | |
| CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash); | |
| CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash); |
| @@ -0,0 +1 @@ | |||
| -- No schema changes in this version. | |||
There was a problem hiding this comment.
The PR description indicates the new entity_relationship columns/indexes are introduced in 1.14.1, but this 1.14.1 migration is marked as having no schema changes. As written, upgrading from 1.14.0 → 1.14.1 would not apply the required DDL.
Move the ALTER TABLE ... ADD COLUMN fromFQNHash/toFQNHash + index creation into the 1.14.1 schemaChanges files (or whichever version is intended to ship this feature).
| public class Migration extends MigrationProcessImpl { | ||
|
|
||
| public Migration(MigrationFile migrationFile) { | ||
| super(migrationFile); | ||
| } |
There was a problem hiding this comment.
The backfill is currently wired into the v1130 migration path, but v1141 (1.14.1) migration has no runDataMigration() and therefore won't backfill fromFQNHash/toFQNHash when upgrading to 1.14.1. Combined with the DDL living in older native migrations, this risks leaving entity_relationship hashes unpopulated on many upgrade paths.
Consider moving the backfill invocation into the v1141 migration (and keeping older migrations immutable).
| public class Migration extends MigrationProcessImpl { | |
| public Migration(MigrationFile migrationFile) { | |
| super(migrationFile); | |
| } | |
| public class Migration extends MigrationProcessImpl { | |
| private final MigrationFile migrationFile; | |
| public Migration(MigrationFile migrationFile) { | |
| super(migrationFile); | |
| this.migrationFile = migrationFile; | |
| } | |
| @Override | |
| public void runDataMigration() { | |
| new org.openmetadata.service.migration.postgres.v1130.Migration(migrationFile) | |
| .runDataMigration(); | |
| } |
|
|
||
| public Migration(MigrationFile migrationFile) { | ||
| super(migrationFile); | ||
| } |
There was a problem hiding this comment.
The backfill is currently wired into the v1130 migration path, but v1141 (1.14.1) migration has no runDataMigration() and therefore won't backfill fromFQNHash/toFQNHash when upgrading to 1.14.1. Combined with the DDL living in older native migrations, this risks leaving entity_relationship hashes unpopulated on many upgrade paths.
Consider moving the backfill invocation into the v1141 migration (and keeping older migrations immutable).
| } | |
| } | |
| @Override | |
| public void runDataMigration() { | |
| super.runDataMigration(); | |
| executeSQL( | |
| """ | |
| UPDATE entity_relationship | |
| SET | |
| fromFQNHash = MD5(fromFQN), | |
| toFQNHash = MD5(toFQN) | |
| WHERE | |
| (fromFQNHash IS NULL OR fromFQNHash = '') | |
| OR (toFQNHash IS NULL OR toFQNHash = '') | |
| """); | |
| } |
| if (lockManager == null) { | ||
| return null; | ||
| } | ||
| try { | ||
| return lockManager.acquireDeletionLock(entity, deletedBy, true); | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Could not acquire deletion lock for {}: {}", | ||
| entity.getFullyQualifiedName(), | ||
| e.getMessage()); | ||
| return null; |
There was a problem hiding this comment.
If the deletion lock cannot be acquired, the code currently logs and proceeds with the bulk delete anyway. That re-opens the ingestion race condition this feature is intended to eliminate (and could still create orphans while the delete is running).
Consider failing the operation when the deletion lock cannot be acquired (or retrying/acquiring in blocking mode), so callers get a clear failure instead of a best-effort delete.
| if (lockManager == null) { | |
| return null; | |
| } | |
| try { | |
| return lockManager.acquireDeletionLock(entity, deletedBy, true); | |
| } catch (Exception e) { | |
| LOG.warn( | |
| "Could not acquire deletion lock for {}: {}", | |
| entity.getFullyQualifiedName(), | |
| e.getMessage()); | |
| return null; | |
| String entityFqn = entity.getFullyQualifiedName(); | |
| if (lockManager == null) { | |
| throw new IllegalStateException( | |
| String.format("Cannot hard delete %s because the deletion lock manager is not available", entityFqn)); | |
| } | |
| try { | |
| DeletionLock deletionLock = lockManager.acquireDeletionLock(entity, deletedBy, true); | |
| if (deletionLock == null) { | |
| throw new IllegalStateException( | |
| String.format("Cannot hard delete %s because the deletion lock could not be acquired", entityFqn)); | |
| } | |
| return deletionLock; | |
| } catch (Exception e) { | |
| throw new IllegalStateException( | |
| String.format("Cannot hard delete %s because the deletion lock could not be acquired", entityFqn), e); |
| try { | ||
| MigrationUtil.backfillRelationshipFqnHashes(handle); | ||
| } catch (Exception e) { | ||
| LOG.error( | ||
| "Failed to backfill FQN hashes in entity_relationship during v1130 migration. " | ||
| + "Fast prefix deletion may not work correctly for pre-existing relationships.", | ||
| e); | ||
| } |
There was a problem hiding this comment.
Running the relationship FQN-hash backfill in the v1130 migration means it will not run for environments that are already beyond 1.13.0 (the common case for a 1.14.1 upgrade). This can leave fromFQNHash/toFQNHash NULL and break prefix deletion.
To make upgrades reliable, wire the backfill into the migration version that actually introduces the columns (e.g. v1141) rather than v1130.
| try { | |
| MigrationUtil.backfillRelationshipFqnHashes(handle); | |
| } catch (Exception e) { | |
| LOG.error( | |
| "Failed to backfill FQN hashes in entity_relationship during v1130 migration. " | |
| + "Fast prefix deletion may not work correctly for pre-existing relationships.", | |
| e); | |
| } |
| try { | ||
| MigrationUtil.backfillRelationshipFqnHashes(handle); | ||
| } catch (Exception e) { | ||
| LOG.error( | ||
| "Failed to backfill FQN hashes in entity_relationship during v1130 migration. " | ||
| + "Fast prefix deletion may not work correctly for pre-existing relationships.", | ||
| e); | ||
| } |
There was a problem hiding this comment.
Running the relationship FQN-hash backfill in the v1130 migration means it will not run for environments that are already beyond 1.13.0 (the common case for a 1.14.1 upgrade). This can leave fromFQNHash/toFQNHash NULL and break prefix deletion.
To make upgrades reliable, wire the backfill into the migration version that actually introduces the columns (e.g. v1141) rather than v1130.
| try { | |
| MigrationUtil.backfillRelationshipFqnHashes(handle); | |
| } catch (Exception e) { | |
| LOG.error( | |
| "Failed to backfill FQN hashes in entity_relationship during v1130 migration. " | |
| + "Fast prefix deletion may not work correctly for pre-existing relationships.", | |
| e); | |
| } |
Adds 11 unit tests covering the correlated-subquery backfill logic introduced in MigrationUtil.backfillRelationshipFqnHashes(): - MigrationUtilTest: 7 tests verifying empty entity list no-ops, fqnHash/nameHash entity updates, unrecognised hash column skip, time-series entity exception swallowing, multi-entity failure isolation, and correct CAST SQL generation - mysql/v1130/MigrationTest: 2 tests verifying runDataMigration delegates to backfillRelationshipFqnHashes and swallows errors - postgres/v1130/MigrationTest: same 2 tests for Postgres variant Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PrefixDeletionIT - three integration tests verifying the
DELETE /v1/services/databaseServices/prefix/{id} endpoint:
- single service with 5 tables is fully erased (service, database,
schema, all tables return 404 after deletion)
- service with 2×2 databases/schemas and 12 tables all erased
- empty service (no databases) is deleted cleanly
PrefixDeletionBenchmarkIT - disabled manual benchmark tagged
@benchmark that creates equal hierarchies and times:
- old: DELETE /{id}?hardDelete=true&recursive=true
- new: DELETE /prefix/{id}
Logs speedup factor; configurable table count via -Dtest.scenario=N.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously only DatabaseService had DELETE /prefix/{id}. Added the same
endpoint to DatabaseResource and DatabaseSchemaResource so callers can
bulk-hard-delete at any level of the service hierarchy:
DELETE /v1/services/databaseServices/prefix/{id} (existing)
DELETE /v1/databases/prefix/{id} (new)
DELETE /v1/databaseSchemas/prefix/{id} (new)
All three delegate to EntityResource.deletePrefixHardById() which runs
PrefixDeletionService.deletePrefixHard() — same FQN hash prefix logic,
same async execution, same auth checks.
Updated PrefixDeletionIT with two new integration tests that verify the
sibling-isolation guarantee: deleting a database leaves sibling databases
intact, and deleting a schema leaves sibling schemas and their tables intact.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously DELETE /prefix/{id} was only wired in DatabaseServiceResource
(and manually added to Database/DatabaseSchema). Adding it to each of the
60+ entity resource subclasses individually would be maintenance-heavy and
error-prone.
JAX-RS inherits endpoint annotations from parent classes when the
subclass method has no annotations of its own, so adding @delete
@path("/prefix/{id}") once to EntityResource.deletePrefixHardById()
exposes the endpoint on every entity resource automatically:
DELETE /v1/databases/prefix/{id}
DELETE /v1/databaseSchemas/prefix/{id}
DELETE /v1/tables/prefix/{id}
DELETE /v1/services/databaseServices/prefix/{id}
DELETE /v1/glossaries/prefix/{id}
... (every EntityResource subclass)
Removed the now-redundant overrides from DatabaseServiceResource,
DatabaseResource, and DatabaseSchemaResource.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The prefix delete endpoint returns 202 immediately and processes the deletion in a background thread. Asserting right after the response would always race against the job and produce false failures. Replace assertEntityGone() (which asserted synchronously) with awaitGone() backed by Awaitility — polls every 1s for up to 30s until the GET throws, matching the async job lifecycle. Also separate the mixed descendant ID list into typed lists (dbIds, schemaIds, tableIds) so each is fetched via the correct SDK client. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rvice is not null
LockManagerInitializer.initialize() was never called, leaving
PrefixDeletionService.getInstance() returning null. Any call to
DELETE /prefix/{id} would immediately NullPointerException in
EntityResource.deletePrefixHardById().
Call LockManagerInitializer.initialize() in OpenMetadataApplication.run()
after Entity.initializeRepositories() so that both CollectionDAO and all
entity repositories are available when the lock manager and
PrefixDeletionService are wired up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nflicts
The old code called DatabaseTestFactory.create(ns, ...) and
DatabaseSchemaTestFactory.create(ns, ...) inside loops — both always
produced ns.prefix("db") / ns.prefix("schema"), so the second iteration
hit a 409 ConflictException: Entity already exists.
Fix:
- Add DatabaseTestFactory.createWithName(ns, serviceFqn, baseName) and
DatabaseSchemaTestFactory.createWithName(ns, databaseFqn, baseName) so
callers can provide a distinct base name that gets namespaced.
- Rewrite every loop in PrefixDeletionIT and PrefixDeletionBenchmarkIT
to pass a loop-index-qualified base name ("db0", "db1", "sc00", ...).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on time - Default topology changed to 5 × 5 × 400 = 10,000 tables per service (~10,031 total entities), all configurable via -Dtest.databases/schemas/tables - timeNewDelete now polls with Awaitility until the service is actually gone so the reported elapsed time covers real deletion, not just the 202 handoff - Log messages improved to show total entity count and seeding time estimate - Removed unused HashMap/Map imports, replaced with Duration/UUID/Awaitility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace sequential triple-nested loop with a 3-phase parallel approach: 1. All databases created concurrently 2. All schemas created concurrently (after databases resolve) 3. All tables created concurrently (after schemas resolve) Default 32 threads (tunable via -Dtest.seedThreads). At ~50ms/REST call and 32 threads, 10k tables seed in ~20s instead of ~8 min sequentially. Also logs per-entity average seed time for diagnostics. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…greSQL The rest of the codebase uses VARCHAR(36) for UUID columns in PostgreSQL. BindUUID binds UUID values as strings (via UUID.toString()), which works with VARCHAR(36) but fails with native UUID columns because PostgreSQL rejects the implicit text = uuid comparison: Hint: No operator matches the given name and argument types. Changing id and entityId to VARCHAR(36) makes the schema consistent with every other entity table and removes the type cast error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… on PostgreSQL" This reverts commit 87aa7f3.
…greSQL The entity_deletion_lock table was created in 1.9.0 with native UUID columns for id and entityId. The rest of the codebase uses VARCHAR(36) for all UUID columns so that BindUUID (which binds via UUID.toString()) can compare them without an explicit type cast. Without this, PostgreSQL rejects the WHERE clause in DeletionLockDAO: Hint: No operator matches the given name and argument types. You might need to add explicit type casts. ALTER TABLE migrates existing rows in place — UUID::VARCHAR produces the standard 8-4-4-4-12 string representation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…greSQL The entity_deletion_lock table was created in 1.9.0 with native UUID columns for id and entityId. The rest of the codebase uses VARCHAR(36) for all UUID columns so that BindUUID (which binds via UUID.toString()) can compare them without an explicit type cast. Without this, PostgreSQL rejects the WHERE clause in DeletionLockDAO: Hint: No operator matches the given name and argument types. You might need to add explicit type casts. ALTER TABLE migrates existing rows in place — UUID::VARCHAR produces the standard 8-4-4-4-12 string representation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PostgreSQL limits prepared statements to 65,535 parameters. With 10k+ entities each having multiple extension rows, deleteAllBatch was generating statements with 100k+ parameters and failing with: PSQLException: PreparedStatement can have at most 65,535 parameters. Wrap both EntityExtensionDAO.deleteAllBatch and UsageDAO.deleteBatch with default interface methods that chunk into 50,000-ID slices before delegating to the raw @SqlUpdate methods. 50k is safely below the limit while keeping the number of round trips minimal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The codebase pattern is: EntityResource provides the implementation with no JAX-RS annotations; each subclass explicitly declares the endpoint with @DELETE/@Path/@operation so it appears in the Swagger UI and is routed correctly by JAX-RS. - Strip @DELETE/@Path/@operation from EntityResource.deletePrefixHardById so it becomes a plain implementation method (same as delete/deleteByIdAsync) - Add explicit @delete /prefix/{id} override in DatabaseServiceResource that calls the base implementation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asses Add the hard-delete-by-FQN-prefix endpoint to all 69 EntityResource subclasses following the established pattern: each resource class explicitly declares @delete @path("/prefix/{id}") with entity-specific @operation metadata and delegates to EntityResource.deletePrefixHardById. Resources covered: - 13 service resources (database, storage, messaging, pipeline, ml, metadata, search, security, mcp, llm, drive, dashboard, api) - Database hierarchy (database, schema, table, stored procedure) - Data entities (topic, container, search index, pipeline, ml model, dashboard, chart, dashboard data model, api collection, api endpoint) - Team/user/role (team, user, role, persona, policy, bot) - Classification/taxonomy (classification, tag, glossary, glossary term, domain, data product) - Tests/quality (test suite, test case, test definition, test connection definition, data contract, kpi, metric, query) - Apps/workflows (app, marketplace, workflow, workflow definition, ingestion pipeline, event subscription, notification template) - AI (ai application, ai governance policy, llm model, mcp server, prompt template) - Drives (directory, file, spreadsheet, worksheet) - Misc (type, report, doc store, data insight chart, system chart, web analytic event, learning resource) Also adds missing imports (DELETE, PathParam, Parameter, ApiResponse, Schema) to the 5 files that lacked them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DataInsightSystemChartResource was missing java.util.UUID and TestConnectionDefinitionResource was missing jakarta.ws.rs.core.Response, both required by the new deletePrefixHardById endpoint. Spotted during mvn clean package compilation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 70 subclass overrides were calling deletePrefixHardById(uriInfo,
securityContext, id) which resolved to the overriding method itself
(infinite recursion), not the base class implementation. This caused:
java.lang.StackOverflowError
at DatabaseSchemaResource.deletePrefixHardById(...)
at DatabaseSchemaResource.deletePrefixHardById(...)
...
Changed all 70 call sites to super.deletePrefixHardById(...) so they
delegate to EntityResource's implementation as intended.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r limit With 72k+ entity IDs from a large service hierarchy, findByEntityIds was generating a prepared statement with 72,157 parameters, exceeding PostgreSQL's 65,535 limit: PSQLException: PreparedStatement can have at most 65,535 parameters. Same fix as EntityExtensionDAO and UsageDAO: rename the raw @sqlquery to findByEntityIdsChunk and wrap it with a default method that collects results across 50,000-ID slices. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After bulk-deleting entities from the DB, call SearchRepository.deleteByEntityTypeFqnPrefix for the root entity type and every descendant type found during the scan. This removes all matching documents from each type's ES index using a single FQN prefix wildcard query per index, preventing deleted entities from continuing to appear in search results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…letion Previously only the root entity's preDelete hook ran. This adds two new extension points to EntityRepository: - preDeleteByFqnHashPrefix(fqnHashPrefix, deletedBy): called before DB deletion for each entity type; entities are still in the DB so external systems can be cleaned up. IngestionPipelineRepository overrides this to call pipelineServiceClient.deletePipeline() for every matching Airflow pipeline and removes pipeline status time-series records. - postDeleteByFqnHashPrefix(fqnHashPrefix): called after DB deletion for post-deletion metadata work that does not require loading deleted entities. PrefixDeletionService.runEntityHooks now invokes preDeleteByFqnHashPrefix for every entity type (alongside the existing deleteTimeSeriesByFqnPrefix loop), and a new runPostEntityHooks phase fires postDeleteByFqnHashPrefix for every type after the entity tables are cleared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new extension points to EntityRepository: - preDeleteByFqnHashPrefix(fqnHashPrefix, deletedBy): runs before DB deletion; entities still exist so external systems can be cleaned up - postDeleteByFqnHashPrefix(fqnHashPrefix): runs after DB deletion for metadata-only post-deletion work Concrete overrides added: - IngestionPipelineRepository: calls Airflow pipelineServiceClient.deletePipeline() and removes pipeline status time-series records for each matching pipeline - DataContractRepository: deletes the linked TestSuite for each matching contract that has quality expectations (prevents orphaned TestSuite entities) - WorkflowDefinitionRepository: calls WorkflowHandler.deleteWorkflowDefinition() for each matching workflow definition - WorkflowRepository: calls SecretsManagerFactory.deleteSecretsFromWorkflow() for each matching workflow (prevents orphaned secrets) Also adds dao.entityExtensionTimeSeriesDao().deleteByFqnHashPrefix() to deleteDependencyTables so all entity_extension_time_series rows (pipeline status, data contract results, etc.) are bulk-cleaned in one pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion MySQL 8.x does not support ADD COLUMN IF NOT EXISTS (that is a MariaDB extension). Replace with the SET @SQL / PREPARE / EXECUTE / DEALLOCATE pattern already used elsewhere in this migration file to conditionally add fromFQNHash and toFQNHash columns to entity_relationship. CREATE INDEX IF NOT EXISTS is supported by MySQL 8.0+ so those lines are left unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| default List<UUID> findIdsByFqnHashPrefix(String fqnHashPrefix) { | ||
| if (!"fqnHash".equals(getNameHashColumn())) { | ||
| return List.of(); | ||
| } | ||
| return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream() | ||
| .map(UUID::fromString) | ||
| .toList(); | ||
| } |
There was a problem hiding this comment.
findIdsByFqnHashPrefix currently returns IDs only when getNameHashColumn() equals "fqnHash", but many entity tables use nameHash while still storing the FQN hash via @BindFQN (i.e., FullyQualifiedName.buildHash). This means prefix deletion will miss descendants for those entity types (e.g., teams/sub-teams) even though the hashes are prefix-searchable. Consider using getNameHashColumn() as the column to LIKE against (or allow both fqnHash and nameHash).
| // Set it on EntityRepository | ||
| EntityRepository.setLockManager(lockManager); | ||
|
|
||
| // Initialize PrefixDeletionService with the same lock manager | ||
| PrefixDeletionService.initialize(lockManager); | ||
|
|
||
| initialized = true; | ||
| LOG.info("Hierarchical lock manager initialized successfully"); | ||
|
|
There was a problem hiding this comment.
PrefixDeletionService.initialize(lockManager) is only invoked on the successful lock-manager path. If initialization fails, PrefixDeletionService.getInstance() can remain null and prefix-delete endpoints will NPE. To keep prefix deletion working even without locking, initialize PrefixDeletionService in the failure/skip paths as well (e.g., with a null lock manager) or make getInstance() lazily create a no-lock instance.
| public void deleteByEntityTypeFqnPrefix(String entityType, String fqnPrefix) { | ||
| if (!checkIfIndexingIsSupported(entityType)) { | ||
| return; | ||
| } | ||
| if (!getSearchClient().isClientAvailable()) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
deleteByEntityTypeFqnPrefix returns early when the search client is unavailable, which can leave stale documents with no retry path. Other delete methods enqueue to SearchIndexRetryQueue when ES/OS is down; consider doing the same here (at least for the root delete request) so prefix deletions eventually clean up search indexes.
| operationId = "deleteApiServicePrefixHard", | ||
| summary = "Hard-delete a API service and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this API service and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar: the OpenAPI summary/description uses "Hard-delete a API service"; since "API" starts with a vowel sound, this should be "an API service" (same for the description string).
| operationId = "deleteAppPrefixHard", | ||
| summary = "Hard-delete a app and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this app and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar: the OpenAPI summary/description uses "Hard-delete a app"; this should be "an app" (and likewise in the description) for correct English in generated API docs.
| operationId = "deleteIngestionPipelinePrefixHard", | ||
| summary = "Hard-delete a ingestion pipeline and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this ingestion pipeline and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar: the OpenAPI summary/description uses "Hard-delete a ingestion pipeline"; this should be "an ingestion pipeline" (and likewise in the description) for correct English in generated API docs.
| operationId = "deleteEventSubscriptionPrefixHard", | ||
| summary = "Hard-delete a event subscription and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this event subscription and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar: the OpenAPI summary/description uses "Hard-delete a event subscription"; this should be "an event subscription" (and likewise in the description) for correct English in generated API docs.
The internal migration runner handles duplicate index checks. Plain CREATE INDEX is simpler and works across all MySQL 8.x versions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the information_schema conditional workarounds. The internal migration runner handles idempotency, so plain DDL is sufficient. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 104 out of 104 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (2)
openmetadata-service/src/main/java/org/openmetadata/service/initialization/LockManagerInitializer.java:57
- If
CollectionDAOis unavailable (or initialization throws),PrefixDeletionServiceis never initialized, but the new/prefix/{id}endpoints still exist and will NPE when invoked. Consider initializingPrefixDeletionServiceeven when the lock manager can't be constructed (e.g., withnulllock manager), or exposing an availability check so the REST layer can return a clear error instead of accepting the job and failing later.
try {
LOG.info("Initializing hierarchical lock manager for entity deletion optimization");
// Get the collection DAO
var collectionDAO = Entity.getCollectionDAO();
if (collectionDAO == null) {
LOG.warn("CollectionDAO not available, skipping lock manager initialization");
return;
}
// Initialize the lock manager
HierarchicalLockManager lockManager =
new HierarchicalLockManager(collectionDAO.deletionLockDAO());
// Set it on EntityRepository
EntityRepository.setLockManager(lockManager);
// Initialize PrefixDeletionService with the same lock manager
PrefixDeletionService.initialize(lockManager);
initialized = true;
LOG.info("Hierarchical lock manager initialized successfully");
} catch (Exception e) {
LOG.error("Failed to initialize hierarchical lock manager: {}", e.getMessage(), e);
// Continue without locking for backward compatibility
}
bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql:142
- Adding new DDL to an older, already-shipped migration version (
1.13.0) is unsafe for upgrades: databases that have already applied 1.13.0 will not re-run this file, so they’ll never get the newfromFQNHash/toFQNHashcolumns/indexes required by prefix deletion. Please introduce this DDL in the latest migration version (and keep 1.13.0 immutable) so upgrades are deterministic.
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN toFQNHash VARCHAR(768) DEFAULT NULL;
CREATE INDEX idx_er_from_fqn_hash ON entity_relationship (fromFQNHash(768));
CREATE INDEX idx_er_to_fqn_hash ON entity_relationship (toFQNHash(768));
| if (!"fqnHash".equals(getNameHashColumn())) { | ||
| return List.of(); | ||
| } | ||
| return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream() |
There was a problem hiding this comment.
findIdsByFqnHashPrefix currently bails out unless getNameHashColumn() is exactly "fqnHash". Several entity tables use nameHash while still hashing the entity FQN (e.g., team_entity has only nameHash), so descendant collection for prefix deletion will silently miss those entity types. Consider querying against getNameHashColumn() (for both fqnHash and nameHash) or otherwise documenting/enforcing that only fqnHash-backed entities are eligible for prefix deletion.
| if (!"fqnHash".equals(getNameHashColumn())) { | |
| return List.of(); | |
| } | |
| return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream() | |
| String nameHashColumn = getNameHashColumn(); | |
| if (!"fqnHash".equals(nameHashColumn) && !"nameHash".equals(nameHashColumn)) { | |
| return List.of(); | |
| } | |
| return findIdsByFqnHashPrefixInternal(getTableName(), nameHashColumn, fqnHashPrefix + ".%") | |
| .stream() |
| import jakarta.ws.rs.*; | ||
| import jakarta.ws.rs.DELETE; | ||
| import jakarta.ws.rs.PathParam; | ||
| import jakarta.ws.rs.core.*; |
There was a problem hiding this comment.
Redundant imports: this file already imports jakarta.ws.rs.*, so explicitly importing jakarta.ws.rs.DELETE and jakarta.ws.rs.PathParam is unnecessary and can violate checkstyle/spotless rules. Please remove the redundant explicit imports.
| import jakarta.ws.rs.*; | ||
| import jakarta.ws.rs.DELETE; | ||
| import jakarta.ws.rs.PathParam; | ||
| import jakarta.ws.rs.core.*; |
There was a problem hiding this comment.
Redundant imports: this file already imports jakarta.ws.rs.*, so explicitly importing jakarta.ws.rs.DELETE and jakarta.ws.rs.PathParam is unnecessary and can violate checkstyle/spotless rules. Please remove the redundant explicit imports.
| operationId = "deleteApiServicePrefixHard", | ||
| summary = "Hard-delete a API service and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this API service and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar in the OpenAPI summary: "Hard-delete a API service..." should be "Hard-delete an API service...".
| @Operation( | ||
| operationId = "deleteIngestionPipelinePrefixHard", | ||
| summary = "Hard-delete a ingestion pipeline and all descendants by FQN prefix", | ||
| description = | ||
| "Bulk hard-delete this ingestion pipeline and all descendants whose FQN starts with this " | ||
| + "entity's FQN. Significantly faster than recursive delete for large hierarchies.", |
There was a problem hiding this comment.
Grammar in the OpenAPI summary: "Hard-delete a ingestion pipeline..." should be "Hard-delete an ingestion pipeline...".
| -- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion. | ||
| -- This allows deleting all relationships for an entire entity subtree in a single indexed query | ||
| -- instead of walking the tree entity-by-entity. | ||
| ALTER TABLE entity_relationship | ||
| ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL, | ||
| ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL; | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash); | ||
| CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash); | ||
|
|
There was a problem hiding this comment.
Adding new DDL to an older, already-shipped migration version (1.13.0) is unsafe for upgrades: databases that have already applied 1.13.0 will not re-run this file, so they’ll never get the new fromFQNHash/toFQNHash columns/indexes required by prefix deletion. Please introduce this DDL in the latest migration version (and keep 1.13.0 immutable) so upgrades are deterministic.
| -- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion. | |
| -- This allows deleting all relationships for an entire entity subtree in a single indexed query | |
| -- instead of walking the tree entity-by-entity. | |
| ALTER TABLE entity_relationship | |
| ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL, | |
| ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL; | |
| CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash); | |
| CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash); |
Code Review 🚫 Blocked 6 resolved / 8 findingsFast FQN prefix-based bulk deletion implementation fixes race conditions but is blocked by a critical lock failure that silently ignores contention and allows unprotected deletion to proceed, plus a minor redundant search index operation for the root entity type. 🚨 Bug: Lock failure silently ignored — deletion proceeds unprotected📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:84-97 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:68-81 The If the lock cannot be acquired (e.g., another deletion is in progress, or the lock manager is unavailable), the deletion runs without protection, allowing ingestion to create new entities in the subtree while it's being deleted — exactly the orphan scenario this PR aims to fix. Suggested fix💡 Performance: Redundant search index delete for root entity type📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:175-186
Not a correctness bug (the call is idempotent), but it's a wasted round-trip to the search cluster. Suggested fix✅ 6 resolved✅ Performance: Migration backfill does row-by-row UPDATEs on entity_relationship
✅ Bug: Async prefix delete test has no wait — assertions are racy
✅ Bug: Test uses tables().get() for databases and schemas, masking failures
✅ Bug: SLF4J does not support {:.2f} — speedup prints as literal text
✅ Bug: Prefix deletion misses relationships with NULL FQN hashes
...and 1 more resolved from earlier reviews 🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Summary
service.database.schema.table). A single indexedLIKE 'prefix.%'query deletes an entire subtree atomically. Works at any hierarchy level (deleting a Schema uses the same path as deleting a full DatabaseService).Changes
Schema
entity_relationship: addedfromFQNHashandtoFQNHashcolumns with indexes (MySQL + Postgres)1.14.1: schema DDL + Java backfill migration that populates hashes for all existing rowsDAO Layer
EntityRelationshipDAO: 8-arginsert()with COALESCE (existing callers unaffected via 6-arg default overload), backfill methods,deleteAllByFqnHashPrefix()EntityDAO:findIdsByFqnHashPrefix()anddeleteBatch()default methods (all entity DAOs inherit automatically)EntityTimeSeriesDAO:deleteByFqnHashPrefix()default methodCollectionDAO:UsageDAO.deleteBatch(),FeedDAO.findByEntityIds()Repository Layer
EntityRepository: FQN-bearingaddRelationship()overload,addServiceRelationship()now populates FQN hashes,deleteTimeSeriesByFqnPrefix()hook,getLockManager()/callPreDelete()/invalidateEntity()bridge methodsTableRepository,DatabaseSchemaRepository:storeRelationships()passes FQN stringsFeedRepository:deleteByAboutBatch()for bulk thread deletionOrchestration
PrefixDeletionService(new): 7-phase hard-delete orchestrationHierarchicalLockManagerdeletion lock (closes ingestion race window in <1ms)findIdsByFqnHashPrefix()across all entity reposentity_relationship,field_relationship,entity_extension,tag_usage,entity_usage, threadspreDeleteon root,deleteTimeSeriesByFqnPrefixper repo)LockManagerInitializer: now also initializesPrefixDeletionServiceon startupREST Endpoint
EntityResource:deletePrefixHardById()protected base method (async, 202 Accepted, WebSocket notification on completion)DatabaseServiceResource:DELETE /api/v1/services/databaseServices/prefix/{id}endpointImplementation Notes
CONTAINSrelationships (written viaaddServiceRelationshipandstoreRelationships) get FQN hashes on insert to avoid adding 2 extra DB lookups per row during bulk ingestion. The migration backfills all existing data.entity_relationshipinsert usesCOALESCE(:fromFQNHash, fromFQNHash)so re-ingestion never overwrites already-populated hashes with null.Test plan
mvn test-compile -pl openmetadata-service -DskipTests— compiles cleanmvn spotless:apply -pl openmetadata-service— no formatting changesDELETE /prefix/{id}, verify all entities return 404 andentity_relationshiphas no rows with the service's FQN hash prefix