Skip to content

feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade#27108

Open
mohityadav766 wants to merge 30 commits intomainfrom
mohit/35dc-improve-deletion
Open

feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade#27108
mohityadav766 wants to merge 30 commits intomainfrom
mohit/35dc-improve-deletion

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

@mohityadav766 mohityadav766 commented Apr 6, 2026

Summary

  • Problem 1 — Speed: Recursive cascade deletion walks the entity tree one entity at a time. For large services (1M+ tables), this takes hours. Each entity triggers individual queries for relationships, extensions, tags, threads, usage, and search events.
  • Problem 2 — Race condition: Ingestion can run concurrently with deletion. While deletion slowly walks the tree, ingestion creates new entities under the service. When deletion finally removes the root, those newly-created entities have a broken parent reference — orphaned, invisible in the UI, and impossible to clean up.
  • Solution: FQN prefix-based bulk deletion. Every entity's FQN encodes its full ancestry (service.database.schema.table). A single indexed LIKE 'prefix.%' query deletes an entire subtree atomically. Works at any hierarchy level (deleting a Schema uses the same path as deleting a full DatabaseService).

Changes

Schema

  • entity_relationship: added fromFQNHash and toFQNHash columns with indexes (MySQL + Postgres)
  • SQL migration 1.14.1: schema DDL + Java backfill migration that populates hashes for all existing rows

DAO Layer

  • EntityRelationshipDAO: 8-arg insert() with COALESCE (existing callers unaffected via 6-arg default overload), backfill methods, deleteAllByFqnHashPrefix()
  • EntityDAO: findIdsByFqnHashPrefix() and deleteBatch() default methods (all entity DAOs inherit automatically)
  • EntityTimeSeriesDAO: deleteByFqnHashPrefix() default method
  • CollectionDAO: UsageDAO.deleteBatch(), FeedDAO.findByEntityIds()

Repository Layer

  • EntityRepository: FQN-bearing addRelationship() overload, addServiceRelationship() now populates FQN hashes, deleteTimeSeriesByFqnPrefix() hook, getLockManager() / callPreDelete() / invalidateEntity() bridge methods
  • TableRepository, DatabaseSchemaRepository: storeRelationships() passes FQN strings
  • FeedRepository: deleteByAboutBatch() for bulk thread deletion

Orchestration

  • PrefixDeletionService (new): 7-phase hard-delete orchestration
    1. Acquire HierarchicalLockManager deletion lock (closes ingestion race window in <1ms)
    2. Collect all descendant UUIDs via findIdsByFqnHashPrefix() across all entity repos
    3. Bulk-delete dependency tables: entity_relationship, field_relationship, entity_extension, tag_usage, entity_usage, threads
    4. Run entity-specific hooks (preDelete on root, deleteTimeSeriesByFqnPrefix per repo)
    5. Bulk-delete from entity tables per type
    6. Emit search deletion event for root entity
    7. Release lock
  • LockManagerInitializer: now also initializes PrefixDeletionService on startup

REST Endpoint

  • EntityResource: deletePrefixHardById() protected base method (async, 202 Accepted, WebSocket notification on completion)
  • DatabaseServiceResource: DELETE /api/v1/services/databaseServices/prefix/{id} endpoint

Implementation Notes

  • Hard delete only — soft delete still uses the existing tree walk to preserve relationship data for restoration
  • FQN hashes on new relationships: Only CONTAINS relationships (written via addServiceRelationship and storeRelationships) get FQN hashes on insert to avoid adding 2 extra DB lookups per row during bulk ingestion. The migration backfills all existing data.
  • COALESCE on conflict: The entity_relationship insert uses COALESCE(:fromFQNHash, fromFQNHash) so re-ingestion never overwrites already-populated hashes with null.

Test plan

  • mvn test-compile -pl openmetadata-service -DskipTests — compiles clean
  • mvn spotless:apply -pl openmetadata-service — no formatting changes
  • Create a DatabaseService with nested Databases → Schemas → Tables, call DELETE /prefix/{id}, verify all entities return 404 and entity_relationship has no rows with the service's FQN hash prefix
  • Verify concurrent ingestion during deletion leaves no orphaned entities
  • Compare deletion timing against recursive delete on same dataset

Copilot AI review requested due to automatic review settings April 6, 2026 20:26
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 6, 2026
@mohityadav766 mohityadav766 changed the title Improve Deletion Process (vibe-kanban) feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade (Vibe Kanban) Apr 6, 2026
@mohityadav766 mohityadav766 changed the title feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade (Vibe Kanban) feat: implement fast FQN prefix-based bulk deletion to fix race conditions and slow cascade Apr 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a faster, prefix-based hard deletion flow aimed at preventing orphaned relationships when ingestion and cascading deletes run concurrently (particularly for large service hierarchies). It does so by adding FQN-hash metadata to relationships, providing a new API endpoint for prefix deletion, and implementing a server-side bulk deletion orchestrator.

Changes:

  • Added /prefix/{id} hard-delete endpoint for DatabaseService that triggers an async prefix-based deletion job.
  • Added fromFQNHash / toFQNHash columns + indexes to entity_relationship and a v1.14.1 migration utility to backfill existing rows.
  • Introduced PrefixDeletionService plus supporting DAO/repository changes for prefix ID discovery and batch deletes.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/resources/services/database/DatabaseServiceResource.java Adds new REST endpoint to start prefix-based hard delete for DatabaseService.
openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java Adds async orchestration method deletePrefixHardById calling PrefixDeletionService.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java New service coordinating bulk prefix-based deletion across tables.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java Adds DAO helpers for finding IDs by FQN-hash prefix and batch deletes.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java Extends relationship insertion to persist relationship FQN hashes; adds hook helpers used by prefix deletion.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java Extends entity_relationship inserts to include new hash columns; adds delete-by-prefix helpers and thread-entity batch lookup.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesDAO.java Adds delete-by-FQN-hash-prefix helper for time-series tables.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/FeedRepository.java Adds batch delete of threads by about-entity IDs (used by prefix deletion).
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java Updates CONTAINS relationship insertion to populate relationship FQN hashes.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java Updates CONTAINS relationship insertion to populate relationship FQN hashes.
openmetadata-service/src/main/java/org/openmetadata/service/initialization/LockManagerInitializer.java Initializes PrefixDeletionService alongside the existing hierarchical lock manager.
openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1141/MigrationUtil.java Adds backfill logic for populating new relationship hash columns for existing rows.
openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v1141/Migration.java Runs v1141 backfill for MySQL migrations.
openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v1141/Migration.java Runs v1141 backfill for Postgres migrations.
bootstrap/sql/schema/mysql.sql Adds fromFQNHash/toFQNHash + indexes to base MySQL schema.
bootstrap/sql/schema/postgres.sql Adds fromfqnhash/tofqnhash + indexes to base Postgres schema.
bootstrap/sql/migrations/native/1.14.1/mysql/schemaChanges.sql Adds columns + indexes to entity_relationship for MySQL migrations.
bootstrap/sql/migrations/native/1.14.1/postgres/schemaChanges.sql Adds columns + indexes to entity_relationship for Postgres migrations.

Comment on lines +717 to +722
ExecutorService executorService = AsyncService.getInstance().getExecutorService();
executorService.submit(
RequestLatencyContext.wrapWithContext(
() -> {
try {
PrefixDeletionService.getInstance().deletePrefixHard(entity, userName);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrefixDeletionService.getInstance() can return null if the lock manager initializer didn’t run (or CollectionDAO was unavailable), which will NPE here. Please either make getInstance() lazily initialize a no-lock instance, or guard here and return a clear 503/500 explaining prefix deletion isn’t available until initialization completes.

Suggested change
ExecutorService executorService = AsyncService.getInstance().getExecutorService();
executorService.submit(
RequestLatencyContext.wrapWithContext(
() -> {
try {
PrefixDeletionService.getInstance().deletePrefixHard(entity, userName);
PrefixDeletionService prefixDeletionService = PrefixDeletionService.getInstance();
if (prefixDeletionService == null) {
return Response.status(Response.Status.SERVICE_UNAVAILABLE)
.type(MediaType.TEXT_PLAIN)
.entity("Prefix deletion is not available until service initialization completes.")
.build();
}
ExecutorService executorService = AsyncService.getInstance().getExecutorService();
executorService.submit(
RequestLatencyContext.wrapWithContext(
() -> {
try {
prefixDeletionService.deletePrefixHard(entity, userName);

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +134
private void deleteDependencyTables(String rootFqn, String fqnHashPrefix, List<String> allIds) {
CollectionDAO dao = Entity.getCollectionDAO();
dao.relationshipDAO().deleteAllByFqnHashPrefix(fqnHashPrefix);
dao.fieldRelationshipDAO().deleteAllByPrefix(rootFqn);
dao.entityExtensionDAO().deleteAllBatch(allIds);
dao.tagUsageDAO().deleteTagLabelsByTargetPrefix(rootFqn);
dao.usageDAO().deleteBatch(allIds);
List<UUID> allUuids = allIds.stream().map(UUID::fromString).toList();
Entity.getFeedRepository().deleteByAboutBatch(allUuids);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code uses deleteAllByFqnHashPrefix(...) as the sole cleanup for entity_relationship. Any relationship rows with NULL fromFQNHash/toFQNHash (e.g., inserted via existing addRelationship(...) overloads that don’t pass FQNs) will not be deleted, leaving orphaned relationships. Consider adding a fallback delete by descendant IDs (batch delete) or ensuring all relationship inserts always populate the hash columns.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +165
private void deleteEntityTables(
Map<String, List<UUID>> descendantsByType, EntityInterface rootEntity) {
for (Map.Entry<String, List<UUID>> entry : descendantsByType.entrySet()) {
String entityType = entry.getKey();
List<String> ids = entry.getValue().stream().map(UUID::toString).toList();
try {
Entity.getEntityRepository(entityType).getDao().deleteBatch(ids);
} catch (Exception e) {
LOG.warn(
"Failed to delete {} entities of type {}: {}", ids.size(), entityType, e.getMessage());
}
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteBatch(ids) is implemented as a single WHERE id IN (<ids>) statement. For large services this can exceed DB parameter limits / max packet size and become the new bottleneck or fail outright. Please chunk batch deletes (e.g., 500/1000 IDs per statement) similarly to RelationshipDAO.batchDeleteRelationships(...), for both entity tables and dependent tables that delete by IN (<ids>).

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +153
Entity.getEntityRepository(entityType).deleteTimeSeriesByFqnPrefix(fqnHashPrefix);
} catch (Exception e) {
LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage());
}
}
}

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteTimeSeriesByFqnPrefix(...) is a no-op in EntityRepository, and there are currently no repository overrides, so this loop won’t delete any time-series rows (e.g., profiler/data quality/query cost/workflow time-series). Prefix hard-delete will leave orphaned time-series data. Implement overrides in the repositories that own time-series tables to call the appropriate EntityTimeSeriesDAO.deleteByFqnHashPrefix(...) (or equivalent).

Suggested change
Entity.getEntityRepository(entityType).deleteTimeSeriesByFqnPrefix(fqnHashPrefix);
} catch (Exception e) {
LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage());
}
}
}
EntityRepository<?> repository = Entity.getEntityRepository(entityType);
if (hasTimeSeriesPrefixDeletionOverride(repository)) {
repository.deleteTimeSeriesByFqnPrefix(fqnHashPrefix);
} else {
LOG.warn(
"Repository {} for entity type {} does not override deleteTimeSeriesByFqnPrefix; "
+ "prefix hard delete may leave orphaned time-series data",
repository.getClass().getName(),
entityType);
}
} catch (Exception e) {
LOG.debug("deleteTimeSeriesByFqnPrefix failed for type {}: {}", entityType, e.getMessage());
}
}
}
private boolean hasTimeSeriesPrefixDeletionOverride(EntityRepository<?> repository) {
try {
return repository
.getClass()
.getMethod("deleteTimeSeriesByFqnPrefix", String.class)
.getDeclaringClass()
!= EntityRepository.class;
} catch (NoSuchMethodException e) {
return false;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +35
/**
* Orchestrates fast prefix-based hard deletion of an entity and all its descendants.
*
* <p>Uses 7 phases to atomically delete an entire subtree by FQN prefix rather than walking the
* entity tree one-by-one. This is orders of magnitude faster for large hierarchies and eliminates
* the race condition where concurrent ingestion creates orphaned entities during slow cascade
* deletion.
*
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class-level Javadoc says this deletion is "atomically" deleting the subtree, but the implementation isn’t wrapped in a single transaction and also catches/logs exceptions while continuing. This can leave the DB in a partially-deleted state while still sending a "complete" notification. Either adjust the documentation/notification semantics to reflect best-effort behavior, or introduce a transactional boundary and fail-fast semantics where feasible.

Copilot uses AI. Check for mistakes.
Comment on lines +776 to +788
public void deleteByAboutBatch(List<UUID> entityIds) {
if (nullOrEmpty(entityIds)) {
return;
}
List<String> ids = entityIds.stream().map(UUID::toString).toList();
List<String> threadIds = listOrEmpty(dao.feedDAO().findByEntityIds(ids));
for (String threadId : threadIds) {
try {
deleteThreadInternal(UUID.fromString(threadId));
} catch (Exception ex) {
// Continue deletion
}
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteByAboutBatch silently swallows exceptions per-thread. This makes it very hard to diagnose partial deletions during prefix hard-delete and can leave threads/relationships behind without any signal. At minimum log at DEBUG/WARN with the threadId/exception, or aggregate failures and report them back to the caller/job notification.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +82
/**
* Hard-deletes the given root entity and all descendants whose FQN starts with the root's FQN.
* Works at any hierarchy level (service, database, schema, etc.).
*/
public void deletePrefixHard(EntityInterface rootEntity, String deletedBy) {
String rootFqn = rootEntity.getFullyQualifiedName();
String fqnHashPrefix = FullyQualifiedName.buildHash(rootFqn);
DeletionLock lock = acquireLock(rootEntity, deletedBy);
try {
Map<String, List<UUID>> descendantsByType = collectDescendantIds(fqnHashPrefix);
List<String> allIds = buildAllIds(rootEntity.getId(), descendantsByType);
deleteDependencyTables(rootFqn, fqnHashPrefix, allIds);
runEntityHooks(rootEntity, deletedBy, fqnHashPrefix);
deleteEntityTables(descendantsByType, rootEntity);
emitDeleteEvent(rootEntity);
} finally {
releaseLock(lock, rootEntity);
}
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new deletion code path (prefix hard-delete) with non-trivial DB side effects across many tables, but there are no unit/integration tests validating correctness (e.g., no orphan rows in entity_relationship/field_relationship/time-series, and behavior vs existing recursive hard delete). Please add focused tests for PrefixDeletionService (mock DAO + verification) and at least one integration test covering database service subtree deletion.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +97
private DeletionLock acquireLock(EntityInterface entity, String deletedBy) {
if (lockManager == null) {
return null;
}
try {
return lockManager.acquireDeletionLock(entity, deletedBy, true);
} catch (Exception e) {
LOG.warn(
"Could not acquire deletion lock for {}: {}",
entity.getFullyQualifiedName(),
e.getMessage());
return null;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Lock failure silently ignored — deletion proceeds unprotected

The acquireLock method (line 84-97) catches all exceptions (including EntityLockedException thrown on contention) and returns null. The caller deletePrefixHard (line 71) never checks whether the lock was actually acquired — it proceeds with all 7 deletion phases regardless. This completely defeats the stated goal of eliminating race conditions with concurrent ingestion.

If the lock cannot be acquired (e.g., another deletion is in progress, or the lock manager is unavailable), the deletion runs without protection, allowing ingestion to create new entities in the subtree while it's being deleted — exactly the orphan scenario this PR aims to fix.

Suggested fix:

private DeletionLock acquireLock(EntityInterface entity, String deletedBy) {
  if (lockManager == null) {
    throw new IllegalStateException("Lock manager not initialized");
  }
  // Let EntityLockedException propagate to the caller
  return lockManager.acquireDeletionLock(entity, deletedBy, true);
}

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔴 Playwright Results — 1 failure(s), 30 flaky

✅ 3593 passed · ❌ 1 failed · 🟡 30 flaky · ⏭️ 207 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 454 0 3 2
🟡 Shard 2 633 0 10 32
🔴 Shard 3 641 1 7 26
🟡 Shard 4 623 0 4 47
🟡 Shard 5 608 0 1 67
🟡 Shard 6 634 0 5 33

Genuine Failures (failed on all attempts)

Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3)
�[31mTest timeout of 180000ms exceeded.�[39m
🟡 30 flaky test(s) (passed on retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 2 retries)
  • Features/BulkImport.spec.ts › Keyboard Delete selection (shard 2, 1 retry)
  • Features/ChangeSummaryBadge.spec.ts › AI badge should NOT appear for manually-edited descriptions (shard 2, 1 retry)
  • Features/DataProductRenameConsolidation.spec.ts › Rename then change owner - assets should be preserved (shard 2, 1 retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
  • Features/DomainTierCertificationVoting.spec.ts › DataProduct - Certification assign, update, and remove (shard 2, 1 retry)
  • Features/DomainTierCertificationVoting.spec.ts › DataProduct - UpVote and DownVote (shard 2, 1 retry)
  • Features/ExploreQuickFilters.spec.ts › tier with assigned asset appears in dropdown, tier without asset does not (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › Date Time (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Owner Rule Any_In (shard 4, 1 retry)
  • Pages/Entity.spec.ts › User as Owner Add, Update and Remove (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Update displayName (shard 4, 1 retry)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Tier Add, Update and Remove (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Reset Password for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql:142

  • These schema changes are being appended to the existing 1.13.0 migration. If a database was already upgraded past 1.13.0, the migration framework will not re-run 1.13.0, so the new columns/indexes will never be applied.

To ensure upgrades work correctly, the DDL should be added to the current/new migration version (e.g. 1.14.1) rather than modifying an older migration script.

-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
    ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
    ADD COLUMN IF NOT EXISTS toFQNHash   VARCHAR(768) DEFAULT NULL;

CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash(768));
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash   ON entity_relationship (toFQNHash(768));

Comment on lines +126 to +135
private void deleteDependencyTables(String rootFqn, String fqnHashPrefix, List<String> allIds) {
CollectionDAO dao = Entity.getCollectionDAO();
dao.relationshipDAO().deleteAllByFqnHashPrefix(fqnHashPrefix);
dao.fieldRelationshipDAO().deleteAllByPrefix(rootFqn);
dao.entityExtensionDAO().deleteAllBatch(allIds);
dao.tagUsageDAO().deleteTagLabelsByTargetPrefix(rootFqn);
dao.usageDAO().deleteBatch(allIds);
List<UUID> allUuids = allIds.stream().map(UUID::fromString).toList();
Entity.getFeedRepository().deleteByAboutBatch(allUuids);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletion path builds allIds for the entire subtree and then issues multiple ... WHERE id IN (<ids>) calls (entity_extension, entity_usage, and thread lookup via deleteByAboutBatch). For large services (the stated 1M+ tables use case), these unchunked IN-lists can exceed database parameter limits and/or cause very large allocations.

Recommend chunking allIds consistently (similar to batchDeleteRelationships), or adding DAO methods that delete by FQN hash prefix directly to avoid materializing/passing millions of IDs through the application.

Copilot uses AI. Check for mistakes.
Comment on lines +775 to +789
@Transaction
public void deleteByAboutBatch(List<UUID> entityIds) {
if (nullOrEmpty(entityIds)) {
return;
}
List<String> ids = entityIds.stream().map(UUID::toString).toList();
List<String> threadIds = listOrEmpty(dao.feedDAO().findByEntityIds(ids));
for (String threadId : threadIds) {
try {
deleteThreadInternal(UUID.fromString(threadId));
} catch (Exception ex) {
// Continue deletion
}
}
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteByAboutBatch still performs per-thread deletes in a loop and first fetches thread IDs with a single IN (<entityIds>) query. With prefix deletion, entityIds can be extremely large, which can exceed DB bind limits and makes this method a bottleneck.

Consider chunking entityIds for findByEntityIds, and then using the existing deleteThreadsInBatch(...) to delete threads/relationships in bulk instead of calling deleteThreadInternal(...) repeatedly.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +65
private static void backfillForEntityType(Handle handle, String entityType) {
EntityRepository<?> repo = Entity.getEntityRepository(entityType);
String hashCol = repo.getDao().getNameHashColumn();
if (!FQNHASH_COL.equals(hashCol) && !NAMEHASH_COL.equals(hashCol)) {
return;
}
String tableName = repo.getDao().getTableName();
int offset = 0;
int processed;
do {
processed = processEntityBatch(handle, tableName, hashCol, offset);
offset += processed;
} while (processed == BATCH_SIZE);
LOG.info("Backfilled FQN hashes for entity type {}: {} rows", entityType, offset);
}

private static int processEntityBatch(Handle handle, String tableName, String hashCol, int offset) {
String sql =
"SELECT id, "
+ hashCol
+ " FROM "
+ tableName
+ " WHERE "
+ hashCol
+ " IS NOT NULL LIMIT :limit OFFSET :offset";
List<Map<String, Object>> rows =
handle.createQuery(sql).bind("limit", BATCH_SIZE).bind("offset", offset).mapToMap().list();
for (Map<String, Object> row : rows) {
backfillRow(handle, row, hashCol);
}
return rows.size();
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration utility uses LIMIT/OFFSET without an ORDER BY. That makes pagination non-deterministic (rows can be skipped/duplicated) and OFFSET becomes very slow on large tables.

For a backfill intended to handle very large installations, prefer a deterministic keyset pagination (e.g., WHERE id > :lastId ORDER BY id LIMIT :limit) and consider restricting to entities that actually have fqnHash (since prefix deletion is based on FQN hashes).

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +162

-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL;

CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash);
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash);
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These schema changes are being appended to the existing 1.13.0 migration. If a database was already upgraded past 1.13.0, the migration framework will not re-run 1.13.0, so the new columns/indexes will never be applied.

To ensure upgrades work correctly, the DDL should be added to the current/new migration version (e.g. 1.14.1) rather than modifying an older migration script.

Suggested change
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL;
CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash);
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash);

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
-- No schema changes in this version.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description indicates the new entity_relationship columns/indexes are introduced in 1.14.1, but this 1.14.1 migration is marked as having no schema changes. As written, upgrading from 1.14.0 → 1.14.1 would not apply the required DDL.

Move the ALTER TABLE ... ADD COLUMN fromFQNHash/toFQNHash + index creation into the 1.14.1 schemaChanges files (or whichever version is intended to ship this feature).

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
public class Migration extends MigrationProcessImpl {

public Migration(MigrationFile migrationFile) {
super(migrationFile);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backfill is currently wired into the v1130 migration path, but v1141 (1.14.1) migration has no runDataMigration() and therefore won't backfill fromFQNHash/toFQNHash when upgrading to 1.14.1. Combined with the DDL living in older native migrations, this risks leaving entity_relationship hashes unpopulated on many upgrade paths.

Consider moving the backfill invocation into the v1141 migration (and keeping older migrations immutable).

Suggested change
public class Migration extends MigrationProcessImpl {
public Migration(MigrationFile migrationFile) {
super(migrationFile);
}
public class Migration extends MigrationProcessImpl {
private final MigrationFile migrationFile;
public Migration(MigrationFile migrationFile) {
super(migrationFile);
this.migrationFile = migrationFile;
}
@Override
public void runDataMigration() {
new org.openmetadata.service.migration.postgres.v1130.Migration(migrationFile)
.runDataMigration();
}

Copilot uses AI. Check for mistakes.

public Migration(MigrationFile migrationFile) {
super(migrationFile);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backfill is currently wired into the v1130 migration path, but v1141 (1.14.1) migration has no runDataMigration() and therefore won't backfill fromFQNHash/toFQNHash when upgrading to 1.14.1. Combined with the DDL living in older native migrations, this risks leaving entity_relationship hashes unpopulated on many upgrade paths.

Consider moving the backfill invocation into the v1141 migration (and keeping older migrations immutable).

Suggested change
}
}
@Override
public void runDataMigration() {
super.runDataMigration();
executeSQL(
"""
UPDATE entity_relationship
SET
fromFQNHash = MD5(fromFQN),
toFQNHash = MD5(toFQN)
WHERE
(fromFQNHash IS NULL OR fromFQNHash = '')
OR (toFQNHash IS NULL OR toFQNHash = '')
""");
}

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +95
if (lockManager == null) {
return null;
}
try {
return lockManager.acquireDeletionLock(entity, deletedBy, true);
} catch (Exception e) {
LOG.warn(
"Could not acquire deletion lock for {}: {}",
entity.getFullyQualifiedName(),
e.getMessage());
return null;
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the deletion lock cannot be acquired, the code currently logs and proceeds with the bulk delete anyway. That re-opens the ingestion race condition this feature is intended to eliminate (and could still create orphans while the delete is running).

Consider failing the operation when the deletion lock cannot be acquired (or retrying/acquiring in blocking mode), so callers get a clear failure instead of a best-effort delete.

Suggested change
if (lockManager == null) {
return null;
}
try {
return lockManager.acquireDeletionLock(entity, deletedBy, true);
} catch (Exception e) {
LOG.warn(
"Could not acquire deletion lock for {}: {}",
entity.getFullyQualifiedName(),
e.getMessage());
return null;
String entityFqn = entity.getFullyQualifiedName();
if (lockManager == null) {
throw new IllegalStateException(
String.format("Cannot hard delete %s because the deletion lock manager is not available", entityFqn));
}
try {
DeletionLock deletionLock = lockManager.acquireDeletionLock(entity, deletedBy, true);
if (deletionLock == null) {
throw new IllegalStateException(
String.format("Cannot hard delete %s because the deletion lock could not be acquired", entityFqn));
}
return deletionLock;
} catch (Exception e) {
throw new IllegalStateException(
String.format("Cannot hard delete %s because the deletion lock could not be acquired", entityFqn), e);

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +27
try {
MigrationUtil.backfillRelationshipFqnHashes(handle);
} catch (Exception e) {
LOG.error(
"Failed to backfill FQN hashes in entity_relationship during v1130 migration. "
+ "Fast prefix deletion may not work correctly for pre-existing relationships.",
e);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the relationship FQN-hash backfill in the v1130 migration means it will not run for environments that are already beyond 1.13.0 (the common case for a 1.14.1 upgrade). This can leave fromFQNHash/toFQNHash NULL and break prefix deletion.

To make upgrades reliable, wire the backfill into the migration version that actually introduces the columns (e.g. v1141) rather than v1130.

Suggested change
try {
MigrationUtil.backfillRelationshipFqnHashes(handle);
} catch (Exception e) {
LOG.error(
"Failed to backfill FQN hashes in entity_relationship during v1130 migration. "
+ "Fast prefix deletion may not work correctly for pre-existing relationships.",
e);
}

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +27
try {
MigrationUtil.backfillRelationshipFqnHashes(handle);
} catch (Exception e) {
LOG.error(
"Failed to backfill FQN hashes in entity_relationship during v1130 migration. "
+ "Fast prefix deletion may not work correctly for pre-existing relationships.",
e);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the relationship FQN-hash backfill in the v1130 migration means it will not run for environments that are already beyond 1.13.0 (the common case for a 1.14.1 upgrade). This can leave fromFQNHash/toFQNHash NULL and break prefix deletion.

To make upgrades reliable, wire the backfill into the migration version that actually introduces the columns (e.g. v1141) rather than v1130.

Suggested change
try {
MigrationUtil.backfillRelationshipFqnHashes(handle);
} catch (Exception e) {
LOG.error(
"Failed to backfill FQN hashes in entity_relationship during v1130 migration. "
+ "Fast prefix deletion may not work correctly for pre-existing relationships.",
e);
}

Copilot uses AI. Check for mistakes.
mohityadav766 and others added 23 commits April 11, 2026 01:16
Adds 11 unit tests covering the correlated-subquery backfill logic
introduced in MigrationUtil.backfillRelationshipFqnHashes():

- MigrationUtilTest: 7 tests verifying empty entity list no-ops,
  fqnHash/nameHash entity updates, unrecognised hash column skip,
  time-series entity exception swallowing, multi-entity failure
  isolation, and correct CAST SQL generation
- mysql/v1130/MigrationTest: 2 tests verifying runDataMigration
  delegates to backfillRelationshipFqnHashes and swallows errors
- postgres/v1130/MigrationTest: same 2 tests for Postgres variant

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PrefixDeletionIT - three integration tests verifying the
DELETE /v1/services/databaseServices/prefix/{id} endpoint:
- single service with 5 tables is fully erased (service, database,
  schema, all tables return 404 after deletion)
- service with 2×2 databases/schemas and 12 tables all erased
- empty service (no databases) is deleted cleanly

PrefixDeletionBenchmarkIT - disabled manual benchmark tagged
@benchmark that creates equal hierarchies and times:
- old: DELETE /{id}?hardDelete=true&recursive=true
- new: DELETE /prefix/{id}
Logs speedup factor; configurable table count via -Dtest.scenario=N.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously only DatabaseService had DELETE /prefix/{id}. Added the same
endpoint to DatabaseResource and DatabaseSchemaResource so callers can
bulk-hard-delete at any level of the service hierarchy:

  DELETE /v1/services/databaseServices/prefix/{id}  (existing)
  DELETE /v1/databases/prefix/{id}                  (new)
  DELETE /v1/databaseSchemas/prefix/{id}            (new)

All three delegate to EntityResource.deletePrefixHardById() which runs
PrefixDeletionService.deletePrefixHard() — same FQN hash prefix logic,
same async execution, same auth checks.

Updated PrefixDeletionIT with two new integration tests that verify the
sibling-isolation guarantee: deleting a database leaves sibling databases
intact, and deleting a schema leaves sibling schemas and their tables intact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously DELETE /prefix/{id} was only wired in DatabaseServiceResource
(and manually added to Database/DatabaseSchema). Adding it to each of the
60+ entity resource subclasses individually would be maintenance-heavy and
error-prone.

JAX-RS inherits endpoint annotations from parent classes when the
subclass method has no annotations of its own, so adding @delete
@path("/prefix/{id}") once to EntityResource.deletePrefixHardById()
exposes the endpoint on every entity resource automatically:

  DELETE /v1/databases/prefix/{id}
  DELETE /v1/databaseSchemas/prefix/{id}
  DELETE /v1/tables/prefix/{id}
  DELETE /v1/services/databaseServices/prefix/{id}
  DELETE /v1/glossaries/prefix/{id}
  ... (every EntityResource subclass)

Removed the now-redundant overrides from DatabaseServiceResource,
DatabaseResource, and DatabaseSchemaResource.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The prefix delete endpoint returns 202 immediately and processes the
deletion in a background thread. Asserting right after the response
would always race against the job and produce false failures.

Replace assertEntityGone() (which asserted synchronously) with
awaitGone() backed by Awaitility — polls every 1s for up to 30s until
the GET throws, matching the async job lifecycle.

Also separate the mixed descendant ID list into typed lists (dbIds,
schemaIds, tableIds) so each is fetched via the correct SDK client.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rvice is not null

LockManagerInitializer.initialize() was never called, leaving
PrefixDeletionService.getInstance() returning null. Any call to
DELETE /prefix/{id} would immediately NullPointerException in
EntityResource.deletePrefixHardById().

Call LockManagerInitializer.initialize() in OpenMetadataApplication.run()
after Entity.initializeRepositories() so that both CollectionDAO and all
entity repositories are available when the lock manager and
PrefixDeletionService are wired up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nflicts

The old code called DatabaseTestFactory.create(ns, ...) and
DatabaseSchemaTestFactory.create(ns, ...) inside loops — both always
produced ns.prefix("db") / ns.prefix("schema"), so the second iteration
hit a 409 ConflictException: Entity already exists.

Fix:
- Add DatabaseTestFactory.createWithName(ns, serviceFqn, baseName) and
  DatabaseSchemaTestFactory.createWithName(ns, databaseFqn, baseName) so
  callers can provide a distinct base name that gets namespaced.
- Rewrite every loop in PrefixDeletionIT and PrefixDeletionBenchmarkIT
  to pass a loop-index-qualified base name ("db0", "db1", "sc00", ...).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on time

- Default topology changed to 5 × 5 × 400 = 10,000 tables per service
  (~10,031 total entities), all configurable via -Dtest.databases/schemas/tables
- timeNewDelete now polls with Awaitility until the service is actually gone
  so the reported elapsed time covers real deletion, not just the 202 handoff
- Log messages improved to show total entity count and seeding time estimate
- Removed unused HashMap/Map imports, replaced with Duration/UUID/Awaitility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace sequential triple-nested loop with a 3-phase parallel approach:
1. All databases created concurrently
2. All schemas created concurrently (after databases resolve)
3. All tables created concurrently (after schemas resolve)

Default 32 threads (tunable via -Dtest.seedThreads). At ~50ms/REST call
and 32 threads, 10k tables seed in ~20s instead of ~8 min sequentially.
Also logs per-entity average seed time for diagnostics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…greSQL

The rest of the codebase uses VARCHAR(36) for UUID columns in PostgreSQL.
BindUUID binds UUID values as strings (via UUID.toString()), which works
with VARCHAR(36) but fails with native UUID columns because PostgreSQL
rejects the implicit text = uuid comparison:

  Hint: No operator matches the given name and argument types.

Changing id and entityId to VARCHAR(36) makes the schema consistent with
every other entity table and removes the type cast error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…greSQL

The entity_deletion_lock table was created in 1.9.0 with native UUID columns
for id and entityId. The rest of the codebase uses VARCHAR(36) for all UUID
columns so that BindUUID (which binds via UUID.toString()) can compare them
without an explicit type cast.

Without this, PostgreSQL rejects the WHERE clause in DeletionLockDAO:
  Hint: No operator matches the given name and argument types.
  You might need to add explicit type casts.

ALTER TABLE migrates existing rows in place — UUID::VARCHAR produces the
standard 8-4-4-4-12 string representation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…greSQL

The entity_deletion_lock table was created in 1.9.0 with native UUID columns
for id and entityId. The rest of the codebase uses VARCHAR(36) for all UUID
columns so that BindUUID (which binds via UUID.toString()) can compare them
without an explicit type cast.

Without this, PostgreSQL rejects the WHERE clause in DeletionLockDAO:
  Hint: No operator matches the given name and argument types.
  You might need to add explicit type casts.

ALTER TABLE migrates existing rows in place — UUID::VARCHAR produces the
standard 8-4-4-4-12 string representation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PostgreSQL limits prepared statements to 65,535 parameters. With 10k+
entities each having multiple extension rows, deleteAllBatch was
generating statements with 100k+ parameters and failing with:

  PSQLException: PreparedStatement can have at most 65,535 parameters.

Wrap both EntityExtensionDAO.deleteAllBatch and UsageDAO.deleteBatch
with default interface methods that chunk into 50,000-ID slices before
delegating to the raw @SqlUpdate methods. 50k is safely below the limit
while keeping the number of round trips minimal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The codebase pattern is: EntityResource provides the implementation with
no JAX-RS annotations; each subclass explicitly declares the endpoint with
@DELETE/@Path/@operation so it appears in the Swagger UI and is routed
correctly by JAX-RS.

- Strip @DELETE/@Path/@operation from EntityResource.deletePrefixHardById
  so it becomes a plain implementation method (same as delete/deleteByIdAsync)
- Add explicit @delete /prefix/{id} override in DatabaseServiceResource
  that calls the base implementation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asses

Add the hard-delete-by-FQN-prefix endpoint to all 69 EntityResource
subclasses following the established pattern: each resource class
explicitly declares @delete @path("/prefix/{id}") with entity-specific
@operation metadata and delegates to EntityResource.deletePrefixHardById.

Resources covered:
- 13 service resources (database, storage, messaging, pipeline, ml,
  metadata, search, security, mcp, llm, drive, dashboard, api)
- Database hierarchy (database, schema, table, stored procedure)
- Data entities (topic, container, search index, pipeline, ml model,
  dashboard, chart, dashboard data model, api collection, api endpoint)
- Team/user/role (team, user, role, persona, policy, bot)
- Classification/taxonomy (classification, tag, glossary, glossary term,
  domain, data product)
- Tests/quality (test suite, test case, test definition, test connection
  definition, data contract, kpi, metric, query)
- Apps/workflows (app, marketplace, workflow, workflow definition,
  ingestion pipeline, event subscription, notification template)
- AI (ai application, ai governance policy, llm model, mcp server,
  prompt template)
- Drives (directory, file, spreadsheet, worksheet)
- Misc (type, report, doc store, data insight chart, system chart,
  web analytic event, learning resource)

Also adds missing imports (DELETE, PathParam, Parameter, ApiResponse,
Schema) to the 5 files that lacked them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DataInsightSystemChartResource was missing java.util.UUID and
TestConnectionDefinitionResource was missing jakarta.ws.rs.core.Response,
both required by the new deletePrefixHardById endpoint. Spotted during
mvn clean package compilation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 70 subclass overrides were calling deletePrefixHardById(uriInfo,
securityContext, id) which resolved to the overriding method itself
(infinite recursion), not the base class implementation. This caused:

  java.lang.StackOverflowError
    at DatabaseSchemaResource.deletePrefixHardById(...)
    at DatabaseSchemaResource.deletePrefixHardById(...)
    ...

Changed all 70 call sites to super.deletePrefixHardById(...) so they
delegate to EntityResource's implementation as intended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r limit

With 72k+ entity IDs from a large service hierarchy, findByEntityIds was
generating a prepared statement with 72,157 parameters, exceeding
PostgreSQL's 65,535 limit:

  PSQLException: PreparedStatement can have at most 65,535 parameters.

Same fix as EntityExtensionDAO and UsageDAO: rename the raw @sqlquery
to findByEntityIdsChunk and wrap it with a default method that collects
results across 50,000-ID slices.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After bulk-deleting entities from the DB, call
SearchRepository.deleteByEntityTypeFqnPrefix for the root entity type
and every descendant type found during the scan. This removes all
matching documents from each type's ES index using a single FQN prefix
wildcard query per index, preventing deleted entities from continuing
to appear in search results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…letion

Previously only the root entity's preDelete hook ran. This adds two new
extension points to EntityRepository:

- preDeleteByFqnHashPrefix(fqnHashPrefix, deletedBy): called before DB
  deletion for each entity type; entities are still in the DB so external
  systems can be cleaned up. IngestionPipelineRepository overrides this to
  call pipelineServiceClient.deletePipeline() for every matching Airflow
  pipeline and removes pipeline status time-series records.

- postDeleteByFqnHashPrefix(fqnHashPrefix): called after DB deletion for
  post-deletion metadata work that does not require loading deleted entities.

PrefixDeletionService.runEntityHooks now invokes preDeleteByFqnHashPrefix
for every entity type (alongside the existing deleteTimeSeriesByFqnPrefix
loop), and a new runPostEntityHooks phase fires postDeleteByFqnHashPrefix
for every type after the entity tables are cleared.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new extension points to EntityRepository:
- preDeleteByFqnHashPrefix(fqnHashPrefix, deletedBy): runs before DB
  deletion; entities still exist so external systems can be cleaned up
- postDeleteByFqnHashPrefix(fqnHashPrefix): runs after DB deletion for
  metadata-only post-deletion work

Concrete overrides added:
- IngestionPipelineRepository: calls Airflow pipelineServiceClient.deletePipeline()
  and removes pipeline status time-series records for each matching pipeline
- DataContractRepository: deletes the linked TestSuite for each matching
  contract that has quality expectations (prevents orphaned TestSuite entities)
- WorkflowDefinitionRepository: calls WorkflowHandler.deleteWorkflowDefinition()
  for each matching workflow definition
- WorkflowRepository: calls SecretsManagerFactory.deleteSecretsFromWorkflow()
  for each matching workflow (prevents orphaned secrets)

Also adds dao.entityExtensionTimeSeriesDao().deleteByFqnHashPrefix() to
deleteDependencyTables so all entity_extension_time_series rows (pipeline
status, data contract results, etc.) are bulk-cleaned in one pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

MySQL 8.x does not support ADD COLUMN IF NOT EXISTS (that is a MariaDB
extension). Replace with the SET @SQL / PREPARE / EXECUTE / DEALLOCATE
pattern already used elsewhere in this migration file to conditionally
add fromFQNHash and toFQNHash columns to entity_relationship.

CREATE INDEX IF NOT EXISTS is supported by MySQL 8.0+ so those lines
are left unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 104 out of 104 changed files in this pull request and generated 7 comments.

Comment on lines +198 to +205
default List<UUID> findIdsByFqnHashPrefix(String fqnHashPrefix) {
if (!"fqnHash".equals(getNameHashColumn())) {
return List.of();
}
return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream()
.map(UUID::fromString)
.toList();
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findIdsByFqnHashPrefix currently returns IDs only when getNameHashColumn() equals "fqnHash", but many entity tables use nameHash while still storing the FQN hash via @BindFQN (i.e., FullyQualifiedName.buildHash). This means prefix deletion will miss descendants for those entity types (e.g., teams/sub-teams) even though the hashes are prefix-searchable. Consider using getNameHashColumn() as the column to LIKE against (or allow both fqnHash and nameHash).

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 53
// Set it on EntityRepository
EntityRepository.setLockManager(lockManager);

// Initialize PrefixDeletionService with the same lock manager
PrefixDeletionService.initialize(lockManager);

initialized = true;
LOG.info("Hierarchical lock manager initialized successfully");

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrefixDeletionService.initialize(lockManager) is only invoked on the successful lock-manager path. If initialization fails, PrefixDeletionService.getInstance() can remain null and prefix-delete endpoints will NPE. To keep prefix deletion working even without locking, initialize PrefixDeletionService in the failure/skip paths as well (e.g., with a null lock manager) or make getInstance() lazily create a no-lock instance.

Copilot uses AI. Check for mistakes.
Comment on lines +2181 to +2187
public void deleteByEntityTypeFqnPrefix(String entityType, String fqnPrefix) {
if (!checkIfIndexingIsSupported(entityType)) {
return;
}
if (!getSearchClient().isClientAvailable()) {
return;
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteByEntityTypeFqnPrefix returns early when the search client is unavailable, which can leave stale documents with no retry path. Other delete methods enqueue to SearchIndexRetryQueue when ES/OS is down; consider doing the same here (at least for the root delete request) so prefix deletions eventually clean up search indexes.

Copilot uses AI. Check for mistakes.
Comment on lines +528 to +532
operationId = "deleteApiServicePrefixHard",
summary = "Hard-delete a API service and all descendants by FQN prefix",
description =
"Bulk hard-delete this API service and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: the OpenAPI summary/description uses "Hard-delete a API service"; since "API" starts with a vowel sound, this should be "an API service" (same for the description string).

Copilot uses AI. Check for mistakes.
Comment on lines +927 to +931
operationId = "deleteAppPrefixHard",
summary = "Hard-delete a app and all descendants by FQN prefix",
description =
"Bulk hard-delete this app and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: the OpenAPI summary/description uses "Hard-delete a app"; this should be "an app" (and likewise in the description) for correct English in generated API docs.

Copilot uses AI. Check for mistakes.
Comment on lines +910 to +914
operationId = "deleteIngestionPipelinePrefixHard",
summary = "Hard-delete a ingestion pipeline and all descendants by FQN prefix",
description =
"Bulk hard-delete this ingestion pipeline and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: the OpenAPI summary/description uses "Hard-delete a ingestion pipeline"; this should be "an ingestion pipeline" (and likewise in the description) for correct English in generated API docs.

Copilot uses AI. Check for mistakes.
Comment on lines +567 to +571
operationId = "deleteEventSubscriptionPrefixHard",
summary = "Hard-delete a event subscription and all descendants by FQN prefix",
description =
"Bulk hard-delete this event subscription and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: the OpenAPI summary/description uses "Hard-delete a event subscription"; this should be "an event subscription" (and likewise in the description) for correct English in generated API docs.

Copilot uses AI. Check for mistakes.
mohityadav766 and others added 2 commits April 11, 2026 22:38
The internal migration runner handles duplicate index checks.
Plain CREATE INDEX is simpler and works across all MySQL 8.x versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the information_schema conditional workarounds. The internal
migration runner handles idempotency, so plain DDL is sufficient.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 104 out of 104 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (2)

openmetadata-service/src/main/java/org/openmetadata/service/initialization/LockManagerInitializer.java:57

  • If CollectionDAO is unavailable (or initialization throws), PrefixDeletionService is never initialized, but the new /prefix/{id} endpoints still exist and will NPE when invoked. Consider initializing PrefixDeletionService even when the lock manager can't be constructed (e.g., with null lock manager), or exposing an availability check so the REST layer can return a clear error instead of accepting the job and failing later.
      try {
        LOG.info("Initializing hierarchical lock manager for entity deletion optimization");

        // Get the collection DAO
        var collectionDAO = Entity.getCollectionDAO();
        if (collectionDAO == null) {
          LOG.warn("CollectionDAO not available, skipping lock manager initialization");
          return;
        }

        // Initialize the lock manager
        HierarchicalLockManager lockManager =
            new HierarchicalLockManager(collectionDAO.deletionLockDAO());

        // Set it on EntityRepository
        EntityRepository.setLockManager(lockManager);

        // Initialize PrefixDeletionService with the same lock manager
        PrefixDeletionService.initialize(lockManager);

        initialized = true;
        LOG.info("Hierarchical lock manager initialized successfully");

      } catch (Exception e) {
        LOG.error("Failed to initialize hierarchical lock manager: {}", e.getMessage(), e);
        // Continue without locking for backward compatibility
      }

bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql:142

  • Adding new DDL to an older, already-shipped migration version (1.13.0) is unsafe for upgrades: databases that have already applied 1.13.0 will not re-run this file, so they’ll never get the new fromFQNHash/toFQNHash columns/indexes required by prefix deletion. Please introduce this DDL in the latest migration version (and keep 1.13.0 immutable) so upgrades are deterministic.
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
    ADD COLUMN fromFQNHash VARCHAR(768) DEFAULT NULL,
    ADD COLUMN toFQNHash   VARCHAR(768) DEFAULT NULL;

CREATE INDEX idx_er_from_fqn_hash ON entity_relationship (fromFQNHash(768));
CREATE INDEX idx_er_to_fqn_hash   ON entity_relationship (toFQNHash(768));

Comment on lines +199 to +202
if (!"fqnHash".equals(getNameHashColumn())) {
return List.of();
}
return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream()
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findIdsByFqnHashPrefix currently bails out unless getNameHashColumn() is exactly "fqnHash". Several entity tables use nameHash while still hashing the entity FQN (e.g., team_entity has only nameHash), so descendant collection for prefix deletion will silently miss those entity types. Consider querying against getNameHashColumn() (for both fqnHash and nameHash) or otherwise documenting/enforcing that only fqnHash-backed entities are eligible for prefix deletion.

Suggested change
if (!"fqnHash".equals(getNameHashColumn())) {
return List.of();
}
return findIdsByFqnHashPrefixInternal(getTableName(), "fqnHash", fqnHashPrefix + ".%").stream()
String nameHashColumn = getNameHashColumn();
if (!"fqnHash".equals(nameHashColumn) && !"nameHash".equals(nameHashColumn)) {
return List.of();
}
return findIdsByFqnHashPrefixInternal(getTableName(), nameHashColumn, fqnHashPrefix + ".%")
.stream()

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 35
import jakarta.ws.rs.*;
import jakarta.ws.rs.DELETE;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.core.*;
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant imports: this file already imports jakarta.ws.rs.*, so explicitly importing jakarta.ws.rs.DELETE and jakarta.ws.rs.PathParam is unnecessary and can violate checkstyle/spotless rules. Please remove the redundant explicit imports.

Copilot uses AI. Check for mistakes.
Comment on lines 16 to 19
import jakarta.ws.rs.*;
import jakarta.ws.rs.DELETE;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.core.*;
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant imports: this file already imports jakarta.ws.rs.*, so explicitly importing jakarta.ws.rs.DELETE and jakarta.ws.rs.PathParam is unnecessary and can violate checkstyle/spotless rules. Please remove the redundant explicit imports.

Copilot uses AI. Check for mistakes.
Comment on lines +528 to +532
operationId = "deleteApiServicePrefixHard",
summary = "Hard-delete a API service and all descendants by FQN prefix",
description =
"Bulk hard-delete this API service and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in the OpenAPI summary: "Hard-delete a API service..." should be "Hard-delete an API service...".

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +914
@Operation(
operationId = "deleteIngestionPipelinePrefixHard",
summary = "Hard-delete a ingestion pipeline and all descendants by FQN prefix",
description =
"Bulk hard-delete this ingestion pipeline and all descendants whose FQN starts with this "
+ "entity's FQN. Significantly faster than recursive delete for large hierarchies.",
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in the OpenAPI summary: "Hard-delete a ingestion pipeline..." should be "Hard-delete an ingestion pipeline...".

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +163
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL;

CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash);
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash);

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new DDL to an older, already-shipped migration version (1.13.0) is unsafe for upgrades: databases that have already applied 1.13.0 will not re-run this file, so they’ll never get the new fromFQNHash/toFQNHash columns/indexes required by prefix deletion. Please introduce this DDL in the latest migration version (and keep 1.13.0 immutable) so upgrades are deterministic.

Suggested change
-- Add FQN hash columns to entity_relationship to enable fast prefix-based bulk deletion.
-- This allows deleting all relationships for an entire entity subtree in a single indexed query
-- instead of walking the tree entity-by-entity.
ALTER TABLE entity_relationship
ADD COLUMN IF NOT EXISTS fromFQNHash VARCHAR(768) DEFAULT NULL,
ADD COLUMN IF NOT EXISTS toFQNHash VARCHAR(768) DEFAULT NULL;
CREATE INDEX IF NOT EXISTS idx_er_from_fqn_hash ON entity_relationship (fromFQNHash);
CREATE INDEX IF NOT EXISTS idx_er_to_fqn_hash ON entity_relationship (toFQNHash);

Copilot uses AI. Check for mistakes.
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 13, 2026

Code Review 🚫 Blocked 6 resolved / 8 findings

Fast FQN prefix-based bulk deletion implementation fixes race conditions but is blocked by a critical lock failure that silently ignores contention and allows unprotected deletion to proceed, plus a minor redundant search index operation for the root entity type.

🚨 Bug: Lock failure silently ignored — deletion proceeds unprotected

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:84-97 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:68-81

The acquireLock method (line 84-97) catches all exceptions (including EntityLockedException thrown on contention) and returns null. The caller deletePrefixHard (line 71) never checks whether the lock was actually acquired — it proceeds with all 7 deletion phases regardless. This completely defeats the stated goal of eliminating race conditions with concurrent ingestion.

If the lock cannot be acquired (e.g., another deletion is in progress, or the lock manager is unavailable), the deletion runs without protection, allowing ingestion to create new entities in the subtree while it's being deleted — exactly the orphan scenario this PR aims to fix.

Suggested fix
private DeletionLock acquireLock(EntityInterface entity, String deletedBy) {
  if (lockManager == null) {
    throw new IllegalStateException("Lock manager not initialized");
  }
  // Let EntityLockedException propagate to the caller
  return lockManager.acquireDeletionLock(entity, deletedBy, true);
}
💡 Performance: Redundant search index delete for root entity type

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:175-186

cleanSearchIndex calls deleteByEntityTypeFqnPrefix for rootType explicitly, then iterates descendantTypes which already includes the root type (since collectDescendantIds uses findIdsByFqnHashPrefix and the root entity's FQN matches its own prefix). This results in two identical delete-by-prefix calls to the search backend for the root entity type.

Not a correctness bug (the call is idempotent), but it's a wasted round-trip to the search cluster.

Suggested fix
private void cleanSearchIndex(EntityInterface rootEntity, Set<String> descendantTypes) {
    SearchRepository searchRepo = Entity.getSearchRepository();
    if (searchRepo == null) {
      return;
    }
    String rootFqn = rootEntity.getFullyQualifiedName();
    // descendantTypes already includes the root's own type
    for (String entityType : descendantTypes) {
      searchRepo.deleteByEntityTypeFqnPrefix(entityType, rootFqn);
    }
  }
✅ 6 resolved
Performance: Migration backfill does row-by-row UPDATEs on entity_relationship

📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1141/MigrationUtil.java:75-89 📄 openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v1141/MigrationUtil.java:62-76
MigrationUtil.backfillRow() issues 2 individual UPDATE statements per entity row (one for fromFQNHash, one for toFQNHash). For deployments with millions of entities, this results in millions of individual UPDATE roundtrips. Consider using batch UPDATE with JOIN to update many rows at once, e.g.:

UPDATE entity_relationship er
JOIN <entity_table> e ON er.fromId = e.id
SET er.fromFQNHash = e.fqnHash
WHERE er.fromFQNHash IS NULL

This would reduce millions of queries to one per entity type.

Bug: Async prefix delete test has no wait — assertions are racy

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/PrefixDeletionIT.java:68-76 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/PrefixDeletionIT.java:96-102 📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/PrefixDeletionBenchmarkIT.java:124-135
The prefix delete endpoint returns 202 Accepted and performs deletion asynchronously via an ExecutorService. However, all three tests in PrefixDeletionIT immediately assert that entities are gone after prefixDeleteService() returns, with no polling, retry, or sleep mechanism. This will cause flaky test failures whenever the background deletion task hasn't completed by the time assertions run.

The benchmark test (PrefixDeletionBenchmarkIT) has the same issue — timeNewDelete() measures only the HTTP round-trip time (always ~milliseconds for a 202), not the actual deletion duration, making the benchmark meaningless.

Bug: Test uses tables().get() for databases and schemas, masking failures

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/PrefixDeletionIT.java:99-102
In prefixDelete_serviceWithMultipleDatabasesAndSchemas, the allDescendantIds list contains IDs of databases, schemas, and tables. But line 101 calls tables().get(finalId.toString()) for every ID. Fetching a Database ID via the Tables API will always throw a 404/not-found exception regardless of whether the entity was actually deleted. This means the assertion passes even if databases and schemas weren't deleted, making the test ineffective for verifying cross-type bulk deletion.

Bug: SLF4J does not support {:.2f} — speedup prints as literal text

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/PrefixDeletionBenchmarkIT.java:89
SLF4J only recognizes {} as a placeholder. The {:.2f} on line 89 is Python/Rust format syntax and will be printed literally (e.g., Speedup : {:.2f}x) instead of formatting the double to 2 decimal places. Lines 86-88 correctly use {}.

Bug: Prefix deletion misses relationships with NULL FQN hashes

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:4893-4898 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:1585-1588 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:128 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:126-135 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:4896-4898 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:172-181 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:154-168 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:172-180 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:154-165 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java:6186-6187
Only 3 out of ~45 addRelationship call sites were updated to pass FQN strings (addServiceRelationship, TableRepository.storeRelationships, DatabaseSchemaRepository.storeRelationships). The remaining ~42 call sites across 24 repositories still use the old 7-parameter overload, which passes null for fromFQNHash/toFQNHash. While the migration backfills existing rows, any NEW relationships created after migration through the unchanged code paths will have NULL hash columns. deleteAllByFqnHashPrefix() will not match these rows, leaving orphaned relationships — the exact problem this PR aims to solve.

Consider either: (a) updating the default insert() overload to look up FQN hashes from entity tables at insert time, or (b) updating all addRelationship call sites, or (c) adding a trigger/interceptor that populates the hash columns on INSERT when they're NULL.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Fast FQN prefix-based bulk deletion implementation fixes race conditions but is blocked by a critical lock failure that silently ignores contention and allows unprotected deletion to proceed, plus a minor redundant search index operation for the root entity type.

1. 🚨 Bug: Lock failure silently ignored — deletion proceeds unprotected
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:84-97, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:68-81

   The `acquireLock` method (line 84-97) catches all exceptions (including `EntityLockedException` thrown on contention) and returns `null`. The caller `deletePrefixHard` (line 71) never checks whether the lock was actually acquired — it proceeds with all 7 deletion phases regardless. This completely defeats the stated goal of eliminating race conditions with concurrent ingestion.
   
   If the lock cannot be acquired (e.g., another deletion is in progress, or the lock manager is unavailable), the deletion runs without protection, allowing ingestion to create new entities in the subtree while it's being deleted — exactly the orphan scenario this PR aims to fix.

   Suggested fix:
   private DeletionLock acquireLock(EntityInterface entity, String deletedBy) {
     if (lockManager == null) {
       throw new IllegalStateException("Lock manager not initialized");
     }
     // Let EntityLockedException propagate to the caller
     return lockManager.acquireDeletionLock(entity, deletedBy, true);
   }

2. 💡 Performance: Redundant search index delete for root entity type
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PrefixDeletionService.java:175-186

   `cleanSearchIndex` calls `deleteByEntityTypeFqnPrefix` for `rootType` explicitly, then iterates `descendantTypes` which already includes the root type (since `collectDescendantIds` uses `findIdsByFqnHashPrefix` and the root entity's FQN matches its own prefix). This results in two identical delete-by-prefix calls to the search backend for the root entity type.
   
   Not a correctness bug (the call is idempotent), but it's a wasted round-trip to the search cluster.

   Suggested fix:
   private void cleanSearchIndex(EntityInterface rootEntity, Set<String> descendantTypes) {
       SearchRepository searchRepo = Entity.getSearchRepository();
       if (searchRepo == null) {
         return;
       }
       String rootFqn = rootEntity.getFullyQualifiedName();
       // descendantTypes already includes the root's own type
       for (String entityType : descendantTypes) {
         searchRepo.deleteByEntityTypeFqnPrefix(entityType, rootFqn);
       }
     }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants