Skip to content

Fixes #4003: bulk + async restore for large entity hierarchies#27997

Merged
harshach merged 48 commits into
mainfrom
harshach/review-issue-4004
May 21, 2026
Merged

Fixes #4003: bulk + async restore for large entity hierarchies#27997
harshach merged 48 commits into
mainfrom
harshach/review-issue-4004

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented May 8, 2026

Describe your changes:

Fixes #4003

EntityRepository.restoreEntity walked the descendant tree synchronously inside one HTTP request — on a database with 12k tables / 1.1k stored procedures the user reported 4m37s response times that exceeded typical proxy / ALB idle timeouts. This PR replaces the per-entity recursion with an iterative bulk path and adds a server-side async option for hierarchies large enough that the synchronous response would still hit timeouts.

Type of change:

  • Bug fix
  • Improvement

High-level design:

Bulk synchronous path. restoreChildren now groups CONTAINS children by entity type and dispatches a single bulkRestoreSubtree(ids, updatedBy) per type rather than looping Entity.restoreEntity per descendant. bulkRestoreSubtree reuses the existing deferred-store bulk infrastructure (updateMany, entityExtensionDAO.insertMany, invalidateMany, insertChangeEventsBatch) for one batched DB write per type. Per-descendant ES writes go through EntityLifecycleEventDispatcher.onEntitiesUpdated so SearchIndexHandler.updateEntitiesIndex picks up HAS-style children (charts) that the top-level cascade query doesn't reach. Subclass extension hooks restoreAdditionalChildren / softDeleteAdditionalChildren are provided for repos that link non-CONTAINS related entities.

Symmetric bulk soft-delete. bulkSoftDeleteSubtree mirrors the restore path — same shape, same hook contract — and is dispatched from deleteChildren(List, hardDelete=false, …).

Async option. New EntityResource.restoreEntityAsync(...) returns 202 Accepted with a RestoreEntityResponse(jobId, message) and runs the restore on AsyncService. AsyncService is now a one-line wrapper around Executors.newThreadPerTaskExecutor(virtual) — back-pressure under load comes from the JDBI connection pool (with its own timeout), not from a custom semaphore wrapper. Completion / failure is broadcast on the RESTORE_ENTITY_CHANNEL WebSocket channel; the user id is resolved on the request thread before submitting so the notification doesn't depend on the request-scoped SecurityContext. The single-arg restoreEntity(uriInfo, sec, id) helper reads ?async=true directly from uriInfo, so SDK callers get universal async support regardless of which Resource subclass forwarded the parameter.

SDK updates. Java SDK adds an AsyncJobResponse model and EntityServiceBase.restoreServerAsync(id); new fluent builders on Tables and Databases give Tables.find(id).restore().execute() (sync, returns Table) and Tables.find(id).restore().async().execute() (returns AsyncJobResponse) with type-safe terminal operations. Python SDK mirrors this with Tables.restore_async(id) and the fluent Tables.restore_request(id).with_async().execute() plus an AsyncJobResponse dataclass. Both SDKs validate the response carries a non-null jobId and raise a clear error if not (in case an older server ignores the param).

Ingestion integration. Legacy OpenMetadata client gets delete_async / restore_async helpers. DeleteEntity carries a dispatch_async flag and MetadataRestSink.delete_entity routes to the async client method when set. delete_entity_from_source / delete_entity_by_name read the OM_INGESTION_DELETE_ASYNC env var by default so operators can flip an ingestion run to fire-and-forget without per-connector schema changes.

Tests:

Use cases covered

  • Synchronously restore a Database with 3 schemas / 12 tables and verify all descendants come back un-deleted.
  • Async-restore the same hierarchy via ?async=true, observe 202 + jobId, then poll until the database flips to deleted=false.
  • restoreChildren correctly groups mixed-type children (DatabaseSchema + StoredProcedure) and dispatches one bulkRestoreSubtree per type with no per-entity Entity.restoreEntity calls.
  • bulkRestoreSubtree is a no-op for null / empty IDs; when entities exist but none are deleted, the additional-children hook still fires (re-entered cascade reconciliation).
  • Java SDK Tables.find(id).restore().async().execute() and Databases.find(id).restore().async().execute() route to restoreServerAsync; without .async() they go to the sync path.
  • Python SDK Tables.restore_async(id) appends ?async=true, returns an AsyncJobResponse; the fluent restore_request(id).with_async().execute() does the same.
  • Legacy OpenMetadata.delete_async / restore_async URL shape coverage.

Unit tests

  • I added unit tests for the new/changed logic.
  • Files added/updated:
    • openmetadata-service/src/test/java/.../jdbi3/EntityRepositoryRestoreTest.java
    • openmetadata-service/src/test/java/.../util/AsyncServiceTest.java
    • openmetadata-sdk/src/test/java/.../fluent/RestoreFluentAPITest.java
    • ingestion/tests/unit/sdk/test_restore_async.py
    • ingestion/tests/unit/test_ometa_restore.py

Backend integration tests

  • I added integration tests in openmetadata-integration-tests/ for new/changed API endpoints.
  • Files added/updated:
    • openmetadata-integration-tests/src/test/java/.../tests/RestoreHierarchyIT.java

Ingestion integration tests

  • Not applicable (covered by the existing mark-deletion suite; async-dispatch flag is feature-flagged behind the env var).

Playwright (UI) tests

  • Not applicable (no UI changes).

Manual testing performed

  1. mvn -pl openmetadata-service,openmetadata-sdk -DskipTests compile — clean build.
  2. mvn -pl openmetadata-integration-tests test-compile — clean build.
  3. mvn -pl openmetadata-service test -Dtest='EntityRepositoryRestoreTest,AsyncServiceTest' — 20/20 pass.
  4. mvn -pl openmetadata-sdk test -Dtest=RestoreFluentAPITest — 4/4 pass.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have added tests (unit / integration as applicable) and listed them above.
  • I have added a test that covers the exact scenario we are fixing (RestoreHierarchyIT builds the multi-level hierarchy and exercises both the bulk-sync and async paths).

🤖 Generated with Claude Code


Summary by Gitar

  • Feed Cleanup:
    • Introduced LinkedHashSet in FeedRepository.deleteByAbout to defensively de-duplicate thread IDs across chunks during legacy feed cleanup.
    • Added DISTINCT to CollectionDAO.findByEntityIds as a defensive measure to ensure unique thread ID retrieval regardless of future schema changes.

This will update automatically on new commits.

EntityRepository.restoreEntity walked descendants synchronously, taking
4+ minutes on a 12k-table database and exceeding typical proxy timeouts.
restoreChildren now groups CONTAINS children by type and dispatches one
bulkRestoreSubtree per type, batching DB writes, version history,
change events, and cache invalidation; the existing ES cascade handles
descendant index updates in one update_by_query.

Adds an async option (?async=true) on the deep-hierarchy restore
endpoints that returns 202 Accepted with a job id and runs the restore
on AsyncService, emitting WebSocket notifications on
restoreEntityChannel. Java SDK adds .restore().async().execute() fluent
builders on Tables/Databases plus restoreServerAsync on
EntityServiceBase; Python SDK mirrors this with
restore_request().with_async().execute() and restore_async() helpers
on BaseEntity, exposing a new AsyncJobResponse type.

Tests: EntityRepositoryRestoreTest verifies the per-type grouping and
bulk dispatch path; RestoreFluentAPITest covers the Java SDK fluent
behavior; RestoreHierarchyIT exercises sync and async restore against a
real DB→schemas→tables tree end-to-end; test_restore_async.py covers
the Python SDK paths.

Fixes #4003

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 8, 2026 18:18
@harshach harshach requested a review from a team as a code owner May 8, 2026 18:18
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 8, 2026
Comment thread openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Tables.java Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses restore timeouts for large entity hierarchies by replacing per-entity recursive restores with a bulk restore path, and adds an optional server-side async restore mode that returns 202 Accepted and notifies completion via WebSockets. It also updates the Java and Python SDKs to expose the async restore option and adds unit/integration tests covering the new behavior.

Changes:

  • Implement bulk subtree restore (bulkRestoreSubtree) and type-grouped child restore dispatch to reduce DB/ES work per restore.
  • Add ?async=true restore option on deep-hierarchy endpoints, backed by AsyncService and a new restoreEntityChannel WebSocket notification channel.
  • Update Java/Python SDKs with async restore helpers + fluent builders, and add unit/integration tests for sync/async restore flows.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java Adds grouped restoreChildren and new bulkRestoreSubtree bulk restore implementation plus extension hook.
openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java Adds restoreEntity(..., async) overload and server-side async restore execution + 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/util/WebsocketNotificationHandler.java Adds restore completion/failure WebSocket notification helpers.
openmetadata-service/src/main/java/org/openmetadata/service/socket/WebSocketManager.java Introduces RESTORE_ENTITY_CHANNEL constant.
openmetadata-service/src/main/java/org/openmetadata/service/util/RestoreEntityResponse.java New 202 response DTO for async restore requests.
openmetadata-service/src/main/java/org/openmetadata/service/util/RestoreEntityMessage.java New WebSocket payload DTO for restore status notifications.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseResource.java Wires ?async=true into database restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseSchemaResource.java Wires ?async=true into schema restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/TableResource.java Wires ?async=true into table restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/StoredProcedureResource.java Wires ?async=true into stored procedure restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/services/database/DatabaseServiceResource.java Wires ?async=true into database service restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/dashboards/DashboardResource.java Wires ?async=true into dashboard restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/datamodels/DashboardDataModelResource.java Wires ?async=true into data model restore endpoint + OpenAPI 202 response.
openmetadata-service/src/main/java/org/openmetadata/service/resources/storages/ContainerResource.java Wires ?async=true into container restore endpoint + OpenAPI 202 response.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/services/EntityServiceBase.java Adds restoreServerAsync(...) calling PUT /restore?async=true.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/models/AsyncJobResponse.java New Java SDK model for 202 async job response.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Tables.java Adds fluent restore builder with .async().execute() for server-side async restore.
openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Databases.java Adds fluent restore builder with .async().execute() for server-side async restore.
ingestion/src/metadata/sdk/entities/base.py Adds Python AsyncJobResponse + restore fluent operation + restore_async.
openmetadata-service/src/test/java/org/openmetadata/service/jdbi3/EntityRepositoryRestoreTest.java Unit tests for grouped restore children + bulk restore no-op behaviors.
openmetadata-sdk/src/test/java/org/openmetadata/sdk/fluent/RestoreFluentAPITest.java Unit tests validating fluent restore routes to sync vs server-async SDK calls.
ingestion/tests/unit/sdk/test_restore_async.py Python unit tests for restore_async and fluent restore request behavior.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/RestoreHierarchyIT.java Integration test validating sync restore and async restore (202 + eventual restored state).
Comments suppressed due to low confidence (1)

openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:790

  • repository.restoreEntity(...) can return null when the entity is not in the deleted state (see EntityRepository.restoreEntity catch). The sync restore path immediately dereferences response.getEntity() which will throw NPE and return 500. Handle the null case explicitly (e.g., return 404/400 with a clear message) before calling restoreFromSearch/addHref.
    OperationContext operationContext =
        new OperationContext(entityType, MetadataOperation.EDIT_ALL);
    authorizer.authorize(securityContext, operationContext, getResourceContextById(id));
    PutResponse<T> response =
        repository.restoreEntity(securityContext.getUserPrincipal().getName(), id);
    repository.restoreFromSearch(response.getEntity());
    addHref(uriInfo, response.getEntity());
    LOG.info(

Comment thread openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Tables.java Outdated
Comment thread openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Databases.java Outdated
Comment thread ingestion/src/metadata/sdk/entities/base.py Outdated
…cade

Two follow-up improvements to the bulk restore introduced for #4003.

Batched findTo per tree level: bulkRestoreSubtree previously issued one
findTo per parent during recursion, which at the schemas → tables level
of a 12k-table database meant 12k DB round trips just to enumerate
children. The new bulkRestoreContainedChildren helper does one
findToBatchAllTypes per tree level regardless of fan-out, then groups
the results by child type and dispatches to each repo's
bulkRestoreSubtree. DashboardRepository's chart-restore logic moves
from the now-bypassed restoreChildren override to the existing
restoreAdditionalChildren extension hook so it still runs both for
direct dashboard restores and when dashboards are descendants of a
larger restore.

Symmetric bulk soft-delete cascade: deleteByName/deleteById had the
same per-entity recursion that this PR fixed for restore — soft-
deleting a database with 12k tables ran 12k recursive
Entity.deleteEntity calls, each writing one row + one ES update + one
change event. New bulkSoftDeleteSubtree mirrors bulkRestoreSubtree:
one batched findToBatchAllTypes per level, deferred-store DB writes,
batched version history, batched change events, batched cache
invalidation; per-descendant ES writes are skipped because the existing
deleteFromSearch cascade flips the deleted flag on descendant indexes
in one update_by_query. deleteChildren(List, hardDelete=false, ...)
now dispatches to the bulk path; hard-delete keeps the existing
batchDeleteChildren path. New softDeleteAdditionalChildren extension
hook mirrors restoreAdditionalChildren; DashboardRepository's chart
soft-delete migrates onto it for the same reason.

Tests: extends EntityRepositoryRestoreTest with cases that verify
findToBatchAllTypes is invoked exactly once per level (not once per
parent) for both bulk operations, plus the existing grouping/dispatch
shape for the soft-delete entry point. Extends RestoreHierarchyIT
with a recursive soft-delete cascade assertion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🟡 Playwright Results — all passed (12 flaky)

✅ 4146 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 90 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 775 0 5 11
🟡 Shard 3 776 0 3 8
🟡 Shard 4 787 0 1 12
✅ Shard 5 773 0 0 47
🟡 Shard 6 737 0 2 8
🟡 12 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Values Sum To Be Between (shard 2, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Values Missing Count To Be Equal (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/IncidentManager.spec.ts › Complete Incident lifecycle with table owner (shard 2, 1 retry)
  • Features/KnowledgeCenter.spec.ts › Article mentions in description should working for Knowledge Center (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/ObservabilityAlerts.spec.ts › Table alert (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DomainUIInteractions.spec.ts › Rename subdomain via UI (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Replace mode - replaces existing contract completely and verifies export (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mohityadav766
mohityadav766 previously approved these changes May 19, 2026
…eadsInBatch

mohityadav766 flagged that the new deleteByAbout(List<UUID>) overload was a
fake batch — it just looped over the single-entity deleteByAbout(UUID), which
itself looped over deleteThreadInternal(threadId) one thread at a time. So a
DatabaseService with N descendants × M threads each would still produce
N×M individual DELETEs against thread_entity / entity_relationship /
field_relationship.

Switch both overloads to a single batched path:

- Add CollectionDAO.FeedDAO.findByEntityIds(tableName, List<String>) so we
  pull every thread id for the whole input batch in one IN-list query.
- Reuse the existing deleteThreadsInBatch(List<UUID>) helper (which already
  fires one bulk DELETE per table — thread_entity, entity_relationship,
  field_relationship) instead of looping deleteThreadInternal.
- Collapse the single-entity overload into deleteByAbout(List.of(id)) so both
  callers go through the same code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 1 comment.

harshach and others added 2 commits May 18, 2026 22:16
…ram limits

Reviewer flagged that the new findByEntityIds + deleteThreadsInBatch path
passes the full id list straight into an IN-clause, where JDBI's @BindList
expands every element into its own bind parameter. For the 12k-descendant
hierarchies this PR specifically targets, that blows past SQL Server's
~2100-parameter ceiling and bloats MySQL's max_allowed_packet budget.

Chunk both paths to 500 ids per query (matches the existing
EntityRepository.RELATION_DELETE_BATCH_SIZE used for the same reason on the
relationship side):

- deleteByAbout(List<UUID>) walks entityIds in 500-id chunks against
  findByEntityIds, accumulates thread ids across chunks, then hands them to
  deleteThreadsInBatch.
- deleteThreadsInBatch(List<UUID>) walks threadIds in 500-id chunks against
  deleteAllByThreadIds / deleteAllByPrefixes / deleteByIds and sums the
  per-chunk delete counts.

A per-chunk find failure no longer aborts the whole cleanup — log it and
move on, matching the original "skip if thread storage is unavailable"
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot reviewer flagged that the deleteByAbout(List<UUID>) refactor lost the
per-thread try/catch that previously made legacy feed cleanup best-effort.
A malformed thread id (UUID.fromString throws IllegalArgumentException) or a
DAO/runtime failure inside deleteThreadsInBatch would now propagate up the
caller's hard-delete @transaction and roll the whole cascade back — even
though the old per-thread loop deliberately swallowed those so legacy feed
issues never blocked entity deletion.

Re-add both guards:

- Parse thread ids defensively: skip + LOG.warn on each malformed id rather
  than throwing IllegalArgumentException out of the cleanup.
- Wrap deleteThreadsInBatch in try/catch so a bulk DELETE failure logs at
  warn level and lets the parent hard-delete commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 1 comment.

Copilot reviewer flagged EntityDAO.deleteByIds(List<UUID>) as an unbounded
IN-clause: JDBI's @BindList expands every id into a separate bind parameter,
and the bulk hard-delete path now walks 12k+ entity hierarchies, so a single
DELETE statement would blow past SQL Server's ~2100-parameter ceiling and
MySQL's max_allowed_packet budget.

Mirror the findEntitiesByIds chunking already used in this file — 30k ids
per chunk, short-circuit when the list fits in one chunk, sum per-chunk
delete counts otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java Outdated
Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 1 comment.

harshach and others added 2 commits May 19, 2026 08:18
Address two Copilot quality flags:

1. The 30k IN-list chunk size was duplicated as a magic number in four
   EntityDAO methods (findReferencesByFqns, deleteByIds, findEntitiesByIds,
   findEntityByNames). Hoist to a single MAX_IN_LIST_CHUNK_SIZE constant on
   the interface so all four stay in sync if the limit needs tuning.

2. EntityDAO.updateFqn interpolated user-supplied prefixes into raw SQL with
   only apostrophe + double-quote escaping. MySQL's default mode treats `\`
   as a string-literal escape char, and Postgres does too when
   standard_conforming_strings is off, so an unescaped `\n` in newPrefix
   would be parsed as a newline before the REGEXP_REPLACE / REPLACE ever
   saw it. Name validation currently forbids these characters, but
   defense-in-depth is cheap.

   Added ListFilter.escapeBackslashAndApostrophe (backslash first, then
   apostrophe — order matters, otherwise the `\\` we just inserted gets
   re-doubled). escape() now composes through this helper, so the existing
   LIKE-aware path also picks up backslash escaping. updateFqn's MySQL
   replacement string and both Postgres prefixes route through the new
   helper before any further (LIKE / double-quote) escape pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fore bulk restore/soft-delete updaters

Copilot reviewer flagged that bulkRestoreSubtree (and by extension
bulkSoftDeleteSubtree) builds EntityUpdaters from entities loaded via raw
storage JSON (loadForBulk -> dao.findEntitiesByIds -> no setFields call).
Entities whose updater rewrites relationships unconditionally — e.g.
DashboardRepository.DashboardUpdater.entitySpecificUpdate which does
deleteFrom(... HAS ...) + addRelationship for every charts/dataModels entry
— would diff a null original.charts against a null updated.charts, fire
the compareAndUpdate lambda (shouldCompare returns true for PUT/SOFT_DELETE
with patchedFields=null), wipe the HAS rows, and re-add nothing. Net: every
chart link disappears as a side effect of restoring or soft-deleting the
parent — well before restoreAdditionalChildren / softDeleteAdditionalChildren
walk the relation set.

This is exactly the failure mode the single-entity restoreEntity already
documents and guards against by calling setFieldsInternal(original, putFields,
ALL) before building its updater. The bulk replacement needs the same
contract.

- New helper hydrateRelationsForBulkUpdater(List<T>) loops setFieldsInternal
  with Include.ALL. Per-entity rather than setFieldsInBulk because the bulk
  fetchers (e.g. DashboardRepository.batchFetchCharts) hard-code NON_DELETED
  when batching HAS lookups, which would still hide cascade-deleted charts
  from the restore path. Restore batches are typically one subtree level,
  so the extra DB hits are acceptable.
- Hooked into bulkRestoreSubtree right before buildBulkUpdaters, and into
  applyBulkSoftDelete with the same rationale.

EntityRepositoryRestoreTest 13/13, AsyncServiceTest 11/11 still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java Outdated
harshach and others added 2 commits May 19, 2026 09:07
- EntityRepository.hydrateRelationsForBulkUpdater: log hydration failures
  at warn level with entity id/type instead of swallowing them silently, so
  any subsequent HAS-row wipe is correlatable in production logs
  (gitar-bot empty-catch flag).
- EntityDAO.MAX_IN_LIST_CHUNK_SIZE Javadoc: drop the misleading SQL Server
  reference (not a supported connection type — its 2100-parameter ceiling
  would need a much smaller constant) and document the MySQL / PostgreSQL
  constraints that actually apply to this codebase (Copilot doc accuracy
  flag).
- FeedRepository.FEED_IN_BATCH_SIZE comment: same — replace SQL Server
  hedging with MySQL/PostgreSQL framing and call out why the per-statement
  budget is tighter than the EntityDAO constant (three IN-list statements
  per chunk).
- EntityResource.restoreEntity null branch: drop the redundant find(id, ALL)
  probe — EntityRepository.restoreEntity now calls find(id, ALL) up front
  and throws EntityNotFoundException for missing ids, so a null response can
  only mean "exists but not deleted" → 400. Updated the comment to match
  (Copilot stale-comment flag).

EntityRepositoryRestoreTest 13/13, AsyncServiceTest 11/11 still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

harshach and others added 2 commits May 20, 2026 10:25
bulkHardDeleteSubtree was walking the cascade, deleting DB rows, but never
firing the per-entity Elasticsearch removal that the legacy Entity.deleteEntity
path triggered via delete()'s top-level deleteFromSearch dispatch. The bulk
replacement is the only path that walks cascaded children now, so a missing
call leaves stale ES docs that survive the hard delete.

Concrete symptom: Playwright Domains.spec.ts:533 ("Should clear assets from
data products after deletion of data product in Domain") consistently failed
on this branch. The test creates Domain + DataProduct(PW_DataProduct_Sales),
recursively hard-deletes the Domain, then re-creates Domain + DataProduct
with the same names. The UI search for PW_DataProduct_Sales then returned
TWO rows: the new DataProduct from the re-create plus a stale ES doc for the
"deleted" cascade child whose DB row was actually gone. The test died on
selectDataProduct's strict-mode locator match.

Add deleteFromSearch(entity, true) alongside postDelete inside the
end-of-walk loop. Mirrors the per-entity contract — legacy delete() called
postDelete then deleteFromSearch in that order, the bulk path now does the
same per cascaded entity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityDAO.java Outdated
harshach and others added 2 commits May 20, 2026 11:59
…lus regression tests

Copilot reviewer flagged that EntityDAO.updateFqn's MySQL path used
escapeBackslashAndApostrophe(newPrefix), which only covers the SQL
string-literal layer. The value is then fed to REGEXP_REPLACE's
replacement argument, which has its own escape semantics — `\X` is a
backreference / escape sequence, so a backslash that survived the SQL layer
gets re-interpreted by the regex engine. For an input like "foo\1bar",
"\1" would be treated as a capture-group backreference rather than a
literal "\1" in the rewritten FQN; for an input like "foo\bar", the regex
engine would consume the backslash as the start of an undefined escape.

Add ListFilter.escapeForMySqlRegexReplacement(s) that composes both layers:
  Step 1 — regex replacement: \  → \\   (regex engine emits literal \)
  Step 2 — SQL string-literal: \\ → \\\\, ' → ''
Net: one input \ → four \ in the SQL statement text → two \ for the regex
engine → one literal \ in the replacement output. Apostrophes only matter
for the SQL layer, so they're left to step 2's existing single-pass
doubling (composing escapeBackslashAndApostrophe twice would re-escape
apostrophes — chose explicit two-step composition to avoid that pitfall).

EntityDAO.updateFqn's MySQL replacement now routes through the new helper.
The Postgres path stays on escapeBackslashAndApostrophe + escapeDoubleQuotes
because PG embeds the value in a static REPLACE call, not a regex
replacement — the regex layer doesn't apply there.

Test coverage in ListFilterTest:
  - test_escapeBackslashAndApostrophe_*           — existing primitive
  - test_escape_alsoDoublesBackslashesViaBackslashAndApostrophe — regression
    guard that escape() picks up the backslash hardening transparently
  - test_escapeForMySqlRegexReplacement_*         — new helper, including
    a "\1 backreference look-alike" case to guard the original concern
    directly

All 22 ListFilterTest cases pass. 46/46 across
EntityRepositoryRestoreTest + AsyncServiceTest + ListFilterTest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… generic EntityRestorer

The fluent restore API previously existed only on Tables and Databases. User
flagged the gap — restore (sync + server-side async via PUT /restore?async=true)
is supported by the server for every entity-type, and the Java SDK base service
(EntityServiceBase.restore / restoreServerAsync) is already shared across all
of them. The fluent layer was the only thing missing.

Rather than copy-paste TableRestorer/AsyncTableRestorer for each entity, hoist
the pattern into a single generic helper:

- common/EntityRestorer<T>    — sync .execute() returns T, .async() switches mode
- common/AsyncEntityRestorer<T> — .execute() returns AsyncJobResponse

Tables and Databases now wire the generic helper too (drops the entity-specific
restorer pair on each — ~80 lines removed).

restore() now lives on 26 additional fluent classes' Finder builders:
  AIApplications, AIGovernancePolicies, Charts, Classifications, Containers,
  DashboardDataModels, Dashboards, DataContracts, DataProducts,
  DatabaseSchemas, Domains, Glossaries, GlossaryTerms, LLMServices,
  McpServers, Metrics, MlModels, Pipelines, PromptTemplates, Queries,
  SearchIndexes, StoredProcedures, Tags, Teams, Topics, Users.

Each is a 3-line wrap-and-delegate; future restore-capable entity-types are a
one-method addition.

Tests in RestoreFluentAPITest extended from 4 → 12: kept the original
Tables / Databases sync+async pairs, added per-fluent route-through assertions
for DatabaseSchemas, Dashboards, Pipelines, Topics, MlModels, Containers,
Glossaries, Domains. They all go through the same EntityRestorer<T>, so the
representative sample is enough to catch a typo in any single fluent's wire-up
without re-asserting the same plumbing 26 times.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 69 changed files in this pull request and generated 2 comments.

`findByEntityIds` now selects DISTINCT ids — currently a no-op given
the PK + single-valued entityId column, but cheap insurance against
future joins or schema shifts that could fan out duplicates.

`FeedRepository.deleteByAbout` accumulates into a LinkedHashSet so a
caller passing an entityIds list with duplicates (or any future change
that allows per-chunk overlap) can't inflate the downstream IN lists
in `deleteThreadsInBatch` (3 DELETEs per 500-id chunk).

Addresses Copilot review comments 3276652759 and 3276652802 on #27997.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 20, 2026

Code Review ✅ Approved 14 resolved / 14 findings

Implements bulk-restore and async entity hierarchy processing to resolve timeout issues, while hardening MySQL regex escaping in FQN updates. All identified architectural and safety concerns have been successfully addressed.

✅ 14 resolved
Bug: bulkRestoreSubtree calls restoreChildren outside @transaction boundary

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:5585-5587
In bulkRestoreSubtree (line 5585-5587), restoreChildren is called for each entity in a loop before the batched DB writes (updateMany, insertMany, etc.) at lines 5629-5641. While JDBI's @Transaction on both methods means they participate in the same transaction when called on the same thread, the recursive descent through restoreChildrenbulkRestoreSubtree means children are restored (DB committed) before parents in a bottom-up fashion. If the process crashes mid-way through a large hierarchy, some children may be restored while their parent remains deleted, leaving the tree in an inconsistent state.

This is acceptable for the use case (restore is idempotent and can be retried), but worth documenting.

Bug: uriInfo used in async lambda after request scope ends

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:822
In restoreEntityAsync, the request-scoped uriInfo object is captured by the lambda submitted to the executor (line 822: addHref(uriInfo, response.getEntity())). After the HTTP response is sent (202 Accepted), the JAX-RS container may invalidate or recycle the UriInfo instance, leading to IllegalStateException or incorrect URL generation when addHref calls uriInfo.getBaseUri().

The existing deleteByIdAsync method does NOT use uriInfo inside its async lambda, confirming this is an inconsistency introduced by this PR. The addHref call is non-essential for the async path since the restored entity is only sent via WebSocket notification (not as an HTTP response body), and the WebSocket message uses entity.getName() not HREFs.

Quality: Fluent restorer classes duplicate identical pattern across entities

📄 openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Tables.java:389-403 📄 openmetadata-sdk/src/main/java/org/openmetadata/sdk/fluent/Databases.java:295-309
The TableRestorer/AsyncTableRestorer and DatabaseRestorer/AsyncDatabaseRestorer classes are structurally identical — only the entity type and service accessor differ. This will grow linearly as more entity types gain the async restore feature (the PR already lists 8 resources). Consider extracting a generic EntityRestorer<T> to avoid duplication per the project's 'No Duplication' principle.

Quality: bulkRestore/SoftDeleteContainedChildren are near-identical duplicates

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:5677-5691 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:5825-5839
The two private methods bulkRestoreContainedChildren (lines 5677-5705) and bulkSoftDeleteContainedChildren (lines 5825-5853) share identical logic — collect parent IDs, call findToBatchAllTypes, filter by entityType, group by child type, dispatch to the child repo. The only difference is the terminal call (bulkRestoreSubtree vs bulkSoftDeleteSubtree). Per the 'No Duplication' principle, extract a shared helper that accepts the dispatch function.

Quality: bulkSoftDeleteSubtree exceeds 15-line method limit (83 lines)

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:5735-5749
bulkSoftDeleteSubtree spans lines 5736-5818 (83 lines), well above the project's 15-line method guideline. The method has clear phases (guard/load, pre-delete hooks, child cascade, updater creation, version history, persistence, cache invalidation, change events) that can each be extracted into focused private methods, matching the phase-based structure already hinted at by the try (var ignored = phase(...)) calls.

...and 9 more resolved from earlier reviews

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants