Skip to content

[#11040] fix(core): make Iceberg import idempotent under concurrent load#11041

Open
yuqi1129 wants to merge 3 commits into
apache:mainfrom
yuqi1129:review-11012
Open

[#11040] fix(core): make Iceberg import idempotent under concurrent load#11041
yuqi1129 wants to merge 3 commits into
apache:mainfrom
yuqi1129:review-11012

Conversation

@yuqi1129
Copy link
Copy Markdown
Contributor

@yuqi1129 yuqi1129 commented May 11, 2026

What changes were proposed in this pull request?

Make Iceberg schema and table import idempotent when multiple Gravitino nodes try to import the same object concurrently. If EntityStore.put(..., true) reports that the entity already exists, the dispatcher now reuses the existing Gravitino entity instead of failing the request.

Also reconcile Gravitino EntityStore metadata after Iceberg REST Catalog table/view drop and rename operations. After the Iceberg backend operation succeeds, the hook checks the current backend state and then either imports the table/view again or removes the stale Gravitino entity.

Why are the changes needed?

In HA deployments, another node can finish the same import first. The current code treats that race as an error, even though the entity is already present in Gravitino. That makes already-existing schema/table imports fail spuriously.

Iceberg REST requests can also be served by different Gravitino nodes. Because the current TreeLock is process-local, another node may drop or recreate the same Iceberg table/view between the backend operation and the hook's direct EntityStore update/delete. Without reconciliation, Gravitino may keep stale/orphan table or view metadata that no longer matches the Iceberg backend. This PR does not introduce a distributed TreeLock; it adds a lightweight backend-state reconciliation step for the high-risk IRC rename/drop paths.

Fix: #11040

Does this PR introduce any user-facing change?

Yes. Concurrent import/load requests become idempotent instead of failing with EntityAlreadyExistsException in the import path. IRC table/view rename and drop operations also reduce stale/orphan Gravitino metadata in multi-node deployments by reconciling against the Iceberg backend state.

How was this patch tested?

./gradlew :core:test --tests org.apache.gravitino.catalog.TestTableOperationDispatcher --tests org.apache.gravitino.catalog.TestSchemaOperationDispatcher -PskipITs

./gradlew :iceberg:iceberg-rest-server:test --tests org.apache.gravitino.iceberg.service.dispatcher.TestIcebergTableHookDispatcher --tests org.apache.gravitino.iceberg.service.dispatcher.TestIcebergViewHookDispatcher -PskipITs

Copilot AI review requested due to automatic review settings May 11, 2026 12:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Code Coverage Report

Overall Project 66.09% +0.14% 🟢
Files changed 81.11% 🟢

Module Coverage
aliyun 1.72% 🔴
api 47.13% 🟢
authorization-common 85.96% 🟢
aws 1.08% 🔴
azure 2.47% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 83.41% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.18% 🟢
catalog-jdbc-common 43.93% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.14% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.08% 🟢
catalog-lakehouse-paimon 76.85% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.96% 🟢
common 50.0% 🟢
core 82.3% -0.04% 🟢
filesystem-hadoop3 76.97% 🟢
flink 0.0% 🔴
flink-common 43.17% 🟢
flink-runtime 0.0% 🔴
gcp 14.12% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.83% 🟢
iceberg-common 55.46% 🟢
iceberg-rest-server 69.89% +0.72% 🟢
idp-basic 94.68% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 19.95% 🔴
lance-rest-server 62.78% 🟢
lineage 53.02% 🟢
optimizer 82.87% 🟢
optimizer-api 21.95% 🔴
server 85.83% 🟢
server-common 71.23% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 35.14% 🔴
Files
Module File Coverage
core SchemaOperationDispatcher.java 80.75% 🟢
TableOperationDispatcher.java 80.7% 🟢
iceberg-rest-server IcebergViewHookDispatcher.java 86.36% 🟢
IcebergTableHookDispatcher.java 78.31% 🟢

@yuqi1129 yuqi1129 force-pushed the review-11012 branch 4 times, most recently from 21649a4 to 1e99961 Compare May 12, 2026 13:38
yuqi1129 and others added 2 commits May 14, 2026 16:11
…x sites

Reviewers had no inline context for what the new catch block /
reconcile call is solving. Add short comments at each modified spot
pointing at the multi-node race the change is addressing:

- SchemaOperationDispatcher / TableOperationDispatcher: explain the
  HA import race that turns EntityAlreadyExistsException into a no-op
  instead of a "managed by multiple catalogs" failure.
- IcebergTableHookDispatcher / IcebergViewHookDispatcher: explain
  why drop and rename reconcile against the Iceberg backend now that
  TreeLock is only process-local.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Iceberg catalog concurrent import can fail with EntityAlreadyExistsException in HA deployments

2 participants