Skip to content

REST: treat HTTP 400 commit-validation failures as CommitFailedException#16644

Closed
martinskeem wants to merge 1 commit into
apache:mainfrom
martinskeem:fix/databricks-bug
Closed

REST: treat HTTP 400 commit-validation failures as CommitFailedException#16644
martinskeem wants to merge 1 commit into
apache:mainfrom
martinskeem:fix/databricks-bug

Conversation

@martinskeem

@martinskeem martinskeem commented Jun 1, 2026

Copy link
Copy Markdown

Some REST catalog implementations (e.g., Databricks Unity Catalog) return HTTP 400 with a "commit validation failed" message for concurrent-write conflicts instead of the spec-mandated HTTP 409. Because CommitErrorHandler previously mapped all 400 responses to BadRequestException, these conflicts escaped SnapshotProducer's retry-with-refresh loop entirely and propagated as fatal errors. For instance:

Coordinator iceberg-sink-connector-epe-log-topics-v1-0 failed to commit for commit 51a29b27-d1b1-45f4-b6f3-61228b4c8481, will try again next cycle","debug_stacktrace":"org.apache.iceberg.exceptions.BadRequestException: Malformed request: Commit validation failed. Please contact Databricks support for assistance. [ErrorCode: 2010]
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:341)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:137)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:119)
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:242)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:347)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
	at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:112)
	at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:150)
	at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:206)
	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:501)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:473)
	at org.apache.iceberg.connect.channel.Coordinator.commitToTable(Coordinator.java:286)
	at org.apache.iceberg.connect.channel.Coordinator.lambda$doCommit$1(Coordinator.java:173)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Add a case 400 check in CommitErrorHandler that recognises the conflict pattern and raises CommitFailedException instead, restoring normal optimistic-concurrency retry behaviour for non-compliant catalogs. Responses that do not match the pattern still fall through to the default 400 handler and raise BadRequestException as before.

@github-actions github-actions Bot added the core label Jun 1, 2026
@martinskeem martinskeem marked this pull request as draft June 1, 2026 08:10
Some REST catalog implementations (e.g., Databricks Unity Catalog) return
HTTP 400 with a "commit validation failed" message for concurrent-write
conflicts instead of the spec-mandated HTTP 409. Because CommitErrorHandler
previously mapped all 400 responses to BadRequestException, these conflicts
escaped SnapshotProducer's retry-with-refresh loop entirely and propagated
as fatal errors.

Add a case 400 check in CommitErrorHandler that recognises the conflict
pattern and raises CommitFailedException instead, restoring normal
optimistic-concurrency retry behaviour for non-compliant catalogs. Responses
that do not match the pattern still fall through to the default 400 handler
and raise BadRequestException as before.

@RussellSpitzer RussellSpitzer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't put in special handling for Catalogs which aren't following the spec, especially if it involves us doing string matching since we can't guarantee that behavior will be constant in the future.

@martinskeem

Copy link
Copy Markdown
Author

Yea, I agree. I will try to see if I can get this resolved through Databricks.

@martinskeem martinskeem closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants