Skip to content

Retry on REPLICATE_VIOLATION for global cluster region switch#1773

Merged
sre-ci-robot merged 1 commit intomilvus-io:masterfrom
yhmo:patch_ma
Mar 6, 2026
Merged

Retry on REPLICATE_VIOLATION for global cluster region switch#1773
sre-ci-robot merged 1 commit intomilvus-io:masterfrom
yhmo:patch_ma

Conversation

@yhmo
Copy link
Copy Markdown
Contributor

@yhmo yhmo commented Mar 5, 2026

No description provided.

Copilot AI review requested due to automatic review settings March 5, 2026 10:55
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yhmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yhmo yhmo changed the title retry on REPLICATE_VIOLATION for global cluster region switch Retry on REPLICATE_VIOLATION for global cluster region switch Mar 5, 2026
@mergify mergify Bot added the dco-passed label Mar 5, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds retry support for STREAMING_CODE_REPLICATE_VIOLATION errors in the RpcUtils retry loop, enabling automatic topology refresh and retry when a global cluster region switch occurs. Previously, only gRPC UNAVAILABLE errors triggered topology refresh; now, server-side replicate violation errors also trigger refresh and allow the retry loop to continue.

Changes:

  • Added handleGlobalRoutingError method to detect STREAMING_CODE_REPLICATE_VIOLATION in exception messages, trigger topology refresh, and signal that retry should continue.
  • Refactored inline UNAVAILABLE error handling into a dedicated handleGlobalConnectionError method for better separation of concerns.
  • Added three new tests covering: routing error triggers refresh, normal errors don't trigger refresh, and no NPE when no refresh trigger is configured.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
sdk-core/src/main/java/io/milvus/v2/utils/RpcUtils.java Added GLOBAL_ROUTING_ERROR constant, extracted error handling into two helper methods, and restructured retry logic to also retry on replicate violation errors.
sdk-core/src/test/java/io/milvus/v2/client/globalcluster/GlobalClusterTest.java Updated comment for existing test and added three new tests for the replicate violation retry behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk-core/src/main/java/io/milvus/v2/utils/RpcUtils.java Outdated
Signed-off-by: yhmo <yihua.mo@zilliz.com>
@mergify mergify Bot added the ci-passed label Mar 5, 2026
@yhmo yhmo added the lgtm label Mar 6, 2026
@sre-ci-robot sre-ci-robot merged commit ef6938d into milvus-io:master Mar 6, 2026
5 checks passed
@yhmo yhmo deleted the patch_ma branch March 9, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants