Skip to content

Skip syncing number_of_replicas when follower has auto_expand_replica…#1662

Open
monusingh-1 wants to merge 1 commit into
opensearch-project:mainfrom
monusingh-1:fix/1661-auto-expand-replicas-sync
Open

Skip syncing number_of_replicas when follower has auto_expand_replica…#1662
monusingh-1 wants to merge 1 commit into
opensearch-project:mainfrom
monusingh-1:fix/1661-auto-expand-replicas-sync

Conversation

@monusingh-1
Copy link
Copy Markdown
Collaborator

@monusingh-1 monusingh-1 commented Apr 28, 2026

Description

When the follower index has index.auto_expand_replicas active (any value other than the literal "false"), the follower's local OpenSearch is responsible for deriving index.number_of_replicas from its own data-node count. CCR's 60s metadata sync was blindly copying the leader's number_of_replicas onto the follower, which in topologies where the leader has fewer data nodes than the follower caused a destructive cycle:

  1. CCR pushes leader's lower number_of_replicas -> STARTED replica shards are destroyed on the follower.
  2. OpenSearch's adaptAutoExpandReplicas() immediately corrects the count back up -> new UNASSIGNED shards enter peer recovery -> cluster goes YELLOW.
  3. 60s later CCR syncs again -> destroys the recovering shards before recovery completes -> cycle repeats forever.

Fix: in IndexReplicationTask.pollForMetadata(), after computing the desired settings from leader+overrides and before diffing against the follower, if the follower's auto_expand_replicas is active, strip number_of_replicas from BOTH the desired and follower settings so the diff loops neither add nor remove it. The leader's auto_expand_replicas itself continues to be synced, so once the user disables auto-expand on the leader the next sync cycle resumes syncing number_of_replicas.

Any explicit number_of_replicas override set via the replication metadata is also suppressed while auto-expand is active on the follower; this is intentional because a fixed replica count and auto-expand are contradictory and the follower's active auto-expand takes precedence.

Added unit tests for the two new companion-object helpers isAutoExpandReplicasActive and filterOutNumberOfReplicas covering: numeric range, "0-all", "false"/"False"/absent, key removal, no-op when absent, and empty settings.

Related Issues

Resolves #1661

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@monusingh-1 monusingh-1 force-pushed the fix/1661-auto-expand-replicas-sync branch 2 times, most recently from f8ed51e to a613e4a Compare April 28, 2026 07:18
…s active

Fixes opensearch-project#1661.

When the follower index has `index.auto_expand_replicas` active (any value
other than the literal "false"), the follower's local OpenSearch is
responsible for deriving `index.number_of_replicas` from its own data-node
count. CCR's 60s metadata sync was blindly copying the leader's
`number_of_replicas` onto the follower, which in topologies where the leader
has fewer data nodes than the follower caused a destructive cycle:

  1. CCR pushes leader's lower `number_of_replicas` -> STARTED replica
     shards are destroyed on the follower.
  2. OpenSearch's `adaptAutoExpandReplicas()` immediately corrects the
     count back up -> new UNASSIGNED shards enter peer recovery -> cluster
     goes YELLOW.
  3. 60s later CCR syncs again -> destroys the recovering shards before
     recovery completes -> cycle repeats forever.

Fix: in `IndexReplicationTask.pollForMetadata()`, after computing the
desired settings from leader+overrides and before diffing against the
follower, if the follower's `auto_expand_replicas` is active, strip
`number_of_replicas` from BOTH the desired and follower settings so the
diff loops neither add nor remove it. The leader's `auto_expand_replicas`
itself continues to be synced, so once the user disables auto-expand on
the leader the next sync cycle resumes syncing `number_of_replicas`.

Any explicit `number_of_replicas` override set via the replication
metadata is also suppressed while auto-expand is active on the follower;
this is intentional because a fixed replica count and auto-expand are
contradictory and the follower's active auto-expand takes precedence.

Added unit tests for the two new companion-object helpers
`isAutoExpandReplicasActive` and `filterOutNumberOfReplicas` covering:
numeric range, "0-all", "false"/"False"/absent, key removal, no-op
when absent, and empty settings.

Signed-off-by: Monu Singh <msnghgw@amazon.com>
@monusingh-1 monusingh-1 force-pushed the fix/1661-auto-expand-replicas-sync branch from a613e4a to 03a002a Compare April 28, 2026 07:19
@monusingh-1 monusingh-1 marked this pull request as ready for review April 28, 2026 07:53
internal fun isAutoExpandReplicasActive(settings: Settings): Boolean {
val value = settings.get(IndexMetadata.INDEX_AUTO_EXPAND_REPLICAS_SETTING.key) ?: return false
// OpenSearch stores the disabled sentinel as the literal string "false".
return !value.equals("false", ignoreCase = true)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldnt it be simpler to check value.equals("true)

// `number_of_replicas` are contradictory settings, and the follower's active auto-expand
// takes precedence until it is disabled.
if (isAutoExpandReplicasActive(followerSettings)) {
desiredSettings = filterOutNumberOfReplicas(desiredSettings)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this logic above at line 543 where we are already iterating over the settings ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add it in line 535 when the desiredSettingsBuilder is being initialised. have done the same fix here in this pr #1664

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] CCR settings sync overwrites number_of_replicas on follower, causing perpetual yellow state when leader has fewer data nodes

3 participants