Skip to content

CASSANDRA-21426: AssertionError in hasReplicaWithOngoingRepair when parallel_repair_count > 1#4858

Open
patrickclee0207 wants to merge 1 commit into
apache:cassandra-5.0from
patrickclee0207:auto-repair-assertion
Open

CASSANDRA-21426: AssertionError in hasReplicaWithOngoingRepair when parallel_repair_count > 1#4858
patrickclee0207 wants to merge 1 commit into
apache:cassandra-5.0from
patrickclee0207:auto-repair-assertion

Conversation

@patrickclee0207
Copy link
Copy Markdown

An assetion error is thrown from AutoRepairUtils.java when calling StorageService.instance.getTokenMetadata(). works fine when only 1 node is allowed parallel_repair_count: 1. When you increase this > 1 you start getting the assertion error

the change is to instead use StorageService.instance.getTokenMetadata().cachedOnlyTokenMap()

patch by Patrick Lee; reviewed by TBD for CASSANDRA-21426

using CCM and 9 nodes, with the auto-repair config

auto_repair:
  enabled: true
  repair_check_interval: 30s
  repair_task_min_duration: 5s
  repair_type_overrides:
    incremental:
      enabled: true
      min_repair_interval: 1m
  global_settings:
    parallel_repair_count: 3
    allow_parallel_replica_repair: false

you can reproduce and get the assertion error. running the same in CCM with the patched version validates that there are no assertion errors and there were 2 nodes running incremental repairs concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant