Skip to content

Fix monotonic property assertion failure during delta application#179

Merged
fulmicoton merged 4 commits into
mainfrom
fulmicoton/178-monotonicity-assert-failing
Apr 17, 2026
Merged

Fix monotonic property assertion failure during delta application#179
fulmicoton merged 4 commits into
mainfrom
fulmicoton/178-monotonicity-assert-failing

Conversation

@fulmicoton-dd
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes the monotonic_property_after >= monotonic_property_before assertion failure in apply_delta
  • Changes monotonic_property() from (max(max_version, last_gc_version), max_version) to (last_gc_version, max_version)
  • Adds a regression test exercising the exact scenario that triggers the bug

Root Cause

A race condition between concurrent gossip exchanges: between the time a node sends its digest and receives the response delta, another peer may advance its state. The responding peer computes a reset delta from the stale digest, with last_gc_version == receiver.max_version but node_delta.max_version < receiver.max_version. The old monotonic property's first element max(mv, gc) would remain equal while max_version regressed, failing the assertion.

Fix

The new monotonic property (last_gc_version, max_version) is the correct invariant because last_gc_version only ever increases (on GC and on reset, where check_delta_status guarantees node_delta.last_gc_version > self.last_gc_version). With the first tuple element tracking last_gc_version independently, a reset always strictly increases it, so max_version can safely be lower (e.g. due to MTU truncation) without violating monotonicity.

Closes #178

The monotonic_property() check in apply_delta was using
(max(max_version, last_gc_version), max_version), which could be
violated when a reset delta had last_gc_version == self.max_version
but node_delta.max_version < self.max_version.

This happens in practice due to a race condition: between the time a
node sends its digest and receives the response delta, another peer
may have already advanced its state. The responding peer computes its
delta from the stale digest, triggering a reset with a max_version
lower than the receiver's current max_version. The old monotonic
property's first element (max(mv, gc)) would remain equal while
max_version regressed, failing the assertion.

The fix changes monotonic_property from (max(max_version,
last_gc_version), max_version) to (last_gc_version, max_version).
This is the correct invariant because last_gc_version only ever
increases: it advances on GC and on reset (where check_delta_status
guarantees node_delta.last_gc_version > self.last_gc_version).
With the first tuple element tracking last_gc_version independently,
a reset always strictly increases it, so max_version can safely be
lower (e.g. due to MTU truncation) without violating monotonicity.

Closes #178
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an assertion panic during delta application by redefining the node “monotonic property” invariant to stay monotonic across reset deltas where max_version can legitimately regress (e.g., MTU truncation), and adds a regression test for the reported race scenario (Issue #178).

Changes:

  • Redefine NodeState::monotonic_property() to return (last_gc_version, max_version) instead of (max(max_version, last_gc_version), max_version).
  • Improve the monotonicity assertion failure message during ClusterState::apply_delta.
  • Add a regression unit test covering reset deltas where last_gc_version increases while max_version decreases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread chitchat/src/state.rs Outdated
Copy link
Copy Markdown
Contributor

@trinity-1686a trinity-1686a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a sneaky one

Comment thread chitchat/src/state.rs
Comment on lines 197 to 198
// Apply delta and returns true if a reset was necessary
fn apply_delta(&mut self, node_delta: NodeDelta, now: Instant) -> DeltaStatus {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation for this function isn't ideal (not related to this PR, i was just trying to understand when max_version could go down)

@fulmicoton-dd fulmicoton-dd force-pushed the fulmicoton/178-monotonicity-assert-failing branch from 4ddca03 to 0e03b8d Compare April 17, 2026 09:03
@fulmicoton fulmicoton merged commit 18fc08c into main Apr 17, 2026
5 checks passed
@fulmicoton fulmicoton deleted the fulmicoton/178-monotonicity-assert-failing branch April 17, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Chitchat monotonicity assert failing.

4 participants