Fix monotonic property assertion failure during delta application#179
Merged
Conversation
The monotonic_property() check in apply_delta was using (max(max_version, last_gc_version), max_version), which could be violated when a reset delta had last_gc_version == self.max_version but node_delta.max_version < self.max_version. This happens in practice due to a race condition: between the time a node sends its digest and receives the response delta, another peer may have already advanced its state. The responding peer computes its delta from the stale digest, triggering a reset with a max_version lower than the receiver's current max_version. The old monotonic property's first element (max(mv, gc)) would remain equal while max_version regressed, failing the assertion. The fix changes monotonic_property from (max(max_version, last_gc_version), max_version) to (last_gc_version, max_version). This is the correct invariant because last_gc_version only ever increases: it advances on GC and on reset (where check_delta_status guarantees node_delta.last_gc_version > self.last_gc_version). With the first tuple element tracking last_gc_version independently, a reset always strictly increases it, so max_version can safely be lower (e.g. due to MTU truncation) without violating monotonicity. Closes #178
There was a problem hiding this comment.
Pull request overview
Fixes an assertion panic during delta application by redefining the node “monotonic property” invariant to stay monotonic across reset deltas where max_version can legitimately regress (e.g., MTU truncation), and adds a regression test for the reported race scenario (Issue #178).
Changes:
- Redefine
NodeState::monotonic_property()to return(last_gc_version, max_version)instead of(max(max_version, last_gc_version), max_version). - Improve the monotonicity assertion failure message during
ClusterState::apply_delta. - Add a regression unit test covering reset deltas where
last_gc_versionincreases whilemax_versiondecreases.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
trinity-1686a
approved these changes
Apr 16, 2026
Contributor
trinity-1686a
left a comment
There was a problem hiding this comment.
that's a sneaky one
Comment on lines
197
to
198
| // Apply delta and returns true if a reset was necessary | ||
| fn apply_delta(&mut self, node_delta: NodeDelta, now: Instant) -> DeltaStatus { |
Contributor
There was a problem hiding this comment.
the documentation for this function isn't ideal (not related to this PR, i was just trying to understand when max_version could go down)
4ddca03 to
0e03b8d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
monotonic_property_after >= monotonic_property_beforeassertion failure inapply_deltamonotonic_property()from(max(max_version, last_gc_version), max_version)to(last_gc_version, max_version)Root Cause
A race condition between concurrent gossip exchanges: between the time a node sends its digest and receives the response delta, another peer may advance its state. The responding peer computes a reset delta from the stale digest, with
last_gc_version == receiver.max_versionbutnode_delta.max_version < receiver.max_version. The old monotonic property's first elementmax(mv, gc)would remain equal whilemax_versionregressed, failing the assertion.Fix
The new monotonic property
(last_gc_version, max_version)is the correct invariant becauselast_gc_versiononly ever increases (on GC and on reset, wherecheck_delta_statusguaranteesnode_delta.last_gc_version > self.last_gc_version). With the first tuple element trackinglast_gc_versionindependently, a reset always strictly increases it, somax_versioncan safely be lower (e.g. due to MTU truncation) without violating monotonicity.Closes #178