fix(event cache): don't deadlock when event-focused cache and redecryption are invoked#6542
Merged
jmartinesp merged 1 commit intomatrix-org:mainfrom May 6, 2026
Conversation
…ption are invoked The following ABBA situation would happen before this fix: - an event-focused cache starts pagination (holding onto the event-focused inner lock), and then takes the event cache room state lock (while saving the events it's received in the pagination). - redecryption takes the event cache room state lock, to update internal room events; then it would try to go through all the event-focused caches to replace their UTDs (thus taking all the inner locks, in turn). So, having an event-focused cache paginating while the redecryption happens would cause a deadlock. The fix consists in changing the order of operations when re-decrypting: instead of holding onto the event cache room state lock all the time, first perform all the replacements necessary in the room cache, *and* clone the event-focused / pinned-event locks; then, replace UTDs in those (after the event cache room state lock has been released).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6542 +/- ##
==========================================
+ Coverage 89.92% 89.97% +0.04%
==========================================
Files 381 381
Lines 105914 105989 +75
Branches 105914 105989 +75
==========================================
+ Hits 95243 95362 +119
+ Misses 7029 6994 -35
+ Partials 3642 3633 -9 ☔ View full report in Codecov by Sentry. |
Contributor
|
Thanks, I'll test it on Element X Android, but your explanation matches what I saw when trying to reproduce the issue, so I have high hopes! I wasn't able to reproduce with my previous setup, which was almost guaranteed to make the issue happen, so I think it's fixed! |
jmartinesp
approved these changes
May 6, 2026
Contributor
jmartinesp
left a comment
There was a problem hiding this comment.
LGTM, but a 2nd review from someone more familiar with r2d2 or the event caches would be appreciated!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The following ABBA situation would happen before this fix:
So, having an event-focused cache paginating while the redecryption happens would cause a deadlock. The fix consists in changing the order of operations when re-decrypting: instead of holding onto the event cache room state lock all the time, first perform all the replacements necessary in the room cache, and clone the event-focused / pinned-event locks; then, replace UTDs in those (after the event cache room state lock has been released).
(Likely) fixes #6524.
cc @jmartinesp @poljar
Signed-off-by: benjamin@bouvier.cc