Skip to content

Remove the lock around the ClientsMap in the SubscriptionManager#2821

Merged
gefjon merged 5 commits into
masterfrom
phoebe/clients-map-two-copies
Jun 6, 2025
Merged

Remove the lock around the ClientsMap in the SubscriptionManager#2821
gefjon merged 5 commits into
masterfrom
phoebe/clients-map-two-copies

Conversation

@gefjon

@gefjon gefjon commented Jun 2, 2025

Copy link
Copy Markdown
Contributor

Description of Changes

We were seeing a deadlock where at least one Tokio task was blocked on this lock. @jsdt had some hypotheses about the specific Tokio behaviors that were causing this, but the important thing is that the lock is not necessary at all, and so we can remove it without delving into the dark depths of the Tokio scheduler.

As part of this change, I split the send worker into being its own struct. That struct now holds its own copy of the clients map. This also simplifies the code that cleans up the send queue length metric.

Closes #2820 .

API and ABI breaking changes

N/a

Expected complexity level and risk

3: I may have missed a place where the client map is updated, which needs to notify the SendWorker.

Testing

  • @kim to test with replication-proto.
  • Bot test to see if we hit the deadlock again.

We were seeing a deadlock where at least one Tokio task was blocked on this lock.
@jsdt had some hypotheses about the specific Tokio behaviors that were causing this,
but the important thing is that the lock is not necessary at all,
and so we can remove it without delving into the dark depths of the Tokio scheduler.

As part of this change, I split the send worker into being its own struct.
That struct now holds its own copy of the clients map.
This also simplifies the code that cleans up the send queue length metric.
@gefjon gefjon requested review from joshua-spacetime and jsdt June 2, 2025 17:52

@Centril Centril left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, mostly requests for more docs, and otherwise LGTM.

Comment thread crates/core/src/subscription/module_subscription_manager.rs
Comment thread crates/core/src/subscription/module_subscription_manager.rs Outdated
Comment thread crates/core/src/subscription/module_subscription_manager.rs
Comment thread crates/core/src/subscription/module_subscription_manager.rs
Comment thread crates/core/src/subscription/module_subscription_manager.rs Outdated
@gefjon

gefjon commented Jun 3, 2025

Copy link
Copy Markdown
Contributor Author

Kim reports out-of-band that in his testing, he could not induce a deadlock with this branch.

@Centril Centril left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jsdt jsdt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets rid of the lock, but I think the send worker still has a race condition, which I explained in a comment.

Comment thread crates/core/src/subscription/module_subscription_manager.rs
Comment thread crates/core/src/subscription/module_subscription_manager.rs
@bfops bfops added the release-any Can land in any release window. Will not block a release deployment. label Jun 3, 2025
@gefjon gefjon enabled auto-merge June 6, 2025 20:34
@gefjon gefjon added this pull request to the merge queue Jun 6, 2025
Merged via the queue into master with commit c99dc82 Jun 6, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-any Can land in any release window. Will not block a release deployment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid taking locks in send_worker

4 participants