Resillience to RabbitMQ cluster outage/behavior #465

PaulKVCare · 2026-04-17T11:22:04Z

PaulKVCare
Apr 17, 2026

We are currently evaluating SMB as the messaging framework for an application. We use a 3 node RabbitMQ cluster with quorum queues.

Basic behavior seems fine for our application. Although we haven't tested with 100+ exchanges/queues that we intent to use.

Our main focus right now is on the behavior of SMB with respect to RabbitMQ cluster (with haproxy in front) issues. Which effectively is two scenario's.

1 node goes down
partial splitbrain between 2 of the 3 nodes

We want recovery preferrably within seconds. Since we use the messages for operational flow control, and +5 second delays is directly visible for our users.

The partial splitbrain seems to behave well. Surprisingly since other message framework seem to give a whole lot of trouble.
However when a node goes does we experience all sort of problems. Detection of the disconnect is almost instant. And recovery of the connection happens (to a different node) after a few seconds. Default behavior.
However publisher hang and never seem to recover. Using a timeout on the publish prevents the hang. But after that publishes seem 'flaky'
And some consumers don't recover at all. Message consuming simply stops. Allowing multiple instances of the consumers partly solves this (mask the issue, actually) since the next disconnect will likelely block the remaining consumers as well.

Are there any suggestions for what we could try to improve the behavior?

Best regards,

Paul

zarusz · 2026-04-19T10:38:43Z

zarusz
Apr 19, 2026
Maintainer

Hey, thanks for raising this scenario. I would suggest to use the latest 3.4.0 (there were few improvements to RMQ and we've moved to the latest RMQ Client). The consumers should resume would be my expectation. Are you using a outbox plugin or health check circuit breaker plugin? They might play a role.

At the end of the day, turning on INFO/DEBUG logging should help for the SlimMessageBus.Host.* loggers.
SMB uses MS Extensions Loggingg facade that is picked up from the MSDI.

Curious if the issue is isolated to SMB or the RMQ client. We're on the latest version as of this time.

Let me know how I could help here.

If you'd shared the logs or sample. Also happy to take in any PRs for a load/resiliency test that others would benefit.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resillience to RabbitMQ cluster outage/behavior #465

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Resillience to RabbitMQ cluster outage/behavior #465

Uh oh!

PaulKVCare Apr 17, 2026

Replies: 1 comment

Uh oh!

zarusz Apr 19, 2026 Maintainer

PaulKVCare
Apr 17, 2026

zarusz
Apr 19, 2026
Maintainer