Describe the bug
We are seeing intermittent timeouts when calling ServiceBusSessionReceiverClient.acceptNextSession().
After waiting ~245 seconds, the call fails with the following exception
java.lang.IllegalStateException: Timeout on blocking read for 245600000000 NANOSECONDS (client-timeout)
The call blocks waiting for a session and eventually throws the exception. This behavior started recently without application code changes. We see the CPU resources spiked right around the timestamp these exceptions happen and the pod gets killed and restarted causing interruption to the application.
Exception/Stack Trace
java.lang.IllegalStateException: Timeout on blocking read for 245600000000 NANOSECONDS (client-timeout) at com.azure.messaging.servicebus.ServiceBusSessionReceiverClient.lambda$acceptNextSession$2(ServiceBusSessionReceiverClient.java:171) at reactor.core.publisher.Mono.lambda$onErrorMap$28(Mono.java:3848) at reactor.core.publisher.Mono.lambda$onErrorResume$30(Mono.java:3938) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:85) at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124) at reactor.core.publisher.FluxTimeout$TimeoutOtherSubscriber.onError(FluxTimeout.java:342) at reactor.core.publisher.Operators.error(Operators.java:198) at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:56) at reactor.core.publisher.Mono.subscribe(Mono.java:4576) at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:302) at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281) at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:68) at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:68) at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270) at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.ContextPropagationOperator$RunnableWrapper.run(ContextPropagationOperator.java:373) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Suppressed: java.lang.Exception: #block terminated with an error at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104) at reactor.core.publisher.Mono.block(Mono.java:1779) at com.azure.messaging.servicebus.ServiceBusSessionReceiverClient.acceptNextSession(ServiceBusSessionReceiverClient.java:172)
To Reproduce
Create a Service Bus Session Pump to read messages from a topic subscription using sessions.
Seeing the issue in a kubernetes pod when there is no traffic on the pump and the sessions are idle.
Code Snippet
protected ServiceBusSessionReceiverClient createServiceBusSessionReceiverClient(String topicName, String subscriptionName) { return serviceBusClientBuilder() .sessionReceiver() .topicName(topicName) .subscriptionName(subscriptionName) .receiveMode(ServiceBusReceiveMode.PEEK_LOCK) .disableAutoComplete() .maxAutoLockRenewDuration(Duration.ofMinutes(2)) .buildClient(); }
ServiceBusSessionReceiverClient.acceptNextSession()
Expected behavior
We expect acceptNextSession() to handle any threads or I/O resources being blocked and not cause CPU spikes
Setup (please complete the following information):
- OS: Linux (Kubernetes)
- Library/Libraries: com.azure:azure-messaging-servicebus (7.17.12)
- Java version: 21
- App Server/Environment: Tomcat
- Frameworks: Spring Boot
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Describe the bug
We are seeing intermittent timeouts when calling
ServiceBusSessionReceiverClient.acceptNextSession().After waiting ~245 seconds, the call fails with the following exception
java.lang.IllegalStateException: Timeout on blocking read for 245600000000 NANOSECONDS (client-timeout)
The call blocks waiting for a session and eventually throws the exception. This behavior started recently without application code changes. We see the CPU resources spiked right around the timestamp these exceptions happen and the pod gets killed and restarted causing interruption to the application.
Exception/Stack Trace
java.lang.IllegalStateException: Timeout on blocking read for 245600000000 NANOSECONDS (client-timeout) at com.azure.messaging.servicebus.ServiceBusSessionReceiverClient.lambda$acceptNextSession$2(ServiceBusSessionReceiverClient.java:171) at reactor.core.publisher.Mono.lambda$onErrorMap$28(Mono.java:3848) at reactor.core.publisher.Mono.lambda$onErrorResume$30(Mono.java:3938) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onError(TracingSubscriber.java:85) at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124) at reactor.core.publisher.FluxTimeout$TimeoutOtherSubscriber.onError(FluxTimeout.java:342) at reactor.core.publisher.Operators.error(Operators.java:198) at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:56) at reactor.core.publisher.Mono.subscribe(Mono.java:4576) at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:302) at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:281) at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:420) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:68) at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.TracingSubscriber.onNext(TracingSubscriber.java:68) at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:270) at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:285) at io.opentelemetry.javaagent.shaded.instrumentation.reactor.v3_1.ContextPropagationOperator$RunnableWrapper.run(ContextPropagationOperator.java:373) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Suppressed: java.lang.Exception: #block terminated with an error at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104) at reactor.core.publisher.Mono.block(Mono.java:1779) at com.azure.messaging.servicebus.ServiceBusSessionReceiverClient.acceptNextSession(ServiceBusSessionReceiverClient.java:172)To Reproduce
Create a Service Bus Session Pump to read messages from a topic subscription using sessions.
Seeing the issue in a kubernetes pod when there is no traffic on the pump and the sessions are idle.
Code Snippet
protected ServiceBusSessionReceiverClient createServiceBusSessionReceiverClient(String topicName, String subscriptionName) { return serviceBusClientBuilder() .sessionReceiver() .topicName(topicName) .subscriptionName(subscriptionName) .receiveMode(ServiceBusReceiveMode.PEEK_LOCK) .disableAutoComplete() .maxAutoLockRenewDuration(Duration.ofMinutes(2)) .buildClient(); }ServiceBusSessionReceiverClient.acceptNextSession()Expected behavior
We expect
acceptNextSession()to handle any threads or I/O resources being blocked and not cause CPU spikesSetup (please complete the following information):
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report