Skip to content

Replace sync-over-async Task.Delay().GetAwaiter().GetResult() with Thread.Sleep in sync pumps #4073

@thomhurst

Description

@thomhurst

Summary

Several sync code paths use Task.Delay(x).GetAwaiter().GetResult() (or .Wait()) to pause a dedicated thread. On a dedicated, sync-only thread this is strictly worse than Thread.Sleep(x): it allocates a Task, registers a timer on the TimerQueue, parks on a ManualResetEventSlim, and takes a HandleNonSuccessAndDebuggerNotification roundtrip to do what Thread.Sleep does with a single OS primitive.

Replacing them would:

  • Remove unnecessary allocations and TimerQueue registrations from hot pump loops.
  • Eliminate the TimerQueue as a variable when diagnosing pump-shutdown hangs (see context below).
  • Match the honest intent of these sites — they explicitly block a dedicated sync thread; there is nothing to yield to.

Call sites

All of these run on dedicated LongRunning threads or the Dispatcher control thread. None are on an async code path:

  1. src/Paramore.Brighter.ServiceActivator/Reactor.cs:123Task.Delay(ChannelFailureDelay).GetAwaiter().GetResult() (broken-circuit retry)
  2. src/Paramore.Brighter.ServiceActivator/Reactor.cs:131Task.Delay(ChannelFailureDelay).GetAwaiter().GetResult() (channel-failure retry)
  3. src/Paramore.Brighter.ServiceActivator/Reactor.cs:155Task.Delay(EmptyChannelDelay).GetAwaiter().GetResult() (empty-channel pause)
  4. src/Paramore.Brighter.ServiceActivator/Dispatcher.cs:348-350Task.Delay(100).GetAwaiter().GetResult() (Dispatcher.Start spin-wait for DS_RUNNING)
  5. src/Paramore.Brighter.MessagingGateway.RMQ.Sync/RmqMessageGatewayConnectionPool.cs:151Task.Delay(jitter.Next(5, 100)).Wait() (connect-retry jitter)
  6. src/Paramore.Brighter.MessagingGateway.RMQ.Sync/PullConsumer.cs:78Task.Delay(pause).Wait() (pull-consumer pause)

Proposed change

Replace each with Thread.Sleep(delay).

Do not touch Proactor.cs or any other await Task.Delay(...) in genuinely async paths — those correctly yield their pump thread back to the BrighterAsyncContext.

Context

During investigation of a CI-only test hang (When_A_Message_Dispatcher_Shuts_A_Connection, 15-minute timeout before hangdump), the Reactor pumps were observed parked in Task.InternalWait(Timeout.Infinite) on a DelayPromise at Reactor.cs:155 for the full timeout window. TimerQueue thread was alive, threadpool had idle workers, and no lock contention was visible in the dump. We cannot inspect the Task object directly (the Linux coredump's GC heap is non-walkable), so TimerQueue pathology cannot be ruled out.

Even if TimerQueue is not the cause, these sites have no reason to use async plumbing. Thread.Sleep is the right primitive here and may also help narrow down the hang investigation.

Acceptance

  • All six sites use Thread.Sleep(delay) instead of Task.Delay(delay).GetAwaiter().GetResult() / .Wait().
  • No new Task.Delay().GetAwaiter().GetResult() or .Wait() calls introduced elsewhere in sync code.
  • Proactor.cs and async paths unchanged.
  • Existing tests pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions