Skip to content

Race condition: assertion failure in stream_engine_base.cpp: out_event() when _io_error is set #4841

@p4l1ly

Description

@p4l1ly

Problem:
With ZMQ_HEARTBEAT_IVL and ZMQ_HEARTBEAT_TIMEOUT enabled (e.g. ROUTER/DEALER over TCP), the process can crash with:

Assertion failed: !_io_error (src/stream_engine_base.cpp:316)

Cause: in_event_internal() sets _io_error = true and removes the fd from the poll set when the receive pipe hits backpressure (e.g. RCVHWM) or on other input-stop paths. The I/O thread’s poller can still deliver a POLLOUT and call out_event() before the engine is torn down. out_event() asserts !_io_error, so the process aborts. This is a race between teardown and a stale/speculative out_event callback.

Reproduction is more likely when the application stops reading from the socket (e.g. under load or in a “stuck” state): the receive pipe fills, backpressure sets _input_stopped then _io_error, and the poller may still invoke out_event().

Solution:
In stream_engine_base.cpp, in out_event(), replace the assert with an early return so that when _io_error is already set we no-op and let teardown proceed:

void zmq::stream_engine_base_t::out_event ()
{
    if (_io_error)
        return;
    // ... rest unchanged
}

(Remove the line zmq_assert (!_io_error);.) Whenever _io_error is true, the correct behavior is to not run the rest of out_event(); the assert was an invariant that this race violates.

Environment:

Steps to reproduce:
See #4364 (PUB/SUB with small heartbeat timeout). In our case: ROUTER with ZMQ_HEARTBEAT_IVL and ZMQ_HEARTBEAT_TIMEOUT set; multiple DEALER clients; stop calling recv on the ROUTER for several seconds (simulating overload). The receive pipe hits HWM, backpressure triggers the path that sets _io_error, and the assertion in out_event() can fire.

Expected result:
No crash. When _io_error is set, out_event() should return immediately; teardown continues without aborting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions