Skip to content

[BUG] dlt-receive -r reconnect does not work when server connection hangs (no socket timeout / keepalive) #817

@MedAmb

Description

@MedAmb

When running dlt-receive with the -r option (reconnect interval), the tool is expected to reconnect automatically after the connection to the DLT daemon/server is lost.

In practice, -r does not work reliably if the server-side socket is not closed properly (e.g., server crash, network disruption, or any failure mode where the TCP connection does not transition cleanly to a closed/error state from the client perspective). In these scenarios, the client socket can remain in a “connected” state and socket reads can block indefinitely.

Because dlt-receive does not configure any socket read timeout (and also does not appear to rely on TCP keepalive probing / dead peer detection), the receive loop can hang forever waiting for data. As a result, the code path that triggers reconnect is never reached, and dlt-receive never reconnects.

- Expected behavior

If the server becomes unreachable or unresponsive (including cases where the server does not close the socket properly), dlt-receive should eventually detect the stalled connection and reconnect after the interval specified by -r. The reconnect feature should work not only for clean disconnects but also for “half-open” / stalled TCP sessions.

- Actual behavior

dlt-receive can block indefinitely on socket I/O when the server disappears without properly closing the connection. No reconnect attempt occurs, even when -r <msecs> is provided.

- Steps to reproduce (example)

  • Start a DLT daemon/server.

  • Start dlt-receive with reconnect enabled, e.g.: dlt-receive -r 1000 <server-host>

  • Cause the server to become unresponsive in a way that does not cleanly close the TCP session.

  • Observe that dlt-receive hangs waiting for socket activity and does not reconnect.

- Impact
dlt-receive becomes stuck and requires manual restart to resume log receiving. Automated setups relying on -r for resilience do not recover from common failure modes (half-open/stalled sockets).

- Suggested direction / possible fixes
Add a configurable socket receive timeout so stalled reads eventually fail/timeout and reconnect logic can run.
Optionally enable TCP keepalive (and configure keepalive parameters where possible) to let the kernel detect dead peers.
Ensure reconnect is triggered not only on explicit disconnect/error, but also when the connection is idle/unresponsive beyond a reasonable threshold (especially when -r is set).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions