Send multiple pings during large downloads, resetting self._ping_t by JamesParrott · Pull Request #923 · eclipse-paho/paho.mqtt.python

JamesParrott · 2026-06-15T14:48:43Z

Hi everyone, I'm very interested, but am not too invested in this PR - I'm keen to hear everyone's thoughts. All feedback is welcome. First and foremost, I think the intention of the spec to fix this problem (MQTT being a Lightweight messaging protocol) is that clients should pick a large enough keep alive interval, to cover the size of downloads they expect to receive on topics they're subscribed to. However subscribers don't control what others publish, and a broker may require a shorter keepalive than the client requested.

From local testing, on a mini slow network connection (between my dev laptop and a Beaglebone Black over a USB cable in my case), I can't speak for @Uezzi but I've found this PR fixes issue #914.

There are two separate checks that must be passed in loop_misc to avoid the client disconnecting itself (calling self._sock_close).

1), if we are connected, but pingreq(s) have gone out and a ping response has not been received after the most recent one left (self._ping_t == 0), and either no other msg packets have been sent within the keep alive, or none have been received during the keep alive interval. If a pingreq has not gone out recently, this check also sends one.
2) if a Ping Response has not been received within the keep alive interval of the last Ping request going out.

It looks to me like this second one is a straightforward implementation of "If a Client does not receive a PINGRESP packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server." from the spec.

Within the first check however (you may have already spotted it) is an edge case that forms a grey area - what should the client do if a ping req has gone out, no other packets have gone out since during the keep alive interval, but in that time it's been happily downloading the partial msg packets that form another packet.

The server / broker may well have sent a PINGRESP, but currently on a slow network, for large downloads and short keep alive intervals, it is possible that the client does not read it in time, to avoid either self disconnection mechanism above. The connection may be fine, but the PINGRESP may simply be stuck behind too many unread partial packets of a large payload. About 100MB to 250MB can recreate precisely this situation with a 30 second keep alive.

In this situation, before the large, slow download completes, a Paho.mqtt.Python client will disconnect itself. Regardless of the keepalive values they set and/or received, some users many find this astonishing. The spec (v5) does state:

If a Client does not receive a PINGRESP packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.

The essence of this PR is to change the definition of "reasonable amount of time" from the keep alive interval, to "the maximum of the keep alive interval and however much time is needed to complete the current download" The maximum payload size is ~ 255MB, and as long as the network connection can pass traffic at an average speed v > 0, there is a finite upper bound of ~255 / v. for this time. Furthermore, it is likely not an unreasonable amount of time, in relation to the expectations and experience of users of slow networks.

This PR fixes this simply by altering current behaviour, to ping again, even if a prior ping request has not received a ping response. This also resets self._ping_t avoiding the second check.
The two new situations when (if this PR is accepted) we would now ping again, even when no ping response has been received since the most recent ping req:

i) If the last message out was too long ago (beyond keep alive), but the Client knows other packets have been received. In this case we ping again to avoid the server kicking the client for inactivity, resetting self._ping_t. The check whether the Client should disconnect due to no ping resp being received for the previous pingreq is left to 2) with the latest value of self._ping_t, once no packets have been received for longer than self._keepalive.
ii) if the last packet in was too long ago (beyond keep alive). A ping resp is a normal packet, so has not been received recently enough either. But this case will also be (or would have already been) handled by 2).

pswsm · 2026-06-16T12:32:17Z

After reading the MQTT specification, the part relating to the Keep Alive, this is still in-spec IMO. As you've stated, the spec says "SHOULD", and this is a valid reason to skip this recommendation, I think. I don't feel like I'm the most qualified guy to make the final decision, though.

PierreF

I've re-read the MQTT spec and agree with your conclusion:

The client must send Ping request every keep-alive interval if not sending MQTT packet (we already did this, and continue to do this)
The client should disconnect after reasonable delay after sending pingreq if pingresp isn't received.
- If data is actively receive, I do believe the reasonable delay could be multiple minutes

Maybe a infinite delay is a bit too much (well depend on how slow the download is, but at 1400 bytes every 10 seconds, 255 MiB is 22 days ;)), but I think have a secondary delay (ping non-responded pingresp) and secondary settings (max keep-alive delay during large download) feels like too complex but this corner case.

PierreF · 2026-06-16T19:50:47Z

 on:
  pull_request:
-    branches: [master]
+    branches: [master, Do_not_self_disconnect_during_large_download]


Suggested change

branches: [master, Do_not_self_disconnect_during_large_download]

branches: [master]

(apply to other similar changes)

I'm not sure what the intend.

it's shouldn't work (i.e. shouldn't change CI behavior on this PR), or else there is a security issue with the CI

it's shouldn't be necessary to run CI jobs on PR. AFAIK they run because this is a PR targeting the branch master (and it's not your very first contribution)

If this change is really needed, we can't hard-code the same of a branch

Apologies. I should've reverted that branch name. Is it possible to relax the branches: [master] everywhere, so we can work on other branches on our forks?

PierreF · 2026-06-16T21:35:56Z

I think following test should pass but don't... it might a bit a "test" issue, but unsure.

Your PR rely on _last_msg_in being updated "soon" after any bytes received on the socket, however during large download, it only get updated once 100 retry were done, and its much more likely to reach the BlockingIO error (only way to not reach it, is the network send data fast enough to always fill the socket buffer, but slow enough to hit the download-too-large issue).

I do think we should review a bit when _last_msg_in is updated... and maybe renamed it (or add new tracker) _last_bytes_received. It would be more clear that is means "any bytes received from network". _last_msg_in sound more like "last MQTT packet received" - which is not what we need for this PR.

The test I've try to write (I hope it's correct, but can't guarantee):

class TestLargeDownload:
    def test_no_disconnect_during_large_download(self, fake_broker: FakeBroker) -> None:
        mqttc = client.Client(
            CallbackAPIVersion.VERSION2,
            "test_no_disconnect_during_large_download",
            transport=fake_broker.transport,
        )

        payload_data = b"very Very Big big payload"
        is_connected = threading.Event()
        is_subscribed = threading.Event()
        publish_received = threading.Event()

        def on_connect(mqttc, obj, flags, rc, properties):
            assert rc == 0
            is_connected.set()

        def on_subscribe(cl, userdata, mid, reason_code_list, properties):
            is_subscribed.set()
        
        def on_message(cl, userdata, message):
            assert isinstance(cl, client.Client)
            assert isinstance(message, client.MQTTMessage)
            assert message.payload == payload_data
            publish_received.set()

        mqttc.on_connect = on_connect
        mqttc.on_subscribe = on_subscribe
        mqttc.on_message = on_message

        mqttc.enable_logger()
        mqttc.connect("localhost", fake_broker.port, keepalive=1)
        mqttc.loop_start()

        try:
            fake_broker.start()

            connect_packet = paho_test.gen_connect(
                "test_no_disconnect_during_large_download", keepalive=1)
            fake_broker.expect_packet("connect", connect_packet)

            connack_packet = paho_test.gen_connack(rc=0)
            count = fake_broker.send_packet(connack_packet)
            assert count == len(connack_packet)

            is_connected.wait(timeout=5)
            rc, mid = mqttc.subscribe("topic")
            assert rc == MQTTErrorCode.MQTT_ERR_SUCCESS
            assert mid is not None

            subscribe_packet = paho_test.gen_subscribe(mid, "topic", 0)
            fake_broker.expect_packet("subscribe", subscribe_packet)

            suback_packet = paho_test.gen_suback(2, 0)
            count = fake_broker.send_packet(suback_packet)
            assert count == len(suback_packet)

            is_subscribed.wait(timeout=5)

            publish_packet = paho_test.gen_publish("topic", 0, payload_data, mid=3)
            assert len(publish_packet) >= 32
            publish_packet_1 = publish_packet[:5] # Enough to cover the packet type
            publish_packet_2 = publish_packet[5:10]
            publish_packet_3 = publish_packet[10:15]
            publish_packet_4 = publish_packet[15:20]
            publish_packet_5 = publish_packet[20:25]
            publish_packet_6 = publish_packet[25:]

            count = fake_broker.send_packet(publish_packet_1)
            assert count == len(publish_packet_1)
            time.sleep(0.8)

            count = fake_broker.send_packet(publish_packet_2)
            assert count == len(publish_packet_2)
            time.sleep(0.8)

            # 1.6 seconds elapsed, client MUST have sent pingreq
            ping_req_packet = paho_test.gen_pingreq()
            fake_broker.expect_packet("pingreq", ping_req_packet)

            count = fake_broker.send_packet(publish_packet_3)
            assert count == len(publish_packet_3)
            time.sleep(0.8)

            count = fake_broker.send_packet(publish_packet_4)
            assert count == len(publish_packet_4)
            time.sleep(0.8)

            # Another 1.6 seconds elapsed, client MUST have sent another pingreq
            ping_req_packet = paho_test.gen_pingreq()
            fake_broker.expect_packet("2nd pingreq", ping_req_packet)

            count = fake_broker.send_packet(publish_packet_5)
            assert count == len(publish_packet_5)
            time.sleep(0.8)

            count = fake_broker.send_packet(publish_packet_6)
            assert count == len(publish_packet_6)

            publish_received.wait(timeout=5)
        finally:
            mqttc.loop_stop()
            fake_broker.finish()

Don't disconnect. Send multiple pings during large downloads. Only disconn if _last_mg_in outside keep alive Don't close socket if _last_mg_in was within keep alive (don't rely on _ping_t only) Send out pingreq, do not disconnect, if downloading, or pingreq overdue. REvert loop misc but make it only disconnect on _last_msg_in, not _ping_t Run workflows on this branch This branch, not the other Disconnect if no ping response, but if downloading ping again, resetting _ping_t Run workflows on this branch Only re-ping if we are downloading

JamesParrott closed this Jun 15, 2026

JamesParrott reopened this Jun 15, 2026

PierreF requested changes Jun 16, 2026

View reviewed changes

JamesParrott force-pushed the Do_not_self_disconnect_during_large_download branch from 4d2d15b to 6d46856 Compare June 17, 2026 09:09

JamesParrott force-pushed the Do_not_self_disconnect_during_large_download branch from 6d46856 to 731607f Compare June 17, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send multiple pings during large downloads, resetting self._ping_t#923

Send multiple pings during large downloads, resetting self._ping_t#923
JamesParrott wants to merge 1 commit into
eclipse-paho:masterfrom
JamesParrott:Do_not_self_disconnect_during_large_download

JamesParrott commented Jun 15, 2026 •

edited

Loading

Uh oh!

pswsm commented Jun 16, 2026

Uh oh!

PierreF left a comment

Uh oh!

PierreF Jun 16, 2026

Uh oh!

JamesParrott Jun 17, 2026

Uh oh!

PierreF commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	branches: [master, Do_not_self_disconnect_during_large_download]
	branches: [master]

Conversation

JamesParrott commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pswsm commented Jun 16, 2026

Uh oh!

PierreF left a comment

Choose a reason for hiding this comment

Uh oh!

PierreF Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

JamesParrott Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

PierreF commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamesParrott commented Jun 15, 2026 •

edited

Loading