Skip to content

PINGREQ still going out even when publishes are happening within keep alive period#903

Open
neel-infocusp wants to merge 2 commits into
eclipse-paho:masterfrom
neel-infocusp:fix/unnecessary-ping-on-active-connection
Open

PINGREQ still going out even when publishes are happening within keep alive period#903
neel-infocusp wants to merge 2 commits into
eclipse-paho:masterfrom
neel-infocusp:fix/unnecessary-ping-on-active-connection

Conversation

@neel-infocusp

Copy link
Copy Markdown

@JamesParrott

Copy link
Copy Markdown
Contributor

@PierreF This PR should fail review and be rejected. The original issue (#813) is not an unreasonable feature request, (even though the spec clearly states:

The Client can send PINGREQ at any time, irrespective of the Keep Alive value, and check for a corresponding PINGRESP to determine that the network and the Server are available.

and this PR looks simple on the surface. But there are no tests for it, and it should be closed.

The original issue reasonably points out, that keep alive PINGREQs are unnecessary, when PUBACK or PUBREC responses are being recieved. But this is only true for QoS 1 and 2 messages. This PR affects QoS 0 messages too, and introduces a breaking change for them.

Under this PR, a client making a large write, will delay the keep alive PINGREQ until after it finishes writing all the data to the socket even if nothing's listening on the end of it. This would seem astonishing to me as a user, especially if I've deliberately set that keep alive timeout to a shorter time period than the anticipated upload time.

@PierreF

PierreF commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

I don't understand the fix from the issue description. I'm not sure to understand which bug we try to fix here. With or without this PR, I still have issue #813 using the reproduce method.

The issue #813 is about send pingreq even when there is a flow of PUBLISH / PUBACK. This change only impact the outgoing path (PUBLISH) and slightly change the meaning of _last_msg_out from last message (== MQTT packet) out, to last bytes out.

Could you explain what issue this change fix ?

@PierreF

PierreF commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

I had an issue in my test process (I tested the same version, not the PR). I do have a change with your PR and it fix #813.
It do change a bit the (expected) meaning of _last_msg_out from the expected last MQTT packet sent to last bytes out. But currently on main it's just broken and had no meaning (it's never updated on successful packet write).

@JamesParrott I'm not sure what is broken with QoS 0 ?

I partially agree with "client making a large write, will delay the keep alive PINGREQ". Pingreq will be sent if not packet are received (regardless of _last_msg_out). On condition that trigger a pingreq, only the sending part will delay pingreq due to this change, the receiving part of the condition will still trigger the pingreq. Which is totally wanted, as you said: we do want to emit a pingreq to be sure someone is listenning on the other end.

I think we should make a better change:

  • We should fix the meaning we expect for _last_msg_out. I think meaning an MQTT packet is full sent is the expected meaning
  • Therefore we should only update it as soon as to_process == 0, which will cover the disconnect case
  • But I don't get why we update it when write failed, and strongly think it should be removed.

@JamesParrott

Copy link
Copy Markdown
Contributor

I'm not sure what is broken with QoS 0 ?

My concern is this PR will cause QoS 0 Clients to self disconnect. Clients publishing under QoS 0 don't receive PUBACKs, so pinging shouldn't be "turned off" for them.

I now realise that there is other code by which such a QoS 0 Client will still ping the broker (based on self._last_msg_in being too long ago). But if the broker is delayed, or if the download speed is slow, or for whatever reason if loop_misc gets called again, before the PINGRESP is received, it alls _check_keepalive again, self._last_msg_in has exactly the same value as before (and now it was even longer ago), then as self._ping_t is now !=0 (from the PINGREQ), the code path will hit self._sock_close().

Maybe I'm wrong (that's certainly possible) or maybe the above just can't happen. But this is a potential breaking change with a large impact, and at the very least there should be a test for this QoS 0 scenario (plus in my opinion, stronger justification than the "nice to have" of avoiding unnecessary pings any compliant MQTT Client is are quite entitled to make).

@JamesParrott

Copy link
Copy Markdown
Contributor

OK, I tested this PR a little bit and couldn't create a self disconnect.

What I should've realised before is that when payload uploads are complete, QoS 1 and 2 publishers should receive PUBREC, PUBREL or PUBACK from the broker, which resets self._last_msg_in. QoS 0 publishers that don't receive those packets, have only existing mechanisms to update self._last_msg_in, so eventually should still ping.

It's your call about the risk/value @PierreF . I've thrown a lot at this, and I have to admit this PR's withstood it all. It's unreasonable of me to stand in the way now - I withdraw all my objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants