Skip to content

nxd_mqtt_client_disconnect() does not release received, but not yet dequeued, packets leading to eventual exhaustion of the packet pool #384

@jespertragardh

Description

@jespertragardh

Describe the bug
https://github.com/eclipse-threadx/rtos-docs-asciidoc/blob/15b1e5ea0176897f7e321e9bf5e99529ef452f22/rtos-docs/netx-duo/modules/ROOT/pages/netx-duo-mqtt/chapter1.adoc states:

To disconnect and terminate the MQTT client service, the application shall use the service nxd_mqtt_client_disconnect() and nxd_mqtt_client_delete(). Calling nxd_mqtt_client_disconnect() simply disconnects the TCP connection to the broker. It releases messages already received and stored in the receive queue.

However, this does not actually work.

_nxd_mqtt_process_disconnect() in nxd_mqtt_client.c L1932–L1936 calls _nxd_mqtt_release_receive_packet(client_ptr, client_ptr -> message_receive_queue_head, NX_NULL); in a loop. But _nxd_mqtt_release_receive_packet() never calls nx_packet_release(). So the packet chain is dropped from client_ptr -> message_receive_queue_head but remains allocated in the packet pool without any pointers to it from the MQTT code. Effectively, the packet chain is leaked.

Compare with the code in client_ptr -> message_receive_queue_head() that does call nx_packet_release() after removing the packet chain.

So if the client calls disconnect when there are still messages not dequeued with _nxd_mqtt_client_message_get() - e.g. because the client needs to give up and retry the connection - packets will leak. If this is repeated, eventually there will not be any free packets in the pool (depending on the capacity of the pool) and neither will any be released. And the TCP/IP layer will run out of packets and network traffic will stop completely. Effectively deadlocking the system.

This cannot be worked around by nxd_mqtt_client_delete() and creating a new one as the leak is in the packet pool provided to nxd_mqtt_client_create() and processing the delete event also calls the broken _nxd_mqtt_release_receive_packet(). The pool could be deleted and re-created but this requires stopping and re-initializing the TCP/IP-stack.

The problem can be fixed by adding the call to nx_packet_release() at nxd_mqtt_client.c L1095 (after the if-ladder in _nxd_mqtt_release_receive_packet(). Unfortunately, I don't have the resources to submit a PR at the moment.

https://github.com/eclipse-threadx/netxduo/blob/8b6e03ac30ab688bec02c69d42f2304b7f72a202/addons/mqtt/nxd_mqtt_client.c

Expected behavior
Disconnecting and re-connecting should not lead to memory leaks.

Impact
Annoyance as we would prefer to not have to patch the NetXDuo code (obviously, the solution to that is to contribute a PR).

For others this can be a showstopper if they only disconnect in some error scenario which is not frequent and then you get unresponsive devices late in the development cycle. Hard to asses how fast a reasonably competent developer can root cause it to the memory leak.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

Status
Evaluation

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions