Skip to content

All connections stop processing events at the same time #1292

@morland96

Description

@morland96

Hi guys!
I'm recently facing an issue that all my connections turn into a strange state almost at the same time

So basically, I have a customised "telemetry" packet a little bit under 1k size sending every 5 seconds. After running them for couple hours, all of such "telemetry" events starting timeout. I double checked the server side and I can confirm that none of those events been received (I logged at the first line of my event handler".

The strangest part is: while these kind events keeping timeout, the socketio pingpong packets are still sending as normal. The log looks like this:

2023-12-31 03:15:51,394 - INFO - Sending packet MESSAGE data 2/node,1659["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:15:51,469 - INFO - Received packet MESSAGE data 3/node,1659[{"return_code":0,"ts":"2023-12-30T19:15:51.380215","message":"Telemetry resolved"}]  
2023-12-31 03:15:56,494 - INFO - Sending packet MESSAGE data 2/node,1660["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:15:56,570 - INFO - Received packet MESSAGE data 3/node,1660[{"return_code":0,"ts":"2023-12-30T19:15:56.480691","message":"Telemetry resolved"}]  
2023-12-31 03:15:58,958 - INFO - Received packet PING data                                                                                                     
2023-12-31 03:15:58,958 - INFO - Sending packet PONG data                                                                                                      
2023-12-31 03:16:01,598 - INFO - Sending packet MESSAGE data 2/node,1661["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:16:01,674 - INFO - Received packet MESSAGE data 3/node,1661[{"return_code":0,"ts":"2023-12-30T19:16:01.584726","message":"Telemetry resolved"}]  
2023-12-31 03:16:06,690 - INFO - Sending packet MESSAGE data 2/node,1662["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:16:26,725 - INFO - Sending packet MESSAGE data 2/node,1663["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:16:46,767 - INFO - Sending packet MESSAGE data 2/node,1664["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:16:59,013 - INFO - Received packet PING data                                                                                                     
2023-12-31 03:16:59,022 - INFO - Sending packet PONG data                                                                                                      
2023-12-31 03:17:06,802 - INFO - Sending packet MESSAGE data 2/node,1665["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_
2023-12-31 03:17:26,821 - INFO - Sending packet MESSAGE data 2/node,1666["report_telemetry","{\"system\": {\"current_config_tag\": \"2afa24ceba44e4ac\", \"cpu_

As you can see, servers stop sending ACKs from 2023-12-31 03:16:06,690 and I can confirm that the event handler is not been triggered. Why this happens?

More information about this issue:

  • I have an nginx proxy, but I tried to let the client talk to the server directly. Same result.
  • All clients (roughly 10) suppose to be long live. It was working while for a really long time (about a month) but start failing recently.
  • It happens everyday and for all clients almost the same time (within few seconds)
  • All clients are forced to use the websocket mode, not polling then upgrading
  • I have changed the ping interval to 60s
  • I'm actually using flask-socketio, but I think this is not related to flask-socketio itself.
  • The backend is been served via uwsgi with gevent, the config looks like this:
[uwsgi]
strict = true
master = true
enable-threads = true
vacuum = true                          ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true                     ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true

wsgi-file = wsgi.py
callable = app
http-websockets = true
gevent = 1024

disable-logging = true
log-4xx = true
log-5xx = true

Versions:
flask-socketio==5.2.0
python-socketio[client]==5.9.1
python-engineio==4.7.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions