Skip to content

Improve websocket stability, timeouts and logging#83

Open
pyroch wants to merge 1 commit intoolijeffers0n:masterfrom
pyroch:master
Open

Improve websocket stability, timeouts and logging#83
pyroch wants to merge 1 commit intoolijeffers0n:masterfrom
pyroch:master

Conversation

@pyroch
Copy link
Copy Markdown

@pyroch pyroch commented Apr 7, 2026

This PR improves websocket reliability and reduces resource usage.

Added websocket keepalive:

  • ping_interval=20
  • ping_timeout=10

Added recv timeout (30s) to prevent hanging connections
Handle asyncio.TimeoutError and close stale connections
Handle ConnectionClosedOK (normal close)
Reduced logging overhead:
replaced logger.exception with logger.error
removed unnecessary traceback logging
Simplified connection error handling

Why?
Between rust server restarts the ws sends a huge amount of logs (even if supressed in logger - it takes CPU).
This causes big CPU spikes when you waiting for rust server to restarts (usually happens every day).
This fix resolves that behaivour

Previously:
websocket could hang indefinitely on recv()
dead connections were not detected
excessive traceback logging increased CPU usage !!!

Now:
stale connections are detected and closed
keepalive ensures connection health
logging is lighter and cleaner

I have it tested on my server with py-spy - difference is about 4400 samples without fix vs 40 samples with fix
Also you can see on grafana graph that is CPU usage now is not even going up between rust server restarts
image

No breaking changes.

@olijeffers0n
Copy link
Copy Markdown
Owner

Have you noticed issues from sending ping packets? I believe I explicitly disabled this in the past due to it causing issues where the server wouldn't actually reply.

app_message.parse(data)

except ConnectionClosedError as e:
except asyncio.TimeoutError:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are only waiting 30 seconds for a packet, but you just are not sending anything at that current moment, that is going to cause you to cycle through connections super fast despite not actually needing to, surely?

self.connection = None

async def run(self) -> None:
RECV_TIMEOUT = 30
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a file-level static variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants