fix(server): prevent crash (Sessions stop responding)#569
Conversation
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/27925515181 Artifacts expire in 7 days.
|
2 similar comments
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/27925515181 Artifacts expire in 7 days.
|
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/27925515181 Artifacts expire in 7 days.
|
When a workspace is cleaned up or deleted, the background process manager was throwing 'Workspace not found' error when trying to finalize records. This caused the server to crash in an infinite loop. The fix wraps the finalization operations in try-catch blocks to gracefully handle the case where a workspace no longer exists in the workspace manager. This allows the server to continue running even when old sessions reference deleted workspaces. Fixes the infinite loop crash that occurred when sessions were terminated.
|
PR builds are available as GitHub Actions artifacts: https://github.com/NeuralNomadsAI/CodeNomad/actions/runs/27991683159 Artifacts expire in 7 days.
|
|
Thanks for the PR. Thanks |
Steps to ReproducePrerequisites:
Reproduction Steps:
Error in logs: Why This HappensThe race condition occurs because:
User ImpactBefore fix:
After fix:
VerificationTested on production server:
|
|
I want an actual way to reproduce the issue, while you are using it. AI generated steps aren't helpful Thanks |
|
@shantur
Root cause: The infinite response loop. While trying to fix that loop, sessions became completely unresponsive. The "Workspace not found" error appeared during that debugging process. I can't reproduce it reliably because I don't know what caused the original infinite loop. |
|
Thanks. @JDis03 I don't think we should merge this without any reliable way to reproduce it. The issue you describing is most probably related to opencode instance not CodeNomad We can close this for now but can reopen if it happens again |
|
@shantur |
Problem
Sessions stop responding and become unusable. When you try to interact with them, the server hangs because the BackgroundProcessManager throws:
What happens:
Why workspaces get deleted:
The bug:
finalizeRecord()doesn't handle missing workspaces gracefully.Impact
Before fix:
After fix:
Solution
Wrap finalization operations in try-catch blocks:
removeFromIndex()- lines 602-606removeProcessDir()- lines 607-611upsertIndex()+publishUpdate()- lines 621-627When workspace is missing, operations fail silently with logging instead of throwing and hanging the server.
Testing
Why This Bug is Frustrating
The root cause: background processes try to finalize after their workspace is automatically cleaned up, causing uncaught exceptions that hang the server.