fix(plugin): detach daemon from job control + add --max-time to hook curls#457
Open
ANGELES00004 wants to merge 1 commit into
Conversation
…-max-time The SessionStart hook starts `engram serve` with a bare `&`, leaving the daemon in the parent shell's process group and attached to its controlling terminal. On Linux/WSL2 a Ctrl-Z (SIGTSTP) or terminal close can suspend the daemon into state T: it keeps port 7437 bound but stops answering. Since the Stop hook's curl has no --max-time, every session close then hangs forever and leaks a bash+curl process, accumulating one per stop. - session-start.sh: launch the daemon via setsid (fallback nohup) in its own session with no controlling terminal, immune to SIGTSTP/SIGHUP. - session-stop.sh, session-start.sh, post-compaction.sh: add --max-time 3 to the remaining hook curls so an unresponsive daemon fails fast instead of hanging and leaking processes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Linux/WSL2 the engram daemon can end up suspended (
State: T (stopped)), keeping port7437bound but never answering. Every Claude Code session close then hangs and leaks a process, and these accumulate over time.In my environment I found dozens of stuck processes — one
bash+ onecurlper session close, oldest ~1h36m:gentle-ai doctorflaggedengram:reachableas unhealthy because of this.Root cause
Two compounding issues in the Claude Code plugin hooks:
session-start.shstarts the daemon with a bare&:This leaves it in the parent shell's process group, attached to the controlling terminal (
TT: pts/N). ACtrl-Z(SIGTSTP), or the terminal being suspended/closed, then deliversSIGSTOPand the daemon freezes in stateTwhile still owning the port.session-stop.sh's curl has no--max-time:curl -sf "${ENGRAM_URL}/sessions/${SESSION_ID}/end" ...Against a frozen daemon the TCP handshake completes (kernel backlog) but no HTTP response ever arrives, so the curl hangs forever. The hook's
"timeout": 5+"async": truekills thebashwrapper, but thecurlchild is reparented to init and survives → one leaked process per session close.Fix
session-start.sh— launch the daemon withsetsid(fallbacknohup) in its own session with no controlling terminal, soSIGTSTP/SIGHUPfrom a terminal can no longer suspend it. This is the portable equivalent of running it as a service.session-stop.sh,session-start.sh,post-compaction.sh— add--max-time 3to the remaining hook curls so an unreachable/unresponsive daemon fails fast instead of hanging and leaking processes. The other curls in these scripts already used--max-time; this just makes the rest consistent.bash -npasses on all three scripts.Honest disclaimer
I'm not 100% sure this is the ideal fix — it's how the issue got diagnosed and resolved with the help of Claude Code in a real WSL2 environment. In my own setup I additionally moved the daemon to a
systemd --userservice to recover immediately, but that isn't portable so it's not proposed here. Feedback very welcome, and happy to adjust the approach — for example, daemonizing inside theengram servebinary itself (double-fork +setsid) might be a more robust long-term fix than patching the shell hooks.