fix(telegram): stop EPIPE uncaughtException loop that pegs a CPU core after the owner exits#2081
Closed
flenard wants to merge 1 commit into
Closed
Conversation
… after the owner exits When the Claude Code process that owns the server's stdio exits, stderr becomes a broken pipe. The uncaughtException / unhandledRejection handlers wrote to stderr, so the write threw EPIPE, which surfaced as another uncaughtException, whose handler wrote to stderr again — a tight synchronous loop that pinned a CPU core and was immune to SIGTERM and the orphan watchdog (the event loop never idled to run them). Orphans accumulated across session restarts and saturated the host (forcing reboots), and held the bot's getUpdates slot, causing 409 Conflict for new sessions. Route every stderr write through a crash-safe helper that swallows the error and exits when the pipe is dead, and short-circuit the two global handlers on EPIPE so a dead pipe ends the process instead of re-entering. Confirmed via strace that the loop emitted ~14k failed writes/sec (matches the capture in anthropics#1049). Fixes anthropics#1049. Addresses the root cause behind anthropics#1794, anthropics#1500, and anthropics#1713.
Contributor
|
Thanks for your interest! This repo only accepts contributions from Anthropic team members. If you'd like to submit a plugin to the marketplace, please submit your plugin here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the zombie
bun server.tsthat pegs a CPU core at ~100% after a Claude Code session exits — the bug reported in #1049, and the root cause behind the "stuck event loop survives SIGTERM" symptom in #1794 (and #1500, #1713).When the Claude Code process that owns the server's stdio exits,
stderrbecomes a broken pipe. The crash handlers log through that same broken stream:So the sequence is:
stderr(a handler, the poll loop, orshutdown()) → EPIPE.uncaughtException→ the handler writes tostderragain → EPIPE again → re-enters → …This is a tight synchronous loop, so the event loop never idles. Every safeguard in the file — the
ppid/stdin orphan watchdog, theSIGTERM/SIGINThandlers, thesetTimeout(process.exit, 2000)inshutdown()— depends on the event loop turning, so none of them ever run. The process is unkillable viaSIGTERM, keeps holding the bot'sgetUpdatesslot (→ 409 Conflict for new sessions), and pins a core untilSIGKILL. On multi-session hosts the orphans accumulate across restarts and can saturate the machine.Reproduced via strace
The handler is literally logging its own EPIPE in a loop. Matches the capture in #1049.
Fix
Route every
stderrwrite through a crash-safe helper that swallows the error and exits when the pipe is dead (a deadstderrmeans the owning process is gone — there is nothing left to serve and nowhere to log), and short-circuit the two global handlers onEPIPEso they exit instead of re-entering:This removes the cause rather than papering over it: the event loop never jams, so the existing
SIGTERMhandler and orphan watchdog work as designed.Why this is safe
stderrbreaking is an unambiguous "owner is gone" signal. UnlikestdinEOF (which Claude Code closes momentarily during context compaction — the reason this file deliberately avoids exiting onstdinclose), thestderrpipe only breaks when the parent process actually dies. So exiting onstderrEPIPE won't false-fire during compaction.shutdown()'s log write now goes through the helper too, so a broken pipe there can no longer abort shutdown before the exit path.Testing
bun build server.tssucceeds (243 modules bundled, no type errors).Notes
external_plugins/telegram/server.ts); no version bump — happy to bump0.0.6 → 0.0.7or fold into fix(telegram): v0.0.7 reliability rollup — state-dir, PID guard, ppid watchdog, install stdout #1424 if maintainers prefer.EPIPEshort-circuit usesprocess.exit(0)(graceful teardown when the owner is gone); easy to switch to a non-zero code if that's the house style.