Skip to content

Commit 1ee2cd8

Browse files
committed
fix(hermes-adapter): kill old bridge process on reconnect to prevent zombie accumulation
When the MemOS bridge fails to initialize (e.g., stuck on orphan recovery due to external API rate limiting), _reconnect_bridge() would call close() on the old bridge — which only closes stdin without killing the process. The old process, stuck in its init loop, never reads stdin EOF and lives forever. Each failed reconnect cycle spawned new node processes, accumulating zombies until OOM killer was invoked. Changes: - bridge_client.py: add terminate() method (SIGKILL + wait) - __init__.py: call old_bridge.terminate() during reconnect
1 parent e0ef84d commit 1ee2cd8

2 files changed

Lines changed: 17 additions & 0 deletions

File tree

apps/memos-local-plugin/adapters/hermes/memos_provider/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1641,6 +1641,11 @@ def _reconnect_bridge(self, session_id: str = "", *, timeout: float = 30.0) -> N
16411641
if old_bridge:
16421642
with contextlib.suppress(Exception):
16431643
old_bridge.close()
1644+
# Kill the old process to prevent zombie accumulation.
1645+
# close() alone only closes stdin — headless bridges may
1646+
# not notice and keep running forever.
1647+
with contextlib.suppress(Exception):
1648+
old_bridge.terminate()
16441649
ensure_bridge_running()
16451650
self._bridge = MemosBridgeClient()
16461651
self._bridge.register_host_handler(

apps/memos-local-plugin/adapters/hermes/memos_provider/bridge_client.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,18 @@ def register_host_handler(
218218
"""
219219
self._host_handlers[method] = handler
220220

221+
def terminate(self) -> None:
222+
"""Kill the bridge subprocess (SIGKILL). Use when the bridge is
223+
stuck/unresponsive and must be forcefully cleaned up before
224+
starting a new one."""
225+
self._closed = True
226+
with contextlib.suppress(Exception):
227+
self._proc.stdin.close()
228+
with contextlib.suppress(Exception):
229+
self._proc.kill()
230+
with contextlib.suppress(Exception):
231+
self._proc.wait(timeout=5)
232+
221233
def close(self) -> None:
222234
if self._closed:
223235
return

0 commit comments

Comments
 (0)