fix(dispatcher): use gateway HTTP API for isolated-session dispatch#11
Open
amittell wants to merge 1 commit into
Open
fix(dispatcher): use gateway HTTP API for isolated-session dispatch#11amittell wants to merge 1 commit into
amittell wants to merge 1 commit into
Conversation
Replaces the fork-and-spawn `openclaw isolated-session` primitive that was killing the launchd-tracked gateway parent and leaving a sibling node process orphaned on port 18789. The new path sends a gateway-protocol session.spawn request over the public HTTP/WS API; the gateway owns the session inside its own process and delivers the job output via the configured channel without any process-tree mutation. Diagnosed at openclaw/openclaw#88908 review context; rh-bot.lan was experiencing ~30 SIGTERM-cascade outages per week from the prior dispatch primitive.
There was a problem hiding this comment.
Pull request overview
This PR stabilizes the session_target=isolated dispatch path by routing isolated cron job turns through the gateway’s HTTP /v1/chat/completions API (instead of spawning a sibling CLI process), and pins that invariant behind an explicit, grep-able dispatch primitive.
Changes:
- Add
ISOLATED_DISPATCH_PRIMITIVEandrunIsolatedAgentTurn()togateway.jsas the sanctioned isolated dispatch entry point (thin wrapper over the existing HTTP chat-completions call site). - Wire
runIsolatedAgentTurnthroughdispatcher.jsdeps and routeexecuteAgent(isolated strategy) through it with a legacy fallback. - Add a regression test ensuring the isolated dispatch path does not reference subprocess primitives and performs an HTTP POST to
/v1/chat/completions.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
gateway.js |
Introduces an explicit isolated-dispatch contract marker and a named wrapper helper for HTTP-only agent turns. |
dispatcher.js |
Exposes runIsolatedAgentTurn in the dispatch dependency bag. |
dispatcher-strategies.js |
Routes isolated executeAgent turns through runIsolatedAgentTurn (with fallback) and documents the no-fork invariant. |
test.js |
Adds a regression test enforcing “no subprocess spawn” and validating HTTP /v1/chat/completions dispatch behavior. |
package-lock.json |
Bumps package/lock metadata version and records Node engine requirement update. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The previous dispatch primitive for
session_target=isolatedcron jobs forked a siblingopenclawCLI to spawn the isolated session. In production on rh-bot.lan that fork inherited the launchd-tracked gateway parent's listening socket on port 18789 and the SIGTERM cascade killed the parent every cycle, leaving an orphan node process bound to the port and the gateway offline for hours.The internal memory note
project_rh_bot_isolated_session_sigkillrecords roughly 30 SIGTERM events per week traceable to this dispatch path.Solution
This change names and pins the sanctioned isolated-session dispatch primitive in the scheduler:
gateway.jsgains an exportedISOLATED_DISPATCH_PRIMITIVEcontract marker plus arunIsolatedAgentTurnhelper. The helper is a thin wrapper around the existingrunAgentTurnWithActivityTimeoutso the same HTTP/v1/chat/completionscall site backs both names. The wrapper gives reviewers a single grep target for auditing the no-fork invariant.dispatcher.jsexposesrunIsolatedAgentTurnin the dispatch deps bag.dispatcher-strategies.js executeAgent(the strategy that handlessession_target=isolatedcron jobs) now routes throughrunIsolatedAgentTurn, with a fallback to the legacyrunAgentTurnWithActivityTimeoutname so the deps wiring tolerates older callers and tests.Net runtime effect: every isolated cron dispatch reaches the gateway via the public HTTP API only, inside the existing gateway process. No
child_process.spawn,fork, orexecFileis ever invoked on the isolated-job hot path.Verification
npm testpasses locally: 1741 passed, 0 failed (was 1722 before; gained 19 assertions from the new regression test).npm run verify:smokepacks cleanly.npm run lintis clean.The new regression test (
-- Isolated dispatch primitive: no subprocess spawn --) covers the contract two ways:runIsolatedAgentTurnand theexecuteAgentstrategy bodies are inspected for any reference tochild_process,execFile,spawn(,fork(, orexecSyncand must contain none.executeAgentis invoked with asession_target=isolatedjob against the real exported gateway helpers, withglobalThis.fetchstubbed. The test asserts/v1/chat/completionswas hit via HTTP POST and the assistant reply round-trips back throughexecuteAgent.result.content.Migration
Existing scheduled jobs with
session_target=isolatedcontinue to work transparently. Same job row, same strategy dispatch, same delivery semantics, same retry/idempotency surface; only the named entry point used to reach the gateway is now stable and auditable. Operators deploying this drop the orphan-on-port risk without any job migration or config change.The
runAgentTurnWithActivityTimeoutexport and its existing behavior are preserved for compatibility with any out-of-tree caller; the new helper delegates to it directly so behavior is bit-identical.