Problem
When swarm workers get stuck in blocked or executing state (e.g. after a mission timeout or gateway restart), there is no API endpoint to reset them. The only workaround is to manually write to each worker's ~/.hermes/profiles/<workerId>/runtime.json.
/api/swarm-lifecycle with action auto-sweep does not reset stale workers — it returns action: none for all entries.
/api/conductor-stop only resets workers that are part of a specific mission via missionIds, but does not handle workers stuck from other dispatch flows.
Proposed Solution
Add a new endpoint:
POST /api/swarm-runtime/reset
Body: { "workerIds": ["augur", "consul", ...] } // or omit for all
Which writes state: idle, phase: cancelled to the relevant runtime.json files — essentially exposing the existing internal resetNativeWorkerRuntime() function from conductor-stop.ts as a standalone endpoint.
Current Workaround
# For each stuck worker:
with open(f'~/.hermes/profiles/{worker}/runtime.json') as f:
d = json.load(f)
d.update({'state': 'idle', 'phase': 'cancelled', 'currentTask': None, ...})
with open(..., 'w') as f:
json.dump(d, f)
Context
Encountered while managing a Castra swarm setup where 6 workers (augur, explorator, consul, cursor, scrutator, speculator) were stuck after a Wave 1 mission timeout. All other configuration and management was done via API — this was the only operation that required direct filesystem access.
Problem
When swarm workers get stuck in
blockedorexecutingstate (e.g. after a mission timeout or gateway restart), there is no API endpoint to reset them. The only workaround is to manually write to each worker's~/.hermes/profiles/<workerId>/runtime.json./api/swarm-lifecyclewith actionauto-sweepdoes not reset stale workers — it returnsaction: nonefor all entries./api/conductor-stoponly resets workers that are part of a specific mission viamissionIds, but does not handle workers stuck from other dispatch flows.Proposed Solution
Add a new endpoint:
Which writes
state: idle, phase: cancelledto the relevantruntime.jsonfiles — essentially exposing the existing internalresetNativeWorkerRuntime()function fromconductor-stop.tsas a standalone endpoint.Current Workaround
Context
Encountered while managing a Castra swarm setup where 6 workers (augur, explorator, consul, cursor, scrutator, speculator) were stuck after a Wave 1 mission timeout. All other configuration and management was done via API — this was the only operation that required direct filesystem access.