Skip to content

Feature: POST /api/swarm-runtime/reset endpoint for worker state cleanup #462

@8jth7gk8c7-stack

Description

@8jth7gk8c7-stack

Problem

When swarm workers get stuck in blocked or executing state (e.g. after a mission timeout or gateway restart), there is no API endpoint to reset them. The only workaround is to manually write to each worker's ~/.hermes/profiles/<workerId>/runtime.json.

/api/swarm-lifecycle with action auto-sweep does not reset stale workers — it returns action: none for all entries.
/api/conductor-stop only resets workers that are part of a specific mission via missionIds, but does not handle workers stuck from other dispatch flows.

Proposed Solution

Add a new endpoint:

POST /api/swarm-runtime/reset
Body: { "workerIds": ["augur", "consul", ...] }   // or omit for all

Which writes state: idle, phase: cancelled to the relevant runtime.json files — essentially exposing the existing internal resetNativeWorkerRuntime() function from conductor-stop.ts as a standalone endpoint.

Current Workaround

# For each stuck worker:
with open(f'~/.hermes/profiles/{worker}/runtime.json') as f:
    d = json.load(f)
d.update({'state': 'idle', 'phase': 'cancelled', 'currentTask': None, ...})
with open(..., 'w') as f:
    json.dump(d, f)

Context

Encountered while managing a Castra swarm setup where 6 workers (augur, explorator, consul, cursor, scrutator, speculator) were stuck after a Wave 1 mission timeout. All other configuration and management was done via API — this was the only operation that required direct filesystem access.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions