|
| 1 | +# `@relayflows/core` — `@agent-relay/sdk` v7 → v8 migration plan |
| 2 | + |
| 3 | +> Status: **proposed, not started.** This plan was produced from a read-only |
| 4 | +> investigation of `packages/core` and the relay monorepo HEAD (all `@agent-relay/*` |
| 5 | +> at `8.0.4` source / `8.1.2` npm). Review before any code is written. |
| 6 | +
|
| 7 | +## 1. Why core doesn't compile today |
| 8 | + |
| 9 | +`packages/core` straddles two incompatible majors of the relay SDK family: |
| 10 | + |
| 11 | +- Its **RelayAuth** code (`provisioner.ts`, `runner.ts`) imports symbols |
| 12 | + (`mintAgentToken`, `resolveAgentPermissions`, `createLocalJwksKeyPair`, |
| 13 | + `compileAgentScopes`, …) that **only exist in `@agent-relay/cloud@8.x`**. |
| 14 | +- Its **agent-spawn + broker-event** code (`runner.ts`) is written against the |
| 15 | + **`@agent-relay/sdk@7.x` API**, all of which was removed/redesigned in 8.x. |
| 16 | + |
| 17 | +cloud + sdk move in lockstep, so no published version set satisfies both. Bumping |
| 18 | +`@agent-relay/{cloud,config,sdk}` to `^8.0.4` (done) cleared all 16 cloud errors and |
| 19 | +surfaced the real work: **82 sdk errors, 80 in `runner.ts`.** |
| 20 | + |
| 21 | +## 2. The key realization — the broker surface became `HarnessDriverClient` |
| 22 | + |
| 23 | +> **REVISED after tracing the v8 source.** An earlier draft assumed the runner |
| 24 | +> had to be re-modelled onto the SDK's *messaging* event system (`message.created`, |
| 25 | +> session/status events). That is **not** the right target and would have been a |
| 26 | +> large rewrite. The correct, far smaller target: |
| 27 | +
|
| 28 | +v7 `AgentRelay` was one object that did **both** messaging **and** local |
| 29 | +process/PTY spawning + broker lifecycle. v8 split those: |
| 30 | +- **Messaging** → `@agent-relay/sdk` (`AgentRelay`). **Core doesn't use this for the |
| 31 | + broker** — it does messaging via `@relaycast/sdk` already. Every `this.relay.*` |
| 32 | + call in `runner.ts` is a *broker* concern (verified: `addListener`×7, `spawnPty`, |
| 33 | + `onBrokerStderr`, `listAgents`/`listAgentsRaw`, `human`). |
| 34 | +- **Broker / PTY / lifecycle** → `@agent-relay/harness-driver` `HarnessDriverClient`. |
| 35 | + |
| 36 | +`HarnessDriverClient` is essentially what v7 `AgentRelay`'s broker half became, and |
| 37 | +it preserves core's model **almost verbatim**: |
| 38 | + |
| 39 | +| v7 `AgentRelay` (broker) | v8 `HarnessDriverClient` | Match | |
| 40 | +|---|---|---| |
| 41 | +| `addListener('workerOutput'\|'messageReceived'\|'agentSpawned'\|'agentReleased'\|'agentExited'\|'agentIdle'\|'deliveryUpdate', …)` | same `addListener<K extends keyof HarnessDriverEvents>(event, handler)` — **same event names** | ✅ verbatim | |
| 42 | +| `workerOutput {name, chunk}` / `messageReceived {eventId,from,to,text,threadId}` / `agentIdle {name,idleSecs}` payloads | `WorkerOutputPayload` / `DriverMessage` / `AgentIdlePayload` — **same field names** | ✅ verbatim | |
| 43 | +| `onBrokerStderr(cb)` | `onStderr` **construction option** | ✅ moved to ctor | |
| 44 | +| `spawnPty(opts)` | `spawnPty(SpawnPtyInput)` — same field names | ✅ | |
| 45 | +| `listAgents()` / `listAgentsRaw()` | `listAgents(): ListAgent[]` (one method) | ✅ | |
| 46 | +| `shutdown()` | `shutdown()` | ✅ | |
| 47 | +| `BrokerEvent` re-emit | `onEvent((e: BrokerEvent)=>…)` — **same `BrokerEvent` union/kinds** | ✅ | |
| 48 | +| spawn return = rich `Agent` (`.release()`,`.waitForExit()`,`.waitForIdle()`,`.send()`,`.exitCode`) | `SpawnAgentResult` = plain `{name,runtime,sessionId?,pid?}` | ⚠️ **needs adapter** | |
| 49 | +| `relay.human({name}).sendMessage(...)` (idle nudge) | not on driver — route through relaycast messaging or `createHuman` | ⚠️ small | |
| 50 | + |
| 51 | +CLI utils (`getCliDefinition`/`resolveCliSync`/`resolveSpawnPolicy`) and `stripAnsi` |
| 52 | +are gone from the SDK → vendored into `core/src/cli-registry.ts` / `strip-ansi` pkg |
| 53 | +(done in WS-0). |
| 54 | + |
| 55 | +So the migration is **swap `this.relay: AgentRelay` → a `HarnessDriverClient`**, with |
| 56 | +two real pieces of work: (1) an agent-handle adapter, (2) the idle-nudge sender. |
| 57 | + |
| 58 | +## 3. Blockers / decisions |
| 59 | + |
| 60 | +1. **`@agent-relay/harnesses` — ✅ RESOLVED.** It is now published |
| 61 | + (`@agent-relay/harnesses@8.1.2`, confirmed resolvable on `registry.npmjs.org`). |
| 62 | + **Decision: core depends on `@agent-relay/harnesses` directly** (option a). It |
| 63 | + provides the `claude`/`codex`/`gemini`/`opencode` PTY harnesses and `createHuman`. |
| 64 | +2. **Direction confirmation.** Two coherent end-states: |
| 65 | + - **Forward (recommended):** migrate core's sdk usage to 8.x so it matches the |
| 66 | + cloud 8.x RelayAuth it already uses. Aligns with relay HEAD. |
| 67 | + - **Backward:** pin sdk+cloud to 7.1.1 and **remove the RelayAuth/provisioner |
| 68 | + feature** (scoped agent-token minting). Only viable if that feature is |
| 69 | + droppable — it appears intentional, so this is likely not acceptable. |
| 70 | +3. **Broker process ownership — ✅ largely addressed.** In v8 the broker is owned |
| 71 | + by the harness layer: `harness.create({ relay })` starts/attaches a `BrokerDriver` |
| 72 | + bound to the relay's workspace (`harnesses/src/broker-binding.ts`). So core's |
| 73 | + manual `brokerName` / `relay.shutdown()` / broker bootstrap mostly **goes away**. |
| 74 | + - **Remaining open question:** `onBrokerStderr` (broker diagnostic lines) has no |
| 75 | + obvious 1:1 on the `BrokerDriver` surface. Decide whether core still needs raw |
| 76 | + broker stderr, or whether session/status events suffice. Low priority — affects |
| 77 | + only diagnostic logging at `runner.ts:3336`. |
| 78 | + |
| 79 | +## 4. Workstreams (assuming "forward" + a resolved harness source) |
| 80 | + |
| 81 | +Ordered to minimize churn; each is independently compilable-checkable. |
| 82 | + |
| 83 | +### WS-0 — Utilities (small, do first) |
| 84 | +- `stripAnsi`: add `strip-ansi` dependency (or a 3-line local helper); update |
| 85 | + `runner.ts:28`, `channel-messenger.ts:1`, and static wrapper `runner.ts:7712`. |
| 86 | +- `getCliDefinition` / `resolveCliSync` / `resolveSpawnPolicy`: these were CLI- |
| 87 | + registry helpers (`runner.ts:31,32,30`, `process-spawner.ts:4,5`, |
| 88 | + `proxy-env.ts:1`). Confirm whether equivalents exist in `@agent-relay/config` / |
| 89 | + `harness-driver`; if not, **vendor** them into a new `core/src/cli-registry.ts` |
| 90 | + (they're self-contained PATH/known-dir resolution + arg/env policy). |
| 91 | +- Move `BrokerEvent` / `AgentSpawner` type imports to `@agent-relay/harness-driver` |
| 92 | + (`runner.ts:29,126`). |
| 93 | + |
| 94 | +### WS-1 — AgentRelay construction & options (`runner.ts:3145`, types in `builder.ts:4`, `run.ts:1`) |
| 95 | +- v7 `new AgentRelay({ brokerName, channels, env, requestTimeoutMs })` → |
| 96 | + v8 `new AgentRelay({ workspaceKey, baseUrl, retryPolicy, harness })`. |
| 97 | +- `env` / `brokerName` / `requestTimeoutMs` are gone from options. Decide where |
| 98 | + each goes: `env` → harness-driver `SpawnRuntimeInput`; `brokerName` → broker |
| 99 | + transport config; `requestTimeoutMs` → `retryPolicy`. |
| 100 | +- Update the `AgentRelayOptions` type references in `builder.ts`, `run.ts`, |
| 101 | + `runner.ts:481`. |
| 102 | + |
| 103 | +### WS-2 — Agent spawning (`runner.ts:444-456`, `6739`, `6742`, `7200`) |
| 104 | +Grounded against the now-available `@agent-relay/harnesses` API: |
| 105 | +- `getWorkflowSdkSpawner()` switch over `relay.claude/.codex/.gemini/.opencode` → |
| 106 | + lookup into `{ claude, codex, gemini, opencode }` imported from |
| 107 | + `@agent-relay/harnesses` (each is a `PtyHarness`). |
| 108 | +- `sdkSpawner.spawn(opts)` / `relay.spawnPty(opts)` → |
| 109 | + `await harness.create(input)` where |
| 110 | + `input: HarnessCreateInput = { name, model, args, task, cwd, env, channels, relay }` |
| 111 | + and the returned `HarnessAgent` (extends `RelayAgentClient`: `id`, `name`, |
| 112 | + `handle`, `sendMessage`, plus `cli`/`runtime`/`definition`) replaces the v7 |
| 113 | + `Agent` handle. **The current `spawnOptions` map ~1:1 onto `HarnessCreateInput`.** |
| 114 | +- `harness.create({ relay })` internally attaches/starts the broker for the |
| 115 | + workspace — remove core's manual broker bootstrap. |
| 116 | +- `relay.human({ name })` (idle-nudge sender, `7200`) → |
| 117 | + `createHuman({ relay, name })` from `@agent-relay/harnesses`. |
| 118 | +- Replace the `Agent` type (`runner.ts:126`, fields at `320/332/507`) with |
| 119 | + `HarnessAgent` / `RelayAgentClient`. |
| 120 | + |
| 121 | +### WS-3 — Event stream rewrite (the bulk: `runner.ts:3158-3343`) |
| 122 | +Re-model the 7 listeners onto the v8 surface. Mapping target per listener: |
| 123 | + |
| 124 | +| v7 listener (fields) | v8 source | |
| 125 | +|---|---| |
| 126 | +| `workerOutput` `{name, chunk}` (3158) | session event `terminal.output`/`transcript.chunk` → `event.text`/`chunk` | |
| 127 | +| `messageReceived` `{eventId,from,to,text,threadId}` (3206) | `addListener('message.created')` → `event.message.text`, `event.envelope.from/to`, `event.message.parentId` | |
| 128 | +| `agentSpawned` `{name,runtime}` (3254) | session `status.active` / spawn ack from driver runtime | |
| 129 | +| `agentReleased` `{name}` (3272) | driver `release()` / session `status.offline` | |
| 130 | +| `agentExited` `{name,exitCode,exitSignal}` (3285) | session `command.completed` `{exitCode}` / `status.offline` | |
| 131 | +| `deliveryUpdate` (3305) | `addListener` delivery events (`deliveries` surface) | |
| 132 | +| `agentIdle` `{name,idleSecs}` (3311) | `agent.status.becomes('idle')` predicate (note: `idleSecs` may be unavailable — see open question) | |
| 133 | +| `onBrokerStderr(line)` (3336) | no direct equivalent — see Blocker #3 | |
| 134 | + |
| 135 | +- Preserve the internal `BrokerEvent` re-emission contract (the |
| 136 | + `{type:'broker:event', runId, event}` shape consumed by the CLI) by **adapting** |
| 137 | + v8 events into the existing `BrokerEvent` union, so downstream (`WorkflowEvent`, |
| 138 | + CLI logging) is unchanged. This keeps the blast radius inside `runner.ts`. |
| 139 | + |
| 140 | +### WS-4 — Agent listing & teardown (`runner.ts:3486`, `5234`, `5250`, `6810`) |
| 141 | +- `relay.listAgents()` / `listAgentsRaw()` → `relay.messaging.agents.list()`. |
| 142 | + Rework the stale-retry-agent cleanup + the wait-for-cleanup poll accordingly. |
| 143 | +- `relay.shutdown()` → broker/transport lifecycle teardown (depends on Blocker #3). |
| 144 | + |
| 145 | +### WS-5 — Implicit-`any` cleanup (9 × TS7006, `runner.ts:2437…7077`) |
| 146 | +- Trivial once the surrounding types resolve; annotate the `.map`/callback params. |
| 147 | + Deferred to last because the types they touch change in WS-2/WS-3. |
| 148 | + |
| 149 | +## 5. Open questions for the SDK owners |
| 150 | +- Is per-agent **idle duration** (`idleSecs`) still observable, or only a boolean |
| 151 | + `idle` status? (Affects the idle-nudge debounce at `runner.ts:3311-3328`.) |
| 152 | +- Is there a supported way to get **broker stderr** / diagnostics off the |
| 153 | + `BrokerDriver` in v8, or should core drop raw broker-stderr logging? |
| 154 | +- ~~Will `@agent-relay/harnesses` be published?~~ ✅ Published at `8.1.2`. |
| 155 | + |
| 156 | +## 6. Suggested sequencing & checkpoints |
| 157 | +1. Resolve Blockers #1–#3 (decisions). |
| 158 | +2. WS-0 (utils) → typecheck: cloud + util errors gone. |
| 159 | +3. WS-1 (construction) → typecheck. |
| 160 | +4. WS-2 (spawn) → typecheck + a smoke spawn of one CLI. |
| 161 | +5. WS-3 (events) → typecheck + observe a real run's event stream. |
| 162 | +6. WS-4, WS-5 → full `tsc` clean. |
| 163 | +7. Run `packages/core` test suite (`vitest`) + one end-to-end workflow via the CLI. |
| 164 | + |
| 165 | +## 7. Effort estimate (REVISED DOWN) |
| 166 | +The `HarnessDriverClient` discovery collapses most of the original risk: |
| 167 | +- **WS-0 — done & verified** (85→77 errors, SDK unified on 8.x, utils vendored). |
| 168 | +- **WS-1/WS-3** — largely a `this.relay → HarnessDriverClient` swap; listeners and |
| 169 | + payloads are verbatim. The only event remap is `agentExited` exit code/signal |
| 170 | + (source it from the `agent_exited` `BrokerEvent` via `onEvent`, since the named |
| 171 | + `agentExited` payload is a method-less `DriverAgent`). |
| 172 | +- **WS-2** — the one real build: a `WorkflowAgentHandle` adapter wrapping |
| 173 | + `SpawnAgentResult` + the driver client to restore `.release()` / `.waitForExit()` |
| 174 | + / `.waitForIdle()` / `.exitCode` (driven off the driver event bus), plus routing |
| 175 | + the idle-nudge sender. Unify the two spawn paths (`getWorkflowSdkSpawner().spawn()` |
| 176 | + and `spawnPty()`) into one `spawnPty()` call. |
| 177 | +- **WS-4/5** — `listAgents()` + `client.release(name)` (note: `ListAgent` is data, |
| 178 | + has no `.release()`), `shutdown()`, implicit-`any`s. |
| 179 | + |
| 180 | +Revised estimate: **~1–1.5 days**, the bulk in the WS-2 handle adapter and a |
| 181 | +real-run validation of the event stream + spawn lifecycle. |
0 commit comments