Skip to content

Commit a151360

Browse files
committed
feat(runner): harden structured CLI adapters and finalization
1 parent 6885d3f commit a151360

9 files changed

Lines changed: 1094 additions & 166 deletions

File tree

CONTRIBUTING.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,11 @@ local execution behavior for the DevAgent stack.
1111
- Node `20+`
1212
- the four sibling repos checked out side by side
1313

14-
For the supported setup path, start from [`devagent-hub`](../devagent-hub/README.md):
14+
For local development, bootstrap the sibling repos directly:
1515

1616
```bash
17-
cd ../devagent-hub
18-
bun install
19-
bun run bootstrap:local
17+
cd ../devagent-sdk && bun install
18+
cd ../devagent-runner && bun install
2019
```
2120

2221
## Local checks before opening a PR
@@ -28,11 +27,14 @@ bun run test
2827
bun run check:oss
2928
```
3029

31-
If your change affects the live path, also run the Hub baseline checks from `../devagent-hub`.
30+
If your change affects a downstream integration path, run that consumer's baseline checks in
31+
addition to the runner checks above.
3232

3333
## Contribution rules
3434

3535
- Keep the DevAgent path stable first.
3636
- Treat other adapters as experimental unless live validation proves parity.
37+
- Keep non-DevAgent adapter command resolution aligned with adapter constructor overrides and runner
38+
env overrides.
3739
- Keep PRs small and lifecycle-focused.
3840
- Update docs if you change setup, adapter maturity, or validation claims.

README.md

Lines changed: 64 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Local execution substrate for DevAgent workflow tasks.
55
## Maturity
66

77
Public alpha component. The repo is public, but the packages remain unpublished and are consumed
8-
through the sibling-repo bootstrap path documented in `devagent-hub`.
8+
through local workspace dependencies during development.
99

1010
## Responsibilities
1111

@@ -54,40 +54,94 @@ devagent-runner run --request /tmp/request-plan.json
5454
devagent-runner inspect <run-id>
5555
```
5656

57+
## Command Resolution
58+
59+
The runner adapters resolve `codex`, `claude`, and `opencode` commands in this order:
60+
61+
1. adapter constructor override or resolver
62+
2. runner env overrides
63+
3. PATH defaults
64+
65+
Runner env overrides for the standalone CLI:
66+
67+
```bash
68+
DEVAGENT_RUNNER_CODEX_BIN=/path/to/codex
69+
DEVAGENT_RUNNER_CLAUDE_BIN=/path/to/claude
70+
DEVAGENT_RUNNER_OPENCODE_BIN=/path/to/opencode
71+
```
72+
73+
Default PATH command names are `codex`, `claude`, and `opencode`.
74+
75+
If the runner is embedded as a library, callers can pass either a fixed command string or a
76+
request-aware resolver function to `CodexAdapter`, `ClaudeAdapter`, or `OpenCodeAdapter`.
77+
5778
## Local Development Wiring
5879

5980
For local MVP work this repo consumes `@devagent-sdk/*` through file dependencies from
60-
`../devagent-sdk`, and `devagent-hub` consumes this runner through file dependencies from
61-
`../devagent-runner/packages/*`.
81+
`../devagent-sdk`. Downstream consumers can depend on `../devagent-runner/packages/*` during local
82+
development.
6283

63-
The supported local setup path is the bootstrap flow documented in
64-
[`devagent-hub/README.md`](../devagent-hub/README.md) and
65-
[`devagent-hub/BASELINE_VALIDATION.md`](../devagent-hub/BASELINE_VALIDATION.md).
84+
Keep the runner repo self-contained: setup, validation, and support claims should be documented
85+
here rather than delegated to a consumer repo.
6686

6787
## Validated Flow
6888

6989
The runner has been validated in the canonical path:
7090

7191
```text
72-
devagent-hub -> LocalRunnerClient -> LocalRunner -> DevAgentAdapter -> devagent execute
92+
TaskExecutionRequest -> LocalRunner -> DevAgentAdapter -> devagent execute
7393
```
7494

7595
Adapter maturity today:
7696

7797
- `DevAgentAdapter`
7898
- live-validated and supported for the MVP path
7999
- `CodexAdapter`
100+
- structured CLI integration with machine-readable event parsing
80101
- `ClaudeAdapter`
102+
- structured CLI integration with streamed JSON event parsing
81103
- `OpenCodeAdapter`
82-
- adapter-present and smoke-tested, but still experimental
104+
- structured CLI integration with JSON event parsing
105+
106+
All non-DevAgent adapters now normalize machine-readable CLI output into the SDK event/result
107+
model, write standard markdown artifacts, and rely on runner-side read-only enforcement for review
108+
and verify stages. Support claims still depend on live validation evidence.
109+
110+
## Validation
111+
112+
Use the shared SDK fixture shape or a generated request JSON and validate each executor through the
113+
debug CLI.
114+
115+
Examples:
116+
117+
```bash
118+
devagent-runner run --request /tmp/codex-request.json
119+
devagent-runner run --request /tmp/claude-request.json
120+
DEVAGENT_RUNNER_OPENCODE_BIN=/Applications/OpenCode.app/Contents/MacOS/opencode-cli \
121+
devagent-runner run --request /tmp/opencode-request.json
122+
```
123+
124+
The supported bar for promoting an executor path beyond experimental is:
125+
126+
- live CLI validation for `triage`, `plan`, `implement`, `verify`, `review`, and `repair`
127+
- downstream integration validation through PR handoff
128+
- cancellation and failure drills still passing
129+
130+
Current CLI smoke-validation snapshot as of 2026-03-11:
131+
132+
- `devagent`: `triage`, `verify`
133+
- `codex`: `implement`, `review`
134+
- `claude`: `plan`, `repair`
135+
- `opencode`: `triage`, `plan`, `review`, `verify`
83136

84-
Treat the experimental adapters as development surfaces, not production-equivalent executor paths.
137+
Those smoke passes confirm current CLI interoperability and artifact persistence. They do not by
138+
themselves promote `codex`, `claude`, or `opencode` beyond experimental status.
85139

86140
## Limitations
87141

88142
- packages are not published to a registry yet
89143
- the supported contributor path is the four-repo sibling checkout flow
90-
- only the DevAgent adapter is live-validated today
144+
- only executor paths with current live validation evidence should be described as supported
91145

92146
## Development
93147

SUPPORT.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,16 @@ Use the bug report template for workspace, eventing, cancellation, or adapter is
1212

1313
Supported:
1414

15-
- the DevAgent adapter path exercised by `devagent-hub -> devagent-runner -> devagent execute`
15+
- the DevAgent adapter path exercised through `devagent-runner -> devagent execute`
1616

1717
Experimental:
1818

19-
- `codex`, `claude`, and `opencode` adapters until they have comparable live validation
19+
- `codex`, `claude`, and `opencode` adapters remain validation-gated until they have comparable
20+
live evidence through both `devagent-runner` CLI and at least one downstream integration
21+
- runner CLI smoke passes alone are not enough to treat those adapters as supported
22+
23+
Binary overrides for standalone runner usage:
24+
25+
- `DEVAGENT_RUNNER_CODEX_BIN`
26+
- `DEVAGENT_RUNNER_CLAUDE_BIN`
27+
- `DEVAGENT_RUNNER_OPENCODE_BIN`

WORKFLOW.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,13 @@ bun run check:oss
2727

2828
## Done means
2929

30-
- the DevAgent adapter path still works through Hub baseline smoke
30+
- the DevAgent adapter path still passes runner validation and downstream smoke coverage
3131
- artifacts and events are written predictably
3232
- cancellation, timeout, and cleanup behavior remain test-covered
33+
- non-DevAgent adapters keep structured event parsing and read-only enforcement intact
3334
- docs do not overstate experimental adapter maturity
3435

3536
## Supported vs experimental
3637

37-
- Supported: `DevAgentAdapter` in the current Hub -> Runner -> DevAgent path
38+
- Supported: `DevAgentAdapter` in the current runner -> DevAgent path
3839
- Experimental: `CodexAdapter`, `ClaudeAdapter`, and `OpenCodeAdapter` until they have matching live validation

packages/adapters/src/index.test.ts

Lines changed: 132 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ import assert from "node:assert/strict";
22
import { join } from "node:path";
33
import { chmod, mkdtemp, mkdir, readFile, writeFile } from "node:fs/promises";
44
import { tmpdir } from "node:os";
5-
import { test } from "vitest";
5+
import { afterEach, test } from "vitest";
66
import {
77
ClaudeAdapter,
88
CodexAdapter,
@@ -22,7 +22,10 @@ async function createWorkspace(): Promise<{ root: string; artifactDir: string; w
2222
return { root, artifactDir, workspacePath };
2323
}
2424

25-
function createRequest(executorId: TaskExecutionRequest["executor"]["executorId"]): TaskExecutionRequest {
25+
function createRequest(
26+
executorId: TaskExecutionRequest["executor"]["executorId"],
27+
options: { model?: string; provider?: string; readOnly?: boolean } = {},
28+
): TaskExecutionRequest {
2629
return {
2730
protocolVersion: PROTOCOL_VERSION,
2831
taskId: `task-${executorId}`,
@@ -33,14 +36,23 @@ function createRequest(executorId: TaskExecutionRequest["executor"]["executorId"
3336
sourceRepoPath: "/tmp/repo",
3437
workBranch: `devagent/${executorId}/task`,
3538
isolation: "temp-copy",
39+
readOnly: options.readOnly,
40+
},
41+
executor: {
42+
executorId,
43+
model: options.model ?? "test-model",
44+
provider: options.provider,
3645
},
37-
executor: { executorId, model: "test-model" },
3846
constraints: {},
3947
context: { summary: "smoke" },
4048
expectedArtifacts: ["triage-report"],
4149
};
4250
}
4351

52+
afterEach(() => {
53+
delete process.env.DEVAGENT_RUNNER_CODEX_BIN;
54+
});
55+
4456
async function createStub(path: string, contents: string): Promise<void> {
4557
await writeFile(path, contents);
4658
await chmod(path, 0o755);
@@ -185,26 +197,31 @@ const fs = require("fs");
185197
const args = process.argv.slice(2);
186198
const outIndex = args.indexOf("-o");
187199
if (outIndex >= 0) fs.writeFileSync(args[outIndex + 1], "stub codex output\\n");
188-
process.stdout.write("{\\"type\\":\\"result\\",\\"message\\":\\"ok\\"}\\n");
200+
process.stdout.write("{\\"type\\":\\"thread.started\\"}\\n");
201+
process.stdout.write("{\\"type\\":\\"turn.started\\"}\\n");
202+
process.stdout.write("{\\"type\\":\\"item.completed\\",\\"item\\":{\\"type\\":\\"agent_message\\",\\"text\\":\\"stub codex output\\"}}\\n");
203+
process.stdout.write("{\\"type\\":\\"turn.completed\\"}\\n");
189204
`);
190205

206+
process.env.DEVAGENT_RUNNER_CODEX_BIN = `${process.execPath} ${stubPath}`;
191207
const { events, result } = await collectEvents(
192-
new CodexAdapter(`${process.execPath} ${stubPath}`),
208+
new CodexAdapter(),
193209
createRequest("codex"),
194210
workspacePath,
195211
artifactDir,
196212
);
197213

198214
assert.equal(result.status, "success");
199-
assert.equal(events.at(-1)?.type, "completed");
215+
assert.deepEqual(events.map((event) => event.type), ["started", "progress", "progress", "progress", "progress"]);
200216
assert.match(await readFile(join(artifactDir, "triage-report.md"), "utf8"), /stub codex output/);
201217
});
202218

203219
test("ClaudeAdapter smoke test with stub executable", async () => {
204220
const { root, artifactDir, workspacePath } = await createWorkspace();
205221
const stubPath = join(root, "claude-stub.js");
206222
await createStub(stubPath, `#!/usr/bin/env node
207-
process.stdout.write("claude stub output\\n");
223+
process.stdout.write(JSON.stringify({ type: "assistant", message: { content: [{ type: "text", text: "claude stub output" }] } }) + "\\n");
224+
process.stdout.write(JSON.stringify({ type: "result", subtype: "success", result: "claude stub output" }) + "\\n");
208225
`);
209226

210227
const { events, result } = await collectEvents(
@@ -215,23 +232,128 @@ process.stdout.write("claude stub output\\n");
215232
);
216233

217234
assert.equal(result.status, "success");
218-
assert.equal(events.at(-1)?.type, "completed");
235+
assert.deepEqual(events.map((event) => event.type), ["started", "progress", "progress"]);
236+
assert.match(await readFile(join(artifactDir, "triage-report.md"), "utf8"), /claude stub output/);
219237
});
220238

221239
test("OpenCodeAdapter smoke test with stub executable", async () => {
222240
const { root, artifactDir, workspacePath } = await createWorkspace();
223241
const stubPath = join(root, "opencode-stub.js");
224242
await createStub(stubPath, `#!/usr/bin/env node
225-
process.stdout.write("opencode stub output\\n");
243+
const args = process.argv.slice(2);
244+
const agentIndex = args.indexOf("--agent");
245+
if (agentIndex === -1 || args[agentIndex + 1] !== "build") {
246+
throw new Error("expected build agent");
247+
}
248+
const permissions = process.env.OPENCODE_PERMISSION || "";
249+
if (!permissions.includes('"*":"deny"') || !permissions.includes('"read":"allow"') || !permissions.includes('"edit":"deny"') || !permissions.includes('"write":"deny"')) {
250+
throw new Error("expected read-only permissions");
251+
}
252+
if (process.argv.includes("--model")) {
253+
throw new Error("unexpected --model flag");
254+
}
255+
process.stdout.write(JSON.stringify({ type: "step_start", part: { type: "step-start" } }) + "\\n");
256+
process.stdout.write(JSON.stringify({ type: "text", part: { type: "text", text: "opencode stub output" } }) + "\\n");
257+
process.stdout.write(JSON.stringify({ type: "step_finish", part: { type: "step-finish" } }) + "\\n");
226258
`);
227259

228260
const { events, result } = await collectEvents(
261+
new OpenCodeAdapter(`${process.execPath} ${stubPath}`),
262+
createRequest("opencode", { readOnly: true }),
263+
workspacePath,
264+
artifactDir,
265+
);
266+
267+
assert.equal(result.status, "success");
268+
assert.deepEqual(events.map((event) => event.type), ["started", "progress", "progress", "progress"]);
269+
assert.match(await readFile(join(artifactDir, "triage-report.md"), "utf8"), /opencode stub output/);
270+
});
271+
272+
test("OpenCodeAdapter passes provider-qualified model names through", async () => {
273+
const { root, artifactDir, workspacePath } = await createWorkspace();
274+
const stubPath = join(root, "opencode-model-stub.js");
275+
await createStub(stubPath, `#!/usr/bin/env node
276+
const args = process.argv.slice(2);
277+
const modelIndex = args.indexOf("--model");
278+
if (modelIndex === -1 || args[modelIndex + 1] !== "opencode/big-pickle") {
279+
throw new Error("expected provider-qualified --model");
280+
}
281+
process.stdout.write(JSON.stringify({ type: "text", part: { type: "text", text: "opencode model output" } }) + "\\n");
282+
`);
283+
284+
const { result } = await collectEvents(
285+
new OpenCodeAdapter(`${process.execPath} ${stubPath}`),
286+
createRequest("opencode", { provider: "opencode", model: "big-pickle" }),
287+
workspacePath,
288+
artifactDir,
289+
);
290+
291+
assert.equal(result.status, "success");
292+
assert.match(await readFile(join(artifactDir, "triage-report.md"), "utf8"), /opencode model output/);
293+
});
294+
295+
test("OpenCodeAdapter surfaces structured errors without mislabeling them as permissions", async () => {
296+
const { root, artifactDir, workspacePath } = await createWorkspace();
297+
const stubPath = join(root, "opencode-error-stub.js");
298+
await createStub(stubPath, `#!/usr/bin/env node
299+
process.stdout.write(JSON.stringify({
300+
type: "error",
301+
error: { data: { message: "Model not found: opencode/missing-model" } }
302+
}) + "\\n");
303+
process.exit(1);
304+
`);
305+
306+
const { result } = await collectEvents(
229307
new OpenCodeAdapter(`${process.execPath} ${stubPath}`),
230308
createRequest("opencode"),
231309
workspacePath,
232310
artifactDir,
233311
);
234312

313+
assert.equal(result.status, "failed");
314+
assert.equal(result.error?.message, "Model not found: opencode/missing-model");
315+
});
316+
317+
test("ClaudeAdapter fails when no final assistant text is produced", async () => {
318+
const { root, artifactDir, workspacePath } = await createWorkspace();
319+
const stubPath = join(root, "claude-empty-stub.js");
320+
await createStub(stubPath, `#!/usr/bin/env node
321+
process.stdout.write(JSON.stringify({ type: "result", subtype: "success", result: "" }) + "\\n");
322+
`);
323+
324+
const { result } = await collectEvents(
325+
new ClaudeAdapter(`${process.execPath} ${stubPath}`),
326+
createRequest("claude"),
327+
workspacePath,
328+
artifactDir,
329+
);
330+
331+
assert.equal(result.status, "failed");
332+
assert.equal(result.artifacts.length, 0);
333+
});
334+
335+
test("ClaudeAdapter captures plan-mode file output when no final assistant text is emitted", async () => {
336+
const { root, artifactDir, workspacePath } = await createWorkspace();
337+
const stubPath = join(root, "claude-plan-stub.js");
338+
await createStub(stubPath, `#!/usr/bin/env node
339+
process.stdout.write(JSON.stringify({
340+
type: "user",
341+
tool_use_result: {
342+
type: "create",
343+
filePath: "/Users/test/.claude/plans/example-plan.md",
344+
content: "# Example Plan\\n\\nExecutor claude handled task type plan."
345+
}
346+
}) + "\\n");
347+
process.stdout.write(JSON.stringify({ type: "result", subtype: "success", result: "" }) + "\\n");
348+
`);
349+
350+
const { result } = await collectEvents(
351+
new ClaudeAdapter(`${process.execPath} ${stubPath}`),
352+
createRequest("claude", { readOnly: true }),
353+
workspacePath,
354+
artifactDir,
355+
);
356+
235357
assert.equal(result.status, "success");
236-
assert.equal(events.at(-1)?.type, "completed");
358+
assert.match(await readFile(join(artifactDir, "triage-report.md"), "utf8"), /Example Plan/);
237359
});

0 commit comments

Comments
 (0)