Skip to content

Commit 29f04a8

Browse files
ScriptedAlchemyScriptedAlchemy
andauthored
feat(bdd): opencode/codex CLI adapters for the general agent (#2677)
* feat(bdd)!: replace callAI general agent with opencode/codex CLI adapters The general agent for `# @agent`/`$skill` steps is now a real CLI coding agent spawned per step: opencode by default, Codex via `generalAgent.type: 'codex'`. Both adapters reuse the Midscene model endpoint/key (MIDSCENE_MODEL_*, legacy OPENAI_* fallback) with zero config — opencode through a generated OPENCODE_CONFIG_CONTENT provider, codex through `-c model_providers.*` TOML overrides — and degrade to the CLI's own auth (`opencode auth login` / `codex login`) when overridden or unavailable. New config knobs: model, env, cwd, timeoutMs, permissions (read-only/workspace/all sandbox mapping), reuseMidsceneModelEnv, sessionPerScenario. CallAiGeneralAgent and generalAgent.modelEnv are removed (modelEnv now fails validation with a migration hint); the pure prompt/verdict helpers moved to agents/general-prompt.ts unchanged. BREAKING CHANGE: generalAgent.modelEnv is rejected; CallAiGeneralAgent is gone. Use generalAgent.env/model/type or generalAgent.factory. * refactor(bdd): deslop cli general-agent adapters Dedupe the verdict-extraction block shared by both adapters into toGeneralResult, share DEFAULT_TIMEOUT_MS and the temp-file path construction in cli-agent.ts, inline the single-caller findAuthFailureHint, drop the unused outputTail limit param, and replace the sessionId `as string` cast with proper narrowing. No behavior change. * refactor(bdd): harden cli adapters per simplify review Fix the stdin EPIPE crash path and split-multibyte UTF-8 decoding in runCli, rethrow non-ENOENT errors when reading codex's last-message file, and flatten the opencode model mapping into exclusive cases (an explicit provider/model no longer injects an unused midscene provider). Share planCommon/throwOnNonZeroExit between the adapters, make opencode permissions handling exhaustive, type the resolved model env as a discriminated union, import the env key names from @midscene/shared/env, prune already-sent skills from resumed-session prompts, and add a lastIndexOf backstop so an unbalanced brace in prose cannot hide a trailing verdict. --------- Co-authored-by: ScriptedAlchemy <zack@module-federation.io>
1 parent f0e1f79 commit 29f04a8

19 files changed

Lines changed: 2024 additions & 250 deletions

packages/bdd/README.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Every statement is routed to exactly one executor:
2626
| Rule | Marker | Who executes the statement |
2727
| --- | --- | --- |
2828
| **Default** | none | Midscene UI agent — the vision model drives the page (`aiAct`) for Given/When and judges Then steps (`aiAssert`, fail-closed) |
29-
| **Agent** | `# [agent]` comment directly above the line, or a `$skill-name` token in it | A general-purpose coding agent (Codex app-server via `codex login`, or any OpenAI-compatible endpoint) — for behavior you cannot see in the browser: server logs, files, databases. Then steps must return a JSON verdict; a missing verdict fails (fail-closed) |
29+
| **Agent** | `# [agent]` comment directly above the line, or a `$skill-name` token in it | A real CLI coding agent ([opencode](https://opencode.ai) by default, [Codex](https://github.com/openai/codex) opt-in) spawned per step — for behavior you cannot see in the browser: server logs, files, databases. Then steps must return a JSON verdict; a missing verdict fails (fail-closed) |
3030
| **No AI** | `# [no-ai]` comment above the line (or `@no-ai` scenario/feature tag) | Classic BDD: a callback registered with `Given`/`When`/`Then`/`defineStep` from `@midscene/bdd` must match. An unimplemented step fails with a ready-to-paste snippet |
3131

3232
All three in one scenario:
@@ -142,10 +142,11 @@ import { defineProfile } from '@midscene/bdd/profile';
142142
export default defineProfile().default;
143143
```
144144

145-
Model setup, either:
145+
Model setup:
146146

147-
- `codex login` once, then point the general agent at it with `MIDSCENE_MODEL_BASE_URL=codex://app-server`, or
148-
- set the `MIDSCENE_MODEL_*` environment variables for any OpenAI-compatible endpoint (at minimum `MIDSCENE_MODEL_BASE_URL`, `MIDSCENE_MODEL_API_KEY`, `MIDSCENE_MODEL_NAME`).
147+
- The UI agent needs the `MIDSCENE_MODEL_*` environment variables for an OpenAI-compatible vision endpoint (at minimum `MIDSCENE_MODEL_BASE_URL`, `MIDSCENE_MODEL_API_KEY`, `MIDSCENE_MODEL_NAME`).
148+
- The general agent (`# [agent]`/`$skill` steps) is the [opencode](https://opencode.ai) CLI: `npm i -g opencode-ai`. With zero extra config it reuses your `MIDSCENE_MODEL_*` endpoint and key. Since that endpoint is tuned for vision, consider pointing the agent at a strong coding model instead via `generalAgent.model` (`'provider/model'`) plus `generalAgent.env`, or `opencode auth login`.
149+
- Prefer Codex? Set `generalAgent: { type: 'codex' }` and install it with `npm i -g @openai/codex`. Same endpoint reuse applies; without a usable endpoint it falls back to your `codex login` account.
149150

150151
Run:
151152

@@ -198,10 +199,31 @@ interface BddConfig {
198199
uiAgentOptions?: UiAgentOptions;
199200

200201
generalAgent?: {
201-
// MIDSCENE_MODEL_* overrides for the general agent, resolved in an
202-
// isolated model config (never leaks into the UI agent). Defaults to
203-
// process env; MIDSCENE_MODEL_BASE_URL=codex://app-server is supported.
204-
modelEnv?: Record<string, string>;
202+
// Which CLI coding agent runs `[agent]`/`$skill` steps.
203+
type?: 'opencode' | 'codex'; // default: 'opencode'
204+
// Model override. opencode: 'provider/model' or a bare name mapped onto
205+
// the generated provider; codex: passed as -m. Default: the resolved
206+
// MIDSCENE_MODEL_NAME (recommend a strong coding model over the
207+
// vision default).
208+
model?: string;
209+
// Extra env for the spawned CLI, merged over process.env.
210+
env?: Record<string, string>;
211+
// Working directory the agent runs (and executes shell!) in.
212+
cwd?: string; // default: the config file's directory
213+
// Hard kill timeout per invocation.
214+
timeoutMs?: number; // default: 600_000 (10 min)
215+
// SECURITY: the agent runs shell commands in cwd, driven by prose in
216+
// your feature files. 'read-only' denies edits/shell writes,
217+
// 'workspace' (default) allows workspace writes, 'all' disables
218+
// sandboxing/permission prompts entirely — only use 'all' in an
219+
// externally sandboxed environment.
220+
permissions?: 'read-only' | 'workspace' | 'all';
221+
// Reuse the Midscene endpoint/key (MIDSCENE_MODEL_*, legacy OPENAI_*
222+
// fallback) for the CLI agent.
223+
reuseMidsceneModelEnv?: boolean; // default: true
224+
// Continue one CLI session across the steps of a scenario, so later
225+
// `[agent]` steps see what earlier ones found.
226+
sessionPerScenario?: boolean; // default: false
205227
// Escape hatch mirroring the uiAgent factory (e.g. for tests).
206228
factory?: () => Promise<GeneralAgent>;
207229
};

packages/bdd/example/README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,14 @@ needed) driven by Gherkin features through Midscene.
66
## Prerequisites
77

88
1. Build the repo once from the root: `pnpm install && pnpm run build`.
9-
2. A model for the agents — either `codex login` (the general agent then uses
10-
`MIDSCENE_MODEL_BASE_URL=codex://app-server`) or set the `MIDSCENE_MODEL_*`
11-
environment variables (at minimum `MIDSCENE_MODEL_BASE_URL`,
12-
`MIDSCENE_MODEL_API_KEY`, `MIDSCENE_MODEL_NAME`).
9+
2. A model for the UI agent: set the `MIDSCENE_MODEL_*` environment variables
10+
(at minimum `MIDSCENE_MODEL_BASE_URL`, `MIDSCENE_MODEL_API_KEY`,
11+
`MIDSCENE_MODEL_NAME`).
12+
3. The general agent (`# [agent]`/`$skill` steps): install the
13+
[opencode](https://opencode.ai) CLI with `npm i -g opencode-ai` — with
14+
zero extra config it reuses the `MIDSCENE_MODEL_*` endpoint above. Or set
15+
`generalAgent: { type: 'codex' }` in `midscene.config.ts` and use
16+
`npm i -g @openai/codex` + `codex login`.
1317

1418
## Run
1519

packages/bdd/example/midscene.config.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,7 @@ export default defineBddConfig({
77
type: 'web',
88
url: pathToFileURL(join(__dirname, 'demo-app/index.html')).href,
99
},
10+
// General agent for `# [agent]` / `$skill` steps — zero config reuses the
11+
// MIDSCENE_MODEL_* endpoint/key. Uncomment to customize:
12+
// generalAgent: { type: 'opencode' },
1013
});

packages/bdd/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"url": "https://github.com/web-infra-dev/midscene.git",
88
"directory": "packages/bdd"
99
},
10-
"description": "AI-native BDD test runner: standard Gherkin driven by cucumber-js, executed by Midscene. Midscene by default, coding agent on @agent/$skill, classic callbacks on @no-ai.",
10+
"description": "AI-native BDD test runner: standard Gherkin driven by cucumber-js, executed by Midscene. Midscene by default, coding agent on [agent]/$skill, classic callbacks on @no-ai.",
1111
"author": "midscene team",
1212
"license": "MIT",
1313
"main": "./dist/lib/index.js",

0 commit comments

Comments
 (0)