You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(bdd): opencode/codex CLI adapters for the general agent (#2677)
* feat(bdd)!: replace callAI general agent with opencode/codex CLI adapters
The general agent for `# @agent`/`$skill` steps is now a real CLI coding
agent spawned per step: opencode by default, Codex via
`generalAgent.type: 'codex'`. Both adapters reuse the Midscene model
endpoint/key (MIDSCENE_MODEL_*, legacy OPENAI_* fallback) with zero
config — opencode through a generated OPENCODE_CONFIG_CONTENT provider,
codex through `-c model_providers.*` TOML overrides — and degrade to the
CLI's own auth (`opencode auth login` / `codex login`) when overridden
or unavailable. New config knobs: model, env, cwd, timeoutMs,
permissions (read-only/workspace/all sandbox mapping), reuseMidsceneModelEnv,
sessionPerScenario. CallAiGeneralAgent and generalAgent.modelEnv are
removed (modelEnv now fails validation with a migration hint); the pure
prompt/verdict helpers moved to agents/general-prompt.ts unchanged.
BREAKING CHANGE: generalAgent.modelEnv is rejected; CallAiGeneralAgent
is gone. Use generalAgent.env/model/type or generalAgent.factory.
* refactor(bdd): deslop cli general-agent adapters
Dedupe the verdict-extraction block shared by both adapters into
toGeneralResult, share DEFAULT_TIMEOUT_MS and the temp-file path
construction in cli-agent.ts, inline the single-caller
findAuthFailureHint, drop the unused outputTail limit param, and
replace the sessionId `as string` cast with proper narrowing.
No behavior change.
* refactor(bdd): harden cli adapters per simplify review
Fix the stdin EPIPE crash path and split-multibyte UTF-8 decoding in
runCli, rethrow non-ENOENT errors when reading codex's last-message
file, and flatten the opencode model mapping into exclusive cases (an
explicit provider/model no longer injects an unused midscene provider).
Share planCommon/throwOnNonZeroExit between the adapters, make opencode
permissions handling exhaustive, type the resolved model env as a
discriminated union, import the env key names from @midscene/shared/env,
prune already-sent skills from resumed-session prompts, and add a
lastIndexOf backstop so an unbalanced brace in prose cannot hide a
trailing verdict.
---------
Co-authored-by: ScriptedAlchemy <zack@module-federation.io>
Copy file name to clipboardExpand all lines: packages/bdd/README.md
+30-8Lines changed: 30 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ Every statement is routed to exactly one executor:
26
26
| Rule | Marker | Who executes the statement |
27
27
| --- | --- | --- |
28
28
|**Default**| none | Midscene UI agent — the vision model drives the page (`aiAct`) for Given/When and judges Then steps (`aiAssert`, fail-closed) |
29
-
|**Agent**|`# [agent]` comment directly above the line, or a `$skill-name` token in it | A general-purpose coding agent (Codex app-server via `codex login`, or any OpenAI-compatible endpoint) — for behavior you cannot see in the browser: server logs, files, databases. Then steps must return a JSON verdict; a missing verdict fails (fail-closed) |
29
+
|**Agent**|`# [agent]` comment directly above the line, or a `$skill-name` token in it | A real CLI coding agent ([opencode](https://opencode.ai) by default, [Codex](https://github.com/openai/codex) opt-in) spawned per step — for behavior you cannot see in the browser: server logs, files, databases. Then steps must return a JSON verdict; a missing verdict fails (fail-closed) |
30
30
|**No AI**|`# [no-ai]` comment above the line (or `@no-ai` scenario/feature tag) | Classic BDD: a callback registered with `Given`/`When`/`Then`/`defineStep` from `@midscene/bdd` must match. An unimplemented step fails with a ready-to-paste snippet |
31
31
32
32
All three in one scenario:
@@ -142,10 +142,11 @@ import { defineProfile } from '@midscene/bdd/profile';
142
142
exportdefaultdefineProfile().default;
143
143
```
144
144
145
-
Model setup, either:
145
+
Model setup:
146
146
147
-
-`codex login` once, then point the general agent at it with `MIDSCENE_MODEL_BASE_URL=codex://app-server`, or
148
-
- set the `MIDSCENE_MODEL_*` environment variables for any OpenAI-compatible endpoint (at minimum `MIDSCENE_MODEL_BASE_URL`, `MIDSCENE_MODEL_API_KEY`, `MIDSCENE_MODEL_NAME`).
147
+
- The UI agent needs the `MIDSCENE_MODEL_*` environment variables for an OpenAI-compatible vision endpoint (at minimum `MIDSCENE_MODEL_BASE_URL`, `MIDSCENE_MODEL_API_KEY`, `MIDSCENE_MODEL_NAME`).
148
+
- The general agent (`# [agent]`/`$skill` steps) is the [opencode](https://opencode.ai) CLI: `npm i -g opencode-ai`. With zero extra config it reuses your `MIDSCENE_MODEL_*` endpoint and key. Since that endpoint is tuned for vision, consider pointing the agent at a strong coding model instead via `generalAgent.model` (`'provider/model'`) plus `generalAgent.env`, or `opencode auth login`.
149
+
- Prefer Codex? Set `generalAgent: { type: 'codex' }` and install it with `npm i -g @openai/codex`. Same endpoint reuse applies; without a usable endpoint it falls back to your `codex login` account.
149
150
150
151
Run:
151
152
@@ -198,10 +199,31 @@ interface BddConfig {
198
199
uiAgentOptions?:UiAgentOptions;
199
200
200
201
generalAgent?: {
201
-
// MIDSCENE_MODEL_* overrides for the general agent, resolved in an
202
-
// isolated model config (never leaks into the UI agent). Defaults to
203
-
// process env; MIDSCENE_MODEL_BASE_URL=codex://app-server is supported.
204
-
modelEnv?:Record<string, string>;
202
+
// Which CLI coding agent runs `[agent]`/`$skill` steps.
203
+
type?:'opencode'|'codex'; // default: 'opencode'
204
+
// Model override. opencode: 'provider/model' or a bare name mapped onto
205
+
// the generated provider; codex: passed as -m. Default: the resolved
206
+
// MIDSCENE_MODEL_NAME (recommend a strong coding model over the
207
+
// vision default).
208
+
model?:string;
209
+
// Extra env for the spawned CLI, merged over process.env.
210
+
env?:Record<string, string>;
211
+
// Working directory the agent runs (and executes shell!) in.
212
+
cwd?:string; // default: the config file's directory
213
+
// Hard kill timeout per invocation.
214
+
timeoutMs?:number; // default: 600_000 (10 min)
215
+
// SECURITY: the agent runs shell commands in cwd, driven by prose in
216
+
// your feature files. 'read-only' denies edits/shell writes,
"description": "AI-native BDD test runner: standard Gherkin driven by cucumber-js, executed by Midscene. Midscene by default, coding agent on @agent/$skill, classic callbacks on @no-ai.",
10
+
"description": "AI-native BDD test runner: standard Gherkin driven by cucumber-js, executed by Midscene. Midscene by default, coding agent on [agent]/$skill, classic callbacks on @no-ai.",
0 commit comments