From bde88d93f2893d9ec9d1daee3f8b2b925f86c608 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 11:04:53 +0800
Subject: [PATCH 01/33] docs(site): add UI testing framework topic

---
 apps/site/docs/en/ui-testing-engineering.mdx  | 149 ++++++++++++++++++
 apps/site/docs/en/ui-testing-framework.mdx    | 120 ++++++++++++++
 .../docs/en/ui-testing-yaml-quick-start.mdx   | 149 ++++++++++++++++++
 apps/site/docs/zh/ui-testing-engineering.mdx  | 149 ++++++++++++++++++
 apps/site/docs/zh/ui-testing-framework.mdx    | 120 ++++++++++++++
 .../docs/zh/ui-testing-yaml-quick-start.mdx   | 149 ++++++++++++++++++
 apps/site/rspress.config.ts                   |  30 ++++
 7 files changed, 866 insertions(+)
 create mode 100644 apps/site/docs/en/ui-testing-engineering.mdx
 create mode 100644 apps/site/docs/en/ui-testing-framework.mdx
 create mode 100644 apps/site/docs/en/ui-testing-yaml-quick-start.mdx
 create mode 100644 apps/site/docs/zh/ui-testing-engineering.mdx
 create mode 100644 apps/site/docs/zh/ui-testing-framework.mdx
 create mode 100644 apps/site/docs/zh/ui-testing-yaml-quick-start.mdx

diff --git a/apps/site/docs/en/ui-testing-engineering.mdx b/apps/site/docs/en/ui-testing-engineering.mdx
new file mode 100644
index 0000000000..11545f802f
--- /dev/null
+++ b/apps/site/docs/en/ui-testing-engineering.mdx
@@ -0,0 +1,149 @@
+# Engineering UI Tests with Midscene
+
+YAML gets a UI test running quickly. Engineering practice keeps it reliable after the test suite grows. The main rule is simple: keep user-facing UI intent in YAML, and keep deterministic environment and business logic in code.
+
+## Recommended project shape
+
+```text
+.
+  .env
+  midscene.config.yaml
+  setup.ts
+  e2e/
+    login.yaml
+    checkout.yaml
+    mobile-smoke.yaml
+  fixtures/
+    account.ts
+    device.ts
+  reports/
+```
+
+Use this shape as a guide, not a strict requirement. Small projects can start with one YAML file. Larger projects usually need separate setup, fixtures, and CI configuration.
+
+## Keep the boundary clear
+
+| Concern | Put it in | Why |
+| --- | --- | --- |
+| User path, visual state, popup handling, navigation | YAML | Natural language is concise and resilient to UI changes |
+| Login, cookies, SSO, accounts, environment preparation | Setup scripts or fixtures | The logic is project-specific and often needs secrets or internal tools |
+| API responses, database records, analytics events, amount calculation | JavaScript/TypeScript assertions | These checks need deterministic data and exact failure messages |
+| Batch execution, concurrency, summaries, report artifacts | CLI and CI configuration | They are execution concerns, not test intent |
+
+## Environment configuration
+
+Midscene CLI loads `.env` from the command working directory. A typical file contains model settings:
+
+```ini filename=.env
+MIDSCENE_MODEL_BASE_URL="https://your-model-service.example.com/v1"
+MIDSCENE_MODEL_API_KEY="your API Key"
+MIDSCENE_MODEL_NAME="your model name"
+MIDSCENE_MODEL_FAMILY="your model family"
+```
+
+In CI, store sensitive values in the CI secret manager and expose them as environment variables. Use `--dotenv-debug` when you need to inspect how local variables are loaded, and `--dotenv-override` only when the `.env` file should replace existing process variables.
+
+## Login and setup
+
+Most real tests should not spend every run manually completing login. Prepare state before the YAML flow starts:
+
+- create or select a test account;
+- complete SSO and inject cookies;
+- install or launch the app under test;
+- select an Android or iOS device;
+- configure a lane, feature flag, or internal environment;
+- seed backend data needed by the test.
+
+After setup, keep the YAML focused on the business behavior:
+
+```yaml
+web:
+  url: https://internal.example.com/dashboard
+
+tasks:
+  - name: Check dashboard
+    flow:
+      - aiAssert: The dashboard is loaded and user information is visible
+```
+
+## Batch runs and CI
+
+Use the CLI for suite-level execution:
+
+```bash
+midscene --files './e2e/**/*.yaml' --concurrent 4 --continue-on-error --summary index.json
+```
+
+Recommended CI artifacts:
+
+- the summary JSON file;
+- each YAML run result;
+- visual report HTML files;
+- logs from the application or device when available.
+
+Keep reports even for successful scheduled runs. They make UI drift and flaky behavior easier to investigate later.
+
+## Browser sessions
+
+For Web tests, choose the connection mode based on the state you need:
+
+| Mode | Best for |
+| --- | --- |
+| Default browser launch | Clean and repeatable tests |
+| Headed mode | Local debugging |
+| CDP connection | Remote browsers or managed browser services |
+| Chrome bridge mode | Reusing an existing desktop Chrome session, cookies, extensions, or internal login |
+
+See [YAML script runner](./yaml-script-runner) and [Bridge to the desktop Chrome](./bridge-mode) for the exact configuration.
+
+## Mobile devices
+
+Mobile tests usually need more setup than Web tests. Keep device management outside the YAML journey when possible:
+
+- reserve a device from a device pool;
+- install the target build;
+- clear or seed app state;
+- configure network, region, or account data;
+- collect device logs after failure.
+
+Use platform-specific YAML helpers for small local actions, such as `runAdbShell`, `runWdaRequest`, `launch`, and `terminate`. Use setup scripts for broader device orchestration.
+
+## Deterministic checks
+
+AI assertions are best for UI state: visible text, layout meaning, workflow completion, and visual conditions. Use code when correctness depends on exact values:
+
+```ts
+expect(createOrderResponse.status).toBe(200);
+expect(order.total).toBe(expectedTotal);
+expect(analyticsEvents).toContainEqual({
+  name: 'checkout_submit',
+  source: 'recommendation',
+});
+```
+
+This keeps failures actionable. A visual assertion explains what the user saw. A code assertion explains which business invariant failed.
+
+## Reports and debugging
+
+When a test fails, inspect the report before editing the prompt. Check:
+
+- the screenshot before the failed step;
+- the AI action and assertion text;
+- whether the page or app was still loading;
+- whether login or setup state was missing;
+- whether the test should use code for an exact business rule.
+
+Prompt changes should clarify intent. They should not encode brittle layout details unless the layout itself is what you are testing.
+
+## Agent orchestration
+
+Some internal workflows require more than a fixed runner: device pools, SSO, logs, network tools, backend queries, and report analysis. In those cases, treat YAML as the stable test asset and let a coding agent or internal runner orchestrate the surrounding tools.
+
+Midscene remains responsible for UI understanding, actions, screenshots, and reports. The orchestrator prepares the environment, calls the right tests, gathers external evidence, and summarizes failures.
+
+## Reference pages
+
+- [Write UI tests with YAML](./ui-testing-yaml-quick-start)
+- [Workflow in YAML format](./automate-with-scripts-in-yaml)
+- [YAML script runner](./yaml-script-runner)
+- [Integrate Midscene with any interface](./integrate-with-any-interface)
diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
new file mode 100644
index 0000000000..3651409bb0
--- /dev/null
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -0,0 +1,120 @@
+# AI-native UI Testing Framework
+
+Midscene helps teams build UI tests around user intent instead of fragile selectors. Natural-language steps describe what a user wants to do, while scripts, setup code, CLI execution, and reports keep the workflow maintainable enough for real projects.
+
+This guide introduces the framework-level view. If you want to run your first test immediately, start with [Write UI tests with YAML](./ui-testing-yaml-quick-start).
+
+## From Smoke Tests to Test Projects
+
+Most teams do not start with a complete test project. They start with smoke tests: the fastest possible way to verify that a key business path still works. At that stage, getting a useful case running matters more than designing every abstraction. Once those cases run repeatedly and become part of daily regression, login state, accounts, cookies, devices, environment variables, and report configuration naturally show up.
+
+That is why Midscene's UI Testing Framework is designed around three steps:
+
+- First, make smoke tests lightweight: business or QA users can write YAML cases, run the core path, and inspect a replayable report.
+- Next, provide the right amount of configuration: a test project must handle model settings, runtime options, login state, test data, devices, and report output without squeezing those details into natural-language steps.
+- Finally, support a flexible test project: when the team needs CI, internal tools, fixtures, API checks, and database checks, it can export and own a standard test project.
+
+The point is not to expose every capability on day one. The point is to give users the right shape at each stage.
+
+| Stage | Best for | Recommended shape |
+| --- | --- | --- |
+| Smoke test | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the report |
+| Simple customization | Tests need login state, accounts, cookies, devices, or a small amount of environment preparation | YAML + `setup.js`. Keep the business path in YAML and put pre-run preparation in setup |
+| Fully custom | The team needs to integrate an existing test project, CI, internal tools, data checks, or a custom runner | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
+
+### 1. Smoke test: YAML + case
+
+This stage is about one thing: make the core path easy to express, run, and review. The test author writes YAML for the entry point, actions, and expected result. Midscene executes the flow, captures screenshots, and generates the report.
+
+```yaml
+web:
+  url: https://shop.example.com
+
+tasks:
+  - name: Guest checkout smoke test
+    flow:
+      - aiAct: Search for "running shoes"
+      - aiAct: Open the first product
+      - aiAssert: The cart page shows one product and the checkout button
+```
+
+### 2. Simple customization: YAML + setup.js
+
+Once tests become part of daily work, they usually need login state, test accounts, cookies, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps. They belong in `setup.js`.
+
+Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, device, account, or backend data for the test. A simple customized project can look like this:
+
+```text
+.
+  setup.js
+  midscene.config.yaml
+  e2e/
+    dashboard.yaml
+    checkout.yaml
+```
+
+`midscene.config.yaml` manages model settings, runtime options, report output, and selected cases. `setup.js` manages project-specific preparation. `e2e/*.yaml` only describes the business path.
+
+YAML still describes the business path:
+
+```yaml
+web:
+  url: https://internal.example.com/dashboard
+
+tasks:
+  - name: Check dashboard
+    flow:
+      - aiAssert: The dashboard is loaded and user information is visible
+```
+
+`setup.js` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, or connecting a device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
+
+```js
+export default async function setup({ browser, context, device }) {
+  const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
+  await context.addCookies(cookies);
+
+  await prepareTestData({
+    user: process.env.TEST_ACCOUNT,
+    scenario: 'dashboard-smoke',
+  });
+}
+```
+
+The code above shows the boundary: login, accounts, data, and devices belong in setup. What the user should accomplish after the page opens stays in YAML.
+
+### 3. Fully custom: emit an independent Rstest project
+
+When a team already has a test platform, CI rules, internal fixtures, data checks, and report systems, Midscene should not force everything through a fixed YAML runner. The core capability for fully custom projects is `emit`: export a lightweight Midscene project into an independent Rstest project.
+
+```bash
+midscene emit ./project-folder
+```
+
+The emitted project can look like this:
+
+```text
+project-folder/
+  package.json
+  rstest.config.ts
+  setup.ts
+  e2e/
+    dashboard.test.ts
+    checkout.test.ts
+  fixtures/
+    account.ts
+    device.ts
+  reports/
+    midscene-report/
+```
+
+At this stage, YAML is no longer the boundary of what the framework can do. It is the migration entry point. Teams can keep the business expression from YAML cases, or move complex logic into Rstest test files, fixtures, and internal tools. Midscene handles UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
+
+The point of this stage is not to write everything in YAML. The point is to let Midscene generate a standard test project that the team can fully own.
+
+## Next steps
+
+- Run your first test: [Write UI tests with YAML](./ui-testing-yaml-quick-start)
+- Add setup, CI, reports, and deterministic checks: [Engineering UI tests with Midscene](./ui-testing-engineering)
+- Look up every YAML field: [Workflow in YAML format](./automate-with-scripts-in-yaml)
+- Look up every CLI flag: [YAML script runner](./yaml-script-runner)
diff --git a/apps/site/docs/en/ui-testing-yaml-quick-start.mdx b/apps/site/docs/en/ui-testing-yaml-quick-start.mdx
new file mode 100644
index 0000000000..3c4528d177
--- /dev/null
+++ b/apps/site/docs/en/ui-testing-yaml-quick-start.mdx
@@ -0,0 +1,149 @@
+import SetupEnv from './common/setup-env.mdx';
+
+# Write UI Tests with YAML
+
+YAML is the fastest way to turn a user journey into a runnable Midscene test. You describe the target and the flow. Midscene handles UI understanding, execution, screenshots, and the report.
+
+This page focuses on the shortest path. For the complete YAML schema, see [Workflow in YAML format](./automate-with-scripts-in-yaml).
+
+<SetupEnv />
+
+## Create your first Web test
+
+Create `bing-search.yaml`:
+
+```yaml
+web:
+  url: https://www.bing.com
+
+tasks:
+  - name: Search weather
+    flow:
+      - aiAct: Search for "today's weather"
+      - aiAssert: The result page shows weather information
+```
+
+Run it:
+
+```bash
+midscene ./bing-search.yaml
+```
+
+Midscene prints execution progress and generates a visual report after the run. The report is the main debugging surface: it records screenshots, task status, AI actions, and failed assertions.
+
+## YAML structure
+
+A YAML test has three main parts:
+
+```yaml
+web:
+  url: https://www.bing.com
+
+agent:
+  aiActContext: If a cookie banner appears, accept it.
+
+tasks:
+  - name: Search weather
+    flow:
+      - aiAct: Search for "today's weather"
+      - aiWaitFor: The result page is loaded
+      - aiAssert: The result page shows weather information
+```
+
+- `web`, `android`, `ios`, or `computer` selects the target.
+- `agent` configures Midscene Agent behavior, reports, and cache options.
+- `tasks` contains named flows.
+- `flow` contains the actual UI steps.
+
+## Common steps
+
+| Step | Use it for |
+| --- | --- |
+| `aiAct` | Ask Midscene to perform a UI action from natural language |
+| `aiAssert` | Check that the UI satisfies a natural-language condition |
+| `aiWaitFor` | Wait until a UI condition becomes true |
+| `aiQuery` | Extract structured information from the UI |
+| `sleep` | Wait for a fixed number of milliseconds |
+| `javascript` | Run JavaScript in a Web page context |
+
+Write steps as user intent, not implementation details. Prefer "open the first product" over "click the third `.card` element".
+
+## Android example
+
+```yaml
+android:
+  deviceId: ${ANDROID_DEVICE_ID}
+  launch: com.example.app
+
+tasks:
+  - name: Verify home page
+    flow:
+      - aiAct: Close any permission popup if it appears
+      - aiAssert: The home page main content is visible
+```
+
+Android flows can also use Android-specific actions:
+
+```yaml
+tasks:
+  - name: Reset app
+    flow:
+      - runAdbShell: pm clear com.example.app
+      - launch: com.example.app
+      - aiAssert: The onboarding page is visible
+```
+
+## iOS example
+
+```yaml
+ios:
+  wdaPort: 8100
+  launch: com.apple.mobilesafari
+
+tasks:
+  - name: Open a website
+    flow:
+      - aiAct: Focus the address bar and open https://www.bing.com
+      - aiAssert: The Bing search page is visible
+```
+
+## Computer example
+
+```yaml
+computer: {}
+
+tasks:
+  - name: Search from desktop browser
+    flow:
+      - aiAct: Press Cmd+Space
+      - aiAct: Type "Safari" and press Enter
+      - aiAct: Focus the address bar and open https://www.bing.com
+      - aiAct: Search for "today's weather"
+      - aiAssert: The result page shows weather information
+```
+
+Adjust keyboard shortcuts for Windows or Linux when driving those systems.
+
+## Run multiple tests
+
+```bash
+midscene --files ./login.yaml './smoke/**/*.yaml'
+```
+
+Run independent files concurrently and continue after failures:
+
+```bash
+midscene --files './smoke/**/*.yaml' --concurrent 4 --continue-on-error
+```
+
+For full CLI behavior, including `.env`, headed mode, CDP, Chrome bridge mode, summary files, and config files, see [YAML script runner](./yaml-script-runner).
+
+## When to move beyond YAML
+
+YAML should describe the UI journey. Move other logic out when it becomes more deterministic than visual:
+
+- Login, SSO, cookies, accounts, and devices belong in setup scripts or fixtures.
+- API, database, analytics, and amount checks belong in code assertions.
+- CI, report retention, and batch execution belong in CLI or CI configuration.
+
+See [Engineering UI tests with Midscene](./ui-testing-engineering) for the project-level workflow.
diff --git a/apps/site/docs/zh/ui-testing-engineering.mdx b/apps/site/docs/zh/ui-testing-engineering.mdx
new file mode 100644
index 0000000000..4d1aeeffa7
--- /dev/null
+++ b/apps/site/docs/zh/ui-testing-engineering.mdx
@@ -0,0 +1,149 @@
+# 用 Midscene 工程化 UI 测试
+
+YAML 能让 UI 测试快速跑起来。工程化实践负责在测试套件增长后继续保持可靠。核心规则很简单：用户可见的 UI 意图留在 YAML；确定性的环境准备和业务逻辑放到代码里。
+
+## 推荐项目形态
+
+```text
+.
+  .env
+  midscene.config.yaml
+  setup.ts
+  e2e/
+    login.yaml
+    checkout.yaml
+    mobile-smoke.yaml
+  fixtures/
+    account.ts
+    device.ts
+  reports/
+```
+
+这只是推荐形态，不是强制结构。小项目可以从一个 YAML 文件开始；更大的项目通常需要独立的 setup、fixture 和 CI 配置。
+
+## 保持边界清晰
+
+| 关注点 | 放在哪里 | 原因 |
+| --- | --- | --- |
+| 用户路径、视觉状态、弹窗处理、导航 | YAML | 自然语言简洁，并且更能适应 UI 变化 |
+| 登录、Cookie、SSO、账号、环境准备 | setup 脚本或 fixture | 这些逻辑和项目强相关，通常需要密钥或内部工具 |
+| 接口响应、数据库记录、埋点、金额计算 | JavaScript/TypeScript 断言 | 这些检查需要确定数据和精确失败信息 |
+| 批量执行、并发、summary、报告产物 | CLI 和 CI 配置 | 这是执行层关注点，不是测试意图 |
+
+## 环境配置
+
+Midscene CLI 会从命令执行目录加载 `.env`。典型文件包含模型配置：
+
+```ini filename=.env
+MIDSCENE_MODEL_BASE_URL="https://your-model-service.example.com/v1"
+MIDSCENE_MODEL_API_KEY="your API Key"
+MIDSCENE_MODEL_NAME="your model name"
+MIDSCENE_MODEL_FAMILY="your model family"
+```
+
+在 CI 中，应把敏感值放进 CI secret manager，再暴露成环境变量。需要排查本地变量加载逻辑时使用 `--dotenv-debug`；只有当 `.env` 应该覆盖已有进程环境变量时才使用 `--dotenv-override`。
+
+## 登录和 setup
+
+大多数真实测试不应该每次都手动完成登录。应在 YAML flow 开始前准备状态：
+
+- 创建或选择测试账号；
+- 完成 SSO 并注入 Cookie；
+- 安装或启动被测 App；
+- 选择 Android 或 iOS 设备；
+- 配置泳道、功能开关或内部环境；
+- 准备测试所需的后端数据。
+
+setup 完成后，让 YAML 专注于业务行为：
+
+```yaml
+web:
+  url: https://internal.example.com/dashboard
+
+tasks:
+  - name: 检查首页
+    flow:
+      - aiAssert: Dashboard 已加载，用户信息可见
+```
+
+## 批量运行和 CI
+
+使用 CLI 执行测试套件：
+
+```bash
+midscene --files './e2e/**/*.yaml' --concurrent 4 --continue-on-error --summary index.json
+```
+
+推荐保留的 CI 产物：
+
+- summary JSON 文件；
+- 每个 YAML 的运行结果；
+- 可视化报告 HTML；
+- 可用时保留应用或设备日志。
+
+即使定时任务成功，也建议保留报告。后续排查 UI 漂移和偶发失败时会更容易。
+
+## 浏览器会话
+
+Web 测试应根据所需状态选择连接模式：
+
+| 模式 | 适合场景 |
+| --- | --- |
+| 默认启动浏览器 | 干净、可重复的测试 |
+| Headed 模式 | 本地调试 |
+| CDP 连接 | 远程浏览器或托管浏览器服务 |
+| Chrome bridge 模式 | 复用桌面 Chrome 会话、Cookie、插件或内部登录态 |
+
+具体配置见 [YAML 脚本运行器](./yaml-script-runner) 和 [桥接到桌面 Chrome](./bridge-mode)。
+
+## 移动端设备
+
+移动端测试通常比 Web 测试需要更多 setup。尽量把设备管理放在 YAML 旅程之外：
+
+- 从设备池预定设备；
+- 安装目标构建产物；
+- 清理或注入 App 状态；
+- 配置网络、地区或账号数据；
+- 失败后收集设备日志。
+
+小的本地动作可以使用平台专属 YAML helper，例如 `runAdbShell`、`runWdaRequest`、`launch` 和 `terminate`。更大的设备编排应放到 setup 脚本里。
+
+## 确定性校验
+
+AI 断言适合 UI 状态：可见文本、布局含义、流程完成和视觉条件。正确性依赖精确值时，使用代码：
+
+```ts
+expect(createOrderResponse.status).toBe(200);
+expect(order.total).toBe(expectedTotal);
+expect(analyticsEvents).toContainEqual({
+  name: 'checkout_submit',
+  source: 'recommendation',
+});
+```
+
+这样失败信息会更可行动。视觉断言解释用户看到什么；代码断言解释哪个业务不变量失败。
+
+## 报告和调试
+
+测试失败时，先看报告，再改 prompt。重点检查：
+
+- 失败步骤前的截图；
+- AI 动作和断言文本；
+- 页面或 App 是否仍在加载；
+- 登录或 setup 状态是否缺失；
+- 当前检查是否应该改用代码表达精确业务规则。
+
+Prompt 修改应该澄清意图，不应该编码脆弱的布局细节，除非布局本身就是测试目标。
+
+## Agent 编排
+
+有些内部流程超过固定 runner 的边界：设备池、SSO、日志、网络工具、后端查询和报告分析。此时可以把 YAML 作为稳定测试资产，让 coding agent 或内部 runner 编排周边工具。
+
+Midscene 继续负责 UI 理解、动作执行、截图和报告。编排层负责准备环境、调用测试、收集外部证据并总结失败。
+
+## 参考文档
+
+- [使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)
+- [YAML 格式的工作流](./automate-with-scripts-in-yaml)
+- [YAML 脚本运行器](./yaml-script-runner)
+- [将 Midscene 集成到任意界面](./integrate-with-any-interface)
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
new file mode 100644
index 0000000000..041a0e67cb
--- /dev/null
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -0,0 +1,120 @@
+# AI 原生 UI Testing Framework
+
+Midscene 帮助团队围绕用户意图构建 UI 测试，而不是围绕脆弱的选择器构建测试。自然语言步骤描述用户想完成什么，脚本、setup、CLI 和报告则把这套流程支撑成可维护的测试工程。
+
+这篇文档介绍框架视角。如果你想直接跑第一个测试，可以从[使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)开始。
+
+## 从 Smoke Test 到测试工程
+
+大多数业务不会一开始就建设完整测试工程。更常见的起点是 Smoke Test：先用最小成本验证关键路径是否可用，快速跑通比完整抽象更重要。只有当这些 case 开始反复运行、进入日常回归，登录态、账号、Cookie、设备、环境变量和报告配置才会自然浮现出来。
+
+所以 Midscene 的 UI Testing Framework 应该沿着三步展开：
+
+- 先让 Smoke Test 足够轻：业务同学或测试同学可以直接写 YAML case，把核心路径跑起来，并拿到可回放报告。
+- 再提供适当的配置能力：测试项目需要能处理模型配置、运行参数、登录态、测试数据、设备和报告输出，而不是把这些工程细节塞进自然语言步骤。
+- 最后支持灵活定制的测试项目：当团队要接 CI、内部工具、fixture、接口校验和数据库校验时，可以导出并接管一个标准测试工程。
+
+这条路径的关键不是把所有能力一次性暴露出来，而是让用户在每个阶段都有刚好够用的形态。
+
+| 阶段 | 适合场景 | 推荐形态 |
+| --- | --- | --- |
+| Smoke Test | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写出页面入口、任务步骤和断言，直接通过 CLI 运行并查看报告 |
+| 简单定制 | 测试需要登录态、账号、Cookie、设备或少量环境准备 | YAML + `setup.js`。业务路径继续留在 YAML，运行前准备放到 setup |
+| 完全自定义 | 团队要接入现有测试工程、CI、内部工具、数据校验或自定义 runner | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
+
+### 1. Smoke Test：YAML + case
+
+这个阶段只关心一件事：核心路径能不能被快速表达、运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责执行、截图和生成报告。
+
+```yaml
+web:
+  url: https://shop.example.com
+
+tasks:
+  - name: Guest checkout smoke test
+    flow:
+      - aiAct: Search for "running shoes"
+      - aiAct: Open the first product
+      - aiAssert: The cart page shows one product and the checkout button
+```
+
+### 2. 简单定制：YAML + setup.js
+
+当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，而应该放在 `setup.js` 中完成。
+
+这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、设备、账号或后端数据准备到正确状态。一个简单定制项目可以长这样：
+
+```text
+.
+  setup.js
+  midscene.config.yaml
+  e2e/
+    dashboard.yaml
+    checkout.yaml
+```
+
+`midscene.config.yaml` 管理模型、运行参数、报告输出和需要执行的 case；`setup.js` 管理项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径。
+
+YAML 仍然描述业务路径：
+
+```yaml
+web:
+  url: https://internal.example.com/dashboard
+
+tasks:
+  - name: Check dashboard
+    flow:
+      - aiAssert: The dashboard is loaded and user information is visible
+```
+
+`setup.js` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据或连接设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
+
+```js
+export default async function setup({ browser, context, device }) {
+  const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
+  await context.addCookies(cookies);
+
+  await prepareTestData({
+    user: process.env.TEST_ACCOUNT,
+    scenario: 'dashboard-smoke',
+  });
+}
+```
+
+上面的代码展示的是职责边界：登录、账号、数据、设备这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。
+
+### 3. 完全自定义：emit 为独立 Rstest 工程
+
+当团队已经有测试平台、CI 规范、内部 fixture、数据校验和报告系统时，Midscene 不应该强迫你停留在固定 YAML runner 里。完全自定义场景的核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程。
+
+```bash
+midscene emit ./project-folder
+```
+
+导出后的项目形态类似：
+
+```text
+project-folder/
+  package.json
+  rstest.config.ts
+  setup.ts
+  e2e/
+    dashboard.test.ts
+    checkout.test.ts
+  fixtures/
+    account.ts
+    device.ts
+  reports/
+    midscene-report/
+```
+
+在这个阶段，YAML 不再是能力边界，而是迁移入口。团队可以继续保留 YAML case 的业务表达，也可以把复杂逻辑拆进 Rstest 测试文件、fixture 和内部工具里。UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
+
+这个阶段的重点不是“把所有东西都写成 YAML”，而是让 Midscene 生成一个能被团队完全接管的标准测试工程。
+
+## 下一步
+
+- 跑第一个测试：[使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)
+- 加入 setup、CI、报告和确定性校验：[用 Midscene 工程化 UI 测试](./ui-testing-engineering)
+- 查询完整 YAML 字段：[YAML 格式的工作流](./automate-with-scripts-in-yaml)
+- 查询完整 CLI 参数：[YAML 脚本运行器](./yaml-script-runner)
diff --git a/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx b/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx
new file mode 100644
index 0000000000..e90121f183
--- /dev/null
+++ b/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx
@@ -0,0 +1,149 @@
+import SetupEnv from './common/setup-env.mdx';
+
+# 使用 YAML 编写 UI 测试
+
+YAML 是把用户路径变成可运行 Midscene 测试的最快方式。你描述 target 和 flow，Midscene 负责 UI 理解、执行、截图和报告。
+
+这篇文档只讲最短路径。完整 YAML schema 请查看 [YAML 格式的工作流](./automate-with-scripts-in-yaml)。
+
+<SetupEnv />
+
+## 创建第一个 Web 测试
+
+创建 `bing-search.yaml`：
+
+```yaml
+web:
+  url: https://www.bing.com
+
+tasks:
+  - name: 搜索天气
+    flow:
+      - aiAct: 搜索 "今日天气"
+      - aiAssert: 结果页展示了天气信息
+```
+
+运行：
+
+```bash
+midscene ./bing-search.yaml
+```
+
+Midscene 会打印执行进度，并在运行结束后生成可视化报告。报告是主要调试入口：它记录截图、任务状态、AI 动作和失败断言。
+
+## YAML 结构
+
+一个 YAML 测试主要包含三部分：
+
+```yaml
+web:
+  url: https://www.bing.com
+
+agent:
+  aiActContext: 如果出现 Cookie 弹窗，点击接受。
+
+tasks:
+  - name: 搜索天气
+    flow:
+      - aiAct: 搜索 "今日天气"
+      - aiWaitFor: 结果页加载完成
+      - aiAssert: 结果页展示了天气信息
+```
+
+- `web`、`android`、`ios` 或 `computer` 选择 target。
+- `agent` 配置 Midscene Agent 行为、报告和缓存选项。
+- `tasks` 包含命名流程。
+- `flow` 包含实际 UI 步骤。
+
+## 常用步骤
+
+| 步骤 | 适用场景 |
+| --- | --- |
+| `aiAct` | 用自然语言让 Midscene 执行 UI 动作 |
+| `aiAssert` | 检查 UI 是否满足自然语言条件 |
+| `aiWaitFor` | 等待某个 UI 条件成立 |
+| `aiQuery` | 从 UI 中提取结构化信息 |
+| `sleep` | 固定等待一段毫秒数 |
+| `javascript` | 在 Web 页面上下文中运行 JavaScript |
+
+步骤应描述用户意图，而不是实现细节。优先写“打开第一个商品”，不要写“点击第三个 `.card` 元素”。
+
+## Android 示例
+
+```yaml
+android:
+  deviceId: ${ANDROID_DEVICE_ID}
+  launch: com.example.app
+
+tasks:
+  - name: 验证首页
+    flow:
+      - aiAct: 如果出现权限弹窗就关闭
+      - aiAssert: 首页主内容区域可见
+```
+
+Android flow 也可以使用平台专属动作：
+
+```yaml
+tasks:
+  - name: 重置应用
+    flow:
+      - runAdbShell: pm clear com.example.app
+      - launch: com.example.app
+      - aiAssert: 新手引导页可见
+```
+
+## iOS 示例
+
+```yaml
+ios:
+  wdaPort: 8100
+  launch: com.apple.mobilesafari
+
+tasks:
+  - name: 打开网站
+    flow:
+      - aiAct: 聚焦地址栏并打开 https://www.bing.com
+      - aiAssert: Bing 搜索页可见
+```
+
+## Computer 示例
+
+```yaml
+computer: {}
+
+tasks:
+  - name: 从桌面浏览器搜索
+    flow:
+      - aiAct: 按下 Cmd+Space
+      - aiAct: 输入 "Safari" 并按回车
+      - aiAct: 聚焦地址栏并打开 https://www.bing.com
+      - aiAct: 搜索 "今日天气"
+      - aiAssert: 结果页展示了天气信息
+```
+
+如果驱动 Windows 或 Linux，请调整对应键盘快捷键。
+
+## 运行多个测试
+
+```bash
+midscene --files ./login.yaml './smoke/**/*.yaml'
+```
+
+并发运行互不依赖的文件，并在失败后继续：
+
+```bash
+midscene --files './smoke/**/*.yaml' --concurrent 4 --continue-on-error
+```
+
+完整 CLI 行为，包括 `.env`、headed 模式、CDP、Chrome bridge 模式、summary 文件和 config 文件，请查看 [YAML 脚本运行器](./yaml-script-runner)。
+
+## 什么时候超出 YAML
+
+YAML 应该描述 UI 旅程。当逻辑比视觉判断更确定时，把它移出 YAML：
+
+- 登录、SSO、Cookie、账号和设备放到 setup 脚本或 fixture。
+- 接口、数据库、埋点和金额校验放到代码断言。
+- CI、报告留存和批量执行放到 CLI 或 CI 配置。
+
+项目级工作流请继续阅读[用 Midscene 工程化 UI 测试](./ui-testing-engineering)。
diff --git a/apps/site/rspress.config.ts b/apps/site/rspress.config.ts
index 07fb03b593..21a54cd368 100644
--- a/apps/site/rspress.config.ts
+++ b/apps/site/rspress.config.ts
@@ -86,6 +86,21 @@ export default defineConfig({
           text: 'Showcases',
           link: '/showcases',
         },
+        {
+          sectionHeaderText: 'UI Testing Framework',
+        },
+        {
+          text: 'Overview',
+          link: '/ui-testing-framework',
+        },
+        {
+          text: 'Write tests with YAML',
+          link: '/ui-testing-yaml-quick-start',
+        },
+        {
+          text: 'Engineering guide',
+          link: '/ui-testing-engineering',
+        },
         {
           sectionHeaderText: 'Web browser',
         },
@@ -266,6 +281,21 @@ export default defineConfig({
           text: '案例展示',
           link: '/zh/showcases',
         },
+        {
+          sectionHeaderText: 'UI Testing Framework',
+        },
+        {
+          text: '专题总览',
+          link: '/zh/ui-testing-framework',
+        },
+        {
+          text: '使用 YAML 编写 UI 测试',
+          link: '/zh/ui-testing-yaml-quick-start',
+        },
+        {
+          text: '工程化指南',
+          link: '/zh/ui-testing-engineering',
+        },
         {
           sectionHeaderText: 'Web 浏览器',
         },

From 962d713b8c51ef840eb30cc8d739ff188546ea32 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 16:08:01 +0800
Subject: [PATCH 02/33] docs(site): refine UI testing framework overview

---
 apps/site/docs/en/ui-testing-framework.mdx | 49 ++++++++++++----------
 apps/site/docs/zh/ui-testing-framework.mdx | 49 ++++++++++++----------
 2 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 3651409bb0..f3e2e508cd 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -1,30 +1,30 @@
 # AI-native UI Testing Framework
 
-Midscene helps teams build UI tests around user intent instead of fragile selectors. Natural-language steps describe what a user wants to do, while scripts, setup code, CLI execution, and reports keep the workflow maintainable enough for real projects.
+Midscene helps teams write UI tests as intuitive cases instead of scripts built around fragile selectors and page implementation details. A case should stay simple: describe what the user wants to do, what the user should see, and what must be verified. AI drives the interaction and assertions, while YAML gives those cases a clear, lightweight structure for collaboration.
 
 This guide introduces the framework-level view. If you want to run your first test immediately, start with [Write UI tests with YAML](./ui-testing-yaml-quick-start).
 
-## From Smoke Tests to Test Projects
+## Intuitive Cases, Maintainable Projects
 
-Most teams do not start with a complete test project. They start with smoke tests: the fastest possible way to verify that a key business path still works. At that stage, getting a useful case running matters more than designing every abstraction. Once those cases run repeatedly and become part of daily regression, login state, accounts, cookies, devices, environment variables, and report configuration naturally show up.
+UI test cases should not be complicated. Most teams also do not start with a complete test platform. They start with smoke tests: the lowest-cost way to verify that a key business path still works. At this stage, the most important thing is to get the core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
 
-That is why Midscene's UI Testing Framework is designed around three steps:
+But a test project cannot stay as “a few loose cases” forever. Once tests become part of daily regression, project complexity naturally appears: cookies, login state, accounts, environments, devices, test data, CI, report archiving, and failure analysis all need a clear home. Midscene balances these two requirements: keep cases intuitive while keeping the test project refined and maintainable.
 
-- First, make smoke tests lightweight: business or QA users can write YAML cases, run the core path, and inspect a replayable report.
-- Next, provide the right amount of configuration: a test project must handle model settings, runtime options, login state, test data, devices, and report output without squeezing those details into natural-language steps.
-- Finally, support a flexible test project: when the team needs CI, internal tools, fixtures, API checks, and database checks, it can export and own a standard test project.
+That is why Midscene's UI Testing Framework is designed around three layers:
 
-The point is not to expose every capability on day one. The point is to give users the right shape at each stage.
+- Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
+- Control project complexity with `setup.js` and a small amount of code: login, cookies, accounts, test data, and device preparation have their own place; individual special cases can use `case.test.ts`.
+- Emit long-running projects into an Rstest template: when the project grows to need fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
-| Stage | Best for | Recommended shape |
+| Layer | Best for | Recommended shape |
 | --- | --- | --- |
-| Smoke test | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the report |
-| Simple customization | Tests need login state, accounts, cookies, devices, or a small amount of environment preparation | YAML + `setup.js`. Keep the business path in YAML and put pre-run preparation in setup |
-| Fully custom | The team needs to integrate an existing test project, CI, internal tools, data checks, or a custom runner | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
+| Intuitive cases | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
+| Refined, maintainable project | Tests need login state, cookies, accounts, devices, or a small amount of environment preparation | YAML + `setup.js`, with `case.test.ts` available for individual special cases |
+| Long-running engineering | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
-### 1. Smoke test: YAML + case
+### 1. Intuitive cases: YAML + case
 
-This stage is about one thing: make the core path easy to express, run, and review. The test author writes YAML for the entry point, actions, and expected result. Midscene executes the flow, captures screenshots, and generates the report.
+This stage is about one thing: make the core path natural to express, quick to run, and easy to review. The test author writes YAML for the entry point, actions, and expected result. Midscene uses AI to perform UI actions, evaluate visual assertions, capture screenshots, and generate a replayable report.
 
 ```yaml
 web:
@@ -38,11 +38,13 @@ tasks:
       - aiAssert: The cart page shows one product and the checkout button
 ```
 
-### 2. Simple customization: YAML + setup.js
+The value of YAML is not making the framework smaller. Its value is making “what this user path should do” clear enough for review, business confirmation, and team collaboration. It also keeps simple cases from being wrapped in test framework boilerplate on day one.
 
-Once tests become part of daily work, they usually need login state, test accounts, cookies, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps. They belong in `setup.js`.
+### 2. Refined, maintainable project: YAML + setup.js
 
-Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, device, account, or backend data for the test. A simple customized project can look like this:
+Once tests become part of daily work, they usually need login state, test accounts, cookies, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.js`.
+
+Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, device, account, or backend data for the test. A refined but lightweight project can look like this:
 
 ```text
 .
@@ -51,9 +53,10 @@ Here, `setup.js` means the pre-run setup script for the test project. It runs be
   e2e/
     dashboard.yaml
     checkout.yaml
+    checkout-edge-case.test.ts
 ```
 
-`midscene.config.yaml` manages model settings, runtime options, report output, and selected cases. `setup.js` manages project-specific preparation. `e2e/*.yaml` only describes the business path.
+`midscene.config.yaml` manages model settings, runtime options, report output, and selected cases. `setup.js` manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
 
 YAML still describes the business path:
 
@@ -81,11 +84,13 @@ export default async function setup({ browser, context, device }) {
 }
 ```
 
-The code above shows the boundary: login, accounts, data, and devices belong in setup. What the user should accomplish after the page opens stays in YAML.
+The code above shows the boundary: login, accounts, data, and devices belong in setup. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
+
+If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
 
-### 3. Fully custom: emit an independent Rstest project
+### 3. Long-running engineering: emit an Rstest project template
 
-When a team already has a test platform, CI rules, internal fixtures, data checks, and report systems, Midscene should not force everything through a fixed YAML runner. The core capability for fully custom projects is `emit`: export a lightweight Midscene project into an independent Rstest project.
+Long-running test projects inevitably grow: there are more cases, fixtures become more complex, CI needs parallelism and grouping, and failure analysis needs to connect with team systems. Midscene should not force all of that into a fixed YAML runner. For this stage, the core capability is `emit`: export a lightweight Midscene project into an independent Rstest project template.
 
 ```bash
 midscene emit ./project-folder
@@ -110,7 +115,7 @@ project-folder/
 
 At this stage, YAML is no longer the boundary of what the framework can do. It is the migration entry point. Teams can keep the business expression from YAML cases, or move complex logic into Rstest test files, fixtures, and internal tools. Midscene handles UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
 
-The point of this stage is not to write everything in YAML. The point is to let Midscene generate a standard test project that the team can fully own.
+Rstest is the underlying choice because it is fast, reliable, and a good foundation for a long-running test project. On top of that base, Midscene provides AI UI actions, visual assertions, screenshots, replay reports, and debugging information, so the project can scale without giving up a good developer experience.
 
 ## Next steps
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 041a0e67cb..6ad9f05634 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -1,30 +1,30 @@
 # AI 原生 UI Testing Framework
 
-Midscene 帮助团队围绕用户意图构建 UI 测试，而不是围绕脆弱的选择器构建测试。自然语言步骤描述用户想完成什么，脚本、setup、CLI 和报告则把这套流程支撑成可维护的测试工程。
+Midscene 帮助团队把 UI 测试写成符合直觉的用例，而不是围绕脆弱选择器和页面实现细节维护脚本。用例本身应该简单：描述用户想完成什么、看到什么、确认什么。AI 驱动负责理解和执行这些意图，YAML 则提供一种清晰、轻量、适合协作的组织形式。
 
 这篇文档介绍框架视角。如果你想直接跑第一个测试，可以从[使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)开始。
 
-## 从 Smoke Test 到测试工程
+## 用例直觉化，工程精致化
 
-大多数业务不会一开始就建设完整测试工程。更常见的起点是 Smoke Test：先用最小成本验证关键路径是否可用，快速跑通比完整抽象更重要。只有当这些 case 开始反复运行、进入日常回归，登录态、账号、Cookie、设备、环境变量和报告配置才会自然浮现出来。
+UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整测试平台，而是 Smoke Test：先用最小成本验证关键路径是否可用。这个阶段最重要的是快速跑通，把核心业务路径变成可以重复执行、可以回放分析的 case。
 
-所以 Midscene 的 UI Testing Framework 应该沿着三步展开：
+但测试项目不能一直停留在“随手写几个 case”的状态。只要进入日常回归，工程复杂度就会自然出现：Cookie、登录态、账号、环境、设备、测试数据、CI、报告归档和失败排查都需要有明确位置。Midscene 的原则是平衡这两件事：让用例保持直觉化，同时让测试工程足够精致、可维护。
 
-- 先让 Smoke Test 足够轻：业务同学或测试同学可以直接写 YAML case，把核心路径跑起来，并拿到可回放报告。
-- 再提供适当的配置能力：测试项目需要能处理模型配置、运行参数、登录态、测试数据、设备和报告输出，而不是把这些工程细节塞进自然语言步骤。
-- 最后支持灵活定制的测试项目：当团队要接 CI、内部工具、fixture、接口校验和数据库校验时，可以导出并接管一个标准测试工程。
+所以 Midscene 的 UI Testing Framework 沿着三层形态展开：
 
-这条路径的关键不是把所有能力一次性暴露出来，而是让用户在每个阶段都有刚好够用的形态。
+- 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
+- 用 `setup.js` 和少量代码控制工程复杂度：登录、Cookie、账号、测试数据和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
+- 长期项目可以 `emit` 成 Rstest 工程模板：当项目膨胀到需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
-| 阶段 | 适合场景 | 推荐形态 |
+| 层级 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
-| Smoke Test | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写出页面入口、任务步骤和断言，直接通过 CLI 运行并查看报告 |
-| 简单定制 | 测试需要登录态、账号、Cookie、设备或少量环境准备 | YAML + `setup.js`。业务路径继续留在 YAML，运行前准备放到 setup |
-| 完全自定义 | 团队要接入现有测试工程、CI、内部工具、数据校验或自定义 runner | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
+| 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写入口、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
+| 精致可维护的项目 | 测试需要登录态、Cookie、账号、设备或少量环境准备 | YAML + `setup.js`，必要时为个别特殊 case 增加 `case.test.ts` |
+| 长期工程化 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
-### 1. Smoke Test：YAML + case
+### 1. 直觉化用例：YAML + case
 
-这个阶段只关心一件事：核心路径能不能被快速表达、运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责执行、截图和生成报告。
+这个阶段只关心一件事：核心路径能不能被自然表达、快速运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责用 AI 执行 UI 操作、完成视觉断言、截图并生成可回放报告。
 
 ```yaml
 web:
@@ -38,11 +38,13 @@ tasks:
       - aiAssert: The cart page shows one product and the checkout button
 ```
 
-### 2. 简单定制：YAML + setup.js
+YAML 的价值不是把测试能力做小，而是把“一个用户路径应该是什么样”组织得足够清楚。它天然适合 code review、业务确认和团队协作，也能避免简单用例一开始就被测试框架样板代码包住。
 
-当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，而应该放在 `setup.js` 中完成。
+### 2. 精致可维护的项目：YAML + setup.js
 
-这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、设备、账号或后端数据准备到正确状态。一个简单定制项目可以长这样：
+当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.js`。
+
+这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、设备、账号或后端数据准备到正确状态。一个精致但不重的项目可以长这样：
 
 ```text
 .
@@ -51,9 +53,10 @@ tasks:
   e2e/
     dashboard.yaml
     checkout.yaml
+    checkout-edge-case.test.ts
 ```
 
-`midscene.config.yaml` 管理模型、运行参数、报告输出和需要执行的 case；`setup.js` 管理项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径。
+`midscene.config.yaml` 管理模型、运行参数、报告输出和需要执行的 case；`setup.js` 管理项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
 
 YAML 仍然描述业务路径：
 
@@ -81,11 +84,13 @@ export default async function setup({ browser, context, device }) {
 }
 ```
 
-上面的代码展示的是职责边界：登录、账号、数据、设备这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。
+上面的代码展示的是职责边界：登录、账号、数据、设备这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
+
+如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
 
-### 3. 完全自定义：emit 为独立 Rstest 工程
+### 3. 长期工程化：emit 为 Rstest 项目模板
 
-当团队已经有测试平台、CI 规范、内部 fixture、数据校验和报告系统时，Midscene 不应该强迫你停留在固定 YAML runner 里。完全自定义场景的核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程。
+测试项目长期运行后一定会膨胀：case 会变多，fixture 会变复杂，CI 会有并发和分组策略，失败分析也需要接入团队自己的系统。Midscene 不应该把这些需求都压进固定 YAML runner。对应这类长期项目，核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程模板。
 
 ```bash
 midscene emit ./project-folder
@@ -110,7 +115,7 @@ project-folder/
 
 在这个阶段，YAML 不再是能力边界，而是迁移入口。团队可以继续保留 YAML case 的业务表达，也可以把复杂逻辑拆进 Rstest 测试文件、fixture 和内部工具里。UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
 
-这个阶段的重点不是“把所有东西都写成 YAML”，而是让 Midscene 生成一个能被团队完全接管的标准测试工程。
+底层选择 Rstest，是因为它足够快、稳定，并且适合承载一个长期维护的测试工程。Midscene 在这个底座上提供 AI UI 操作、视觉断言、截图、回放报告和调试信息，保证项目既能工程化扩展，也有足够好的开发体验。
 
 ## 下一步
 

From 1fc24eaba67b423ad4474261ae6f8f57f4e72bd7 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 16:54:20 +0800
Subject: [PATCH 03/33] docs(site): keep UI testing framework as single page

---
 apps/site/docs/en/ui-testing-engineering.mdx  | 149 ------------------
 apps/site/docs/en/ui-testing-framework.mdx    |   4 +-
 .../docs/en/ui-testing-yaml-quick-start.mdx   | 149 ------------------
 apps/site/docs/zh/ui-testing-engineering.mdx  | 149 ------------------
 apps/site/docs/zh/ui-testing-framework.mdx    |   4 +-
 .../docs/zh/ui-testing-yaml-quick-start.mdx   | 149 ------------------
 apps/site/rspress.config.ts                   |  16 --
 7 files changed, 2 insertions(+), 618 deletions(-)
 delete mode 100644 apps/site/docs/en/ui-testing-engineering.mdx
 delete mode 100644 apps/site/docs/en/ui-testing-yaml-quick-start.mdx
 delete mode 100644 apps/site/docs/zh/ui-testing-engineering.mdx
 delete mode 100644 apps/site/docs/zh/ui-testing-yaml-quick-start.mdx

diff --git a/apps/site/docs/en/ui-testing-engineering.mdx b/apps/site/docs/en/ui-testing-engineering.mdx
deleted file mode 100644
index 11545f802f..0000000000
--- a/apps/site/docs/en/ui-testing-engineering.mdx
+++ /dev/null
@@ -1,149 +0,0 @@
-# Engineering UI Tests with Midscene
-
-YAML gets a UI test running quickly. Engineering practice keeps it reliable after the test suite grows. The main rule is simple: keep user-facing UI intent in YAML, and keep deterministic environment and business logic in code.
-
-## Recommended project shape
-
-```text
-.
-  .env
-  midscene.config.yaml
-  setup.ts
-  e2e/
-    login.yaml
-    checkout.yaml
-    mobile-smoke.yaml
-  fixtures/
-    account.ts
-    device.ts
-  reports/
-```
-
-Use this shape as a guide, not a strict requirement. Small projects can start with one YAML file. Larger projects usually need separate setup, fixtures, and CI configuration.
-
-## Keep the boundary clear
-
-| Concern | Put it in | Why |
-| --- | --- | --- |
-| User path, visual state, popup handling, navigation | YAML | Natural language is concise and resilient to UI changes |
-| Login, cookies, SSO, accounts, environment preparation | Setup scripts or fixtures | The logic is project-specific and often needs secrets or internal tools |
-| API responses, database records, analytics events, amount calculation | JavaScript/TypeScript assertions | These checks need deterministic data and exact failure messages |
-| Batch execution, concurrency, summaries, report artifacts | CLI and CI configuration | They are execution concerns, not test intent |
-
-## Environment configuration
-
-Midscene CLI loads `.env` from the command working directory. A typical file contains model settings:
-
-```ini filename=.env
-MIDSCENE_MODEL_BASE_URL="https://your-model-service.example.com/v1"
-MIDSCENE_MODEL_API_KEY="your API Key"
-MIDSCENE_MODEL_NAME="your model name"
-MIDSCENE_MODEL_FAMILY="your model family"
-```
-
-In CI, store sensitive values in the CI secret manager and expose them as environment variables. Use `--dotenv-debug` when you need to inspect how local variables are loaded, and `--dotenv-override` only when the `.env` file should replace existing process variables.
-
-## Login and setup
-
-Most real tests should not spend every run manually completing login. Prepare state before the YAML flow starts:
-
-- create or select a test account;
-- complete SSO and inject cookies;
-- install or launch the app under test;
-- select an Android or iOS device;
-- configure a lane, feature flag, or internal environment;
-- seed backend data needed by the test.
-
-After setup, keep the YAML focused on the business behavior:
-
-```yaml
-web:
-  url: https://internal.example.com/dashboard
-
-tasks:
-  - name: Check dashboard
-    flow:
-      - aiAssert: The dashboard is loaded and user information is visible
-```
-
-## Batch runs and CI
-
-Use the CLI for suite-level execution:
-
-```bash
-midscene --files './e2e/**/*.yaml' --concurrent 4 --continue-on-error --summary index.json
-```
-
-Recommended CI artifacts:
-
-- the summary JSON file;
-- each YAML run result;
-- visual report HTML files;
-- logs from the application or device when available.
-
-Keep reports even for successful scheduled runs. They make UI drift and flaky behavior easier to investigate later.
-
-## Browser sessions
-
-For Web tests, choose the connection mode based on the state you need:
-
-| Mode | Best for |
-| --- | --- |
-| Default browser launch | Clean and repeatable tests |
-| Headed mode | Local debugging |
-| CDP connection | Remote browsers or managed browser services |
-| Chrome bridge mode | Reusing an existing desktop Chrome session, cookies, extensions, or internal login |
-
-See [YAML script runner](./yaml-script-runner) and [Bridge to the desktop Chrome](./bridge-mode) for the exact configuration.
-
-## Mobile devices
-
-Mobile tests usually need more setup than Web tests. Keep device management outside the YAML journey when possible:
-
-- reserve a device from a device pool;
-- install the target build;
-- clear or seed app state;
-- configure network, region, or account data;
-- collect device logs after failure.
-
-Use platform-specific YAML helpers for small local actions, such as `runAdbShell`, `runWdaRequest`, `launch`, and `terminate`. Use setup scripts for broader device orchestration.
-
-## Deterministic checks
-
-AI assertions are best for UI state: visible text, layout meaning, workflow completion, and visual conditions. Use code when correctness depends on exact values:
-
-```ts
-expect(createOrderResponse.status).toBe(200);
-expect(order.total).toBe(expectedTotal);
-expect(analyticsEvents).toContainEqual({
-  name: 'checkout_submit',
-  source: 'recommendation',
-});
-```
-
-This keeps failures actionable. A visual assertion explains what the user saw. A code assertion explains which business invariant failed.
-
-## Reports and debugging
-
-When a test fails, inspect the report before editing the prompt. Check:
-
-- the screenshot before the failed step;
-- the AI action and assertion text;
-- whether the page or app was still loading;
-- whether login or setup state was missing;
-- whether the test should use code for an exact business rule.
-
-Prompt changes should clarify intent. They should not encode brittle layout details unless the layout itself is what you are testing.
-
-## Agent orchestration
-
-Some internal workflows require more than a fixed runner: device pools, SSO, logs, network tools, backend queries, and report analysis. In those cases, treat YAML as the stable test asset and let a coding agent or internal runner orchestrate the surrounding tools.
-
-Midscene remains responsible for UI understanding, actions, screenshots, and reports. The orchestrator prepares the environment, calls the right tests, gathers external evidence, and summarizes failures.
-
-## Reference pages
-
-- [Write UI tests with YAML](./ui-testing-yaml-quick-start)
-- [Workflow in YAML format](./automate-with-scripts-in-yaml)
-- [YAML script runner](./yaml-script-runner)
-- [Integrate Midscene with any interface](./integrate-with-any-interface)
diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index f3e2e508cd..a1c24aee91 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -2,7 +2,7 @@
 
 Midscene helps teams write UI tests as intuitive cases instead of scripts built around fragile selectors and page implementation details. A case should stay simple: describe what the user wants to do, what the user should see, and what must be verified. AI drives the interaction and assertions, while YAML gives those cases a clear, lightweight structure for collaboration.
 
-This guide introduces the framework-level view. If you want to run your first test immediately, start with [Write UI tests with YAML](./ui-testing-yaml-quick-start).
+This guide introduces the design view and project shapes of Midscene UI Testing Framework.
 
 ## Intuitive Cases, Maintainable Projects
 
@@ -119,7 +119,5 @@ Rstest is the underlying choice because it is fast, reliable, and a good foundat
 
 ## Next steps
 
-- Run your first test: [Write UI tests with YAML](./ui-testing-yaml-quick-start)
-- Add setup, CI, reports, and deterministic checks: [Engineering UI tests with Midscene](./ui-testing-engineering)
 - Look up every YAML field: [Workflow in YAML format](./automate-with-scripts-in-yaml)
 - Look up every CLI flag: [YAML script runner](./yaml-script-runner)
diff --git a/apps/site/docs/en/ui-testing-yaml-quick-start.mdx b/apps/site/docs/en/ui-testing-yaml-quick-start.mdx
deleted file mode 100644
index 3c4528d177..0000000000
--- a/apps/site/docs/en/ui-testing-yaml-quick-start.mdx
+++ /dev/null
@@ -1,149 +0,0 @@
-import SetupEnv from './common/setup-env.mdx';
-
-# Write UI Tests with YAML
-
-YAML is the fastest way to turn a user journey into a runnable Midscene test. You describe the target and the flow. Midscene handles UI understanding, execution, screenshots, and the report.
-
-This page focuses on the shortest path. For the complete YAML schema, see [Workflow in YAML format](./automate-with-scripts-in-yaml).
-
-<SetupEnv />
-
-## Create your first Web test
-
-Create `bing-search.yaml`:
-
-```yaml
-web:
-  url: https://www.bing.com
-
-tasks:
-  - name: Search weather
-    flow:
-      - aiAct: Search for "today's weather"
-      - aiAssert: The result page shows weather information
-```
-
-Run it:
-
-```bash
-midscene ./bing-search.yaml
-```
-
-Midscene prints execution progress and generates a visual report after the run. The report is the main debugging surface: it records screenshots, task status, AI actions, and failed assertions.
-
-## YAML structure
-
-A YAML test has three main parts:
-
-```yaml
-web:
-  url: https://www.bing.com
-
-agent:
-  aiActContext: If a cookie banner appears, accept it.
-
-tasks:
-  - name: Search weather
-    flow:
-      - aiAct: Search for "today's weather"
-      - aiWaitFor: The result page is loaded
-      - aiAssert: The result page shows weather information
-```
-
-- `web`, `android`, `ios`, or `computer` selects the target.
-- `agent` configures Midscene Agent behavior, reports, and cache options.
-- `tasks` contains named flows.
-- `flow` contains the actual UI steps.
-
-## Common steps
-
-| Step | Use it for |
-| --- | --- |
-| `aiAct` | Ask Midscene to perform a UI action from natural language |
-| `aiAssert` | Check that the UI satisfies a natural-language condition |
-| `aiWaitFor` | Wait until a UI condition becomes true |
-| `aiQuery` | Extract structured information from the UI |
-| `sleep` | Wait for a fixed number of milliseconds |
-| `javascript` | Run JavaScript in a Web page context |
-
-Write steps as user intent, not implementation details. Prefer "open the first product" over "click the third `.card` element".
-
-## Android example
-
-```yaml
-android:
-  deviceId: ${ANDROID_DEVICE_ID}
-  launch: com.example.app
-
-tasks:
-  - name: Verify home page
-    flow:
-      - aiAct: Close any permission popup if it appears
-      - aiAssert: The home page main content is visible
-```
-
-Android flows can also use Android-specific actions:
-
-```yaml
-tasks:
-  - name: Reset app
-    flow:
-      - runAdbShell: pm clear com.example.app
-      - launch: com.example.app
-      - aiAssert: The onboarding page is visible
-```
-
-## iOS example
-
-```yaml
-ios:
-  wdaPort: 8100
-  launch: com.apple.mobilesafari
-
-tasks:
-  - name: Open a website
-    flow:
-      - aiAct: Focus the address bar and open https://www.bing.com
-      - aiAssert: The Bing search page is visible
-```
-
-## Computer example
-
-```yaml
-computer: {}
-
-tasks:
-  - name: Search from desktop browser
-    flow:
-      - aiAct: Press Cmd+Space
-      - aiAct: Type "Safari" and press Enter
-      - aiAct: Focus the address bar and open https://www.bing.com
-      - aiAct: Search for "today's weather"
-      - aiAssert: The result page shows weather information
-```
-
-Adjust keyboard shortcuts for Windows or Linux when driving those systems.
-
-## Run multiple tests
-
-```bash
-midscene --files ./login.yaml './smoke/**/*.yaml'
-```
-
-Run independent files concurrently and continue after failures:
-
-```bash
-midscene --files './smoke/**/*.yaml' --concurrent 4 --continue-on-error
-```
-
-For full CLI behavior, including `.env`, headed mode, CDP, Chrome bridge mode, summary files, and config files, see [YAML script runner](./yaml-script-runner).
-
-## When to move beyond YAML
-
-YAML should describe the UI journey. Move other logic out when it becomes more deterministic than visual:
-
-- Login, SSO, cookies, accounts, and devices belong in setup scripts or fixtures.
-- API, database, analytics, and amount checks belong in code assertions.
-- CI, report retention, and batch execution belong in CLI or CI configuration.
-
-See [Engineering UI tests with Midscene](./ui-testing-engineering) for the project-level workflow.
diff --git a/apps/site/docs/zh/ui-testing-engineering.mdx b/apps/site/docs/zh/ui-testing-engineering.mdx
deleted file mode 100644
index 4d1aeeffa7..0000000000
--- a/apps/site/docs/zh/ui-testing-engineering.mdx
+++ /dev/null
@@ -1,149 +0,0 @@
-# 用 Midscene 工程化 UI 测试
-
-YAML 能让 UI 测试快速跑起来。工程化实践负责在测试套件增长后继续保持可靠。核心规则很简单：用户可见的 UI 意图留在 YAML；确定性的环境准备和业务逻辑放到代码里。
-
-## 推荐项目形态
-
-```text
-.
-  .env
-  midscene.config.yaml
-  setup.ts
-  e2e/
-    login.yaml
-    checkout.yaml
-    mobile-smoke.yaml
-  fixtures/
-    account.ts
-    device.ts
-  reports/
-```
-
-这只是推荐形态，不是强制结构。小项目可以从一个 YAML 文件开始；更大的项目通常需要独立的 setup、fixture 和 CI 配置。
-
-## 保持边界清晰
-
-| 关注点 | 放在哪里 | 原因 |
-| --- | --- | --- |
-| 用户路径、视觉状态、弹窗处理、导航 | YAML | 自然语言简洁，并且更能适应 UI 变化 |
-| 登录、Cookie、SSO、账号、环境准备 | setup 脚本或 fixture | 这些逻辑和项目强相关，通常需要密钥或内部工具 |
-| 接口响应、数据库记录、埋点、金额计算 | JavaScript/TypeScript 断言 | 这些检查需要确定数据和精确失败信息 |
-| 批量执行、并发、summary、报告产物 | CLI 和 CI 配置 | 这是执行层关注点，不是测试意图 |
-
-## 环境配置
-
-Midscene CLI 会从命令执行目录加载 `.env`。典型文件包含模型配置：
-
-```ini filename=.env
-MIDSCENE_MODEL_BASE_URL="https://your-model-service.example.com/v1"
-MIDSCENE_MODEL_API_KEY="your API Key"
-MIDSCENE_MODEL_NAME="your model name"
-MIDSCENE_MODEL_FAMILY="your model family"
-```
-
-在 CI 中，应把敏感值放进 CI secret manager，再暴露成环境变量。需要排查本地变量加载逻辑时使用 `--dotenv-debug`；只有当 `.env` 应该覆盖已有进程环境变量时才使用 `--dotenv-override`。
-
-## 登录和 setup
-
-大多数真实测试不应该每次都手动完成登录。应在 YAML flow 开始前准备状态：
-
-- 创建或选择测试账号；
-- 完成 SSO 并注入 Cookie；
-- 安装或启动被测 App；
-- 选择 Android 或 iOS 设备；
-- 配置泳道、功能开关或内部环境；
-- 准备测试所需的后端数据。
-
-setup 完成后，让 YAML 专注于业务行为：
-
-```yaml
-web:
-  url: https://internal.example.com/dashboard
-
-tasks:
-  - name: 检查首页
-    flow:
-      - aiAssert: Dashboard 已加载，用户信息可见
-```
-
-## 批量运行和 CI
-
-使用 CLI 执行测试套件：
-
-```bash
-midscene --files './e2e/**/*.yaml' --concurrent 4 --continue-on-error --summary index.json
-```
-
-推荐保留的 CI 产物：
-
-- summary JSON 文件；
-- 每个 YAML 的运行结果；
-- 可视化报告 HTML；
-- 可用时保留应用或设备日志。
-
-即使定时任务成功，也建议保留报告。后续排查 UI 漂移和偶发失败时会更容易。
-
-## 浏览器会话
-
-Web 测试应根据所需状态选择连接模式：
-
-| 模式 | 适合场景 |
-| --- | --- |
-| 默认启动浏览器 | 干净、可重复的测试 |
-| Headed 模式 | 本地调试 |
-| CDP 连接 | 远程浏览器或托管浏览器服务 |
-| Chrome bridge 模式 | 复用桌面 Chrome 会话、Cookie、插件或内部登录态 |
-
-具体配置见 [YAML 脚本运行器](./yaml-script-runner) 和 [桥接到桌面 Chrome](./bridge-mode)。
-
-## 移动端设备
-
-移动端测试通常比 Web 测试需要更多 setup。尽量把设备管理放在 YAML 旅程之外：
-
-- 从设备池预定设备；
-- 安装目标构建产物；
-- 清理或注入 App 状态；
-- 配置网络、地区或账号数据；
-- 失败后收集设备日志。
-
-小的本地动作可以使用平台专属 YAML helper，例如 `runAdbShell`、`runWdaRequest`、`launch` 和 `terminate`。更大的设备编排应放到 setup 脚本里。
-
-## 确定性校验
-
-AI 断言适合 UI 状态：可见文本、布局含义、流程完成和视觉条件。正确性依赖精确值时，使用代码：
-
-```ts
-expect(createOrderResponse.status).toBe(200);
-expect(order.total).toBe(expectedTotal);
-expect(analyticsEvents).toContainEqual({
-  name: 'checkout_submit',
-  source: 'recommendation',
-});
-```
-
-这样失败信息会更可行动。视觉断言解释用户看到什么；代码断言解释哪个业务不变量失败。
-
-## 报告和调试
-
-测试失败时，先看报告，再改 prompt。重点检查：
-
-- 失败步骤前的截图；
-- AI 动作和断言文本；
-- 页面或 App 是否仍在加载；
-- 登录或 setup 状态是否缺失；
-- 当前检查是否应该改用代码表达精确业务规则。
-
-Prompt 修改应该澄清意图，不应该编码脆弱的布局细节，除非布局本身就是测试目标。
-
-## Agent 编排
-
-有些内部流程超过固定 runner 的边界：设备池、SSO、日志、网络工具、后端查询和报告分析。此时可以把 YAML 作为稳定测试资产，让 coding agent 或内部 runner 编排周边工具。
-
-Midscene 继续负责 UI 理解、动作执行、截图和报告。编排层负责准备环境、调用测试、收集外部证据并总结失败。
-
-## 参考文档
-
-- [使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)
-- [YAML 格式的工作流](./automate-with-scripts-in-yaml)
-- [YAML 脚本运行器](./yaml-script-runner)
-- [将 Midscene 集成到任意界面](./integrate-with-any-interface)
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 6ad9f05634..37c6f58364 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -2,7 +2,7 @@
 
 Midscene 帮助团队把 UI 测试写成符合直觉的用例，而不是围绕脆弱选择器和页面实现细节维护脚本。用例本身应该简单：描述用户想完成什么、看到什么、确认什么。AI 驱动负责理解和执行这些意图，YAML 则提供一种清晰、轻量、适合协作的组织形式。
 
-这篇文档介绍框架视角。如果你想直接跑第一个测试，可以从[使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)开始。
+这篇文档介绍 Midscene UI Testing Framework 的设计视角和项目形态。
 
 ## 用例直觉化，工程精致化
 
@@ -119,7 +119,5 @@ project-folder/
 
 ## 下一步
 
-- 跑第一个测试：[使用 YAML 编写 UI 测试](./ui-testing-yaml-quick-start)
-- 加入 setup、CI、报告和确定性校验：[用 Midscene 工程化 UI 测试](./ui-testing-engineering)
 - 查询完整 YAML 字段：[YAML 格式的工作流](./automate-with-scripts-in-yaml)
 - 查询完整 CLI 参数：[YAML 脚本运行器](./yaml-script-runner)
diff --git a/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx b/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx
deleted file mode 100644
index e90121f183..0000000000
--- a/apps/site/docs/zh/ui-testing-yaml-quick-start.mdx
+++ /dev/null
@@ -1,149 +0,0 @@
-import SetupEnv from './common/setup-env.mdx';
-
-# 使用 YAML 编写 UI 测试
-
-YAML 是把用户路径变成可运行 Midscene 测试的最快方式。你描述 target 和 flow，Midscene 负责 UI 理解、执行、截图和报告。
-
-这篇文档只讲最短路径。完整 YAML schema 请查看 [YAML 格式的工作流](./automate-with-scripts-in-yaml)。
-
-<SetupEnv />
-
-## 创建第一个 Web 测试
-
-创建 `bing-search.yaml`：
-
-```yaml
-web:
-  url: https://www.bing.com
-
-tasks:
-  - name: 搜索天气
-    flow:
-      - aiAct: 搜索 "今日天气"
-      - aiAssert: 结果页展示了天气信息
-```
-
-运行：
-
-```bash
-midscene ./bing-search.yaml
-```
-
-Midscene 会打印执行进度，并在运行结束后生成可视化报告。报告是主要调试入口：它记录截图、任务状态、AI 动作和失败断言。
-
-## YAML 结构
-
-一个 YAML 测试主要包含三部分：
-
-```yaml
-web:
-  url: https://www.bing.com
-
-agent:
-  aiActContext: 如果出现 Cookie 弹窗，点击接受。
-
-tasks:
-  - name: 搜索天气
-    flow:
-      - aiAct: 搜索 "今日天气"
-      - aiWaitFor: 结果页加载完成
-      - aiAssert: 结果页展示了天气信息
-```
-
-- `web`、`android`、`ios` 或 `computer` 选择 target。
-- `agent` 配置 Midscene Agent 行为、报告和缓存选项。
-- `tasks` 包含命名流程。
-- `flow` 包含实际 UI 步骤。
-
-## 常用步骤
-
-| 步骤 | 适用场景 |
-| --- | --- |
-| `aiAct` | 用自然语言让 Midscene 执行 UI 动作 |
-| `aiAssert` | 检查 UI 是否满足自然语言条件 |
-| `aiWaitFor` | 等待某个 UI 条件成立 |
-| `aiQuery` | 从 UI 中提取结构化信息 |
-| `sleep` | 固定等待一段毫秒数 |
-| `javascript` | 在 Web 页面上下文中运行 JavaScript |
-
-步骤应描述用户意图，而不是实现细节。优先写“打开第一个商品”，不要写“点击第三个 `.card` 元素”。
-
-## Android 示例
-
-```yaml
-android:
-  deviceId: ${ANDROID_DEVICE_ID}
-  launch: com.example.app
-
-tasks:
-  - name: 验证首页
-    flow:
-      - aiAct: 如果出现权限弹窗就关闭
-      - aiAssert: 首页主内容区域可见
-```
-
-Android flow 也可以使用平台专属动作：
-
-```yaml
-tasks:
-  - name: 重置应用
-    flow:
-      - runAdbShell: pm clear com.example.app
-      - launch: com.example.app
-      - aiAssert: 新手引导页可见
-```
-
-## iOS 示例
-
-```yaml
-ios:
-  wdaPort: 8100
-  launch: com.apple.mobilesafari
-
-tasks:
-  - name: 打开网站
-    flow:
-      - aiAct: 聚焦地址栏并打开 https://www.bing.com
-      - aiAssert: Bing 搜索页可见
-```
-
-## Computer 示例
-
-```yaml
-computer: {}
-
-tasks:
-  - name: 从桌面浏览器搜索
-    flow:
-      - aiAct: 按下 Cmd+Space
-      - aiAct: 输入 "Safari" 并按回车
-      - aiAct: 聚焦地址栏并打开 https://www.bing.com
-      - aiAct: 搜索 "今日天气"
-      - aiAssert: 结果页展示了天气信息
-```
-
-如果驱动 Windows 或 Linux，请调整对应键盘快捷键。
-
-## 运行多个测试
-
-```bash
-midscene --files ./login.yaml './smoke/**/*.yaml'
-```
-
-并发运行互不依赖的文件，并在失败后继续：
-
-```bash
-midscene --files './smoke/**/*.yaml' --concurrent 4 --continue-on-error
-```
-
-完整 CLI 行为，包括 `.env`、headed 模式、CDP、Chrome bridge 模式、summary 文件和 config 文件，请查看 [YAML 脚本运行器](./yaml-script-runner)。
-
-## 什么时候超出 YAML
-
-YAML 应该描述 UI 旅程。当逻辑比视觉判断更确定时，把它移出 YAML：
-
-- 登录、SSO、Cookie、账号和设备放到 setup 脚本或 fixture。
-- 接口、数据库、埋点和金额校验放到代码断言。
-- CI、报告留存和批量执行放到 CLI 或 CI 配置。
-
-项目级工作流请继续阅读[用 Midscene 工程化 UI 测试](./ui-testing-engineering)。
diff --git a/apps/site/rspress.config.ts b/apps/site/rspress.config.ts
index 21a54cd368..d39448e0b6 100644
--- a/apps/site/rspress.config.ts
+++ b/apps/site/rspress.config.ts
@@ -93,14 +93,6 @@ export default defineConfig({
           text: 'Overview',
           link: '/ui-testing-framework',
         },
-        {
-          text: 'Write tests with YAML',
-          link: '/ui-testing-yaml-quick-start',
-        },
-        {
-          text: 'Engineering guide',
-          link: '/ui-testing-engineering',
-        },
         {
           sectionHeaderText: 'Web browser',
         },
@@ -288,14 +280,6 @@ export default defineConfig({
           text: '专题总览',
           link: '/zh/ui-testing-framework',
         },
-        {
-          text: '使用 YAML 编写 UI 测试',
-          link: '/zh/ui-testing-yaml-quick-start',
-        },
-        {
-          text: '工程化指南',
-          link: '/zh/ui-testing-engineering',
-        },
         {
           sectionHeaderText: 'Web 浏览器',
         },

From 8366f57e17a3257311cd8fb9bd542031475341ee Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 16:55:38 +0800
Subject: [PATCH 04/33] docs(site): rename complete test project section

---
 apps/site/docs/en/ui-testing-framework.mdx | 10 +++++-----
 apps/site/docs/zh/ui-testing-framework.mdx |  8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index a1c24aee91..0986977314 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -14,13 +14,13 @@ That is why Midscene's UI Testing Framework is designed around three layers:
 
 - Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
 - Control project complexity with `setup.js` and a small amount of code: login, cookies, accounts, test data, and device preparation have their own place; individual special cases can use `case.test.ts`.
-- Emit long-running projects into an Rstest template: when the project grows to need fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
+- Emit a complete test project from an Rstest template: when the project needs fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
 | Layer | Best for | Recommended shape |
 | --- | --- | --- |
 | Intuitive cases | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
 | Refined, maintainable project | Tests need login state, cookies, accounts, devices, or a small amount of environment preparation | YAML + `setup.js`, with `case.test.ts` available for individual special cases |
-| Long-running engineering | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
+| Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
 ### 1. Intuitive cases: YAML + case
 
@@ -88,9 +88,9 @@ The code above shows the boundary: login, accounts, data, and devices belong in
 
 If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
 
-### 3. Long-running engineering: emit an Rstest project template
+### 3. Complete test project: emit an Rstest project template
 
-Long-running test projects inevitably grow: there are more cases, fixtures become more complex, CI needs parallelism and grouping, and failure analysis needs to connect with team systems. Midscene should not force all of that into a fixed YAML runner. For this stage, the core capability is `emit`: export a lightweight Midscene project into an independent Rstest project template.
+When a test project needs complete engineering capabilities, there are more cases, fixtures become more complex, CI needs parallelism and grouping, and failure analysis needs to connect with team systems. Midscene should not force all of that into a fixed YAML runner. For this stage, the core capability is `emit`: export a lightweight Midscene project into an independent Rstest project template.
 
 ```bash
 midscene emit ./project-folder
@@ -115,7 +115,7 @@ project-folder/
 
 At this stage, YAML is no longer the boundary of what the framework can do. It is the migration entry point. Teams can keep the business expression from YAML cases, or move complex logic into Rstest test files, fixtures, and internal tools. Midscene handles UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
 
-Rstest is the underlying choice because it is fast, reliable, and a good foundation for a long-running test project. On top of that base, Midscene provides AI UI actions, visual assertions, screenshots, replay reports, and debugging information, so the project can scale without giving up a good developer experience.
+Rstest is the underlying choice because it is fast, reliable, and a good foundation for a complete test project. On top of that base, Midscene provides AI UI actions, visual assertions, screenshots, replay reports, and debugging information, so the project can scale without giving up a good developer experience.
 
 ## Next steps
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 37c6f58364..9d6a9247ff 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -14,13 +14,13 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 
 - 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
 - 用 `setup.js` 和少量代码控制工程复杂度：登录、Cookie、账号、测试数据和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
-- 长期项目可以 `emit` 成 Rstest 工程模板：当项目膨胀到需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
+- 完备的测试工程可以由 `emit` 生成：当项目需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
 | 层级 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
 | 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写入口、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
 | 精致可维护的项目 | 测试需要登录态、Cookie、账号、设备或少量环境准备 | YAML + `setup.js`，必要时为个别特殊 case 增加 `case.test.ts` |
-| 长期工程化 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
+| 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
 ### 1. 直觉化用例：YAML + case
 
@@ -88,9 +88,9 @@ export default async function setup({ browser, context, device }) {
 
 如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
 
-### 3. 长期工程化：emit 为 Rstest 项目模板
+### 3. 完备的测试工程：emit 为 Rstest 项目模板
 
-测试项目长期运行后一定会膨胀：case 会变多，fixture 会变复杂，CI 会有并发和分组策略，失败分析也需要接入团队自己的系统。Midscene 不应该把这些需求都压进固定 YAML runner。对应这类长期项目，核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程模板。
+当测试项目需要完整工程能力时，case 会变多，fixture 会变复杂，CI 会有并发和分组策略，失败分析也需要接入团队自己的系统。Midscene 不应该把这些需求都压进固定 YAML runner。对应这类完备测试工程，核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程模板。
 
 ```bash
 midscene emit ./project-folder

From 6dd7a8288c8a4c050564d0321cf542c846f1011e Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 17:00:45 +0800
Subject: [PATCH 05/33] docs(site): mention remote device setup

---
 apps/site/docs/en/ui-testing-framework.mdx | 21 ++++++++++++++-------
 apps/site/docs/zh/ui-testing-framework.mdx | 21 ++++++++++++++-------
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 0986977314..365385e371 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -13,13 +13,13 @@ But a test project cannot stay as “a few loose cases” forever. Once tests be
 That is why Midscene's UI Testing Framework is designed around three layers:
 
 - Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
-- Control project complexity with `setup.js` and a small amount of code: login, cookies, accounts, test data, and device preparation have their own place; individual special cases can use `case.test.ts`.
+- Control project complexity with `setup.js` and a small amount of code: login, cookies, accounts, test data, remote device connections, and device preparation have their own place; individual special cases can use `case.test.ts`.
 - Emit a complete test project from an Rstest template: when the project needs fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
 | Layer | Best for | Recommended shape |
 | --- | --- | --- |
 | Intuitive cases | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
-| Refined, maintainable project | Tests need login state, cookies, accounts, devices, or a small amount of environment preparation | YAML + `setup.js`, with `case.test.ts` available for individual special cases |
+| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.js`, with `case.test.ts` available for individual special cases |
 | Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
 ### 1. Intuitive cases: YAML + case
@@ -42,9 +42,9 @@ The value of YAML is not making the framework smaller. Its value is making “wh
 
 ### 2. Refined, maintainable project: YAML + setup.js
 
-Once tests become part of daily work, they usually need login state, test accounts, cookies, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.js`.
+Once tests become part of daily work, they usually need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.js`.
 
-Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, device, account, or backend data for the test. A refined but lightweight project can look like this:
+Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, remote device, account, or backend data for the test. A refined but lightweight project can look like this:
 
 ```text
 .
@@ -70,13 +70,20 @@ tasks:
       - aiAssert: The dashboard is loaded and user information is visible
 ```
 
-`setup.js` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, or connecting a device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
+`setup.js` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, connecting a remote device, or selecting the target device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
 
 ```js
-export default async function setup({ browser, context, device }) {
+import { connectRemoteDevice } from './remote-device';
+
+export default async function setup({ context }) {
   const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
   await context.addCookies(cookies);
 
+  await connectRemoteDevice({
+    provider: 'remote-device-lab',
+    deviceId: process.env.REMOTE_DEVICE_ID,
+  });
+
   await prepareTestData({
     user: process.env.TEST_ACCOUNT,
     scenario: 'dashboard-smoke',
@@ -84,7 +91,7 @@ export default async function setup({ browser, context, device }) {
 }
 ```
 
-The code above shows the boundary: login, accounts, data, and devices belong in setup. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
+The code above shows the boundary: login, accounts, data, and remote device connection belong in setup. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
 
 If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 9d6a9247ff..b95e2d2f18 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -13,13 +13,13 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 所以 Midscene 的 UI Testing Framework 沿着三层形态展开：
 
 - 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
-- 用 `setup.js` 和少量代码控制工程复杂度：登录、Cookie、账号、测试数据和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
+- 用 `setup.js` 和少量代码控制工程复杂度：登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
 - 完备的测试工程可以由 `emit` 生成：当项目需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
 | 层级 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
 | 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写入口、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
-| 精致可维护的项目 | 测试需要登录态、Cookie、账号、设备或少量环境准备 | YAML + `setup.js`，必要时为个别特殊 case 增加 `case.test.ts` |
+| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.js`，必要时为个别特殊 case 增加 `case.test.ts` |
 | 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
 ### 1. 直觉化用例：YAML + case
@@ -42,9 +42,9 @@ YAML 的价值不是把测试能力做小，而是把“一个用户路径应该
 
 ### 2. 精致可维护的项目：YAML + setup.js
 
-当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.js`。
+当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.js`。
 
-这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、设备、账号或后端数据准备到正确状态。一个精致但不重的项目可以长这样：
+这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、远程设备、账号或后端数据准备到正确状态。一个精致但不重的项目可以长这样：
 
 ```text
 .
@@ -70,13 +70,20 @@ tasks:
       - aiAssert: The dashboard is loaded and user information is visible
 ```
 
-`setup.js` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据或连接设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
+`setup.js` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据、连接远程设备或选择目标设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
 
 ```js
-export default async function setup({ browser, context, device }) {
+import { connectRemoteDevice } from './remote-device';
+
+export default async function setup({ context }) {
   const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
   await context.addCookies(cookies);
 
+  await connectRemoteDevice({
+    provider: 'remote-device-lab',
+    deviceId: process.env.REMOTE_DEVICE_ID,
+  });
+
   await prepareTestData({
     user: process.env.TEST_ACCOUNT,
     scenario: 'dashboard-smoke',
@@ -84,7 +91,7 @@ export default async function setup({ browser, context, device }) {
 }
 ```
 
-上面的代码展示的是职责边界：登录、账号、数据、设备这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
+上面的代码展示的是职责边界：登录、账号、数据、远程设备连接这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
 
 如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
 

From 79f8a168704c5660af0774ac00fad9784d9eaa40 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 17:01:37 +0800
Subject: [PATCH 06/33] docs(site): expand setup js guidance

---
 apps/site/docs/en/ui-testing-framework.mdx | 10 ++++++++++
 apps/site/docs/zh/ui-testing-framework.mdx | 10 ++++++++++
 2 files changed, 20 insertions(+)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 365385e371..3e1ecfc053 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -58,6 +58,16 @@ Here, `setup.js` means the pre-run setup script for the test project. It runs be
 
 `midscene.config.yaml` manages model settings, runtime options, report output, and selected cases. `setup.js` manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
 
+The content of `setup.js` should focus on getting the test to an executable starting point. It usually covers these responsibilities:
+
+- Read environment variables, test accounts, target environment, and device IDs.
+- Complete login, or convert an existing login state into browser cookies.
+- Connect to a remote device platform and select the device for the current case.
+- Prepare backend test data, such as creating an order, clearing a cart, or enabling a staging flag.
+- Register project-level hooks, such as adding logs, screenshots, or business trace IDs when a case fails.
+
+It should not carry the user path itself. Business actions like “search for a product”, “open order details”, or “verify the invoice entry is visible” should still stay in the YAML case.
+
 YAML still describes the business path:
 
 ```yaml
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index b95e2d2f18..9f02618f67 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -58,6 +58,16 @@ YAML 的价值不是把测试能力做小，而是把“一个用户路径应该
 
 `midscene.config.yaml` 管理模型、运行参数、报告输出和需要执行的 case；`setup.js` 管理项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
 
+`setup.js` 的内容应该围绕“把测试带到可执行的起点”展开。它通常包含这些职责：
+
+- 读取环境变量、测试账号、目标环境和设备 ID。
+- 完成登录，或者把已有登录态转换成浏览器 Cookie。
+- 连接远程设备平台，选择本次 case 要运行的设备。
+- 准备后端测试数据，例如创建订单、清理购物车、打开灰度开关。
+- 注册项目级 hook，例如失败时补充日志、截图或业务侧 trace id。
+
+它不适合承载用户路径本身。比如“搜索商品”“打开订单详情”“确认页面展示了发票入口”这类业务动作，仍然应该留在 YAML case 里。
+
 YAML 仍然描述业务路径：
 
 ```yaml

From 182aa4dd47f852bff03b220eb684009cb6c5e7f0 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 17:09:37 +0800
Subject: [PATCH 07/33] docs(site): describe setup ts target contract

---
 apps/site/docs/en/ui-testing-framework.mdx | 92 ++++++++++++++--------
 apps/site/docs/zh/ui-testing-framework.mdx | 92 ++++++++++++++--------
 2 files changed, 114 insertions(+), 70 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 3e1ecfc053..52815d01ad 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -13,13 +13,13 @@ But a test project cannot stay as “a few loose cases” forever. Once tests be
 That is why Midscene's UI Testing Framework is designed around three layers:
 
 - Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
-- Control project complexity with `setup.js` and a small amount of code: login, cookies, accounts, test data, remote device connections, and device preparation have their own place; individual special cases can use `case.test.ts`.
+- Control project complexity with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place; individual special cases can use `case.test.ts`.
 - Emit a complete test project from an Rstest template: when the project needs fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
 | Layer | Best for | Recommended shape |
 | --- | --- | --- |
-| Intuitive cases | Verify a key business path quickly | YAML + case. Write the entry point, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
-| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.js`, with `case.test.ts` available for individual special cases |
+| Intuitive cases | Verify a key business path quickly | YAML + case. Write `target`, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
+| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`, with `case.test.ts` available for individual special cases |
 | Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
 ### 1. Intuitive cases: YAML + case
@@ -27,81 +27,103 @@ That is why Midscene's UI Testing Framework is designed around three layers:
 This stage is about one thing: make the core path natural to express, quick to run, and easy to review. The test author writes YAML for the entry point, actions, and expected result. Midscene uses AI to perform UI actions, evaluate visual assertions, capture screenshots, and generate a replayable report.
 
 ```yaml
-web:
+target:
+  type: web
   url: https://shop.example.com
 
-tasks:
-  - name: Guest checkout smoke test
-    flow:
-      - aiAct: Search for "running shoes"
-      - aiAct: Open the first product
-      - aiAssert: The cart page shows one product and the checkout button
+name: Guest checkout smoke test
+flow:
+  - aiAct: Search for "running shoes"
+  - aiAct: Open the first product
+  - aiAssert: The cart page shows one product and the checkout button
 ```
 
 The value of YAML is not making the framework smaller. Its value is making “what this user path should do” clear enough for review, business confirmation, and team collaboration. It also keeps simple cases from being wrapped in test framework boilerplate on day one.
 
-### 2. Refined, maintainable project: YAML + setup.js
+### 2. Refined, maintainable project: YAML + setup.ts
 
-Once tests become part of daily work, they usually need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.js`.
+Once tests become part of daily work, they usually need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.ts`.
 
-Here, `setup.js` means the pre-run setup script for the test project. It runs before YAML cases and prepares the browser, remote device, account, or backend data for the test. A refined but lightweight project can look like this:
+Here, `setup.ts` is the engineering entry point of the test project. It absorbs the project configuration previously carried by `midscene.config.yaml`, and it also defines the preparation logic that runs before each YAML case. The runner passes the current case `target` into setup, and setup returns the `agent` or other runtime resources used to execute the following `flow`. A refined but lightweight project can look like this:
 
 ```text
 .
-  setup.js
-  midscene.config.yaml
+  setup.ts
   e2e/
     dashboard.yaml
     checkout.yaml
     checkout-edge-case.test.ts
 ```
 
-`midscene.config.yaml` manages model settings, runtime options, report output, and selected cases. `setup.js` manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
+`setup.ts` exports project-level configuration such as model settings, runtime options, report output, and selected cases. Its default export manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
 
-The content of `setup.js` should focus on getting the test to an executable starting point. It usually covers these responsibilities:
+The content of `setup.ts` should focus on getting the test to an executable starting point. It usually covers these responsibilities:
 
 - Read environment variables, test accounts, target environment, and device IDs.
 - Complete login, or convert an existing login state into browser cookies.
 - Connect to a remote device platform and select the device for the current case.
 - Prepare backend test data, such as creating an order, clearing a cart, or enabling a staging flag.
 - Register project-level hooks, such as adding logs, screenshots, or business trace IDs when a case fails.
+- Return the `agent` for the current case, so the runner can execute the YAML `flow`.
 
-It should not carry the user path itself. Business actions like “search for a product”, “open order details”, or “verify the invoice entry is visible” should still stay in the YAML case.
+It should not carry the user path itself. Business actions like “search for a product”, “open order details”, or “verify the invoice entry is visible” should still stay in the YAML case. YAML declares where the case should run through `target`, including the environment and account; `setup.ts` turns that declaration into a real browser, remote device, or Agent.
 
 YAML still describes the business path:
 
 ```yaml
-web:
+target:
+  type: web
   url: https://internal.example.com/dashboard
+  env: staging
+  account: smoke-user
 
-tasks:
-  - name: Check dashboard
-    flow:
-      - aiAssert: The dashboard is loaded and user information is visible
+name: Check dashboard
+flow:
+  - aiAssert: The dashboard is loaded and user information is visible
 ```
 
-`setup.js` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, connecting a remote device, or selecting the target device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
+`setup.ts` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, connecting a remote device, or selecting the target device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
 
-```js
+```ts
+import { createMidsceneAgent } from './agent';
+import { createBrowserRuntime } from './browser-runtime';
 import { connectRemoteDevice } from './remote-device';
 
-export default async function setup({ context }) {
-  const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
-  await context.addCookies(cookies);
-
-  await connectRemoteDevice({
-    provider: 'remote-device-lab',
-    deviceId: process.env.REMOTE_DEVICE_ID,
-  });
+export const config = {
+  cases: ['e2e/**/*.yaml', 'e2e/**/*.test.ts'],
+  report: {
+    outputDir: 'reports/midscene',
+    replay: true,
+  },
+};
+
+export default async function setup({ target }) {
+  const account = target.account ?? process.env.TEST_ACCOUNT;
+  const cookies = await loginByTestAccount(account, target.env);
+
+  const runtime =
+    target.type === 'remote-device'
+      ? await connectRemoteDevice({
+          provider: 'remote-device-lab',
+          deviceId: target.deviceId,
+        })
+      : await createBrowserRuntime({
+          url: target.url,
+          cookies,
+        });
 
   await prepareTestData({
-    user: process.env.TEST_ACCOUNT,
+    user: account,
     scenario: 'dashboard-smoke',
   });
+
+  return {
+    agent: createMidsceneAgent(runtime),
+  };
 }
 ```
 
-The code above shows the boundary: login, accounts, data, and remote device connection belong in setup. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
+The code above shows the boundary: configuration, login, accounts, data, and remote device connection belong in `setup.ts`. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
 
 If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 9f02618f67..4fb91a3dbc 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -13,13 +13,13 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 所以 Midscene 的 UI Testing Framework 沿着三层形态展开：
 
 - 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
-- 用 `setup.js` 和少量代码控制工程复杂度：登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
+- 用 `setup.ts` 和少量代码控制工程复杂度：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
 - 完备的测试工程可以由 `emit` 生成：当项目需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
 | 层级 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
-| 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写入口、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
-| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.js`，必要时为个别特殊 case 增加 `case.test.ts` |
+| 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写 `target`、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
+| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`，必要时为个别特殊 case 增加 `case.test.ts` |
 | 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
 ### 1. 直觉化用例：YAML + case
@@ -27,81 +27,103 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 这个阶段只关心一件事：核心路径能不能被自然表达、快速运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责用 AI 执行 UI 操作、完成视觉断言、截图并生成可回放报告。
 
 ```yaml
-web:
+target:
+  type: web
   url: https://shop.example.com
 
-tasks:
-  - name: Guest checkout smoke test
-    flow:
-      - aiAct: Search for "running shoes"
-      - aiAct: Open the first product
-      - aiAssert: The cart page shows one product and the checkout button
+name: Guest checkout smoke test
+flow:
+  - aiAct: Search for "running shoes"
+  - aiAct: Open the first product
+  - aiAssert: The cart page shows one product and the checkout button
 ```
 
 YAML 的价值不是把测试能力做小，而是把“一个用户路径应该是什么样”组织得足够清楚。它天然适合 code review、业务确认和团队协作，也能避免简单用例一开始就被测试框架样板代码包住。
 
-### 2. 精致可维护的项目：YAML + setup.js
+### 2. 精致可维护的项目：YAML + setup.ts
 
-当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.js`。
+当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.ts`。
 
-这里的 `setup.js` 指的是测试项目的运行前准备脚本。它在 YAML case 执行前运行，负责把浏览器、远程设备、账号或后端数据准备到正确状态。一个精致但不重的项目可以长这样：
+这里的 `setup.ts` 是测试项目的工程入口。它合并了承载在 `midscene.config.yaml` 里的项目配置，也定义了每个 YAML case 执行前的准备逻辑。runner 会把当前 case 的 `target` 传给 setup，setup 再返回后续执行 `flow` 所需的 `agent` 或其他运行时资源。一个精致但不重的项目可以长这样：
 
 ```text
 .
-  setup.js
-  midscene.config.yaml
+  setup.ts
   e2e/
     dashboard.yaml
     checkout.yaml
     checkout-edge-case.test.ts
 ```
 
-`midscene.config.yaml` 管理模型、运行参数、报告输出和需要执行的 case；`setup.js` 管理项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
+`setup.ts` 负责 export 项目级配置，例如模型、运行参数、报告输出和需要执行的 case；它的默认导出负责项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
 
-`setup.js` 的内容应该围绕“把测试带到可执行的起点”展开。它通常包含这些职责：
+`setup.ts` 的内容应该围绕“把测试带到可执行的起点”展开。它通常包含这些职责：
 
 - 读取环境变量、测试账号、目标环境和设备 ID。
 - 完成登录，或者把已有登录态转换成浏览器 Cookie。
 - 连接远程设备平台，选择本次 case 要运行的设备。
 - 准备后端测试数据，例如创建订单、清理购物车、打开灰度开关。
 - 注册项目级 hook，例如失败时补充日志、截图或业务侧 trace id。
+- 返回当前 case 的 `agent`，让 runner 用它继续执行 YAML 里的 `flow`。
 
-它不适合承载用户路径本身。比如“搜索商品”“打开订单详情”“确认页面展示了发票入口”这类业务动作，仍然应该留在 YAML case 里。
+它不适合承载用户路径本身。比如“搜索商品”“打开订单详情”“确认页面展示了发票入口”这类业务动作，仍然应该留在 YAML case 里。YAML 通过 `target` 声明“跑在哪里、用什么环境和账号”，`setup.ts` 把这份声明落地成真实的浏览器、远程设备或 Agent。
 
 YAML 仍然描述业务路径：
 
 ```yaml
-web:
+target:
+  type: web
   url: https://internal.example.com/dashboard
+  env: staging
+  account: smoke-user
 
-tasks:
-  - name: Check dashboard
-    flow:
-      - aiAssert: The dashboard is loaded and user information is visible
+name: Check dashboard
+flow:
+  - aiAssert: The dashboard is loaded and user information is visible
 ```
 
-`setup.js` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据、连接远程设备或选择目标设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
+`setup.ts` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据、连接远程设备或选择目标设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
 
-```js
+```ts
+import { createMidsceneAgent } from './agent';
+import { createBrowserRuntime } from './browser-runtime';
 import { connectRemoteDevice } from './remote-device';
 
-export default async function setup({ context }) {
-  const cookies = await loginByTestAccount(process.env.TEST_ACCOUNT);
-  await context.addCookies(cookies);
-
-  await connectRemoteDevice({
-    provider: 'remote-device-lab',
-    deviceId: process.env.REMOTE_DEVICE_ID,
-  });
+export const config = {
+  cases: ['e2e/**/*.yaml', 'e2e/**/*.test.ts'],
+  report: {
+    outputDir: 'reports/midscene',
+    replay: true,
+  },
+};
+
+export default async function setup({ target }) {
+  const account = target.account ?? process.env.TEST_ACCOUNT;
+  const cookies = await loginByTestAccount(account, target.env);
+
+  const runtime =
+    target.type === 'remote-device'
+      ? await connectRemoteDevice({
+          provider: 'remote-device-lab',
+          deviceId: target.deviceId,
+        })
+      : await createBrowserRuntime({
+          url: target.url,
+          cookies,
+        });
 
   await prepareTestData({
-    user: process.env.TEST_ACCOUNT,
+    user: account,
     scenario: 'dashboard-smoke',
   });
+
+  return {
+    agent: createMidsceneAgent(runtime),
+  };
 }
 ```
 
-上面的代码展示的是职责边界：登录、账号、数据、远程设备连接这类工程逻辑放进 setup；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
+上面的代码展示的是职责边界：配置、登录、账号、数据、远程设备连接这类工程逻辑放进 `setup.ts`；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
 
 如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
 

From 01f31c3422a21289a237c7a3d38319462bded352 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 17:12:27 +0800
Subject: [PATCH 08/33] docs(site): split setup and test case rows

---
 apps/site/docs/en/ui-testing-framework.mdx | 6 ++++--
 apps/site/docs/zh/ui-testing-framework.mdx | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 52815d01ad..3a8adc51c0 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -13,13 +13,15 @@ But a test project cannot stay as “a few loose cases” forever. Once tests be
 That is why Midscene's UI Testing Framework is designed around three layers:
 
 - Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
-- Control project complexity with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place; individual special cases can use `case.test.ts`.
+- Control project complexity with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place.
+- Use `case.test.ts` for individual special cases: when a case needs complex branches, internal SDKs, or deterministic checks, test code covers the parts that YAML should not express.
 - Emit a complete test project from an Rstest template: when the project needs fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
 | Layer | Best for | Recommended shape |
 | --- | --- | --- |
 | Intuitive cases | Verify a key business path quickly | YAML + case. Write `target`, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
-| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`, with `case.test.ts` available for individual special cases |
+| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`. Keep the business path in YAML and put project configuration plus pre-run preparation in `setup.ts` |
+| Refined, maintainable project | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
 | Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
 ### 1. Intuitive cases: YAML + case
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 4fb91a3dbc..005b9049f1 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -13,13 +13,15 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 所以 Midscene 的 UI Testing Framework 沿着三层形态展开：
 
 - 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
-- 用 `setup.ts` 和少量代码控制工程复杂度：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置；少数特殊 case 也可以写成 `case.test.ts`。
+- 用 `setup.ts` 和少量代码控制工程复杂度：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置。
+- 对个别特殊 case 使用 `case.test.ts`：当用例需要复杂分支、内部 SDK 或确定性校验时，用测试代码补足 YAML 不适合表达的部分。
 - 完备的测试工程可以由 `emit` 生成：当项目需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
 | 层级 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
 | 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写 `target`、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
-| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`，必要时为个别特殊 case 增加 `case.test.ts` |
+| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`。业务路径继续留在 YAML，项目级配置和运行前准备放到 `setup.ts` |
+| 精致可维护的项目 | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
 | 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
 ### 1. 直觉化用例：YAML + case

From 09d479c5ce6f46ae9e56187360ae1baa50c53910 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 20 May 2026 17:18:41 +0800
Subject: [PATCH 09/33] docs(site): simplify special case table row

---
 apps/site/docs/en/ui-testing-framework.mdx | 2 +-
 apps/site/docs/zh/ui-testing-framework.mdx | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 3a8adc51c0..f339ffbbc1 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -21,7 +21,7 @@ That is why Midscene's UI Testing Framework is designed around three layers:
 | --- | --- | --- |
 | Intuitive cases | Verify a key business path quickly | YAML + case. Write `target`, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
 | Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`. Keep the business path in YAML and put project configuration plus pre-run preparation in `setup.ts` |
-| Refined, maintainable project | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
+|  | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
 | Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
 
 ### 1. Intuitive cases: YAML + case
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 005b9049f1..ff90056029 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -21,7 +21,7 @@ UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整
 | --- | --- | --- |
 | 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写 `target`、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
 | 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`。业务路径继续留在 YAML，项目级配置和运行前准备放到 `setup.ts` |
-| 精致可维护的项目 | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
+|  | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
 | 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
 
 ### 1. 直觉化用例：YAML + case

From ab4dd583ee3bad20de79f03b1670603507a5744f Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 21 May 2026 10:48:12 +0800
Subject: [PATCH 10/33] chore(core): update docs

---
 apps/site/docs/en/ui-testing-framework.mdx | 55 +++++++++++++---------
 apps/site/docs/zh/ui-testing-framework.mdx | 55 +++++++++++++---------
 2 files changed, 66 insertions(+), 44 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index f339ffbbc1..170e9e9903 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -1,32 +1,32 @@
-# AI-native UI Testing Framework
+# AI-native UI Testing Framework for Natural-language Cases
 
-Midscene helps teams write UI tests as intuitive cases instead of scripts built around fragile selectors and page implementation details. A case should stay simple: describe what the user wants to do, what the user should see, and what must be verified. AI drives the interaction and assertions, while YAML gives those cases a clear, lightweight structure for collaboration.
+Midscene is an AI-native UI testing framework for natural-language-driven automation cases. It turns UI actions, observations, and assertions into structured cases that can be saved, rerun, shared, and diagnosed.
+
+YAML is the default human-friendly case format for describing targets, environments, steps, assertions, and outputs. The YAML runner executes those cases and produces replayable Midscene reports. TypeScript APIs provide engineering control for setup, data, devices, deterministic checks, and integration with existing test projects.
 
 This guide introduces the design view and project shapes of Midscene UI Testing Framework.
 
-## Intuitive Cases, Maintainable Projects
+## Refined Cases
 
-UI test cases should not be complicated. Most teams also do not start with a complete test platform. They start with smoke tests: the lowest-cost way to verify that a key business path still works. At this stage, the most important thing is to get the core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
+UI test cases should not be complicated. For most smoke tests and lightweight regression projects, the most important thing is to get the core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
 
-But a test project cannot stay as “a few loose cases” forever. Once tests become part of daily regression, project complexity naturally appears: cookies, login state, accounts, environments, devices, test data, CI, report archiving, and failure analysis all need a clear home. Midscene balances these two requirements: keep cases intuitive while keeping the test project refined and maintainable.
+That simple shape should cover roughly 80% of projects. When a project needs more preparation, or when a few cases need stronger engineering control, Midscene provides clear outlets for that complexity: keep the business path in YAML, put reusable preparation in `setup.ts`, and use `case.test.ts` for special cases.
 
-That is why Midscene's UI Testing Framework is designed around three layers:
+That is why Midscene's UI Testing Framework starts with three everyday authoring shapes:
 
 - Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
-- Control project complexity with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place.
+- Keep YAML clean with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place outside the case flow.
 - Use `case.test.ts` for individual special cases: when a case needs complex branches, internal SDKs, or deterministic checks, test code covers the parts that YAML should not express.
-- Emit a complete test project from an Rstest template: when the project needs fixtures, CI, internal tools, and deterministic checks, the team can own a standard test project.
 
-| Layer | Best for | Recommended shape |
+| Project shape | Best for | Recommended shape |
 | --- | --- | --- |
 | Intuitive cases | Verify a key business path quickly | YAML + case. Write `target`, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
-| Refined, maintainable project | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`. Keep the business path in YAML and put project configuration plus pre-run preparation in `setup.ts` |
-|  | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
-| Complete test project | The project needs CI, internal tools, fixtures, data checks, or custom flows | Use `midscene emit` to export an independent Rstest project, then maintain it like a standard test project |
+| Clean YAML with setup | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`. Keep the business path in YAML and put project configuration plus pre-run preparation in `setup.ts` |
+| Coded special cases | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
 
 ### 1. Intuitive cases: YAML + case
 
-This stage is about one thing: make the core path natural to express, quick to run, and easy to review. The test author writes YAML for the entry point, actions, and expected result. Midscene uses AI to perform UI actions, evaluate visual assertions, capture screenshots, and generate a replayable report.
+This shape is about one thing: make the core path natural to express, quick to run, and easy to review. The test author writes YAML for the entry point, actions, and expected result. Midscene uses AI to perform UI actions, evaluate visual assertions, capture screenshots, and generate a replayable report.
 
 ```yaml
 target:
@@ -42,11 +42,11 @@ flow:
 
 The value of YAML is not making the framework smaller. Its value is making “what this user path should do” clear enough for review, business confirmation, and team collaboration. It also keeps simple cases from being wrapped in test framework boilerplate on day one.
 
-### 2. Refined, maintainable project: YAML + setup.ts
+### 2. Clean YAML with setup: YAML + setup.ts
 
-Once tests become part of daily work, they usually need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.ts`.
+Some projects need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.ts`.
 
-Here, `setup.ts` is the engineering entry point of the test project. It absorbs the project configuration previously carried by `midscene.config.yaml`, and it also defines the preparation logic that runs before each YAML case. The runner passes the current case `target` into setup, and setup returns the `agent` or other runtime resources used to execute the following `flow`. A refined but lightweight project can look like this:
+Here, `setup.ts` is a practical enhancement for keeping YAML focused on the case flow. It defines the preparation logic that runs before each YAML case, and can also hold project-level options that are awkward to repeat in every file. The runner passes the current case `target` into setup, and setup returns the `agent` or other runtime resources used to execute the following `flow`. A lightweight project can look like this:
 
 ```text
 .
@@ -57,7 +57,7 @@ Here, `setup.ts` is the engineering entry point of the test project. It absorbs
     checkout-edge-case.test.ts
 ```
 
-`setup.ts` exports project-level configuration such as model settings, runtime options, report output, and selected cases. Its default export manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
+`setup.ts` can export project-level options such as model settings, runtime options, report output, and selected cases. Its default export manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
 
 The content of `setup.ts` should focus on getting the test to an executable starting point. It usually covers these responsibilities:
 
@@ -129,9 +129,22 @@ The code above shows the boundary: configuration, login, accounts, data, and rem
 
 If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
 
-### 3. Complete test project: emit an Rstest project template
+## Designed on Rstest
+
+Most users do not need to study the details of an Rstest project. Midscene uses it underneath so that the YAML runner, `setup.ts`, and `case.test.ts` are not just a lightweight script system. They sit on top of a standard test framework from the start.
+
+In other words, Midscene does not replace a test runner with YAML. YAML is the user-friendly case layer; Rstest is the underlying test framework; Midscene adds AI UI actions, visual assertions, screenshots, replay reports, and diagnostics on top.
+
+### Why Rstest helps here
 
-When a test project needs complete engineering capabilities, there are more cases, fixtures become more complex, CI needs parallelism and grouping, and failure analysis needs to connect with team systems. Midscene should not force all of that into a fixed YAML runner. For this stage, the core capability is `emit`: export a lightweight Midscene project into an independent Rstest project template.
+- Case execution is backed by a standard test lifecycle, including setup, teardown, hooks, fixtures, and parallel execution.
+- Projects can naturally connect to CI, test filtering, failure reporting, and existing team testing habits.
+- YAML cases and `case.test.ts` do not become two disconnected systems; they share the same underlying runtime model.
+- When a project grows from lightweight smoke tests into a long-lived regression suite, the team does not need to rewrite the testing foundation.
+
+### Emit a standard Rstest project
+
+For complex projects, there may be more cases, richer fixtures, CI parallelism and grouping, and failure analysis that connects with team systems. Because the lightweight Midscene project already runs on Rstest, `emit` is not a migration to another framework. It makes the underlying Rstest project explicit, so the team can own the test runner, fixtures, and integration code directly.
 
 ```bash
 midscene emit ./project-folder
@@ -154,9 +167,7 @@ project-folder/
     midscene-report/
 ```
 
-At this stage, YAML is no longer the boundary of what the framework can do. It is the migration entry point. Teams can keep the business expression from YAML cases, or move complex logic into Rstest test files, fixtures, and internal tools. Midscene handles UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
-
-Rstest is the underlying choice because it is fast, reliable, and a good foundation for a complete test project. On top of that base, Midscene provides AI UI actions, visual assertions, screenshots, replay reports, and debugging information, so the project can scale without giving up a good developer experience.
+In this project shape, YAML can remain the human-friendly expression for natural-language cases, while Rstest provides the visible test project structure. Teams can keep business paths in YAML and place complex logic in Rstest test files, fixtures, and internal tools. Midscene handles AI UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
 
 ## Next steps
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index ff90056029..40d5ca56f0 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -1,32 +1,32 @@
-# AI 原生 UI Testing Framework
+# 面向自然语言用例的 AI 原生 UI Testing Framework
 
-Midscene 帮助团队把 UI 测试写成符合直觉的用例，而不是围绕脆弱选择器和页面实现细节维护脚本。用例本身应该简单：描述用户想完成什么、看到什么、确认什么。AI 驱动负责理解和执行这些意图，YAML 则提供一种清晰、轻量、适合协作的组织形式。
+Midscene 是面向自然语言自动化用例的 AI 原生 UI Testing Framework。它把 UI 操作、观察和断言组织成结构化、可保存、可复跑、可协作、可诊断的用例。
+
+YAML 是默认的人类友好用例声明格式，用来描述目标、环境、步骤、断言和输出。YAML runner 负责执行这些用例并生成可回放的 Midscene 报告。TypeScript API 提供工程控制能力，用于接入启动准备、数据、设备、确定性校验和现有测试工程。
 
 这篇文档介绍 Midscene UI Testing Framework 的设计视角和项目形态。
 
-## 用例直觉化，工程精致化
+## 用例精致化
 
-UI Test 的用例表达不应该复杂。大多数业务的起点也不是完整测试平台，而是 Smoke Test：先用最小成本验证关键路径是否可用。这个阶段最重要的是快速跑通，把核心业务路径变成可以重复执行、可以回放分析的 case。
+UI Test 的用例表达不应该复杂。对于大多数 Smoke Test 和轻量回归项目，最重要的是快速跑通核心路径，把它变成可以重复执行、可以回放分析的 case。
 
-但测试项目不能一直停留在“随手写几个 case”的状态。只要进入日常回归，工程复杂度就会自然出现：Cookie、登录态、账号、环境、设备、测试数据、CI、报告归档和失败排查都需要有明确位置。Midscene 的原则是平衡这两件事：让用例保持直觉化，同时让测试工程足够精致、可维护。
+这种简单形态应该覆盖大约 80% 的项目。当项目需要更多准备逻辑，或者少数 case 需要更强的工程控制时，Midscene 也提供清晰的复杂度出口：业务路径继续留在 YAML，可复用的准备逻辑放进 `setup.ts`，特殊 case 使用 `case.test.ts`。
 
-所以 Midscene 的 UI Testing Framework 沿着三层形态展开：
+所以 Midscene 的 UI Testing Framework 先围绕三种日常编写形态展开：
 
 - 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
-- 用 `setup.ts` 和少量代码控制工程复杂度：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备有自己的位置。
+- 用 `setup.ts` 和少量代码保持 YAML 干净：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备放在 case flow 之外。
 - 对个别特殊 case 使用 `case.test.ts`：当用例需要复杂分支、内部 SDK 或确定性校验时，用测试代码补足 YAML 不适合表达的部分。
-- 完备的测试工程可以由 `emit` 生成：当项目需要 fixture、CI、内部工具和更多确定性校验时，团队可以接管一个标准测试工程。
 
-| 层级 | 适合场景 | 推荐形态 |
+| 项目形态 | 适合场景 | 推荐形态 |
 | --- | --- | --- |
 | 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写 `target`、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
-| 精致可维护的项目 | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`。业务路径继续留在 YAML，项目级配置和运行前准备放到 `setup.ts` |
-|  | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
-| 完备的测试工程 | 项目开始接入 CI、内部工具、fixture、数据校验或自定义流程 | `midscene emit` 导出独立 Rstest 工程，再按标准测试项目维护 |
+| 保持 YAML 干净的 setup | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`。业务路径继续留在 YAML，项目级配置和运行前准备放到 `setup.ts` |
+| 代码化特殊 case | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
 
 ### 1. 直觉化用例：YAML + case
 
-这个阶段只关心一件事：核心路径能不能被自然表达、快速运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责用 AI 执行 UI 操作、完成视觉断言、截图并生成可回放报告。
+这种形态只关心一件事：核心路径能不能被自然表达、快速运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责用 AI 执行 UI 操作、完成视觉断言、截图并生成可回放报告。
 
 ```yaml
 target:
@@ -42,11 +42,11 @@ flow:
 
 YAML 的价值不是把测试能力做小，而是把“一个用户路径应该是什么样”组织得足够清楚。它天然适合 code review、业务确认和团队协作，也能避免简单用例一开始就被测试框架样板代码包住。
 
-### 2. 精致可维护的项目：YAML + setup.ts
+### 2. 保持 YAML 干净的 setup：YAML + setup.ts
 
-当测试开始进入日常使用，通常会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.ts`。
+有些项目会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.ts`。
 
-这里的 `setup.ts` 是测试项目的工程入口。它合并了承载在 `midscene.config.yaml` 里的项目配置，也定义了每个 YAML case 执行前的准备逻辑。runner 会把当前 case 的 `target` 传给 setup，setup 再返回后续执行 `flow` 所需的 `agent` 或其他运行时资源。一个精致但不重的项目可以长这样：
+这里的 `setup.ts` 是让 YAML 专注于 case flow 的实用增强能力。它定义每个 YAML case 执行前的准备逻辑，也可以承载不适合在每个文件里重复书写的项目级选项。runner 会把当前 case 的 `target` 传给 setup，setup 再返回后续执行 `flow` 所需的 `agent` 或其他运行时资源。一个轻量项目可以长这样：
 
 ```text
 .
@@ -57,7 +57,7 @@ YAML 的价值不是把测试能力做小，而是把“一个用户路径应该
     checkout-edge-case.test.ts
 ```
 
-`setup.ts` 负责 export 项目级配置，例如模型、运行参数、报告输出和需要执行的 case；它的默认导出负责项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
+`setup.ts` 可以 export 项目级选项，例如模型、运行参数、报告输出和需要执行的 case；它的默认导出负责项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
 
 `setup.ts` 的内容应该围绕“把测试带到可执行的起点”展开。它通常包含这些职责：
 
@@ -129,9 +129,22 @@ export default async function setup({ target }) {
 
 如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
 
-### 3. 完备的测试工程：emit 为 Rstest 项目模板
+## 基于 Rstest 设计
+
+大多数用户不需要关心 Rstest 的项目细节。Midscene 把它放在底层，是为了让 YAML runner、`setup.ts` 和 `case.test.ts` 不只是一个轻量脚本系统，而是天然站在标准测试框架之上。
+
+换句话说，Midscene 不是用 YAML 替代测试 Runner。YAML 是面向人的用例表达层，Rstest 是底层测试框架，Midscene 在这个底座上提供 AI UI 操作、视觉断言、截图、回放报告和诊断信息。
+
+### Rstest 用在这里的优势是...
 
-当测试项目需要完整工程能力时，case 会变多，fixture 会变复杂，CI 会有并发和分组策略，失败分析也需要接入团队自己的系统。Midscene 不应该把这些需求都压进固定 YAML runner。对应这类完备测试工程，核心能力是 `emit`：把轻量 Midscene 项目导出成一个独立的 Rstest 工程模板。
+- 用例执行有标准测试生命周期支撑，包括 setup、teardown、hook、fixture 和并发能力。
+- 项目可以自然接入 CI、测试过滤、失败报告和团队已有的测试工程习惯。
+- YAML case 和 `case.test.ts` 不会变成两套割裂系统，它们共享同一个底层运行模型。
+- 当项目从轻量 Smoke Test 长成长期维护的回归工程时，不需要重写测试体系。
+
+### 支持 emit 成一个标准的 Rstest 项目
+
+对于复杂项目，case 可能更多，fixture 更复杂，CI 需要并发和分组策略，失败分析也需要接入团队自己的系统。由于轻量 Midscene 项目本来就运行在 Rstest 之上，`emit` 不是迁移到另一套框架，而是把底层 Rstest 工程显式暴露出来，让团队可以直接接管测试 Runner、fixture 和集成代码。
 
 ```bash
 midscene emit ./project-folder
@@ -154,9 +167,7 @@ project-folder/
     midscene-report/
 ```
 
-在这个阶段，YAML 不再是能力边界，而是迁移入口。团队可以继续保留 YAML case 的业务表达，也可以把复杂逻辑拆进 Rstest 测试文件、fixture 和内部工具里。UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
-
-底层选择 Rstest，是因为它足够快、稳定，并且适合承载一个长期维护的测试工程。Midscene 在这个底座上提供 AI UI 操作、视觉断言、截图、回放报告和调试信息，保证项目既能工程化扩展，也有足够好的开发体验。
+在这种项目形态里，YAML 仍然可以作为自然语言用例的人类友好表达，Rstest 则提供显式的测试工程结构。团队可以把业务路径继续保留在 YAML 中，把复杂逻辑放进 Rstest 测试文件、fixture 和内部工具里。AI UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
 
 ## 下一步
 

From 4b9148233baeaa7ff6600457072e7762b06d91b1 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 21 May 2026 12:03:59 +0800
Subject: [PATCH 11/33] docs(site): refine UI testing framework guide

---
 apps/site/docs/en/ui-testing-framework.mdx | 166 +++++++++------------
 apps/site/docs/zh/ui-testing-framework.mdx | 166 +++++++++------------
 2 files changed, 140 insertions(+), 192 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 170e9e9903..5acdfb8405 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -1,150 +1,123 @@
 # AI-native UI Testing Framework for Natural-language Cases
 
-Midscene is an AI-native UI testing framework for natural-language-driven automation cases. It turns UI actions, observations, and assertions into structured cases that can be saved, rerun, shared, and diagnosed.
+UI tests are valuable only when teams can keep writing and maintaining them. In many projects, the first few browser scripts are easy, but the suite soon fills with selectors, waits, login helpers, data setup, and failure screenshots that only test specialists can understand.
 
-YAML is the default human-friendly case format for describing targets, environments, steps, assertions, and outputs. The YAML runner executes those cases and produces replayable Midscene reports. TypeScript APIs provide engineering control for setup, data, devices, deterministic checks, and integration with existing test projects.
+Midscene provides an AI-native UI Testing Framework for this problem. It lets teams describe user paths in natural language, run them as structured cases, and review the result through screenshots, replay reports, and diagnostics. Test authors can start with YAML before adding test framework boilerplate, while engineers still have TypeScript extension points for setup, data, devices, deterministic checks, and existing test projects.
 
-This guide introduces the design view and project shapes of Midscene UI Testing Framework.
+The goal is simple: make lightweight cases easy to write, and make serious test projects possible without changing the authoring model later.
 
-## Refined Cases
+## From YAML to an Engineering-ready Project
 
-UI test cases should not be complicated. For most smoke tests and lightweight regression projects, the most important thing is to get the core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
+Midscene is designed for teams that want UI tests to stay close to user behavior:
 
-That simple shape should cover roughly 80% of projects. When a project needs more preparation, or when a few cases need stronger engineering control, Midscene provides clear outlets for that complexity: keep the business path in YAML, put reusable preparation in `setup.ts`, and use `case.test.ts` for special cases.
+- QA and business teams can review a case by reading the YAML flow.
+- Frontend and test engineers can keep login, cookies, accounts, devices, and test data in code, leaving natural-language steps focused on the user path.
+- CI jobs get repeatable cases, replayable Midscene reports, screenshots, and failure details.
+- Growing projects can move from lightweight YAML cases to a standard Rstest project without rewriting the testing foundation.
 
-That is why Midscene's UI Testing Framework starts with three everyday authoring shapes:
+Midscene lets a test project start from natural-language cases: core paths stay readable and maintainable, and the first case stays lightweight. When the project needs login state, test data, device connections, complex assertions, or CI management, it still has enough room for engineering extension.
 
-- Write intuitive cases with YAML: business and QA users can describe the entry point, actions, and assertions, then let AI handle UI interaction and visual judgment.
-- Keep YAML clean with `setup.ts` and a small amount of code: project configuration, login, cookies, accounts, test data, remote device connections, and device preparation have their own place outside the case flow.
-- Use `case.test.ts` for individual special cases: when a case needs complex branches, internal SDKs, or deterministic checks, test code covers the parts that YAML should not express.
+### Start with YAML
 
-| Project shape | Best for | Recommended shape |
-| --- | --- | --- |
-| Intuitive cases | Verify a key business path quickly | YAML + case. Write `target`, steps, and assertions in YAML, then run it from the CLI and inspect the replay report |
-| Clean YAML with setup | Tests need login state, cookies, accounts, remote devices, or a small amount of environment preparation | YAML + `setup.ts`. Keep the business path in YAML and put project configuration plus pre-run preparation in `setup.ts` |
-| Coded special cases | Individual cases need complex branches, internal SDKs, API checks, or existing test utilities | `case.test.ts`. Use it only for special cases and keep it in the same project as regular YAML cases |
+For most smoke tests and lightweight regression projects, the first useful milestone is to get a core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
 
-### 1. Intuitive cases: YAML + case
-
-This shape is about one thing: make the core path natural to express, quick to run, and easy to review. The test author writes YAML for the entry point, actions, and expected result. Midscene uses AI to perform UI actions, evaluate visual assertions, capture screenshots, and generate a replayable report.
+A YAML case keeps the path readable:
 
 ```yaml
 target:
   type: web
   url: https://shop.example.com
 
-name: Guest checkout smoke test
 flow:
   - aiAct: Search for "running shoes"
   - aiAct: Open the first product
-  - aiAssert: The cart page shows one product and the checkout button
+  - aiQuery: product name and price
+  - aiAssert: The product detail page shows a visible Add to cart button
 ```
 
-The value of YAML is not making the framework smaller. Its value is making “what this user path should do” clear enough for review, business confirmation, and team collaboration. It also keeps simple cases from being wrapped in test framework boilerplate on day one.
-
-### 2. Clean YAML with setup: YAML + setup.ts
+YAML makes "what this user path should do" clear enough for review, business confirmation, and team collaboration. Midscene handles the AI UI actions, visual understanding, assertions, screenshots, and report generation around that case.
 
-Some projects need login state, test accounts, cookies, remote device connections, device preparation, staging lanes, or other environment details. Those concerns should not be squeezed into natural-language steps, and they should not be copied into every YAML case. They belong in project-level configuration and `setup.ts`.
-
-Here, `setup.ts` is a practical enhancement for keeping YAML focused on the case flow. It defines the preparation logic that runs before each YAML case, and can also hold project-level options that are awkward to repeat in every file. The runner passes the current case `target` into setup, and setup returns the `agent` or other runtime resources used to execute the following `flow`. A lightweight project can look like this:
+That simple shape should cover most early projects:
 
 ```text
 .
-  setup.ts
   e2e/
     dashboard.yaml
     checkout.yaml
-    checkout-edge-case.test.ts
+    pricing.yaml
 ```
 
-`setup.ts` can export project-level options such as model settings, runtime options, report output, and selected cases. Its default export manages project-specific preparation. `e2e/*.yaml` only describes the business path. A small number of cases that cannot be expressed cleanly in YAML can be written directly as `case.test.ts`.
+The case remains close to business language, while the runner gives it a repeatable execution and a report that can be inspected after success or failure.
 
-The content of `setup.ts` should focus on getting the test to an executable starting point. It usually covers these responsibilities:
+### Keep Environment Complexity in Setup
 
-- Read environment variables, test accounts, target environment, and device IDs.
-- Complete login, or convert an existing login state into browser cookies.
-- Connect to a remote device platform and select the device for the current case.
-- Prepare backend test data, such as creating an order, clearing a cart, or enabling a staging flag.
-- Register project-level hooks, such as adding logs, screenshots, or business trace IDs when a case fails.
-- Return the `agent` for the current case, so the runner can execute the YAML `flow`.
+In professional UI testing projects, the case steps are usually the clearest part: open a page, perform a user path, and check the expected result. The engineering complexity tends to sit around that path, in environment configuration such as login state, cookies, test accounts, staging lanes, backend data, cloud device connections, and device initialization.
 
-It should not carry the user path itself. Business actions like “search for a product”, “open order details”, or “verify the invoice entry is visible” should still stay in the YAML case. YAML declares where the case should run through `target`, including the environment and account; `setup.ts` turns that declaration into a real browser, remote device, or Agent.
+Those concerns should not be copied into every YAML case. Midscene provides `setup.ts` so preparation logic can live at the project layer:
 
-YAML still describes the business path:
+```ts
+export default async function setup() {
+  const account = await getSmokeAccount();
+  const cookies = await loginAndGetCookies(account);
+  const device = await connectCloudDevice(process.env.DEVICE_ID);
 
-```yaml
-target:
-  type: web
-  url: https://internal.example.com/dashboard
-  env: staging
-  account: smoke-user
+  return {
+    agent: await createAgent({
+      device,
+      cookies,
+    }),
+  };
+}
+```
 
-name: Check dashboard
-flow:
-  - aiAssert: The dashboard is loaded and user information is visible
+With setup in place, the project can stay direct:
+
+```text
+.
+  setup.ts
+  e2e/
+    dashboard.yaml
+    checkout.yaml
 ```
 
-`setup.ts` gets the test to the right starting point, such as logging in, injecting cookies, preparing data, connecting a remote device, or selecting the target device. Its value is giving deterministic preparation logic a clear home while the YAML case stays lightweight:
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish after the page opens, while `setup.ts` gets the test to an executable starting point. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
 
-```ts
-import { createMidsceneAgent } from './agent';
-import { createBrowserRuntime } from './browser-runtime';
-import { connectRemoteDevice } from './remote-device';
-
-export const config = {
-  cases: ['e2e/**/*.yaml', 'e2e/**/*.test.ts'],
-  report: {
-    outputDir: 'reports/midscene',
-    replay: true,
-  },
-};
-
-export default async function setup({ target }) {
-  const account = target.account ?? process.env.TEST_ACCOUNT;
-  const cookies = await loginByTestAccount(account, target.env);
-
-  const runtime =
-    target.type === 'remote-device'
-      ? await connectRemoteDevice({
-          provider: 'remote-device-lab',
-          deviceId: target.deviceId,
-        })
-      : await createBrowserRuntime({
-          url: target.url,
-          cookies,
-        });
-
-  await prepareTestData({
-    user: account,
-    scenario: 'dashboard-smoke',
-  });
+### Use TypeScript for Special Cases
 
-  return {
-    agent: createMidsceneAgent(runtime),
-  };
-}
+YAML should be the main path while still leaving room for code. Some cases need complex branches, internal SDK calls, mixed API checks, database assertions, or existing test utilities. Those cases can be written as `case.test.ts`.
+
+This is an engineering capability for special cases:
+
+```text
+.
+  setup.ts
+  e2e/
+    dashboard.yaml
+    checkout.yaml
+    checkout-risk-control.test.ts
 ```
 
-The code above shows the boundary: configuration, login, accounts, data, and remote device connection belong in `setup.ts`. What the user should accomplish after the page opens stays in YAML. The project can handle real engineering constraints, while the case itself stays close to business intuition.
+Teams can keep ordinary business paths in YAML and put the few truly complex cases in TypeScript. They still live in the same project and share the same runtime model.
 
-If a case needs complex branches, internal SDK calls, mixed API checks, or existing test utilities, it can be written as `case.test.ts`. This is not the main path. It is an escape hatch for special cases, so the project does not become awkward just to keep everything in YAML.
+## Built on Rstest
 
-## Designed on Rstest
+Midscene is built as a higher-level testing framework on top of Rstest. Rstest provides the underlying lifecycle, fixture model, parallel execution, filtering, and CI-friendly runtime. It is also written in Rust for strong execution performance, so Midscene users get a high-performance test foundation by default. Midscene wraps those capabilities with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
 
-Most users do not need to study the details of an Rstest project. Midscene uses it underneath so that the YAML runner, `setup.ts`, and `case.test.ts` are not just a lightweight script system. They sit on top of a standard test framework from the start.
+Most users can rely on that foundation through Midscene's YAML runner, `setup.ts`, and `case.test.ts` without learning Rstest project details. Those details only become necessary when a team enters the `emit` workflow and wants to own the generated Rstest project directly.
 
-In other words, Midscene does not replace a test runner with YAML. YAML is the user-friendly case layer; Rstest is the underlying test framework; Midscene adds AI UI actions, visual assertions, screenshots, replay reports, and diagnostics on top.
+### Why Rstest Helps
 
-### Why Rstest helps here
+Rstest gives the Midscene project a reliable engineering base:
 
 - Case execution is backed by a standard test lifecycle, including setup, teardown, hooks, fixtures, and parallel execution.
+- The Rust-based runtime gives Midscene projects a performance-oriented execution layer by default.
 - Projects can naturally connect to CI, test filtering, failure reporting, and existing team testing habits.
-- YAML cases and `case.test.ts` do not become two disconnected systems; they share the same underlying runtime model.
-- When a project grows from lightweight smoke tests into a long-lived regression suite, the team does not need to rewrite the testing foundation.
+- YAML cases and `case.test.ts` share the same underlying runtime model.
+- Teams can start lightweight and still keep a path toward long-lived regression suites.
 
-### Emit a standard Rstest project
+### Emit a Standard Project
 
-For complex projects, there may be more cases, richer fixtures, CI parallelism and grouping, and failure analysis that connects with team systems. Because the lightweight Midscene project already runs on Rstest, `emit` is not a migration to another framework. It makes the underlying Rstest project explicit, so the team can own the test runner, fixtures, and integration code directly.
+For complex projects, there may be more cases, richer fixtures, CI parallelism and grouping, and failure analysis that connects with team systems. Because the lightweight Midscene project already runs on Rstest, `emit` makes the underlying Rstest project explicit, so the team can own the test runner, fixtures, and integration code directly.
 
 ```bash
 midscene emit ./project-folder
@@ -169,7 +142,8 @@ project-folder/
 
 In this project shape, YAML can remain the human-friendly expression for natural-language cases, while Rstest provides the visible test project structure. Teams can keep business paths in YAML and place complex logic in Rstest test files, fixtures, and internal tools. Midscene handles AI UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
 
-## Next steps
+### Next Steps
 
+- Run a YAML case from the command line: [YAML script runner](./yaml-script-runner)
 - Look up every YAML field: [Workflow in YAML format](./automate-with-scripts-in-yaml)
-- Look up every CLI flag: [YAML script runner](./yaml-script-runner)
+- Start from platform guides: [Android](./android-getting-started), [iOS](./ios-getting-started), [Computer](./computer-getting-started)
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 40d5ca56f0..136af7f376 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -1,150 +1,123 @@
 # 面向自然语言用例的 AI 原生 UI Testing Framework
 
-Midscene 是面向自然语言自动化用例的 AI 原生 UI Testing Framework。它把 UI 操作、观察和断言组织成结构化、可保存、可复跑、可协作、可诊断的用例。
+UI Test 的价值，取决于团队能不能持续编写和维护它。很多项目一开始写几条浏览器脚本并不难，但用例很快就会被选择器、等待逻辑、登录辅助函数、测试数据准备和失败截图塞满，最后只有少数测试工程师能看懂。
 
-YAML 是默认的人类友好用例声明格式，用来描述目标、环境、步骤、断言和输出。YAML runner 负责执行这些用例并生成可回放的 Midscene 报告。TypeScript API 提供工程控制能力，用于接入启动准备、数据、设备、确定性校验和现有测试工程。
+Midscene 提供了面向这个问题的 AI 原生 UI Testing Framework。它让团队用自然语言描述用户路径，把这些路径作为结构化 case 执行，并通过截图、回放报告和诊断信息复盘结果。测试作者可以先从 YAML 开始，后续再补充测试框架样板代码；工程同学仍然可以用 TypeScript 扩展点接入启动准备、数据、设备、确定性校验和现有测试工程。
 
-这篇文档介绍 Midscene UI Testing Framework 的设计视角和项目形态。
+目标很简单：轻量 case 要容易写，严肃测试工程也要能继续长大，而且不需要中途换掉用例表达方式。
 
-## 用例精致化
+## 从 YAML 到工程化项目
 
-UI Test 的用例表达不应该复杂。对于大多数 Smoke Test 和轻量回归项目，最重要的是快速跑通核心路径，把它变成可以重复执行、可以回放分析的 case。
+Midscene 适合希望 UI Test 更接近用户行为的团队：
 
-这种简单形态应该覆盖大约 80% 的项目。当项目需要更多准备逻辑，或者少数 case 需要更强的工程控制时，Midscene 也提供清晰的复杂度出口：业务路径继续留在 YAML，可复用的准备逻辑放进 `setup.ts`，特殊 case 使用 `case.test.ts`。
+- QA 和业务同学可以直接阅读 YAML flow 来 review 用例。
+- 前端和测试工程师可以把登录、Cookie、账号、设备和测试数据留在代码里，让自然语言步骤专注描述用户路径。
+- CI 任务可以获得可重复执行的 case、可回放的 Midscene 报告、截图和失败详情。
+- 项目从轻量 YAML case 长成标准 Rstest 工程时，不需要重写测试底座。
 
-所以 Midscene 的 UI Testing Framework 先围绕三种日常编写形态展开：
+Midscene 让测试项目可以从自然语言用例开始：核心路径可读、可维护，第一条 case 启动很轻；当项目需要登录态、测试数据、设备连接、复杂断言或 CI 管理时，也有足够的工程扩展空间。
 
-- 用 YAML 写直觉化用例：业务同学或测试同学可以直接描述入口、动作和断言，让 AI 完成 UI 操作和视觉判断。
-- 用 `setup.ts` 和少量代码保持 YAML 干净：项目配置、登录、Cookie、账号、测试数据、远程设备连接和设备准备放在 case flow 之外。
-- 对个别特殊 case 使用 `case.test.ts`：当用例需要复杂分支、内部 SDK 或确定性校验时，用测试代码补足 YAML 不适合表达的部分。
+### 从 YAML 开始
 
-| 项目形态 | 适合场景 | 推荐形态 |
-| --- | --- | --- |
-| 直觉化用例 | 快速验证一个关键业务路径是否可用 | YAML + case。用 YAML 写 `target`、任务步骤和断言，直接通过 CLI 运行并查看回放报告 |
-| 保持 YAML 干净的 setup | 测试需要登录态、Cookie、账号、远程设备或少量环境准备 | YAML + `setup.ts`。业务路径继续留在 YAML，项目级配置和运行前准备放到 `setup.ts` |
-| 代码化特殊 case | 个别 case 需要复杂分支、内部 SDK、接口校验或复用现有测试工具 | `case.test.ts`。只给少数特殊场景使用，让它和普通 YAML case 放在同一个项目里维护 |
+对于大多数 Smoke Test 和轻量回归项目，第一个有价值的里程碑，是快速跑通核心路径，并把它变成可以重复执行、可以回放分析的 case。
 
-### 1. 直觉化用例：YAML + case
-
-这种形态只关心一件事：核心路径能不能被自然表达、快速运行和复盘。测试作者写 YAML，描述入口、操作和期望结果。Midscene 负责用 AI 执行 UI 操作、完成视觉断言、截图并生成可回放报告。
+YAML case 可以让路径保持可读：
 
 ```yaml
 target:
   type: web
   url: https://shop.example.com
 
-name: Guest checkout smoke test
 flow:
   - aiAct: Search for "running shoes"
   - aiAct: Open the first product
-  - aiAssert: The cart page shows one product and the checkout button
+  - aiQuery: product name and price
+  - aiAssert: The product detail page shows a visible Add to cart button
 ```
 
-YAML 的价值不是把测试能力做小，而是把“一个用户路径应该是什么样”组织得足够清楚。它天然适合 code review、业务确认和团队协作，也能避免简单用例一开始就被测试框架样板代码包住。
-
-### 2. 保持 YAML 干净的 setup：YAML + setup.ts
+YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，便于 code review、业务确认和团队协作。围绕这个 case，Midscene 负责 AI UI 操作、视觉理解、断言、截图和报告生成。
 
-有些项目会遇到登录态、测试账号、Cookie、远程设备连接、设备准备、灰度环境等问题。这些内容不应该写进自然语言步骤里，也不应该复制到每个 YAML case 中。它们应该进入项目级配置和 `setup.ts`。
-
-这里的 `setup.ts` 是让 YAML 专注于 case flow 的实用增强能力。它定义每个 YAML case 执行前的准备逻辑，也可以承载不适合在每个文件里重复书写的项目级选项。runner 会把当前 case 的 `target` 传给 setup，setup 再返回后续执行 `flow` 所需的 `agent` 或其他运行时资源。一个轻量项目可以长这样：
+这种简单形态可以覆盖大多数早期项目：
 
 ```text
 .
-  setup.ts
   e2e/
     dashboard.yaml
     checkout.yaml
-    checkout-edge-case.test.ts
+    pricing.yaml
 ```
 
-`setup.ts` 可以 export 项目级选项，例如模型、运行参数、报告输出和需要执行的 case；它的默认导出负责项目自己的准备逻辑；`e2e/*.yaml` 只描述业务路径；少数无法优雅表达成 YAML 的特殊场景，可以用 `case.test.ts` 直接写成测试代码。
+用例本身仍然接近业务语言，runner 则提供可重复执行的过程，以及成功或失败后都可以检查的报告。
 
-`setup.ts` 的内容应该围绕“把测试带到可执行的起点”展开。它通常包含这些职责：
+### 环境配置复杂度交给 setup
 
-- 读取环境变量、测试账号、目标环境和设备 ID。
-- 完成登录，或者把已有登录态转换成浏览器 Cookie。
-- 连接远程设备平台，选择本次 case 要运行的设备。
-- 准备后端测试数据，例如创建订单、清理购物车、打开灰度开关。
-- 注册项目级 hook，例如失败时补充日志、截图或业务侧 trace id。
-- 返回当前 case 的 `agent`，让 runner 用它继续执行 YAML 里的 `flow`。
+在专业 UI Test 项目里，用例步骤通常是最清楚的部分：打开页面、执行用户路径、检查预期结果。真正的工程复杂度往往出现在用户路径周边，集中在登录态、Cookie、测试账号、灰度环境、后端测试数据、云端设备连接和设备初始化这些环境配置里。
 
-它不适合承载用户路径本身。比如“搜索商品”“打开订单详情”“确认页面展示了发票入口”这类业务动作，仍然应该留在 YAML case 里。YAML 通过 `target` 声明“跑在哪里、用什么环境和账号”，`setup.ts` 把这份声明落地成真实的浏览器、远程设备或 Agent。
+这些内容不应该复制到每个 YAML case 里。Midscene 提供 `setup.ts`，让准备逻辑进入统一的项目层：
 
-YAML 仍然描述业务路径：
+```ts
+export default async function setup() {
+  const account = await getSmokeAccount();
+  const cookies = await loginAndGetCookies(account);
+  const device = await connectCloudDevice(process.env.DEVICE_ID);
 
-```yaml
-target:
-  type: web
-  url: https://internal.example.com/dashboard
-  env: staging
-  account: smoke-user
+  return {
+    agent: await createAgent({
+      device,
+      cookies,
+    }),
+  };
+}
+```
 
-name: Check dashboard
-flow:
-  - aiAssert: The dashboard is loaded and user information is visible
+有了 setup 之后，项目结构仍然可以保持直接：
+
+```text
+.
+  setup.ts
+  e2e/
+    dashboard.yaml
+    checkout.yaml
 ```
 
-`setup.ts` 负责把测试带到正确的起点，例如登录、注入 Cookie、准备测试数据、连接远程设备或选择目标设备。它的价值是给“确定性准备逻辑”一个明确位置，让 YAML case 仍然保持轻量：
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户进入页面后要完成什么，`setup.ts` 负责把测试带到可执行的起点。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
 
-```ts
-import { createMidsceneAgent } from './agent';
-import { createBrowserRuntime } from './browser-runtime';
-import { connectRemoteDevice } from './remote-device';
-
-export const config = {
-  cases: ['e2e/**/*.yaml', 'e2e/**/*.test.ts'],
-  report: {
-    outputDir: 'reports/midscene',
-    replay: true,
-  },
-};
-
-export default async function setup({ target }) {
-  const account = target.account ?? process.env.TEST_ACCOUNT;
-  const cookies = await loginByTestAccount(account, target.env);
-
-  const runtime =
-    target.type === 'remote-device'
-      ? await connectRemoteDevice({
-          provider: 'remote-device-lab',
-          deviceId: target.deviceId,
-        })
-      : await createBrowserRuntime({
-          url: target.url,
-          cookies,
-        });
-
-  await prepareTestData({
-    user: account,
-    scenario: 'dashboard-smoke',
-  });
+### 特殊 case 使用 TypeScript
 
-  return {
-    agent: createMidsceneAgent(runtime),
-  };
-}
+YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能需要复杂分支、内部 SDK 调用、混合接口校验、数据库断言或复用现有测试工具，这类用例可以写成 `case.test.ts`。
+
+这是给特殊场景保留的工程能力：
+
+```text
+.
+  setup.ts
+  e2e/
+    dashboard.yaml
+    checkout.yaml
+    checkout-risk-control.test.ts
 ```
 
-上面的代码展示的是职责边界：配置、登录、账号、数据、远程设备连接这类工程逻辑放进 `setup.ts`；“用户进入页面后要完成什么”继续留在 YAML。这样项目可以处理真实工程问题，但用例本身仍然接近业务直觉。
+团队可以把普通业务路径继续保留在 YAML 中，把少数真正复杂的 case 放进 TypeScript。它们仍然在同一个项目里维护，并共享同一个运行时模型。
 
-如果某个 case 需要复杂分支、调用内部 SDK、混合接口校验或复用现有测试工具，可以把它写成 `case.test.ts`。这不是主路径，而是给少数特殊场景留出的逃生口，避免为了保持 YAML 纯度反而让项目变得别扭。
+## 基于 Rstest 构建
 
-## 基于 Rstest 设计
+Midscene 是基于 Rstest 封装构建的上层测试框架。Rstest 在底层提供测试生命周期、fixture 模型、并发执行、用例过滤和适合 CI 的运行时能力。它基于 Rust 编写，具备更高的执行性能，因此 Midscene 用户默认就能获得高性能的测试底座。Midscene 则把这些能力封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
 
-大多数用户不需要关心 Rstest 的项目细节。Midscene 把它放在底层，是为了让 YAML runner、`setup.ts` 和 `case.test.ts` 不只是一个轻量脚本系统，而是天然站在标准测试框架之上。
+绝大多数用户可以通过 Midscene 的 YAML runner、`setup.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。只有团队进入 `emit` 流程，希望直接接管生成后的 Rstest 项目时，才需要关注这些细节。
 
-换句话说，Midscene 不是用 YAML 替代测试 Runner。YAML 是面向人的用例表达层，Rstest 是底层测试框架，Midscene 在这个底座上提供 AI UI 操作、视觉断言、截图、回放报告和诊断信息。
+### Rstest 提供的工程能力
 
-### Rstest 用在这里的优势是...
+Rstest 为 Midscene 项目提供可靠的工程底座：
 
 - 用例执行有标准测试生命周期支撑，包括 setup、teardown、hook、fixture 和并发能力。
+- 基于 Rust 的运行时让 Midscene 项目默认拥有面向性能优化的执行层。
 - 项目可以自然接入 CI、测试过滤、失败报告和团队已有的测试工程习惯。
-- YAML case 和 `case.test.ts` 不会变成两套割裂系统，它们共享同一个底层运行模型。
-- 当项目从轻量 Smoke Test 长成长期维护的回归工程时，不需要重写测试体系。
+- YAML case 和 `case.test.ts` 共享同一个底层运行模型。
+- 团队可以从轻量项目开始，同时保留成长为长期回归套件的路径。
 
-### 支持 emit 成一个标准的 Rstest 项目
+### 导出标准项目
 
-对于复杂项目，case 可能更多，fixture 更复杂，CI 需要并发和分组策略，失败分析也需要接入团队自己的系统。由于轻量 Midscene 项目本来就运行在 Rstest 之上，`emit` 不是迁移到另一套框架，而是把底层 Rstest 工程显式暴露出来，让团队可以直接接管测试 Runner、fixture 和集成代码。
+对于复杂项目，case 可能更多，fixture 更复杂，CI 需要并发和分组策略，失败分析也需要接入团队自己的系统。由于轻量 Midscene 项目本来就运行在 Rstest 之上，`emit` 会把底层 Rstest 工程显式暴露出来，让团队可以直接接管测试 Runner、fixture 和集成代码。
 
 ```bash
 midscene emit ./project-folder
@@ -169,7 +142,8 @@ project-folder/
 
 在这种项目形态里，YAML 仍然可以作为自然语言用例的人类友好表达，Rstest 则提供显式的测试工程结构。团队可以把业务路径继续保留在 YAML 中，把复杂逻辑放进 Rstest 测试文件、fixture 和内部工具里。AI UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
 
-## 下一步
+### 下一步
 
+- 从命令行运行 YAML case：[YAML 脚本运行器](./yaml-script-runner)
 - 查询完整 YAML 字段：[YAML 格式的工作流](./automate-with-scripts-in-yaml)
-- 查询完整 CLI 参数：[YAML 脚本运行器](./yaml-script-runner)
+- 从平台指南开始：[Android](./android-getting-started)、[iOS](./ios-getting-started)、[Computer](./computer-getting-started)

From 549bb2ea7a7d0452d510547fd4bb67cc138ecb35 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 22 May 2026 17:15:13 +0800
Subject: [PATCH 12/33] docs(site): update ui testing framework config docs

---
 apps/site/docs/en/ui-testing-framework.mdx | 72 ++++++++++++++--------
 apps/site/docs/zh/ui-testing-framework.mdx | 72 ++++++++++++++--------
 2 files changed, 96 insertions(+), 48 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 5acdfb8405..05afed9b4e 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -11,11 +11,11 @@ The goal is simple: make lightweight cases easy to write, and make serious test
 Midscene is designed for teams that want UI tests to stay close to user behavior:
 
 - QA and business teams can review a case by reading the YAML flow.
-- Frontend and test engineers can keep login, cookies, accounts, devices, and test data in code, leaving natural-language steps focused on the user path.
+- Frontend and test engineers can keep case matching, concurrency, reports, login, cookies, accounts, devices, and test data in code, leaving natural-language steps focused on the user path.
 - CI jobs get repeatable cases, replayable Midscene reports, screenshots, and failure details.
 - Growing projects can move from lightweight YAML cases to a standard Rstest project without rewriting the testing foundation.
 
-Midscene lets a test project start from natural-language cases: core paths stay readable and maintainable, and the first case stays lightweight. When the project needs login state, test data, device connections, complex assertions, or CI management, it still has enough room for engineering extension.
+Midscene lets a test project start from natural-language cases: core paths stay readable and maintainable, and the first case stays lightweight. When the project needs case filtering, concurrency control, login state, test data, device connections, complex assertions, reports, or CI management, it still has enough room for engineering extension.
 
 ### Start with YAML
 
@@ -49,38 +49,62 @@ That simple shape should cover most early projects:
 
 The case remains close to business language, while the runner gives it a repeatable execution and a report that can be inspected after success or failure.
 
-### Keep Environment Complexity in Setup
+### Configure the Project in TypeScript
 
-In professional UI testing projects, the case steps are usually the clearest part: open a page, perform a user path, and check the expected result. The engineering complexity tends to sit around that path, in environment configuration such as login state, cookies, test accounts, staging lanes, backend data, cloud device connections, and device initialization.
+In professional UI testing projects, the case steps are usually the clearest part: open a page, perform a user path, and check the expected result. The engineering complexity tends to sit around that path, in project configuration such as which cases to run, how much concurrency to use, where reports should be written, login state, cookies, test accounts, staging lanes, backend data, cloud device connections, and device initialization.
 
-Those concerns should not be copied into every YAML case. Midscene provides `setup.ts` so preparation logic can live at the project layer:
+Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, and runtime setup all live in one typed file.
 
 ```ts
-export default async function setup() {
-  const account = await getSmokeAccount();
-  const cookies = await loginAndGetCookies(account);
-  const device = await connectCloudDevice(process.env.DEVICE_ID);
-
-  return {
-    agent: await createAgent({
-      device,
-      cookies,
-    }),
-  };
-}
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  testDir: './e2e',
+  include: ['**/*.yaml', '**/*.test.ts'],
+  exclude: ['**/*.draft.yaml'],
+
+  maxConcurrency: 1,
+  bail: 0,
+  testTimeout: 120_000,
+
+  output: {
+    summary: './midscene_run/output/summary.json',
+    reportDir: './midscene_run/report',
+  },
+
+  use: {
+    baseUrl: process.env.DEMO_SITE_URL ?? 'https://shop.example.com',
+    viewport: { width: 1280, height: 800 },
+    headless: process.env.CI === 'true',
+  },
+
+  async setup({ use }) {
+    const account = await getSmokeAccount();
+    const cookies = await loginAndGetCookies(account);
+    const device = await connectCloudDevice(process.env.DEVICE_ID);
+
+    return {
+      agent: await createAgent({
+        device,
+        cookies,
+        baseUrl: use.baseUrl,
+      }),
+    };
+  },
+});
 ```
 
-With setup in place, the project can stay direct:
+With this config in place, the project can stay direct:
 
 ```text
 .
-  setup.ts
+  midscene.config.ts
   e2e/
     dashboard.yaml
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish after the page opens, while `setup.ts` gets the test to an executable starting point. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes how the suite is discovered, scheduled, reported, and prepared for execution. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
 
 ### Use TypeScript for Special Cases
 
@@ -90,7 +114,7 @@ This is an engineering capability for special cases:
 
 ```text
 .
-  setup.ts
+  midscene.config.ts
   e2e/
     dashboard.yaml
     checkout.yaml
@@ -103,7 +127,7 @@ Teams can keep ordinary business paths in YAML and put the few truly complex cas
 
 Midscene is built as a higher-level testing framework on top of Rstest. Rstest provides the underlying lifecycle, fixture model, parallel execution, filtering, and CI-friendly runtime. It is also written in Rust for strong execution performance, so Midscene users get a high-performance test foundation by default. Midscene wraps those capabilities with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
 
-Most users can rely on that foundation through Midscene's YAML runner, `setup.ts`, and `case.test.ts` without learning Rstest project details. Those details only become necessary when a team enters the `emit` workflow and wants to own the generated Rstest project directly.
+Most users can rely on that foundation through Midscene's YAML runner, `midscene.config.ts`, and `case.test.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific agent setup in the same place. Rstest details only become necessary when a team enters the `emit` workflow and wants to own the generated Rstest project directly.
 
 ### Why Rstest Helps
 
@@ -128,8 +152,8 @@ The emitted project can look like this:
 ```text
 project-folder/
   package.json
+  midscene.config.ts
   rstest.config.ts
-  setup.ts
   e2e/
     dashboard.test.ts
     checkout.test.ts
@@ -140,7 +164,7 @@ project-folder/
     midscene-report/
 ```
 
-In this project shape, YAML can remain the human-friendly expression for natural-language cases, while Rstest provides the visible test project structure. Teams can keep business paths in YAML and place complex logic in Rstest test files, fixtures, and internal tools. Midscene handles AI UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
+In this project shape, YAML can remain the human-friendly expression for natural-language cases, `midscene.config.ts` remains the Midscene-facing source of truth, and Rstest provides the visible test project structure. Teams can keep business paths in YAML and place complex logic in Rstest test files, fixtures, and internal tools. Midscene handles AI UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
 
 ### Next Steps
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 136af7f376..71d8ae4b3a 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -11,11 +11,11 @@ Midscene 提供了面向这个问题的 AI 原生 UI Testing Framework。它让
 Midscene 适合希望 UI Test 更接近用户行为的团队：
 
 - QA 和业务同学可以直接阅读 YAML flow 来 review 用例。
-- 前端和测试工程师可以把登录、Cookie、账号、设备和测试数据留在代码里，让自然语言步骤专注描述用户路径。
+- 前端和测试工程师可以把用例匹配、并发、报告、登录、Cookie、账号、设备和测试数据留在代码里，让自然语言步骤专注描述用户路径。
 - CI 任务可以获得可重复执行的 case、可回放的 Midscene 报告、截图和失败详情。
 - 项目从轻量 YAML case 长成标准 Rstest 工程时，不需要重写测试底座。
 
-Midscene 让测试项目可以从自然语言用例开始：核心路径可读、可维护，第一条 case 启动很轻；当项目需要登录态、测试数据、设备连接、复杂断言或 CI 管理时，也有足够的工程扩展空间。
+Midscene 让测试项目可以从自然语言用例开始：核心路径可读、可维护，第一条 case 启动很轻；当项目需要用例过滤、并发控制、登录态、测试数据、设备连接、复杂断言、报告输出或 CI 管理时，也有足够的工程扩展空间。
 
 ### 从 YAML 开始
 
@@ -49,38 +49,62 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 
 用例本身仍然接近业务语言，runner 则提供可重复执行的过程，以及成功或失败后都可以检查的报告。
 
-### 环境配置复杂度交给 setup
+### 用 TypeScript 配置项目
 
-在专业 UI Test 项目里，用例步骤通常是最清楚的部分：打开页面、执行用户路径、检查预期结果。真正的工程复杂度往往出现在用户路径周边，集中在登录态、Cookie、测试账号、灰度环境、后端测试数据、云端设备连接和设备初始化这些环境配置里。
+在专业 UI Test 项目里，用例步骤通常是最清楚的部分：打开页面、执行用户路径、检查预期结果。真正的工程复杂度往往出现在用户路径周边，集中在要运行哪些 case、并发怎么控制、报告写到哪里、登录态、Cookie、测试账号、灰度环境、后端测试数据、云端设备连接和设备初始化这些项目配置里。
 
-这些内容不应该复制到每个 YAML case 里。Midscene 提供 `setup.ts`，让准备逻辑进入统一的项目层：
+这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置和运行时 setup 都放进同一个类型化文件里。
 
 ```ts
-export default async function setup() {
-  const account = await getSmokeAccount();
-  const cookies = await loginAndGetCookies(account);
-  const device = await connectCloudDevice(process.env.DEVICE_ID);
-
-  return {
-    agent: await createAgent({
-      device,
-      cookies,
-    }),
-  };
-}
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  testDir: './e2e',
+  include: ['**/*.yaml', '**/*.test.ts'],
+  exclude: ['**/*.draft.yaml'],
+
+  maxConcurrency: 1,
+  bail: 0,
+  testTimeout: 120_000,
+
+  output: {
+    summary: './midscene_run/output/summary.json',
+    reportDir: './midscene_run/report',
+  },
+
+  use: {
+    baseUrl: process.env.DEMO_SITE_URL ?? 'https://shop.example.com',
+    viewport: { width: 1280, height: 800 },
+    headless: process.env.CI === 'true',
+  },
+
+  async setup({ use }) {
+    const account = await getSmokeAccount();
+    const cookies = await loginAndGetCookies(account);
+    const device = await connectCloudDevice(process.env.DEVICE_ID);
+
+    return {
+      agent: await createAgent({
+        device,
+        cookies,
+        baseUrl: use.baseUrl,
+      }),
+    };
+  },
+});
 ```
 
-有了 setup 之后，项目结构仍然可以保持直接：
+有了这个配置之后，项目结构仍然可以保持直接：
 
 ```text
 .
-  setup.ts
+  midscene.config.ts
   e2e/
     dashboard.yaml
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户进入页面后要完成什么，`setup.ts` 负责把测试带到可执行的起点。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 suite 如何被发现、调度、报告和准备执行。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
 
 ### 特殊 case 使用 TypeScript
 
@@ -90,7 +114,7 @@ YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能
 
 ```text
 .
-  setup.ts
+  midscene.config.ts
   e2e/
     dashboard.yaml
     checkout.yaml
@@ -103,7 +127,7 @@ YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能
 
 Midscene 是基于 Rstest 封装构建的上层测试框架。Rstest 在底层提供测试生命周期、fixture 模型、并发执行、用例过滤和适合 CI 的运行时能力。它基于 Rust 编写，具备更高的执行性能，因此 Midscene 用户默认就能获得高性能的测试底座。Midscene 则把这些能力封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
 
-绝大多数用户可以通过 Midscene 的 YAML runner、`setup.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。只有团队进入 `emit` 流程，希望直接接管生成后的 Rstest 项目时，才需要关注这些细节。
+绝大多数用户可以通过 Midscene 的 YAML runner、`midscene.config.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 agent setup 留在同一个入口里。只有团队进入 `emit` 流程，希望直接接管生成后的 Rstest 项目时，才需要关注 Rstest 的细节。
 
 ### Rstest 提供的工程能力
 
@@ -128,8 +152,8 @@ midscene emit ./project-folder
 ```text
 project-folder/
   package.json
+  midscene.config.ts
   rstest.config.ts
-  setup.ts
   e2e/
     dashboard.test.ts
     checkout.test.ts
@@ -140,7 +164,7 @@ project-folder/
     midscene-report/
 ```
 
-在这种项目形态里，YAML 仍然可以作为自然语言用例的人类友好表达，Rstest 则提供显式的测试工程结构。团队可以把业务路径继续保留在 YAML 中，把复杂逻辑放进 Rstest 测试文件、fixture 和内部工具里。AI UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
+在这种项目形态里，YAML 仍然可以作为自然语言用例的人类友好表达，`midscene.config.ts` 仍然是面向 Midscene 的配置事实来源，Rstest 则提供显式的测试工程结构。团队可以把业务路径继续保留在 YAML 中，把复杂逻辑放进 Rstest 测试文件、fixture 和内部工具里。AI UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
 
 ### 下一步
 

From 3b45c9b12193d6ab57575c1a9338f74dfddfe4ec Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 22 May 2026 18:56:25 +0800
Subject: [PATCH 13/33] docs(site): refine midscene config schema

---
 apps/site/docs/en/ui-testing-framework.mdx | 39 +++++++++++++---------
 apps/site/docs/zh/ui-testing-framework.mdx | 39 +++++++++++++---------
 2 files changed, 48 insertions(+), 30 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 05afed9b4e..2e15e601c0 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -56,38 +56,47 @@ In professional UI testing projects, the case steps are usually the clearest par
 Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, and runtime setup all live in one typed file.
 
 ```ts
+import { agentFromAdbDevice } from '@midscene/android';
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
+  platform: 'android',
   testDir: './e2e',
   include: ['**/*.yaml', '**/*.test.ts'],
   exclude: ['**/*.draft.yaml'],
 
-  maxConcurrency: 1,
-  bail: 0,
-  testTimeout: 120_000,
+  testRunner: {
+    maxConcurrency: 1,
+    bail: 0,
+    testTimeout: 120_000,
+  },
 
   output: {
     summary: './midscene_run/output/summary.json',
     reportDir: './midscene_run/report',
   },
 
-  use: {
-    baseUrl: process.env.DEMO_SITE_URL ?? 'https://shop.example.com',
-    viewport: { width: 1280, height: 800 },
-    headless: process.env.CI === 'true',
+  agentOptions: {
+    aiActionContext: 'The user is already signed in as a smoke-test account.',
+    cache: true,
+    generateReport: true,
+  },
+
+  runtimeOptions: {
+    deviceId: process.env.ANDROID_DEVICE_ID,
+    androidAdbPath: process.env.ANDROID_ADB_PATH,
+    autoDismissKeyboard: false,
   },
 
-  async setup({ use }) {
+  async setup({ agentOptions, runtimeOptions }) {
+    const { deviceId, ...deviceOptions } = runtimeOptions;
     const account = await getSmokeAccount();
-    const cookies = await loginAndGetCookies(account);
-    const device = await connectCloudDevice(process.env.DEVICE_ID);
+    await prepareTestData(account);
 
     return {
-      agent: await createAgent({
-        device,
-        cookies,
-        baseUrl: use.baseUrl,
+      agent: await agentFromAdbDevice(deviceId, {
+        ...agentOptions,
+        ...deviceOptions,
       }),
     };
   },
@@ -104,7 +113,7 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes how the suite is discovered, scheduled, reported, and prepared for execution. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the platform, testRunner behavior, shared Agent options, runtime connection options, reporting, and setup logic. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
 
 ### Use TypeScript for Special Cases
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 71d8ae4b3a..44b97fecee 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -56,38 +56,47 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置和运行时 setup 都放进同一个类型化文件里。
 
 ```ts
+import { agentFromAdbDevice } from '@midscene/android';
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
+  platform: 'android',
   testDir: './e2e',
   include: ['**/*.yaml', '**/*.test.ts'],
   exclude: ['**/*.draft.yaml'],
 
-  maxConcurrency: 1,
-  bail: 0,
-  testTimeout: 120_000,
+  testRunner: {
+    maxConcurrency: 1,
+    bail: 0,
+    testTimeout: 120_000,
+  },
 
   output: {
     summary: './midscene_run/output/summary.json',
     reportDir: './midscene_run/report',
   },
 
-  use: {
-    baseUrl: process.env.DEMO_SITE_URL ?? 'https://shop.example.com',
-    viewport: { width: 1280, height: 800 },
-    headless: process.env.CI === 'true',
+  agentOptions: {
+    aiActionContext: 'The user is already signed in as a smoke-test account.',
+    cache: true,
+    generateReport: true,
+  },
+
+  runtimeOptions: {
+    deviceId: process.env.ANDROID_DEVICE_ID,
+    androidAdbPath: process.env.ANDROID_ADB_PATH,
+    autoDismissKeyboard: false,
   },
 
-  async setup({ use }) {
+  async setup({ agentOptions, runtimeOptions }) {
+    const { deviceId, ...deviceOptions } = runtimeOptions;
     const account = await getSmokeAccount();
-    const cookies = await loginAndGetCookies(account);
-    const device = await connectCloudDevice(process.env.DEVICE_ID);
+    await prepareTestData(account);
 
     return {
-      agent: await createAgent({
-        device,
-        cookies,
-        baseUrl: use.baseUrl,
+      agent: await agentFromAdbDevice(deviceId, {
+        ...agentOptions,
+        ...deviceOptions,
       }),
     };
   },
@@ -104,7 +113,7 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 suite 如何被发现、调度、报告和准备执行。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述平台、testRunner 行为、共享 Agent 参数、运行时连接参数、报告和 setup 逻辑。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
 
 ### 特殊 case 使用 TypeScript
 

From de7142f23cd8c237af39ebb3c28dae29461ce514 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Mon, 25 May 2026 16:41:55 +0800
Subject: [PATCH 14/33] chore(core): update docs

---
 apps/site/docs/en/ui-testing-framework.mdx | 27 ++++++++++------------
 apps/site/docs/zh/ui-testing-framework.mdx | 27 ++++++++++------------
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 2e15e601c0..78ce2bbcff 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -56,11 +56,18 @@ In professional UI testing projects, the case steps are usually the clearest par
 Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, and runtime setup all live in one typed file.
 
 ```ts
-import { agentFromAdbDevice } from '@midscene/android';
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  platform: 'android',
+  target: {
+    type: 'android',
+    options: {
+      deviceId: process.env.ANDROID_DEVICE_ID,
+      androidAdbPath: process.env.ANDROID_ADB_PATH,
+      autoDismissKeyboard: false,
+    },
+  },
+
   testDir: './e2e',
   include: ['**/*.yaml', '**/*.test.ts'],
   exclude: ['**/*.draft.yaml'],
@@ -82,22 +89,12 @@ export default defineMidsceneConfig({
     generateReport: true,
   },
 
-  runtimeOptions: {
-    deviceId: process.env.ANDROID_DEVICE_ID,
-    androidAdbPath: process.env.ANDROID_ADB_PATH,
-    autoDismissKeyboard: false,
-  },
-
-  async setup({ agentOptions, runtimeOptions }) {
-    const { deviceId, ...deviceOptions } = runtimeOptions;
+  async setup({ createAgent }) {
     const account = await getSmokeAccount();
     await prepareTestData(account);
 
     return {
-      agent: await agentFromAdbDevice(deviceId, {
-        ...agentOptions,
-        ...deviceOptions,
-      }),
+      agent: await createAgent(),
     };
   },
 });
@@ -113,7 +110,7 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the platform, testRunner behavior, shared Agent options, runtime connection options, reporting, and setup logic. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared Agent options, reporting, and setup logic. By default, the framework creates the Agent from `target.type` and `target.options`; projects only need to override that step in `setup` when they use custom devices, remote services, or internal team fixtures. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
 
 ### Use TypeScript for Special Cases
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 44b97fecee..51963af6ec 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -56,11 +56,18 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置和运行时 setup 都放进同一个类型化文件里。
 
 ```ts
-import { agentFromAdbDevice } from '@midscene/android';
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  platform: 'android',
+  target: {
+    type: 'android',
+    options: {
+      deviceId: process.env.ANDROID_DEVICE_ID,
+      androidAdbPath: process.env.ANDROID_ADB_PATH,
+      autoDismissKeyboard: false,
+    },
+  },
+
   testDir: './e2e',
   include: ['**/*.yaml', '**/*.test.ts'],
   exclude: ['**/*.draft.yaml'],
@@ -82,22 +89,12 @@ export default defineMidsceneConfig({
     generateReport: true,
   },
 
-  runtimeOptions: {
-    deviceId: process.env.ANDROID_DEVICE_ID,
-    androidAdbPath: process.env.ANDROID_ADB_PATH,
-    autoDismissKeyboard: false,
-  },
-
-  async setup({ agentOptions, runtimeOptions }) {
-    const { deviceId, ...deviceOptions } = runtimeOptions;
+  async setup({ createAgent }) {
     const account = await getSmokeAccount();
     await prepareTestData(account);
 
     return {
-      agent: await agentFromAdbDevice(deviceId, {
-        ...agentOptions,
-        ...deviceOptions,
-      }),
+      agent: await createAgent(),
     };
   },
 });
@@ -113,7 +110,7 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述平台、testRunner 行为、共享 Agent 参数、运行时连接参数、报告和 setup 逻辑。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 Agent 参数、报告和 setup 逻辑。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 Agent；只有项目需要接入自定义设备、远程服务或团队内部 fixture 时，才需要在 `setup` 里覆盖这一步。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
 
 ### 特殊 case 使用 TypeScript
 

From 474431992d92068aaa8690df171213ffe634d5fa Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Mon, 25 May 2026 16:51:56 +0800
Subject: [PATCH 15/33] docs(site): update UI testing target config design

---
 apps/site/docs/en/ui-testing-framework.mdx | 41 ++++++++++++++++------
 apps/site/docs/zh/ui-testing-framework.mdx | 41 ++++++++++++++++------
 2 files changed, 62 insertions(+), 20 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 78ce2bbcff..55d94f57de 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -88,15 +88,6 @@ export default defineMidsceneConfig({
     cache: true,
     generateReport: true,
   },
-
-  async setup({ createAgent }) {
-    const account = await getSmokeAccount();
-    await prepareTestData(account);
-
-    return {
-      agent: await createAgent(),
-    };
-  },
 });
 ```
 
@@ -110,7 +101,37 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared Agent options, reporting, and setup logic. By default, the framework creates the Agent from `target.type` and `target.options`; projects only need to override that step in `setup` when they use custom devices, remote services, or internal team fixtures. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared Agent options, and reporting. By default, the framework creates the Agent from `target.type` and `target.options`, so ordinary projects do not need to write `setup`. If a project needs custom devices, remote services, or internal team fixtures, it can create the Agent entirely inside `setup`; in that case, `target` can be omitted to avoid defining the runtime target twice in the same config. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+
+When a project really needs a fully custom runtime target, it can put Agent creation in `setup`:
+
+```ts
+import { agentFromAdbDevice } from '@midscene/android';
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  testDir: './e2e',
+
+  agentOptions: {
+    aiActionContext: 'The user is already signed in as a smoke-test account.',
+    cache: true,
+    generateReport: true,
+  },
+
+  async setup({ agentOptions }) {
+    const account = await getSmokeAccount();
+    await prepareTestData(account);
+
+    return {
+      agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
+        ...agentOptions,
+        androidAdbPath: process.env.ANDROID_ADB_PATH,
+        autoDismissKeyboard: false,
+      }),
+    };
+  },
+});
+```
 
 ### Use TypeScript for Special Cases
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 51963af6ec..6d4aced1fb 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -88,15 +88,6 @@ export default defineMidsceneConfig({
     cache: true,
     generateReport: true,
   },
-
-  async setup({ createAgent }) {
-    const account = await getSmokeAccount();
-    await prepareTestData(account);
-
-    return {
-      agent: await createAgent(),
-    };
-  },
 });
 ```
 
@@ -110,7 +101,37 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 Agent 参数、报告和 setup 逻辑。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 Agent；只有项目需要接入自定义设备、远程服务或团队内部 fixture 时，才需要在 `setup` 里覆盖这一步。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 Agent，普通项目不需要编写 `setup`。如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `setup` 里完全自行创建 Agent；这时 `target` 可以省略，避免同一份配置里出现两套运行目标定义。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+
+当项目确实需要完全自定义运行目标时，可以把 Agent 创建逻辑放进 `setup`：
+
+```ts
+import { agentFromAdbDevice } from '@midscene/android';
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  testDir: './e2e',
+
+  agentOptions: {
+    aiActionContext: 'The user is already signed in as a smoke-test account.',
+    cache: true,
+    generateReport: true,
+  },
+
+  async setup({ agentOptions }) {
+    const account = await getSmokeAccount();
+    await prepareTestData(account);
+
+    return {
+      agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
+        ...agentOptions,
+        androidAdbPath: process.env.ANDROID_ADB_PATH,
+        autoDismissKeyboard: false,
+      }),
+    };
+  },
+});
+```
 
 ### 特殊 case 使用 TypeScript
 

From 349bd80e8e8a1ac6dd946c96cf69daea490d256d Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Mon, 25 May 2026 17:26:43 +0800
Subject: [PATCH 16/33] docs(site): document custom YAML steps

---
 apps/site/docs/en/ui-testing-framework.mdx | 60 ++++++++++++++++++++++
 apps/site/docs/zh/ui-testing-framework.mdx | 60 ++++++++++++++++++++++
 2 files changed, 120 insertions(+)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 55d94f57de..f07e440847 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -133,6 +133,66 @@ export default defineMidsceneConfig({
 });
 ```
 
+### Extend YAML Steps
+
+YAML should stay readable, but some teams need a few project-specific actions around the natural-language path: seeding test data, switching accounts, calling an internal API, checking a database, or wrapping a repeated business action. These actions can be registered as custom YAML steps in `midscene.config.ts`.
+
+```ts
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  target: {
+    type: 'web',
+    options: {
+      url: 'http://127.0.0.1:3000',
+    },
+  },
+
+  testDir: './e2e',
+  include: ['**/*.yaml'],
+
+  yamlSteps: {
+    async useAccount(value, ctx) {
+      if (typeof value !== 'string') {
+        throw new Error('useAccount expects an account name');
+      }
+
+      const account = await getAccount(value);
+      ctx.state.account = account;
+      await ctx.agent.aiAct(`Sign in as ${account.email}`);
+    },
+
+    async assertOrderStatus(value) {
+      const { orderId, status } = value as {
+        orderId: string;
+        status: string;
+      };
+
+      const order = await queryOrder(orderId);
+      if (order.status !== status) {
+        throw new Error(
+          `Expected order ${orderId} to be ${status}, got ${order.status}`,
+        );
+      }
+    },
+  },
+});
+```
+
+The YAML file can then mix Midscene built-in steps with project steps:
+
+```yaml
+flow:
+  - useAccount: smoke-buyer
+  - aiAct: Open my orders
+  - assertOrderStatus:
+      orderId: E2E-10001
+      status: paid
+  - aiAssert: The order detail page shows Paid
+```
+
+The rule is intentionally small: if a `flow` step key matches a built-in Midscene step, Midscene runs the built-in behavior; if it matches `yamlSteps`, Midscene calls the registered function with the YAML value and the current step context. If neither exists, the runner reports an unknown step. Custom steps should not override built-in step names. Midscene does not require a schema for custom step values; teams can validate values inside the handler in whatever style fits their project.
+
 ### Use TypeScript for Special Cases
 
 YAML should be the main path while still leaving room for code. Some cases need complex branches, internal SDK calls, mixed API checks, database assertions, or existing test utilities. Those cases can be written as `case.test.ts`.
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 6d4aced1fb..19e24abbcf 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -133,6 +133,66 @@ export default defineMidsceneConfig({
 });
 ```
 
+### 扩展 YAML Step
+
+YAML 应该保持可读，但有些团队需要在自然语言路径周围放少量项目专属动作：准备测试数据、切换账号、调用内部 API、检查数据库，或封装重复的业务动作。这类动作可以在 `midscene.config.ts` 里注册成自定义 YAML step。
+
+```ts
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  target: {
+    type: 'web',
+    options: {
+      url: 'http://127.0.0.1:3000',
+    },
+  },
+
+  testDir: './e2e',
+  include: ['**/*.yaml'],
+
+  yamlSteps: {
+    async useAccount(value, ctx) {
+      if (typeof value !== 'string') {
+        throw new Error('useAccount expects an account name');
+      }
+
+      const account = await getAccount(value);
+      ctx.state.account = account;
+      await ctx.agent.aiAct(`Sign in as ${account.email}`);
+    },
+
+    async assertOrderStatus(value) {
+      const { orderId, status } = value as {
+        orderId: string;
+        status: string;
+      };
+
+      const order = await queryOrder(orderId);
+      if (order.status !== status) {
+        throw new Error(
+          `Expected order ${orderId} to be ${status}, got ${order.status}`,
+        );
+      }
+    },
+  },
+});
+```
+
+YAML 文件可以混用 Midscene 内置 step 和项目自己的 step：
+
+```yaml
+flow:
+  - useAccount: smoke-buyer
+  - aiAct: Open my orders
+  - assertOrderStatus:
+      orderId: E2E-10001
+      status: paid
+  - aiAssert: The order detail page shows Paid
+```
+
+规则刻意保持很小：如果 `flow` 里的 step key 命中 Midscene 内置 step，就执行内置行为；如果命中 `yamlSteps`，就把 YAML value 和当前 step context 传给注册函数；两边都没有命中时，runner 报 unknown step。自定义 step 不应该覆盖内置 step 名。Midscene 不要求为自定义 step value 编写 schema；团队可以在 handler 里按自己的工程习惯做参数校验。
+
 ### 特殊 case 使用 TypeScript
 
 YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能需要复杂分支、内部 SDK 调用、混合接口校验、数据库断言或复用现有测试工具，这类用例可以写成 `case.test.ts`。

From 8f6e8ef167196aa7d301c0870e2432f5f611afcc Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Tue, 2 Jun 2026 23:49:38 -0700
Subject: [PATCH 17/33] docs(site): update ui testing framework docs

---
 apps/site/docs/en/ui-testing-framework.mdx | 183 +++++++++++++--------
 apps/site/docs/zh/ui-testing-framework.mdx | 182 ++++++++++++--------
 2 files changed, 225 insertions(+), 140 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index f07e440847..5386552ed9 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -2,9 +2,9 @@
 
 UI tests are valuable only when teams can keep writing and maintaining them. In many projects, the first few browser scripts are easy, but the suite soon fills with selectors, waits, login helpers, data setup, and failure screenshots that only test specialists can understand.
 
-Midscene provides an AI-native UI Testing Framework for this problem. It lets teams describe user paths in natural language, run them as structured cases, and review the result through screenshots, replay reports, and diagnostics. Test authors can start with YAML before adding test framework boilerplate, while engineers still have TypeScript extension points for setup, data, devices, deterministic checks, and existing test projects.
+Midscene provides an AI-native UI Testing Framework for this problem. It lets teams describe user paths in natural language, run them as structured cases, and review the result through screenshots, replay reports, and diagnostics. Test authors can start with YAML before adding test framework boilerplate, while engineers still have TypeScript extension points for UI Agent creation, data, devices, deterministic checks, and existing test projects.
 
-The goal is simple: make lightweight cases easy to write, and make serious test projects possible without changing the authoring model later.
+The goal is simple: make lightweight cases easy to write, and make serious test projects possible without changing the authoring model later. From there, Midscene moves UI testing toward Agentic Testing: everything starts from the UI, but testing should not stop at the UI. A trustworthy test conclusion should be able to connect page behavior with API responses, database state, logs, analytics, and the tools a team already relies on.
 
 ## From YAML to an Engineering-ready Project
 
@@ -13,7 +13,7 @@ Midscene is designed for teams that want UI tests to stay close to user behavior
 - QA and business teams can review a case by reading the YAML flow.
 - Frontend and test engineers can keep case matching, concurrency, reports, login, cookies, accounts, devices, and test data in code, leaving natural-language steps focused on the user path.
 - CI jobs get repeatable cases, replayable Midscene reports, screenshots, and failure details.
-- Growing projects can move from lightweight YAML cases to a standard Rstest project without rewriting the testing foundation.
+- Growing projects can move from lightweight YAML cases to long-lived regression suites without rewriting the testing foundation.
 
 Midscene lets a test project start from natural-language cases: core paths stay readable and maintainable, and the first case stays lightweight. When the project needs case filtering, concurrency control, login state, test data, device connections, complex assertions, reports, or CI management, it still has enough room for engineering extension.
 
@@ -29,10 +29,13 @@ target:
   url: https://shop.example.com
 
 flow:
-  - aiAct: Search for "running shoes"
-  - aiAct: Open the first product
-  - aiQuery: product name and price
-  - aiAssert: The product detail page shows a visible Add to cart button
+  - ui: Search for "running shoes"
+  - ui: Open the first product
+  - ui: |
+      Read the product name and price.
+
+      Record them in the conclusion.
+  - verify: The product detail page shows a visible Add to cart button
 ```
 
 YAML makes "what this user path should do" clear enough for review, business confirmation, and team collaboration. Midscene handles the AI UI actions, visual understanding, assertions, screenshots, and report generation around that case.
@@ -53,7 +56,7 @@ The case remains close to business language, while the runner gives it a repeata
 
 In professional UI testing projects, the case steps are usually the clearest part: open a page, perform a user path, and check the expected result. The engineering complexity tends to sit around that path, in project configuration such as which cases to run, how much concurrency to use, where reports should be written, login state, cookies, test accounts, staging lanes, backend data, cloud device connections, and device initialization.
 
-Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, and runtime setup all live in one typed file.
+Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, UI Agent creation, and runtime extensions all live in one typed file.
 
 ```ts
 import { defineMidsceneConfig } from '@midscene/testing-framework';
@@ -83,8 +86,8 @@ export default defineMidsceneConfig({
     reportDir: './midscene_run/report',
   },
 
-  agentOptions: {
-    aiActionContext: 'The user is already signed in as a smoke-test account.',
+  uiAgentOptions: {
+    aiActContext: 'The user is already signed in as a smoke-test account.',
     cache: true,
     generateReport: true,
   },
@@ -101,9 +104,9 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared Agent options, and reporting. By default, the framework creates the Agent from `target.type` and `target.options`, so ordinary projects do not need to write `setup`. If a project needs custom devices, remote services, or internal team fixtures, it can create the Agent entirely inside `setup`; in that case, `target` can be omitted to avoid defining the runtime target twice in the same config. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
+The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`, so ordinary projects do not need to write `createUIAgent`. If a project needs custom devices, remote services, or internal team fixtures, it can create the UI Agent entirely inside `createUIAgent`; in that case, `target` can be omitted to avoid defining the runtime target twice in the same config. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
 
-When a project really needs a fully custom runtime target, it can put Agent creation in `setup`:
+When a project really needs a fully custom runtime target, it can put UI Agent creation in `createUIAgent`:
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -112,19 +115,19 @@ import { defineMidsceneConfig } from '@midscene/testing-framework';
 export default defineMidsceneConfig({
   testDir: './e2e',
 
-  agentOptions: {
-    aiActionContext: 'The user is already signed in as a smoke-test account.',
+  uiAgentOptions: {
+    aiActContext: 'The user is already signed in as a smoke-test account.',
     cache: true,
     generateReport: true,
   },
 
-  async setup({ agentOptions }) {
+  async createUIAgent({ uiAgentOptions }) {
     const account = await getSmokeAccount();
     await prepareTestData(account);
 
     return {
       agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
-        ...agentOptions,
+        ...uiAgentOptions,
         androidAdbPath: process.env.ANDROID_ADB_PATH,
         autoDismissKeyboard: false,
       }),
@@ -133,12 +136,15 @@ export default defineMidsceneConfig({
 });
 ```
 
-### Extend YAML Steps
+### Extend YAML Nodes with Runtime
 
-YAML should stay readable, but some teams need a few project-specific actions around the natural-language path: seeding test data, switching accounts, calling an internal API, checking a database, or wrapping a repeated business action. These actions can be registered as custom YAML steps in `midscene.config.ts`.
+YAML should stay readable, but some teams need a few project-specific actions around the natural-language path: seeding test data, switching accounts, calling an internal API, sending notifications, or wrapping a repeated business action. These actions can be registered as custom runtimes in `midscene.config.ts` and used as new YAML nodes.
 
 ```ts
-import { defineMidsceneConfig } from '@midscene/testing-framework';
+import {
+  defineMidsceneConfig,
+  defineRuntime,
+} from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
   target: {
@@ -151,30 +157,21 @@ export default defineMidsceneConfig({
   testDir: './e2e',
   include: ['**/*.yaml'],
 
-  yamlSteps: {
-    async useAccount(value, ctx) {
+  runtime: {
+    useAccount: defineRuntime(async ({ input, context }) => {
+      const value = input;
       if (typeof value !== 'string') {
         throw new Error('useAccount expects an account name');
       }
 
       const account = await getAccount(value);
-      ctx.state.account = account;
-      await ctx.agent.aiAct(`Sign in as ${account.email}`);
-    },
+      context.state.account = account;
+      await signInWithAccount(context.agent, account);
 
-    async assertOrderStatus(value) {
-      const { orderId, status } = value as {
-        orderId: string;
-        status: string;
+      return {
+        conclusion: `Signed in as ${account.email}`,
       };
-
-      const order = await queryOrder(orderId);
-      if (order.status !== status) {
-        throw new Error(
-          `Expected order ${orderId} to be ${status}, got ${order.status}`,
-        );
-      }
-    },
+    }),
   },
 });
 ```
@@ -184,14 +181,87 @@ The YAML file can then mix Midscene built-in steps with project steps:
 ```yaml
 flow:
   - useAccount: smoke-buyer
-  - aiAct: Open my orders
-  - assertOrderStatus:
-      orderId: E2E-10001
-      status: paid
-  - aiAssert: The order detail page shows Paid
+  - ui: Open my orders
+  - verify: |
+      Use $database to verify that order E2E-10001 has status paid.
+  - verify: The order detail page shows Paid
+```
+
+The rule is intentionally small: `ui` is the unified entry for interface behavior, `verify` is the unified entry for assertions and external evidence, and `agent` is the entry for analysis, attribution, and cross-tool orchestration. A `$name` such as `$database` refers to a skill, and the runtime engine resolves and loads it automatically. `runtime` is for extending YAML with new nodes such as `useAccount` based on project needs. Teams decide which nodes to add and how many to add according to their own business and engineering habits. Midscene does not require a schema for custom node values; teams can validate values inside runtime handlers in whatever style fits their project.
+
+### Toward Agentic Testing
+
+The UI is the entry point to user experience, and it is the most natural place for a test to begin. But many real failures do not stay on the page: an order page may look successful while no record exists in the database; a click may finish without a visible error while logs contain an exception; a user path may complete while analytics, network requests, or downstream state are wrong.
+
+Agentic Testing means starting from UI behavior and extending the test into a cross-tool process for collecting evidence and making a judgment. Midscene understands the page, operates the interface, and produces replayable reports, while the project can register databases, logs, network requests, analytics, notification systems, and internal tools as testing capabilities. YAML still starts with the user path, but verification can continue into system state and business risk.
+
+A fuller case can look like this:
+
+```yaml
+name: Create Order
+
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+  - ui: |
+      Sign in with a test account and create a test order.
+
+      Record in the conclusion:
+      - order id
+      - current page state
+      - whether the order was created successfully
+  - verify: |
+      Use $database to verify that the order id from the previous conclusion
+      really exists and that the order status is paid.
+  - verify: |
+      Use $logs to check whether any related ERROR appeared during the test.
+  - verify: The order detail page shows payment success
+  - agent: Analyze the risk of this test from all verification results
+  - notifySlack
+```
+
+The core still starts from the UI mindset: how the user signs in, how the order is created, and what appears on the page. Projects can insert custom nodes for data preparation or notifications before or after the UI step; later steps pass the UI conclusion to project capabilities for deeper verification: whether the order exists in the database, whether error logs appeared during the test, and whether the final page state matches the business state.
+
+There are two kinds of capabilities here: `$database` and `$logs` are `$name` references that the runtime engine resolves and loads automatically; `prepareOrderFixture` and `notifySlack` are new YAML nodes registered by the project in `midscene.config.ts`:
+
+```ts
+import {
+  defineMidsceneConfig,
+  defineRuntime,
+} from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  target: {
+    type: 'web',
+    options: {
+      url: 'http://127.0.0.1:3000',
+    },
+  },
+
+  testDir: './e2e',
+
+  runtime: {
+    prepareOrderFixture: defineRuntime(async ({ input, context }) => {
+      const fixture = await createOrderFixture(input);
+      context.state.orderFixture = fixture;
+
+      return {
+        conclusion: `Prepared order fixture ${fixture.id}`,
+      };
+    }),
+
+    notifySlack: defineRuntime(async ({ context }) => {
+      await sendSlackSummary(context.result);
+
+      return {
+        conclusion: 'Slack notification sent',
+      };
+    }),
+  },
+});
 ```
 
-The rule is intentionally small: if a `flow` step key matches a built-in Midscene step, Midscene runs the built-in behavior; if it matches `yamlSteps`, Midscene calls the registered function with the YAML value and the current step context. If neither exists, the runner reports an unknown step. Custom steps should not override built-in step names. Midscene does not require a schema for custom step values; teams can validate values inside the handler in whatever style fits their project.
+This direction keeps the low-friction YAML-driven UI testing model intact. YAML remains the human-facing expression for the test, and TypeScript config remains the engineering entry for registering capabilities: ordinary paths stay in natural language, while places that need deterministic evidence can connect to the team's own tools.
 
 ### Use TypeScript for Special Cases
 
@@ -214,7 +284,7 @@ Teams can keep ordinary business paths in YAML and put the few truly complex cas
 
 Midscene is built as a higher-level testing framework on top of Rstest. Rstest provides the underlying lifecycle, fixture model, parallel execution, filtering, and CI-friendly runtime. It is also written in Rust for strong execution performance, so Midscene users get a high-performance test foundation by default. Midscene wraps those capabilities with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
 
-Most users can rely on that foundation through Midscene's YAML runner, `midscene.config.ts`, and `case.test.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific agent setup in the same place. Rstest details only become necessary when a team enters the `emit` workflow and wants to own the generated Rstest project directly.
+Most users can rely on that foundation through Midscene's YAML runner, `midscene.config.ts`, and `case.test.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific UI Agent creation in the same config.
 
 ### Why Rstest Helps
 
@@ -226,33 +296,6 @@ Rstest gives the Midscene project a reliable engineering base:
 - YAML cases and `case.test.ts` share the same underlying runtime model.
 - Teams can start lightweight and still keep a path toward long-lived regression suites.
 
-### Emit a Standard Project
-
-For complex projects, there may be more cases, richer fixtures, CI parallelism and grouping, and failure analysis that connects with team systems. Because the lightweight Midscene project already runs on Rstest, `emit` makes the underlying Rstest project explicit, so the team can own the test runner, fixtures, and integration code directly.
-
-```bash
-midscene emit ./project-folder
-```
-
-The emitted project can look like this:
-
-```text
-project-folder/
-  package.json
-  midscene.config.ts
-  rstest.config.ts
-  e2e/
-    dashboard.test.ts
-    checkout.test.ts
-  fixtures/
-    account.ts
-    device.ts
-  reports/
-    midscene-report/
-```
-
-In this project shape, YAML can remain the human-friendly expression for natural-language cases, `midscene.config.ts` remains the Midscene-facing source of truth, and Rstest provides the visible test project structure. Teams can keep business paths in YAML and place complex logic in Rstest test files, fixtures, and internal tools. Midscene handles AI UI actions and visual assertions, while your own code handles environment orchestration, API checks, database checks, and failure analysis.
-
 ### Next Steps
 
 - Run a YAML case from the command line: [YAML script runner](./yaml-script-runner)
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 19e24abbcf..fe8b42fa11 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -2,9 +2,9 @@
 
 UI Test 的价值，取决于团队能不能持续编写和维护它。很多项目一开始写几条浏览器脚本并不难，但用例很快就会被选择器、等待逻辑、登录辅助函数、测试数据准备和失败截图塞满，最后只有少数测试工程师能看懂。
 
-Midscene 提供了面向这个问题的 AI 原生 UI Testing Framework。它让团队用自然语言描述用户路径，把这些路径作为结构化 case 执行，并通过截图、回放报告和诊断信息复盘结果。测试作者可以先从 YAML 开始，后续再补充测试框架样板代码；工程同学仍然可以用 TypeScript 扩展点接入启动准备、数据、设备、确定性校验和现有测试工程。
+Midscene 提供了面向这个问题的 AI 原生 UI Testing Framework。它让团队用自然语言描述用户路径，把这些路径作为结构化 case 执行，并通过截图、回放报告和诊断信息复盘结果。测试作者可以先从 YAML 开始，后续再补充测试框架样板代码；工程同学仍然可以用 TypeScript 扩展点接入 UI Agent 创建、数据、设备、确定性校验和现有测试工程。
 
-目标很简单：轻量 case 要容易写，严肃测试工程也要能继续长大，而且不需要中途换掉用例表达方式。
+目标很简单：轻量 case 要容易写，严肃测试工程也要能继续长大，而且不需要中途换掉用例表达方式。更进一步，Midscene 希望把 UI Test 带向 Agentic Testing：一切从 UI 开始，但测试不该止步于 UI。真正可信的测试结论，应该能把页面行为、接口响应、数据库状态、日志、埋点和团队已有工具串在一起。
 
 ## 从 YAML 到工程化项目
 
@@ -13,7 +13,7 @@ Midscene 适合希望 UI Test 更接近用户行为的团队：
 - QA 和业务同学可以直接阅读 YAML flow 来 review 用例。
 - 前端和测试工程师可以把用例匹配、并发、报告、登录、Cookie、账号、设备和测试数据留在代码里，让自然语言步骤专注描述用户路径。
 - CI 任务可以获得可重复执行的 case、可回放的 Midscene 报告、截图和失败详情。
-- 项目从轻量 YAML case 长成标准 Rstest 工程时，不需要重写测试底座。
+- 项目从轻量 YAML case 长成长期回归套件时，不需要重写测试底座。
 
 Midscene 让测试项目可以从自然语言用例开始：核心路径可读、可维护，第一条 case 启动很轻；当项目需要用例过滤、并发控制、登录态、测试数据、设备连接、复杂断言、报告输出或 CI 管理时，也有足够的工程扩展空间。
 
@@ -29,10 +29,13 @@ target:
   url: https://shop.example.com
 
 flow:
-  - aiAct: Search for "running shoes"
-  - aiAct: Open the first product
-  - aiQuery: product name and price
-  - aiAssert: The product detail page shows a visible Add to cart button
+  - ui: Search for "running shoes"
+  - ui: Open the first product
+  - ui: |
+      Read the product name and price.
+
+      Record them in the conclusion.
+  - verify: The product detail page shows a visible Add to cart button
 ```
 
 YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，便于 code review、业务确认和团队协作。围绕这个 case，Midscene 负责 AI UI 操作、视觉理解、断言、截图和报告生成。
@@ -53,7 +56,7 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 
 在专业 UI Test 项目里，用例步骤通常是最清楚的部分：打开页面、执行用户路径、检查预期结果。真正的工程复杂度往往出现在用户路径周边，集中在要运行哪些 case、并发怎么控制、报告写到哪里、登录态、Cookie、测试账号、灰度环境、后端测试数据、云端设备连接和设备初始化这些项目配置里。
 
-这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置和运行时 setup 都放进同一个类型化文件里。
+这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置、UI Agent 创建和运行时扩展都放进同一个类型化文件里。
 
 ```ts
 import { defineMidsceneConfig } from '@midscene/testing-framework';
@@ -83,8 +86,8 @@ export default defineMidsceneConfig({
     reportDir: './midscene_run/report',
   },
 
-  agentOptions: {
-    aiActionContext: 'The user is already signed in as a smoke-test account.',
+  uiAgentOptions: {
+    aiActContext: 'The user is already signed in as a smoke-test account.',
     cache: true,
     generateReport: true,
   },
@@ -101,9 +104,9 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 Agent，普通项目不需要编写 `setup`。如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `setup` 里完全自行创建 Agent；这时 `target` 可以省略，避免同一份配置里出现两套运行目标定义。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
+这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent，普通项目不需要编写 `createUIAgent`。如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `createUIAgent` 里完全自行创建 UI Agent；这时 `target` 可以省略，避免同一份配置里出现两套运行目标定义。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
 
-当项目确实需要完全自定义运行目标时，可以把 Agent 创建逻辑放进 `setup`：
+当项目确实需要完全自定义运行目标时，可以把 UI Agent 创建逻辑放进 `createUIAgent`：
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -112,19 +115,19 @@ import { defineMidsceneConfig } from '@midscene/testing-framework';
 export default defineMidsceneConfig({
   testDir: './e2e',
 
-  agentOptions: {
-    aiActionContext: 'The user is already signed in as a smoke-test account.',
+  uiAgentOptions: {
+    aiActContext: 'The user is already signed in as a smoke-test account.',
     cache: true,
     generateReport: true,
   },
 
-  async setup({ agentOptions }) {
+  async createUIAgent({ uiAgentOptions }) {
     const account = await getSmokeAccount();
     await prepareTestData(account);
 
     return {
       agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
-        ...agentOptions,
+        ...uiAgentOptions,
         androidAdbPath: process.env.ANDROID_ADB_PATH,
         autoDismissKeyboard: false,
       }),
@@ -133,12 +136,15 @@ export default defineMidsceneConfig({
 });
 ```
 
-### 扩展 YAML Step
+### 用 Runtime 扩展 YAML 节点
 
-YAML 应该保持可读，但有些团队需要在自然语言路径周围放少量项目专属动作：准备测试数据、切换账号、调用内部 API、检查数据库，或封装重复的业务动作。这类动作可以在 `midscene.config.ts` 里注册成自定义 YAML step。
+YAML 应该保持可读，但有些团队需要在自然语言路径周围放少量项目专属动作：准备测试数据、切换账号、调用内部 API、发送通知，或封装重复的业务动作。这类动作可以在 `midscene.config.ts` 里注册成自定义 runtime，并作为新的 YAML 节点使用。
 
 ```ts
-import { defineMidsceneConfig } from '@midscene/testing-framework';
+import {
+  defineMidsceneConfig,
+  defineRuntime,
+} from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
   target: {
@@ -151,30 +157,21 @@ export default defineMidsceneConfig({
   testDir: './e2e',
   include: ['**/*.yaml'],
 
-  yamlSteps: {
-    async useAccount(value, ctx) {
+  runtime: {
+    useAccount: defineRuntime(async ({ input, context }) => {
+      const value = input;
       if (typeof value !== 'string') {
         throw new Error('useAccount expects an account name');
       }
 
       const account = await getAccount(value);
-      ctx.state.account = account;
-      await ctx.agent.aiAct(`Sign in as ${account.email}`);
-    },
+      context.state.account = account;
+      await signInWithAccount(context.agent, account);
 
-    async assertOrderStatus(value) {
-      const { orderId, status } = value as {
-        orderId: string;
-        status: string;
+      return {
+        conclusion: `Signed in as ${account.email}`,
       };
-
-      const order = await queryOrder(orderId);
-      if (order.status !== status) {
-        throw new Error(
-          `Expected order ${orderId} to be ${status}, got ${order.status}`,
-        );
-      }
-    },
+    }),
   },
 });
 ```
@@ -184,14 +181,86 @@ YAML 文件可以混用 Midscene 内置 step 和项目自己的 step：
 ```yaml
 flow:
   - useAccount: smoke-buyer
-  - aiAct: Open my orders
-  - assertOrderStatus:
-      orderId: E2E-10001
-      status: paid
-  - aiAssert: The order detail page shows Paid
+  - ui: Open my orders
+  - verify: |
+      使用 $database 验证订单 E2E-10001 的状态是 paid。
+  - verify: The order detail page shows Paid
+```
+
+规则刻意保持很小：`ui` 是面向界面行为的统一入口，`verify` 是面向断言和外部证据的统一入口，`agent` 是面向分析、归因和跨工具编排的入口。`$database` 这样的 `$name` 表示引用某个 skill，运行时引擎会自动解析和读取；`runtime` 则用于按项目需要扩展新的 YAML 节点，例如 `useAccount`。团队可以根据自己的业务和工程习惯决定要扩展哪些节点、扩展多少节点。Midscene 不要求为自定义节点 value 编写 schema；团队可以在 runtime handler 里按自己的工程习惯做参数校验。
+
+### 走向 Agentic Testing
+
+UI 是用户体验的入口，也是最自然的测试起点。但很多真实缺陷并不会只停留在页面上：订单页面可能显示成功，数据库里却没有记录；按钮点击没有报错，日志里却出现了异常；用户路径看似完成，埋点、网络请求或下游状态却不符合预期。
+
+Agentic Testing 的核心理念是：从 UI 行为出发，把测试扩展成一次跨工具的证据收集和判断过程。Midscene 负责理解页面、操作界面和产出可回放报告，同时允许项目把数据库、日志、网络请求、埋点、通知系统和内部工具注册为测试能力。这样 YAML 仍然从用户路径开始，但验证可以继续深入到系统状态和业务风险。
+
+一个更完整的 case 可以长成这样：
+
+```yaml
+name: Create Order
+
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+  - ui: |
+      使用测试账号登录系统，创建一笔测试订单。
+
+      在结论中记录：
+      - 订单号
+      - 当前页面状态
+      - 是否创建成功
+  - verify: |
+      使用 $database 验证前面结论中的订单号是否真实存在，且订单状态是 paid。
+  - verify: |
+      使用 $logs 检查测试期间是否出现相关 ERROR。
+  - verify: 订单详情页展示支付成功
+  - agent: 根据所有验证结果分析本次测试风险
+  - notifySlack
+```
+
+这里的核心仍然从 UI 切入：用户如何登录、如何创建订单、页面上发生了什么。项目可以在 UI 步骤前后插入准备数据、发送通知等自定义节点；后续步骤则把 UI 结论交给项目自己的能力继续验证：订单是否真的进入数据库，测试期间是否有错误日志，最终页面状态是否和业务状态一致。
+
+这里有两类能力：`$database` 和 `$logs` 这样的 `$name` 引用由运行时引擎自动解析和读取；`prepareOrderFixture` 和 `notifySlack` 则是项目在 `midscene.config.ts` 里扩展出来的新 YAML 节点：
+
+```ts
+import {
+  defineMidsceneConfig,
+  defineRuntime,
+} from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  target: {
+    type: 'web',
+    options: {
+      url: 'http://127.0.0.1:3000',
+    },
+  },
+
+  testDir: './e2e',
+
+  runtime: {
+    prepareOrderFixture: defineRuntime(async ({ input, context }) => {
+      const fixture = await createOrderFixture(input);
+      context.state.orderFixture = fixture;
+
+      return {
+        conclusion: `Prepared order fixture ${fixture.id}`,
+      };
+    }),
+
+    notifySlack: defineRuntime(async ({ context }) => {
+      await sendSlackSummary(context.result);
+
+      return {
+        conclusion: 'Slack notification sent',
+      };
+    }),
+  },
+});
 ```
 
-规则刻意保持很小：如果 `flow` 里的 step key 命中 Midscene 内置 step，就执行内置行为；如果命中 `yamlSteps`，就把 YAML value 和当前 step context 传给注册函数；两边都没有命中时，runner 报 unknown step。自定义 step 不应该覆盖内置 step 名。Midscene 不要求为自定义 step value 编写 schema；团队可以在 handler 里按自己的工程习惯做参数校验。
+这条路线不会丢掉 YAML 驱动 UI Test 的低门槛。相反，它把 YAML 作为面向人的测试表达，把 TypeScript 配置作为面向工程的能力注册入口：普通路径继续用自然语言描述，真正需要确定性证据的地方再接入团队自己的工具。
 
 ### 特殊 case 使用 TypeScript
 
@@ -214,7 +283,7 @@ YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能
 
 Midscene 是基于 Rstest 封装构建的上层测试框架。Rstest 在底层提供测试生命周期、fixture 模型、并发执行、用例过滤和适合 CI 的运行时能力。它基于 Rust 编写，具备更高的执行性能，因此 Midscene 用户默认就能获得高性能的测试底座。Midscene 则把这些能力封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
 
-绝大多数用户可以通过 Midscene 的 YAML runner、`midscene.config.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 agent setup 留在同一个入口里。只有团队进入 `emit` 流程，希望直接接管生成后的 Rstest 项目时，才需要关注 Rstest 的细节。
+绝大多数用户可以通过 Midscene 的 YAML runner、`midscene.config.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 UI Agent 创建入口留在同一个配置里。
 
 ### Rstest 提供的工程能力
 
@@ -226,33 +295,6 @@ Rstest 为 Midscene 项目提供可靠的工程底座：
 - YAML case 和 `case.test.ts` 共享同一个底层运行模型。
 - 团队可以从轻量项目开始，同时保留成长为长期回归套件的路径。
 
-### 导出标准项目
-
-对于复杂项目，case 可能更多，fixture 更复杂，CI 需要并发和分组策略，失败分析也需要接入团队自己的系统。由于轻量 Midscene 项目本来就运行在 Rstest 之上，`emit` 会把底层 Rstest 工程显式暴露出来，让团队可以直接接管测试 Runner、fixture 和集成代码。
-
-```bash
-midscene emit ./project-folder
-```
-
-导出后的项目形态类似：
-
-```text
-project-folder/
-  package.json
-  midscene.config.ts
-  rstest.config.ts
-  e2e/
-    dashboard.test.ts
-    checkout.test.ts
-  fixtures/
-    account.ts
-    device.ts
-  reports/
-    midscene-report/
-```
-
-在这种项目形态里，YAML 仍然可以作为自然语言用例的人类友好表达，`midscene.config.ts` 仍然是面向 Midscene 的配置事实来源，Rstest 则提供显式的测试工程结构。团队可以把业务路径继续保留在 YAML 中，把复杂逻辑放进 Rstest 测试文件、fixture 和内部工具里。AI UI 操作和视觉断言交给 Midscene，环境编排、接口校验、数据库校验和失败归因交给团队自己的代码。
-
 ### 下一步
 
 - 从命令行运行 YAML case：[YAML 脚本运行器](./yaml-script-runner)

From 3d07d63c9987a910d93979ed8411b1509cc2e036 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Tue, 2 Jun 2026 23:57:08 -0700
Subject: [PATCH 18/33] docs(site): clarify verify and agent nodes

---
 apps/site/docs/en/ui-testing-framework.mdx | 32 ++++++++++++++++++++++
 apps/site/docs/zh/ui-testing-framework.mdx | 31 +++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 5386552ed9..97b5184c73 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -189,6 +189,38 @@ flow:
 
 The rule is intentionally small: `ui` is the unified entry for interface behavior, `verify` is the unified entry for assertions and external evidence, and `agent` is the entry for analysis, attribution, and cross-tool orchestration. A `$name` such as `$database` refers to a skill, and the runtime engine resolves and loads it automatically. `runtime` is for extending YAML with new nodes such as `useAccount` based on project needs. Teams decide which nodes to add and how many to add according to their own business and engineering habits. Midscene does not require a schema for custom node values; teams can validate values inside runtime handlers in whatever style fits their project.
 
+### `verify` and `agent`
+
+`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. They are not new UI operation entries; they verify, analyze, and attribute results from the current test context. They use the same kind of Agent capability, but `verify` carries test judgment semantics: it must decide pass or fail, and a failed verification fails the current case. `agent` is better suited for summaries, analysis, attribution, and recommendations.
+
+When Pi Agent executes these nodes, it sees two kinds of context:
+
+- The output of each previous node, such as conclusions recorded by `ui` nodes or `conclusion` values returned by runtime nodes.
+- The current screenshot, so it can understand the current page or screen state.
+
+It does not see the full execution process of previous nodes. For example, a `ui` node may click, type, and retry several times to create an order, but later `verify` / `agent` nodes only see that node's final output and the current screenshot.
+
+If a later step needs a variable, ask the earlier node to write it explicitly into its output:
+
+```yaml
+flow:
+  - ui: |
+      Create a test order.
+
+      Name this step's output createOrder, and record:
+      - orderId: the order id
+      - pageState: the current page state
+
+  - verify: |
+      Use $database to verify that the orderId from the output named createOrder exists.
+
+  - agent: |
+      Analyze this test's risk from the output named createOrder, database verification result,
+      and current screenshot.
+```
+
+Here, `ui` still takes only natural-language input. `createOrder` is the output name requested in that natural-language instruction, and `orderId` is a field in that output. Later nodes can reference “the `orderId` from the output named `createOrder`” in natural language. External capabilities still use skill references such as `$database`.
+
 ### Toward Agentic Testing
 
 The UI is the entry point to user experience, and it is the most natural place for a test to begin. But many real failures do not stay on the page: an order page may look successful while no record exists in the database; a click may finish without a visible error while logs contain an exception; a user path may complete while analytics, network requests, or downstream state are wrong.
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index fe8b42fa11..1baa3704f8 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -189,6 +189,37 @@ flow:
 
 规则刻意保持很小：`ui` 是面向界面行为的统一入口，`verify` 是面向断言和外部证据的统一入口，`agent` 是面向分析、归因和跨工具编排的入口。`$database` 这样的 `$name` 表示引用某个 skill，运行时引擎会自动解析和读取；`runtime` 则用于按项目需要扩展新的 YAML 节点，例如 `useAccount`。团队可以根据自己的业务和工程习惯决定要扩展哪些节点、扩展多少节点。Midscene 不要求为自定义节点 value 编写 schema；团队可以在 runtime handler 里按自己的工程习惯做参数校验。
 
+### `verify` 和 `agent`
+
+`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。它们不是新的 UI 操作入口，而是基于当前测试上下文做验证、分析和归因。两者本质上使用同一类 Agent 能力，区别在于 `verify` 带有测试判定语义：它需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 更适合做总结、分析、归因和建议。
+
+Pi Agent 在执行这些节点时会看到两类上下文：
+
+- 前序每个节点的输出，例如 `ui` 节点记录的结论、runtime 节点返回的 `conclusion`。
+- 当前截图，用来理解此刻页面或屏幕上的状态。
+
+它不会看到前序节点的完整执行过程。比如一个 `ui` 节点为了创建订单可能经历了多次点击、输入和重试，后续 `verify` / `agent` 只能看到这个节点最终输出了什么，以及当前截图是什么样。
+
+因此，如果后续步骤需要使用某个变量，应该让前面的节点把它明确写进输出里：
+
+```yaml
+flow:
+  - ui: |
+      创建一笔测试订单。
+
+      将这一步的输出命名为 createOrder，并记录：
+      - orderId: 订单号
+      - pageState: 当前页面状态
+
+  - verify: |
+      使用 $database 验证名为 createOrder 的输出中的 orderId 是否真实存在。
+
+  - agent: |
+      根据名为 createOrder 的输出、数据库验证结果和当前截图，分析本次测试风险。
+```
+
+这里的 `ui` 仍然只有自然语言输入。`createOrder` 是这段自然语言要求 Pi Agent 记录的输出名称，`orderId` 是该输出里的字段。后续节点可以直接用自然语言引用“名为 `createOrder` 的输出中的 `orderId`”。如果要调用外部能力，仍然使用 `$database` 这样的 skill 引用。
+
 ### 走向 Agentic Testing
 
 UI 是用户体验的入口，也是最自然的测试起点。但很多真实缺陷并不会只停留在页面上：订单页面可能显示成功，数据库里却没有记录；按钮点击没有报错，日志里却出现了异常；用户路径看似完成，埋点、网络请求或下游状态却不符合预期。

From b3c3c8e83e7744284c2d775f9280e2781bfe373a Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 3 Jun 2026 00:09:19 -0700
Subject: [PATCH 19/33] docs(site): refine ui testing framework overview

---
 apps/site/docs/en/ui-testing-framework.mdx | 40 ++++++----------------
 apps/site/docs/zh/ui-testing-framework.mdx | 40 ++++++----------------
 2 files changed, 20 insertions(+), 60 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 97b5184c73..4e775dbf28 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -1,21 +1,18 @@
 # AI-native UI Testing Framework for Natural-language Cases
 
-UI tests are valuable only when teams can keep writing and maintaining them. In many projects, the first few browser scripts are easy, but the suite soon fills with selectors, waits, login helpers, data setup, and failure screenshots that only test specialists can understand.
+The hard part of UI testing is not writing the first browser script. It is keeping the suite readable, writable, and maintainable as the product changes. Traditional scripts quickly fill with selectors, waits, login helpers, data setup, and failure screenshots, until only a few test specialists can tell what they actually verify.
 
-Midscene provides an AI-native UI Testing Framework for this problem. It lets teams describe user paths in natural language, run them as structured cases, and review the result through screenshots, replay reports, and diagnostics. Test authors can start with YAML before adding test framework boilerplate, while engineers still have TypeScript extension points for UI Agent creation, data, devices, deterministic checks, and existing test projects.
+Midscene is designed around three core ideas:
 
-The goal is simple: make lightweight cases easy to write, and make serious test projects possible without changing the authoring model later. From there, Midscene moves UI testing toward Agentic Testing: everything starts from the UI, but testing should not stop at the UI. A trustworthy test conclusion should be able to connect page behavior with API responses, database state, logs, analytics, and the tools a team already relies on.
+- Cases must stay readable. Test authors write natural-language user paths in YAML, so QA, business teams, and engineers can review the case itself instead of first decoding a script implementation.
+- The engineering architecture must split responsibilities cleanly. YAML focuses on what the user should accomplish; `midscene.config.ts` manages target environments, UI Agent creation, execution policy, reporting, and runtime extensions; TypeScript code owns data setup, device integration, deterministic checks, and internal tools.
+- The architecture must be ready for Agentic Testing. Teams can start from the UI path, but conclusions do not have to stop at the UI. `ui`, `verify`, `agent`, skill references, and runtime extensions let tests connect API responses, database state, logs, analytics, and the tools a team already relies on.
 
-## From YAML to an Engineering-ready Project
+Midscene is not a choice between lightweight YAML and serious test engineering. It makes the first case lightweight while letting the same authoring model grow into a long-lived regression suite.
 
-Midscene is designed for teams that want UI tests to stay close to user behavior:
+## Progressive Adoption Path
 
-- QA and business teams can review a case by reading the YAML flow.
-- Frontend and test engineers can keep case matching, concurrency, reports, login, cookies, accounts, devices, and test data in code, leaving natural-language steps focused on the user path.
-- CI jobs get repeatable cases, replayable Midscene reports, screenshots, and failure details.
-- Growing projects can move from lightweight YAML cases to long-lived regression suites without rewriting the testing foundation.
-
-Midscene lets a test project start from natural-language cases: core paths stay readable and maintainable, and the first case stays lightweight. When the project needs case filtering, concurrency control, login state, test data, device connections, complex assertions, reports, or CI management, it still has enough room for engineering extension.
+Based on those principles, Midscene lets a project start from natural-language cases: first write the core path as a readable, runnable, replayable YAML case; then add case filtering, concurrency control, login state, test data, device connections, complex assertions, reports, or CI management when the suite needs them.
 
 ### Start with YAML
 
@@ -295,28 +292,11 @@ export default defineMidsceneConfig({
 
 This direction keeps the low-friction YAML-driven UI testing model intact. YAML remains the human-facing expression for the test, and TypeScript config remains the engineering entry for registering capabilities: ordinary paths stay in natural language, while places that need deterministic evidence can connect to the team's own tools.
 
-### Use TypeScript for Special Cases
-
-YAML should be the main path while still leaving room for code. Some cases need complex branches, internal SDK calls, mixed API checks, database assertions, or existing test utilities. Those cases can be written as `case.test.ts`.
-
-This is an engineering capability for special cases:
-
-```text
-.
-  midscene.config.ts
-  e2e/
-    dashboard.yaml
-    checkout.yaml
-    checkout-risk-control.test.ts
-```
-
-Teams can keep ordinary business paths in YAML and put the few truly complex cases in TypeScript. They still live in the same project and share the same runtime model.
-
 ## Built on Rstest
 
 Midscene is built as a higher-level testing framework on top of Rstest. Rstest provides the underlying lifecycle, fixture model, parallel execution, filtering, and CI-friendly runtime. It is also written in Rust for strong execution performance, so Midscene users get a high-performance test foundation by default. Midscene wraps those capabilities with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
 
-Most users can rely on that foundation through Midscene's YAML runner, `midscene.config.ts`, and `case.test.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific UI Agent creation in the same config.
+Most users can rely on that foundation through Midscene's YAML runner and `midscene.config.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific UI Agent creation in the same config.
 
 ### Why Rstest Helps
 
@@ -325,7 +305,7 @@ Rstest gives the Midscene project a reliable engineering base:
 - Case execution is backed by a standard test lifecycle, including setup, teardown, hooks, fixtures, and parallel execution.
 - The Rust-based runtime gives Midscene projects a performance-oriented execution layer by default.
 - Projects can naturally connect to CI, test filtering, failure reporting, and existing team testing habits.
-- YAML cases and `case.test.ts` share the same underlying runtime model.
+- YAML cases, runtime nodes, and config extensions share the same underlying runtime model.
 - Teams can start lightweight and still keep a path toward long-lived regression suites.
 
 ### Next Steps
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 1baa3704f8..ed799415e1 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -1,21 +1,18 @@
 # 面向自然语言用例的 AI 原生 UI Testing Framework
 
-UI Test 的价值，取决于团队能不能持续编写和维护它。很多项目一开始写几条浏览器脚本并不难，但用例很快就会被选择器、等待逻辑、登录辅助函数、测试数据准备和失败截图塞满，最后只有少数测试工程师能看懂。
+UI Test 真正的难点，不是写出第一条浏览器脚本，而是让团队长期愿意写、看得懂、维护得起。传统脚本很快会被选择器、等待逻辑、登录辅助函数、测试数据准备和失败截图塞满，最后只有少数测试工程师能理解它们到底在验证什么。
 
-Midscene 提供了面向这个问题的 AI 原生 UI Testing Framework。它让团队用自然语言描述用户路径，把这些路径作为结构化 case 执行，并通过截图、回放报告和诊断信息复盘结果。测试作者可以先从 YAML 开始，后续再补充测试框架样板代码；工程同学仍然可以用 TypeScript 扩展点接入 UI Agent 创建、数据、设备、确定性校验和现有测试工程。
+Midscene 的设计围绕三个核心要点展开：
 
-目标很简单：轻量 case 要容易写，严肃测试工程也要能继续长大，而且不需要中途换掉用例表达方式。更进一步，Midscene 希望把 UI Test 带向 Agentic Testing：一切从 UI 开始，但测试不该止步于 UI。真正可信的测试结论，应该能把页面行为、接口响应、数据库状态、日志、埋点和团队已有工具串在一起。
+- 用例必须可读。测试作者用 YAML 写自然语言用户路径，QA、业务同学和工程师都能直接 review case 本身，而不是先读懂一套脚本实现。
+- 工程架构必须优雅拆分职责。YAML 专注描述用户要完成什么；`midscene.config.ts` 管理目标环境、UI Agent 创建、运行策略、报告输出和 runtime 扩展；TypeScript 代码承接数据准备、设备接入、确定性校验和团队内部工具。
+- 架构必须面向 Agentic Testing。团队可以从 UI 路径切入测试，但结论不必止步于 UI。`ui`、`verify`、`agent`、skill 引用和 runtime 扩展让测试可以继续连接接口响应、数据库状态、日志、埋点和团队已有工具。
 
-## 从 YAML 到工程化项目
+Midscene 不是让团队在“轻量 YAML”和“严肃测试工程”之间二选一，而是让第一条 case 足够轻，同时让同一套表达方式继续长成长期回归套件。
 
-Midscene 适合希望 UI Test 更接近用户行为的团队：
+## 渐进式落地路径
 
-- QA 和业务同学可以直接阅读 YAML flow 来 review 用例。
-- 前端和测试工程师可以把用例匹配、并发、报告、登录、Cookie、账号、设备和测试数据留在代码里，让自然语言步骤专注描述用户路径。
-- CI 任务可以获得可重复执行的 case、可回放的 Midscene 报告、截图和失败详情。
-- 项目从轻量 YAML case 长成长期回归套件时，不需要重写测试底座。
-
-Midscene 让测试项目可以从自然语言用例开始：核心路径可读、可维护，第一条 case 启动很轻；当项目需要用例过滤、并发控制、登录态、测试数据、设备连接、复杂断言、报告输出或 CI 管理时，也有足够的工程扩展空间。
+基于这三个原则，Midscene 让项目可以从自然语言用例开始：先把核心路径写成可读、可运行、可回放的 YAML case；当项目需要更强的工程能力时，再接入用例过滤、并发控制、登录态、测试数据、设备连接、复杂断言、报告输出或 CI 管理。
 
 ### 从 YAML 开始
 
@@ -293,28 +290,11 @@ export default defineMidsceneConfig({
 
 这条路线不会丢掉 YAML 驱动 UI Test 的低门槛。相反，它把 YAML 作为面向人的测试表达，把 TypeScript 配置作为面向工程的能力注册入口：普通路径继续用自然语言描述，真正需要确定性证据的地方再接入团队自己的工具。
 
-### 特殊 case 使用 TypeScript
-
-YAML 应该是主路径，同时也要给代码留出空间。少数 case 可能需要复杂分支、内部 SDK 调用、混合接口校验、数据库断言或复用现有测试工具，这类用例可以写成 `case.test.ts`。
-
-这是给特殊场景保留的工程能力：
-
-```text
-.
-  midscene.config.ts
-  e2e/
-    dashboard.yaml
-    checkout.yaml
-    checkout-risk-control.test.ts
-```
-
-团队可以把普通业务路径继续保留在 YAML 中，把少数真正复杂的 case 放进 TypeScript。它们仍然在同一个项目里维护，并共享同一个运行时模型。
-
 ## 基于 Rstest 构建
 
 Midscene 是基于 Rstest 封装构建的上层测试框架。Rstest 在底层提供测试生命周期、fixture 模型、并发执行、用例过滤和适合 CI 的运行时能力。它基于 Rust 编写，具备更高的执行性能，因此 Midscene 用户默认就能获得高性能的测试底座。Midscene 则把这些能力封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
 
-绝大多数用户可以通过 Midscene 的 YAML runner、`midscene.config.ts` 和 `case.test.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 UI Agent 创建入口留在同一个配置里。
+绝大多数用户可以通过 Midscene 的 YAML runner 和 `midscene.config.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 UI Agent 创建入口留在同一个配置里。
 
 ### Rstest 提供的工程能力
 
@@ -323,7 +303,7 @@ Rstest 为 Midscene 项目提供可靠的工程底座：
 - 用例执行有标准测试生命周期支撑，包括 setup、teardown、hook、fixture 和并发能力。
 - 基于 Rust 的运行时让 Midscene 项目默认拥有面向性能优化的执行层。
 - 项目可以自然接入 CI、测试过滤、失败报告和团队已有的测试工程习惯。
-- YAML case 和 `case.test.ts` 共享同一个底层运行模型。
+- YAML case、runtime 节点和配置扩展共享同一个底层运行模型。
 - 团队可以从轻量项目开始，同时保留成长为长期回归套件的路径。
 
 ### 下一步

From 3ee649c3ac9349d5f4e92a9255c8b55f89a67a76 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 3 Jun 2026 00:26:24 -0700
Subject: [PATCH 20/33] docs(site): restructure ui testing framework docs

---
 apps/site/docs/en/ui-testing-framework.mdx | 201 +++++++--------------
 apps/site/docs/zh/ui-testing-framework.mdx | 197 +++++++-------------
 2 files changed, 136 insertions(+), 262 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 4e775dbf28..666fd8a02a 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -10,13 +10,9 @@ Midscene is designed around three core ideas:
 
 Midscene is not a choice between lightweight YAML and serious test engineering. It makes the first case lightweight while letting the same authoring model grow into a long-lived regression suite.
 
-## Progressive Adoption Path
+## Start with Simple UI Tasks
 
-Based on those principles, Midscene lets a project start from natural-language cases: first write the core path as a readable, runnable, replayable YAML case; then add case filtering, concurrency control, login state, test data, device connections, complex assertions, reports, or CI management when the suite needs them.
-
-### Start with YAML
-
-For most smoke tests and lightweight regression projects, the first useful milestone is to get a core path running quickly and turn it into a case that can be repeated, replayed, and inspected.
+Midscene starts by helping teams write a simple UI task clearly, run it, and replay it. For most smoke tests and lightweight regression projects, the first useful milestone is not setting up a complex testing project. It is turning a core user path into a readable, repeatable, inspectable case.
 
 A YAML case keeps the path readable:
 
@@ -49,11 +45,72 @@ That simple shape should cover most early projects:
 
 The case remains close to business language, while the runner gives it a repeatable execution and a report that can be inspected after success or failure.
 
-### Configure the Project in TypeScript
+## Connect External Context with `verify` and `agent`
+
+`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. They are not new UI operation entries; they verify, analyze, and attribute results from the current test context. They use the same kind of Agent capability, but `verify` carries test judgment semantics: it must decide pass or fail, and a failed verification fails the current case. `agent` is better suited for summaries, analysis, attribution, and recommendations.
+
+When Pi Agent executes these nodes, it sees two kinds of context:
+
+- The output of each previous node, such as conclusions recorded by `ui` nodes or `conclusion` values returned by runtime nodes.
+- The current screenshot, so it can understand the current page or screen state.
+
+It does not see the full execution process of previous nodes. For example, a `ui` node may click, type, and retry several times to create an order, but later `verify` / `agent` nodes only see that node's final output and the current screenshot.
+
+If a later step needs a variable, ask the earlier node to write it explicitly into its output:
+
+```yaml
+flow:
+  - ui: |
+      Create a test order.
+
+      Name this step's output createOrder, and record:
+      - orderId: the order id
+      - pageState: the current page state
+
+  - verify: |
+      Use $database to verify that the orderId from the output named createOrder exists.
+
+  - agent: |
+      Analyze this test's risk from the output named createOrder, database verification result,
+      and current screenshot.
+```
+
+Here, `ui` still takes only natural-language input. `createOrder` is the output name requested in that natural-language instruction, and `orderId` is a field in that output. Later nodes can reference "the `orderId` from the output named `createOrder`" in natural language.
+
+External systems stay in natural language as well. `$database`, `$logs`, and other `$name` references are resolved by the runtime engine as skills. Pi Agent uses skill results together with previous node outputs and the current screenshot when executing `verify` or `agent`.
+
+A fuller case can look like this:
+
+```yaml
+name: Create Order
+
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+  - ui: |
+      Sign in with a test account and create a test order.
+
+      Record in the conclusion:
+      - order id
+      - current page state
+      - whether the order was created successfully
+  - verify: |
+      Use $database to verify that the order id from the previous conclusion
+      really exists and that the order status is paid.
+  - verify: |
+      Use $logs to check whether any related ERROR appeared during the test.
+  - verify: The order detail page shows payment success
+  - agent: Analyze the risk of this test from all verification results
+  - notifySlack
+```
+
+In this example, `ui` creates the order and records order information; `verify` uses `$database` and `$logs` for external checks and returns a pass or fail judgment; `agent` summarizes the verification results and current screenshot; `notifySlack` is a custom node added later through runtime.
+
+There are two kinds of extension here: `$database` and `$logs` are skill references for external context; `prepareOrderFixture` and `notifySlack` are new YAML nodes registered by the project in `midscene.config.ts`.
 
-In professional UI testing projects, the case steps are usually the clearest part: open a page, perform a user path, and check the expected result. The engineering complexity tends to sit around that path, in project configuration such as which cases to run, how much concurrency to use, where reports should be written, login state, cookies, test accounts, staging lanes, backend data, cloud device connections, and device initialization.
+## Extension and Integration
 
-Those concerns should not be copied into every YAML case. Midscene provides `midscene.config.ts` as the project-level config-as-code entry. It replaces the lightweight `config.yml` shape when a suite grows up: test discovery, execution policy, output, UI Agent creation, and runtime extensions all live in one typed file.
+As a project grows from lightweight cases into a long-lived regression suite, engineering complexity should move into configuration and extension layers instead of being copied into every YAML file. Midscene provides `midscene.config.ts` as the project-level config-as-code entry for test discovery, execution policy, output, UI Agent creation, and runtime extensions.
 
 ```ts
 import { defineMidsceneConfig } from '@midscene/testing-framework';
@@ -69,7 +126,7 @@ export default defineMidsceneConfig({
   },
 
   testDir: './e2e',
-  include: ['**/*.yaml', '**/*.test.ts'],
+  include: ['**/*.yaml'],
   exclude: ['**/*.draft.yaml'],
 
   testRunner: {
@@ -101,9 +158,7 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-The boundary is clear: `e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`, so ordinary projects do not need to write `createUIAgent`. If a project needs custom devices, remote services, or internal team fixtures, it can create the UI Agent entirely inside `createUIAgent`; in that case, `target` can be omitted to avoid defining the runtime target twice in the same config. The project can handle real engineering constraints, while the case itself remains friendly to code review, business confirmation, and team collaboration.
-
-When a project really needs a fully custom runtime target, it can put UI Agent creation in `createUIAgent`:
+`e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`. If a project needs custom devices, remote services, or internal team fixtures, it can create the UI Agent entirely inside `createUIAgent` and omit `target` to avoid defining the runtime target twice.
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -133,125 +188,7 @@ export default defineMidsceneConfig({
 });
 ```
 
-### Extend YAML Nodes with Runtime
-
-YAML should stay readable, but some teams need a few project-specific actions around the natural-language path: seeding test data, switching accounts, calling an internal API, sending notifications, or wrapping a repeated business action. These actions can be registered as custom runtimes in `midscene.config.ts` and used as new YAML nodes.
-
-```ts
-import {
-  defineMidsceneConfig,
-  defineRuntime,
-} from '@midscene/testing-framework';
-
-export default defineMidsceneConfig({
-  target: {
-    type: 'web',
-    options: {
-      url: 'http://127.0.0.1:3000',
-    },
-  },
-
-  testDir: './e2e',
-  include: ['**/*.yaml'],
-
-  runtime: {
-    useAccount: defineRuntime(async ({ input, context }) => {
-      const value = input;
-      if (typeof value !== 'string') {
-        throw new Error('useAccount expects an account name');
-      }
-
-      const account = await getAccount(value);
-      context.state.account = account;
-      await signInWithAccount(context.agent, account);
-
-      return {
-        conclusion: `Signed in as ${account.email}`,
-      };
-    }),
-  },
-});
-```
-
-The YAML file can then mix Midscene built-in steps with project steps:
-
-```yaml
-flow:
-  - useAccount: smoke-buyer
-  - ui: Open my orders
-  - verify: |
-      Use $database to verify that order E2E-10001 has status paid.
-  - verify: The order detail page shows Paid
-```
-
-The rule is intentionally small: `ui` is the unified entry for interface behavior, `verify` is the unified entry for assertions and external evidence, and `agent` is the entry for analysis, attribution, and cross-tool orchestration. A `$name` such as `$database` refers to a skill, and the runtime engine resolves and loads it automatically. `runtime` is for extending YAML with new nodes such as `useAccount` based on project needs. Teams decide which nodes to add and how many to add according to their own business and engineering habits. Midscene does not require a schema for custom node values; teams can validate values inside runtime handlers in whatever style fits their project.
-
-### `verify` and `agent`
-
-`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. They are not new UI operation entries; they verify, analyze, and attribute results from the current test context. They use the same kind of Agent capability, but `verify` carries test judgment semantics: it must decide pass or fail, and a failed verification fails the current case. `agent` is better suited for summaries, analysis, attribution, and recommendations.
-
-When Pi Agent executes these nodes, it sees two kinds of context:
-
-- The output of each previous node, such as conclusions recorded by `ui` nodes or `conclusion` values returned by runtime nodes.
-- The current screenshot, so it can understand the current page or screen state.
-
-It does not see the full execution process of previous nodes. For example, a `ui` node may click, type, and retry several times to create an order, but later `verify` / `agent` nodes only see that node's final output and the current screenshot.
-
-If a later step needs a variable, ask the earlier node to write it explicitly into its output:
-
-```yaml
-flow:
-  - ui: |
-      Create a test order.
-
-      Name this step's output createOrder, and record:
-      - orderId: the order id
-      - pageState: the current page state
-
-  - verify: |
-      Use $database to verify that the orderId from the output named createOrder exists.
-
-  - agent: |
-      Analyze this test's risk from the output named createOrder, database verification result,
-      and current screenshot.
-```
-
-Here, `ui` still takes only natural-language input. `createOrder` is the output name requested in that natural-language instruction, and `orderId` is a field in that output. Later nodes can reference “the `orderId` from the output named `createOrder`” in natural language. External capabilities still use skill references such as `$database`.
-
-### Toward Agentic Testing
-
-The UI is the entry point to user experience, and it is the most natural place for a test to begin. But many real failures do not stay on the page: an order page may look successful while no record exists in the database; a click may finish without a visible error while logs contain an exception; a user path may complete while analytics, network requests, or downstream state are wrong.
-
-Agentic Testing means starting from UI behavior and extending the test into a cross-tool process for collecting evidence and making a judgment. Midscene understands the page, operates the interface, and produces replayable reports, while the project can register databases, logs, network requests, analytics, notification systems, and internal tools as testing capabilities. YAML still starts with the user path, but verification can continue into system state and business risk.
-
-A fuller case can look like this:
-
-```yaml
-name: Create Order
-
-flow:
-  - prepareOrderFixture:
-      scenario: paid-order
-  - ui: |
-      Sign in with a test account and create a test order.
-
-      Record in the conclusion:
-      - order id
-      - current page state
-      - whether the order was created successfully
-  - verify: |
-      Use $database to verify that the order id from the previous conclusion
-      really exists and that the order status is paid.
-  - verify: |
-      Use $logs to check whether any related ERROR appeared during the test.
-  - verify: The order detail page shows payment success
-  - agent: Analyze the risk of this test from all verification results
-  - notifySlack
-```
-
-The core still starts from the UI mindset: how the user signs in, how the order is created, and what appears on the page. Projects can insert custom nodes for data preparation or notifications before or after the UI step; later steps pass the UI conclusion to project capabilities for deeper verification: whether the order exists in the database, whether error logs appeared during the test, and whether the final page state matches the business state.
-
-There are two kinds of capabilities here: `$database` and `$logs` are `$name` references that the runtime engine resolves and loads automatically; `prepareOrderFixture` and `notifySlack` are new YAML nodes registered by the project in `midscene.config.ts`:
+YAML can also gain new project-specific nodes. For example, `prepareOrderFixture` and `notifySlack` can be registered as custom runtimes:
 
 ```ts
 import {
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index ed799415e1..b70f376b98 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -10,13 +10,9 @@ Midscene 的设计围绕三个核心要点展开：
 
 Midscene 不是让团队在“轻量 YAML”和“严肃测试工程”之间二选一，而是让第一条 case 足够轻，同时让同一套表达方式继续长成长期回归套件。
 
-## 渐进式落地路径
+## 从简单 UI 任务开始
 
-基于这三个原则，Midscene 让项目可以从自然语言用例开始：先把核心路径写成可读、可运行、可回放的 YAML case；当项目需要更强的工程能力时，再接入用例过滤、并发控制、登录态、测试数据、设备连接、复杂断言、报告输出或 CI 管理。
-
-### 从 YAML 开始
-
-对于大多数 Smoke Test 和轻量回归项目，第一个有价值的里程碑，是快速跑通核心路径，并把它变成可以重复执行、可以回放分析的 case。
+Midscene 的第一步，是让团队用 YAML 把一个简单 UI 任务写清楚、跑起来、回放出来。对于大多数 Smoke Test 和轻量回归项目，第一个有价值的里程碑不是搭建复杂工程，而是把核心用户路径变成可读、可重复执行、可分析的 case。
 
 YAML case 可以让路径保持可读：
 
@@ -49,11 +45,70 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 
 用例本身仍然接近业务语言，runner 则提供可重复执行的过程，以及成功或失败后都可以检查的报告。
 
-### 用 TypeScript 配置项目
+## 用 `verify` 和 `agent` 连接外部能力
+
+`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。它们不是新的 UI 操作入口，而是基于当前测试上下文做验证、分析和归因。两者本质上使用同一类 Agent 能力，区别在于 `verify` 带有测试判定语义：它需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 更适合做总结、分析、归因和建议。
+
+Pi Agent 在执行这些节点时会看到两类上下文：
+
+- 前序每个节点的输出，例如 `ui` 节点记录的结论、runtime 节点返回的 `conclusion`。
+- 当前截图，用来理解此刻页面或屏幕上的状态。
+
+它不会看到前序节点的完整执行过程。比如一个 `ui` 节点为了创建订单可能经历了多次点击、输入和重试，后续 `verify` / `agent` 只能看到这个节点最终输出了什么，以及当前截图是什么样。
+
+因此，如果后续步骤需要使用某个变量，应该让前面的节点把它明确写进输出里：
+
+```yaml
+flow:
+  - ui: |
+      创建一笔测试订单。
+
+      将这一步的输出命名为 createOrder，并记录：
+      - orderId: 订单号
+      - pageState: 当前页面状态
+
+  - verify: |
+      使用 $database 验证名为 createOrder 的输出中的 orderId 是否真实存在。
+
+  - agent: |
+      根据名为 createOrder 的输出、数据库验证结果和当前截图，分析本次测试风险。
+```
+
+这里的 `ui` 仍然只有自然语言输入。`createOrder` 是这段自然语言要求 Pi Agent 记录的输出名称，`orderId` 是该输出里的字段。后续节点可以直接用自然语言引用“名为 `createOrder` 的输出中的 `orderId`”。
+
+对外部系统的引用也保持在自然语言里。`$database`、`$logs` 这样的 `$name` 会被运行时引擎解析为对应 skill；Pi Agent 会把 skill 结果、前序节点输出和当前截图一起用于 `verify` 或 `agent`。
+
+一个更完整的 case 可以长成这样：
+
+```yaml
+name: Create Order
+
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+  - ui: |
+      使用测试账号登录系统，创建一笔测试订单。
+
+      在结论中记录：
+      - 订单号
+      - 当前页面状态
+      - 是否创建成功
+  - verify: |
+      使用 $database 验证前面结论中的订单号是否真实存在，且订单状态是 paid。
+  - verify: |
+      使用 $logs 检查测试期间是否出现相关 ERROR。
+  - verify: 订单详情页展示支付成功
+  - agent: 根据所有验证结果分析本次测试风险
+  - notifySlack
+```
+
+这个例子里，`ui` 负责创建订单并输出订单信息；`verify` 用 `$database` 和 `$logs` 做外部验证，并给出通过或不通过的判断；`agent` 汇总验证结果和当前截图；`notifySlack` 是后面通过 runtime 扩展出来的自定义节点。
+
+这里有两类扩展方式：`$database` 和 `$logs` 这样的 `$name` 引用用来连接外部 skill；`prepareOrderFixture` 和 `notifySlack` 则是项目在 `midscene.config.ts` 里扩展出来的新 YAML 节点。
 
-在专业 UI Test 项目里，用例步骤通常是最清楚的部分：打开页面、执行用户路径、检查预期结果。真正的工程复杂度往往出现在用户路径周边，集中在要运行哪些 case、并发怎么控制、报告写到哪里、登录态、Cookie、测试账号、灰度环境、后端测试数据、云端设备连接和设备初始化这些项目配置里。
+## 扩展和集成能力
 
-这些内容不应该复制到每个 YAML case 里。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口。它是轻量 `config.yml` 形态长大之后的替代：用例发现、执行策略、输出位置、UI Agent 创建和运行时扩展都放进同一个类型化文件里。
+当项目从轻量 case 长成长期回归套件时，工程复杂度应该进入配置和扩展层，而不是塞回每个 YAML 文件。Midscene 提供 `midscene.config.ts` 作为项目级 config-as-code 入口，用来管理用例发现、执行策略、输出位置、UI Agent 创建和 runtime 扩展。
 
 ```ts
 import { defineMidsceneConfig } from '@midscene/testing-framework';
@@ -69,7 +124,7 @@ export default defineMidsceneConfig({
   },
 
   testDir: './e2e',
-  include: ['**/*.yaml', '**/*.test.ts'],
+  include: ['**/*.yaml'],
   exclude: ['**/*.draft.yaml'],
 
   testRunner: {
@@ -101,9 +156,7 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-这里的职责边界很清楚：`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent，普通项目不需要编写 `createUIAgent`。如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `createUIAgent` 里完全自行创建 UI Agent；这时 `target` 可以省略，避免同一份配置里出现两套运行目标定义。这样项目可以处理真实工程问题，但用例本身仍然适合 code review、业务确认和团队协作。
-
-当项目确实需要完全自定义运行目标时，可以把 UI Agent 创建逻辑放进 `createUIAgent`：
+`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent；如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `createUIAgent` 里完全自行创建 UI Agent，并省略 `target`，避免同一份配置里出现两套运行目标定义。
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -133,123 +186,7 @@ export default defineMidsceneConfig({
 });
 ```
 
-### 用 Runtime 扩展 YAML 节点
-
-YAML 应该保持可读，但有些团队需要在自然语言路径周围放少量项目专属动作：准备测试数据、切换账号、调用内部 API、发送通知，或封装重复的业务动作。这类动作可以在 `midscene.config.ts` 里注册成自定义 runtime，并作为新的 YAML 节点使用。
-
-```ts
-import {
-  defineMidsceneConfig,
-  defineRuntime,
-} from '@midscene/testing-framework';
-
-export default defineMidsceneConfig({
-  target: {
-    type: 'web',
-    options: {
-      url: 'http://127.0.0.1:3000',
-    },
-  },
-
-  testDir: './e2e',
-  include: ['**/*.yaml'],
-
-  runtime: {
-    useAccount: defineRuntime(async ({ input, context }) => {
-      const value = input;
-      if (typeof value !== 'string') {
-        throw new Error('useAccount expects an account name');
-      }
-
-      const account = await getAccount(value);
-      context.state.account = account;
-      await signInWithAccount(context.agent, account);
-
-      return {
-        conclusion: `Signed in as ${account.email}`,
-      };
-    }),
-  },
-});
-```
-
-YAML 文件可以混用 Midscene 内置 step 和项目自己的 step：
-
-```yaml
-flow:
-  - useAccount: smoke-buyer
-  - ui: Open my orders
-  - verify: |
-      使用 $database 验证订单 E2E-10001 的状态是 paid。
-  - verify: The order detail page shows Paid
-```
-
-规则刻意保持很小：`ui` 是面向界面行为的统一入口，`verify` 是面向断言和外部证据的统一入口，`agent` 是面向分析、归因和跨工具编排的入口。`$database` 这样的 `$name` 表示引用某个 skill，运行时引擎会自动解析和读取；`runtime` 则用于按项目需要扩展新的 YAML 节点，例如 `useAccount`。团队可以根据自己的业务和工程习惯决定要扩展哪些节点、扩展多少节点。Midscene 不要求为自定义节点 value 编写 schema；团队可以在 runtime handler 里按自己的工程习惯做参数校验。
-
-### `verify` 和 `agent`
-
-`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。它们不是新的 UI 操作入口，而是基于当前测试上下文做验证、分析和归因。两者本质上使用同一类 Agent 能力，区别在于 `verify` 带有测试判定语义：它需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 更适合做总结、分析、归因和建议。
-
-Pi Agent 在执行这些节点时会看到两类上下文：
-
-- 前序每个节点的输出，例如 `ui` 节点记录的结论、runtime 节点返回的 `conclusion`。
-- 当前截图，用来理解此刻页面或屏幕上的状态。
-
-它不会看到前序节点的完整执行过程。比如一个 `ui` 节点为了创建订单可能经历了多次点击、输入和重试，后续 `verify` / `agent` 只能看到这个节点最终输出了什么，以及当前截图是什么样。
-
-因此，如果后续步骤需要使用某个变量，应该让前面的节点把它明确写进输出里：
-
-```yaml
-flow:
-  - ui: |
-      创建一笔测试订单。
-
-      将这一步的输出命名为 createOrder，并记录：
-      - orderId: 订单号
-      - pageState: 当前页面状态
-
-  - verify: |
-      使用 $database 验证名为 createOrder 的输出中的 orderId 是否真实存在。
-
-  - agent: |
-      根据名为 createOrder 的输出、数据库验证结果和当前截图，分析本次测试风险。
-```
-
-这里的 `ui` 仍然只有自然语言输入。`createOrder` 是这段自然语言要求 Pi Agent 记录的输出名称，`orderId` 是该输出里的字段。后续节点可以直接用自然语言引用“名为 `createOrder` 的输出中的 `orderId`”。如果要调用外部能力，仍然使用 `$database` 这样的 skill 引用。
-
-### 走向 Agentic Testing
-
-UI 是用户体验的入口，也是最自然的测试起点。但很多真实缺陷并不会只停留在页面上：订单页面可能显示成功，数据库里却没有记录；按钮点击没有报错，日志里却出现了异常；用户路径看似完成，埋点、网络请求或下游状态却不符合预期。
-
-Agentic Testing 的核心理念是：从 UI 行为出发，把测试扩展成一次跨工具的证据收集和判断过程。Midscene 负责理解页面、操作界面和产出可回放报告，同时允许项目把数据库、日志、网络请求、埋点、通知系统和内部工具注册为测试能力。这样 YAML 仍然从用户路径开始，但验证可以继续深入到系统状态和业务风险。
-
-一个更完整的 case 可以长成这样：
-
-```yaml
-name: Create Order
-
-flow:
-  - prepareOrderFixture:
-      scenario: paid-order
-  - ui: |
-      使用测试账号登录系统，创建一笔测试订单。
-
-      在结论中记录：
-      - 订单号
-      - 当前页面状态
-      - 是否创建成功
-  - verify: |
-      使用 $database 验证前面结论中的订单号是否真实存在，且订单状态是 paid。
-  - verify: |
-      使用 $logs 检查测试期间是否出现相关 ERROR。
-  - verify: 订单详情页展示支付成功
-  - agent: 根据所有验证结果分析本次测试风险
-  - notifySlack
-```
-
-这里的核心仍然从 UI 切入：用户如何登录、如何创建订单、页面上发生了什么。项目可以在 UI 步骤前后插入准备数据、发送通知等自定义节点；后续步骤则把 UI 结论交给项目自己的能力继续验证：订单是否真的进入数据库，测试期间是否有错误日志，最终页面状态是否和业务状态一致。
-
-这里有两类能力：`$database` 和 `$logs` 这样的 `$name` 引用由运行时引擎自动解析和读取；`prepareOrderFixture` 和 `notifySlack` 则是项目在 `midscene.config.ts` 里扩展出来的新 YAML 节点：
+YAML 也可以按项目需要扩展新的节点。比如 `prepareOrderFixture` 和 `notifySlack` 可以注册成自定义 runtime：
 
 ```ts
 import {

From 9c50ba2c6e3b827f2b3f7a94e4c7c79e7a3d669c Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 3 Jun 2026 12:48:56 -0700
Subject: [PATCH 21/33] docs(site): refine agent node and pi agent description

---
 apps/site/docs/en/ui-testing-framework.mdx | 21 ++++++++++++++-------
 apps/site/docs/zh/ui-testing-framework.mdx | 21 ++++++++++++++-------
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index 666fd8a02a..ad64453607 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -47,7 +47,19 @@ The case remains close to business language, while the runner gives it a repeata
 
 ## Connect External Context with `verify` and `agent`
 
-`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. They are not new UI operation entries; they verify, analyze, and attribute results from the current test context. They use the same kind of Agent capability, but `verify` carries test judgment semantics: it must decide pass or fail, and a failed verification fails the current case. `agent` is better suited for summaries, analysis, attribution, and recommendations.
+`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. Pi is the lightweight agent framework used by OpenClaw (see [earendil-works/pi](https://github.com/earendil-works/pi)); Midscene embeds it to run these nodes. They are not new UI operation entries; they make judgments or explore freely from the current test context. They use the same kind of Agent capability, but differ in semantics: `verify` carries test judgment semantics — it must decide pass or fail, and a failed verification fails the current case. `agent` is a free-running agent with no fixed judgment semantics; it can be creative within the current test context — summarizing, attributing, investigating deeper, proposing follow-ups, and even deciding on its own what to look at or analyze next from a natural-language instruction.
+
+For example, you can let `agent` freely probe the current page for potential issues:
+
+```yaml
+flow:
+  - ui: Open the checkout page
+  - agent: |
+      Freely inspect the current checkout flow and find anything that looks off:
+      copy, prices, button states, potential usability problems.
+
+      List your findings with likely causes and follow-up suggestions.
+```
 
 When Pi Agent executes these nodes, it sees two kinds of context:
 
@@ -142,7 +154,6 @@ export default defineMidsceneConfig({
 
   uiAgentOptions: {
     aiActContext: 'The user is already signed in as a smoke-test account.',
-    cache: true,
     generateReport: true,
   },
 });
@@ -158,7 +169,7 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-`e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`. If a project needs custom devices, remote services, or internal team fixtures, it can create the UI Agent entirely inside `createUIAgent` and omit `target` to avoid defining the runtime target twice.
+`e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`. If a project needs custom devices, remote services, or custom agent construction logic, it can create the UI Agent entirely inside `createUIAgent` and omit `target` to avoid defining the runtime target twice.
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -169,14 +180,10 @@ export default defineMidsceneConfig({
 
   uiAgentOptions: {
     aiActContext: 'The user is already signed in as a smoke-test account.',
-    cache: true,
     generateReport: true,
   },
 
   async createUIAgent({ uiAgentOptions }) {
-    const account = await getSmokeAccount();
-    await prepareTestData(account);
-
     return {
       agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
         ...uiAgentOptions,
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index b70f376b98..0f43e4bc1a 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -47,7 +47,19 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 
 ## 用 `verify` 和 `agent` 连接外部能力
 
-`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。它们不是新的 UI 操作入口，而是基于当前测试上下文做验证、分析和归因。两者本质上使用同一类 Agent 能力，区别在于 `verify` 带有测试判定语义：它需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 更适合做总结、分析、归因和建议。
+`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。Pi 是 OpenClaw 采用的轻量 Agent 框架（参见 [earendil-works/pi](https://github.com/earendil-works/pi)），Midscene 内置它来执行这些节点。它们不是新的 UI 操作入口，而是基于当前测试上下文做判断或自由探索。两者本质上使用同一类 Agent 能力，区别在于语义：`verify` 带有测试判定语义，需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 则是一个自由运行的 Agent，没有固定的判定语义，可以在当前测试上下文里发挥创造力——总结、归因、深入排查、提出后续建议，甚至按自然语言要求自行决定接下来该看什么、分析什么。
+
+比如，可以让 `agent` 在当前页面上自由探查潜在问题：
+
+```yaml
+flow:
+  - ui: 打开结账页面
+  - agent: |
+      自由检查当前结账流程，找出任何看起来不合理的地方：
+      文案、价格、按钮状态、潜在的可用性问题。
+
+      列出你的发现，并给出可能的原因和后续建议。
+```
 
 Pi Agent 在执行这些节点时会看到两类上下文：
 
@@ -140,7 +152,6 @@ export default defineMidsceneConfig({
 
   uiAgentOptions: {
     aiActContext: 'The user is already signed in as a smoke-test account.',
-    cache: true,
     generateReport: true,
   },
 });
@@ -156,7 +167,7 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent；如果项目需要接入自定义设备、远程服务或团队内部 fixture，可以在 `createUIAgent` 里完全自行创建 UI Agent，并省略 `target`，避免同一份配置里出现两套运行目标定义。
+`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent；如果项目需要接入自定义设备、远程服务或自定义的 Agent 构造逻辑，可以在 `createUIAgent` 里完全自行创建 UI Agent，并省略 `target`，避免同一份配置里出现两套运行目标定义。
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -167,14 +178,10 @@ export default defineMidsceneConfig({
 
   uiAgentOptions: {
     aiActContext: 'The user is already signed in as a smoke-test account.',
-    cache: true,
     generateReport: true,
   },
 
   async createUIAgent({ uiAgentOptions }) {
-    const account = await getSmokeAccount();
-    await prepareTestData(account);
-
     return {
       agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
         ...uiAgentOptions,

From 267b88771575666f106705c72c01cdde0428f42a Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 3 Jun 2026 14:45:36 -0700
Subject: [PATCH 22/33] docs(site): clarify v2 framework positioning and Pi
 context contract

- Mark this as a new v2 framework, distinct from the existing YAML player
- Reframe verify/agent: verify is the deterministic gate; agent is
  exploratory and non-deterministic, advisory only (no pass/fail effect)
- Present Pi as a swappable agent layer aligned with the community direction
- Define the Pi context contract: past steps + their outputs + current UI,
  nothing more; clarify naming, skill-result lifetime, and output vs
  context.state channels
- Rewrite the Rstest section to emphasize lifecycle/fixtures/concurrency
  over raw runtime speed
---
 apps/site/docs/en/ui-testing-framework.mdx | 55 +++++++++++++++-------
 apps/site/docs/zh/ui-testing-framework.mdx | 53 ++++++++++++++-------
 2 files changed, 73 insertions(+), 35 deletions(-)

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index ad64453607..f635e2264d 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -2,6 +2,12 @@
 
 The hard part of UI testing is not writing the first browser script. It is keeping the suite readable, writable, and maintainable as the product changes. Traditional scripts quickly fill with selectors, waits, login helpers, data setup, and failure screenshots, until only a few test specialists can tell what they actually verify.
 
+:::info This is a brand-new v2 framework
+
+This page describes Midscene's newly designed v2 testing framework — a separate, new thing whose authoring model and positioning differ from the existing YAML player. This page covers the new framework only; migration and compatibility with the older version are out of scope here.
+
+:::
+
 Midscene is designed around three core ideas:
 
 - Cases must stay readable. Test authors write natural-language user paths in YAML, so QA, business teams, and engineers can review the case itself instead of first decoding a script implementation.
@@ -47,7 +53,12 @@ The case remains close to business language, while the runner gives it a repeata
 
 ## Connect External Context with `verify` and `agent`
 
-`verify` and `agent` nodes are executed by Midscene's built-in Pi Agent. Pi is the lightweight agent framework used by OpenClaw (see [earendil-works/pi](https://github.com/earendil-works/pi)); Midscene embeds it to run these nodes. They are not new UI operation entries; they make judgments or explore freely from the current test context. They use the same kind of Agent capability, but differ in semantics: `verify` carries test judgment semantics — it must decide pass or fail, and a failed verification fails the current case. `agent` is a free-running agent with no fixed judgment semantics; it can be creative within the current test context — summarizing, attributing, investigating deeper, proposing follow-ups, and even deciding on its own what to look at or analyze next from a natural-language instruction.
+`verify` and `agent` nodes are not new UI operation entries; they make judgments or explore freely from the current test context. There is a deliberate split here: Midscene itself focuses on UI capabilities (`ui` nodes are executed by Midscene's UI Agent), while nodes that need reasoning, orchestration, and external context — `verify` and `agent` — are handed to a **swappable, general-purpose agent framework**. The current built-in is Pi, the lightweight agent framework used by OpenClaw (see [earendil-works/pi](https://github.com/earendil-works/pi)). This layer is intentionally swappable: it may later be replaced with the Codex Agent SDK or other community options, so Midscene's testing capabilities evolve alongside the community agent ecosystem instead of being locked to one implementation.
+
+`verify` and `agent` use the same kind of Agent capability, but differ in **semantics** and in their **effect** on the test conclusion:
+
+- `verify` carries test judgment semantics: it must decide pass or fail, and a failed verification fails the current case. It is the test's **deterministic gate** — the part a regression suite actually uses to gate CI.
+- `agent` is a free-running agent with no fixed judgment semantics. It is about **room for creativity and imagination** — summarizing, attributing, investigating deeper, proposing follow-ups, and even deciding on its own what to look at or analyze next from a natural-language instruction. Precisely because of that freedom, face its other side honestly: its output is inherently non-deterministic, and the same case can surface different observations across two runs. So `agent` by default **does not participate in the case's pass/fail decision**; it produces human-facing diagnostics and suggestions, not regression assertions. When you need a stable, reproducible verdict, use `verify`; when you want the test to add a layer of exploration and insight beyond the UI, use `agent`.
 
 For example, you can let `agent` freely probe the current page for potential issues:
 
@@ -61,14 +72,15 @@ flow:
       List your findings with likely causes and follow-up suggestions.
 ```
 
-When Pi Agent executes these nodes, it sees two kinds of context:
+Every flow step produces an output. This forms an explicit **context contract**: when Pi Agent executes a `verify` or `agent` node, all it can see is —
 
-- The output of each previous node, such as conclusions recorded by `ui` nodes or `conclusion` values returned by runtime nodes.
-- The current screenshot, so it can understand the current page or screen state.
+- **Every previous step itself** — that is, what each step was asked to do (its intent).
+- **The output of each previous step**, such as conclusions recorded by `ui` nodes or `conclusion` values returned by runtime nodes.
+- **The current UI screenshot**, so it can understand the current page or screen state.
 
-It does not see the full execution process of previous nodes. For example, a `ui` node may click, type, and retry several times to create an order, but later `verify` / `agent` nodes only see that node's final output and the current screenshot.
+Nothing else. It does not see the full execution process of previous nodes: a `ui` node may click, type, and retry several times to create an order, but later `verify` / `agent` nodes only see **what that node finally output**. It also cannot see historical screenshots — only the current one.
 
-If a later step needs a variable, ask the earlier node to write it explicitly into its output:
+This yields one rule that holds throughout: **the only channel that carries anything forward is the output.** If a later step needs something, the earlier step must write it explicitly into its own output:
 
 ```yaml
 flow:
@@ -87,9 +99,9 @@ flow:
       and current screenshot.
 ```
 
-Here, `ui` still takes only natural-language input. `createOrder` is the output name requested in that natural-language instruction, and `orderId` is a field in that output. Later nodes can reference "the `orderId` from the output named `createOrder`" in natural language.
+Here, `ui` still takes only natural-language input. `createOrder` is the output name requested in that natural-language instruction, and `orderId` is a field in that output. Note that since every previous step's output is already in context, naming is **not** about "it won't pass forward unless named" — it is about referring to one specific output **unambiguously** among many. Later nodes can then reference "the `orderId` from the output named `createOrder`" in natural language.
 
-External systems stay in natural language as well. `$database`, `$logs`, and other `$name` references are resolved by the runtime engine as skills. Pi Agent uses skill results together with previous node outputs and the current screenshot when executing `verify` or `agent`.
+External systems stay in natural language as well. `$database`, `$logs`, and other `$name` references are resolved by the runtime engine as skills. Pi Agent uses skill results together with previous step outputs and the current screenshot for **that single** `verify` or `agent` run. But note: **a skill result belongs only to that run** and does not automatically enter the context of later nodes. If a later step needs it, the current node must write it into its own output.
 
 A fuller case can look like this:
 
@@ -118,7 +130,7 @@ flow:
 
 In this example, `ui` creates the order and records order information; `verify` uses `$database` and `$logs` for external checks and returns a pass or fail judgment; `agent` summarizes the verification results and current screenshot; `notifySlack` is a custom node added later through runtime.
 
-There are two kinds of extension here: `$database` and `$logs` are skill references for external context; `prepareOrderFixture` and `notifySlack` are new YAML nodes registered by the project in `midscene.config.ts`.
+The two kinds of extension here are **layered**, not competing: `$name` + skill is the **lightweight integration layer** — references like `$database` and `$logs` only need a registered skill, and then you can reference them directly in natural language at very low cost; `defineRuntime` (such as `prepareOrderFixture` and `notifySlack`) is the **lower-level extension** for defining standalone YAML nodes that own a whole step's execution. Use a `$name` skill when you just need to feed external context into `verify` / `agent`; use `defineRuntime` when you need full control over how a step runs.
 
 ## Extension and Integration
 
@@ -195,7 +207,7 @@ export default defineMidsceneConfig({
 });
 ```
 
-YAML can also gain new project-specific nodes. For example, `prepareOrderFixture` and `notifySlack` can be registered as custom runtimes:
+YAML can also gain new project-specific nodes. Compared with the lightweight `$name` skill integration, `defineRuntime` is the lower-level extension: it defines standalone YAML nodes that own a whole step's execution. For example, `prepareOrderFixture` and `notifySlack` can be registered as custom runtimes:
 
 ```ts
 import {
@@ -234,23 +246,30 @@ export default defineMidsceneConfig({
 });
 ```
 
+A runtime node has two channels, matching the context contract above — keep them distinct:
+
+- The `conclusion` in the return value is the **context-facing output**: like any other step's output, it enters the context of later `verify` / `agent` nodes.
+- `context.state` (such as `context.state.orderFixture`) is **engineering-facing TypeScript state** for passing structured data between runtime nodes, and **does not enter Pi Agent's context**. In other words, the agent cannot see `context.state`, only `conclusion`. To make a value available to a later `verify` / `agent`, put it in `conclusion`.
+
 This direction keeps the low-friction YAML-driven UI testing model intact. YAML remains the human-facing expression for the test, and TypeScript config remains the engineering entry for registering capabilities: ordinary paths stay in natural language, while places that need deterministic evidence can connect to the team's own tools.
 
 ## Built on Rstest
 
-Midscene is built as a higher-level testing framework on top of Rstest. Rstest provides the underlying lifecycle, fixture model, parallel execution, filtering, and CI-friendly runtime. It is also written in Rust for strong execution performance, so Midscene users get a high-performance test foundation by default. Midscene wraps those capabilities with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
+Midscene is built as a higher-level testing framework on top of Rstest. For an AI-driven UI testing framework, the real value is not how fast the runner is — each node's duration is dominated by model inference — but whether it can reliably carry the capabilities a test engineering setup needs: lifecycle, fixtures, concurrency, filtering, failure reporting, and CI integration. Rstest provides these at the base layer, and Midscene wraps them with natural-language cases, AI UI actions, visual assertions, screenshots, replay reports, and diagnostics.
 
 Most users can rely on that foundation through Midscene's YAML runner and `midscene.config.ts` without learning Rstest project details. The `midscene.config.ts` fields are intentionally aligned with Rstest concepts such as include/exclude, maxConcurrency, retry, timeout, setup, teardown, and reporters, while keeping Midscene-specific UI Agent creation in the same config.
 
-### Why Rstest Helps
+### What Rstest Provides
+
+Rstest gives the Midscene project a reliable test engineering base:
 
-Rstest gives the Midscene project a reliable engineering base:
+- **Standard test lifecycle**: setup / teardown / hooks give login setup, test-data initialization, and cleanup explicit attachment points instead of pushing them into every case.
+- **Fixture model**: declare shared prerequisites (accounts, device connections, fixture data) as reusable, composable fixtures, injected per case as needed.
+- **Concurrency and isolation**: cases can run concurrently, with the runner handling scheduling and isolation so a regression suite's total CI time stays manageable.
+- **Filtering and failure reporting**: filter cases by file, name, or tag, paired with standard failure reports for easy triage and reruns.
+- **Unified runtime model**: YAML cases, runtime nodes, and config extensions share the same underlying runtime model, so teams can start lightweight and grow into a long-lived regression suite without switching frameworks.
 
-- Case execution is backed by a standard test lifecycle, including setup, teardown, hooks, fixtures, and parallel execution.
-- The Rust-based runtime gives Midscene projects a performance-oriented execution layer by default.
-- Projects can naturally connect to CI, test filtering, failure reporting, and existing team testing habits.
-- YAML cases, runtime nodes, and config extensions share the same underlying runtime model.
-- Teams can start lightweight and still keep a path toward long-lived regression suites.
+Rstest is itself written in Rust with good execution performance; but for Midscene users, the mature test engineering capabilities above matter more than the runner's raw speed — in AI testing, the time is mostly spent on model inference.
 
 ### Next Steps
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 0f43e4bc1a..50006809a1 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -2,6 +2,12 @@
 
 UI Test 真正的难点，不是写出第一条浏览器脚本，而是让团队长期愿意写、看得懂、维护得起。传统脚本很快会被选择器、等待逻辑、登录辅助函数、测试数据准备和失败截图塞满，最后只有少数测试工程师能理解它们到底在验证什么。
 
+:::info 这是全新的 v2 测试框架
+
+本文描述的是 Midscene 全新设计的 v2 测试框架——一套独立的新事物，它的表达方式和定位都与现有 YAML player 不同。本文只介绍这套新框架本身，不涉及与旧版本的迁移或兼容。
+
+:::
+
 Midscene 的设计围绕三个核心要点展开：
 
 - 用例必须可读。测试作者用 YAML 写自然语言用户路径，QA、业务同学和工程师都能直接 review case 本身，而不是先读懂一套脚本实现。
@@ -47,7 +53,12 @@ YAML 可以把“一个用户路径应该是什么样”组织得足够清楚，
 
 ## 用 `verify` 和 `agent` 连接外部能力
 
-`verify` 和 `agent` 节点由 Midscene 内置的 Pi Agent 执行。Pi 是 OpenClaw 采用的轻量 Agent 框架（参见 [earendil-works/pi](https://github.com/earendil-works/pi)），Midscene 内置它来执行这些节点。它们不是新的 UI 操作入口，而是基于当前测试上下文做判断或自由探索。两者本质上使用同一类 Agent 能力，区别在于语义：`verify` 带有测试判定语义，需要给出通过或不通过的判断，不通过时会让当前 case 失败；`agent` 则是一个自由运行的 Agent，没有固定的判定语义，可以在当前测试上下文里发挥创造力——总结、归因、深入排查、提出后续建议，甚至按自然语言要求自行决定接下来该看什么、分析什么。
+`verify` 和 `agent` 节点不是新的 UI 操作入口，而是基于当前测试上下文做判断或自由探索。这里有一个有意为之的分工：Midscene 自身专注 UI 能力（`ui` 节点由 Midscene 的 UI Agent 执行）；而 `verify` 和 `agent` 这类需要推理、编排、连接外部上下文的节点，交给一个**可替换的通用 Agent 框架**来执行。当前内置的是 Pi——OpenClaw 采用的轻量 Agent 框架（参见 [earendil-works/pi](https://github.com/earendil-works/pi)）。这一层刻意做成可替换的：未来也可能换成 Codex Agent SDK 等社区方案，让 Midscene 的测试能力跟随社区 Agent 生态一起演进，而不是绑死在某一个实现上。
+
+`verify` 和 `agent` 使用同一类 Agent 能力，区别在于**语义**，以及它们对测试结论的**影响**：
+
+- `verify` 带有测试判定语义：它必须给出通过或不通过的结论，不通过会让当前 case 失败。它是测试的**确定性闸门**，是回归套件真正用来 gate CI 的部分。
+- `agent` 是一个自由运行的 Agent，没有固定判定语义，强调的是**创造和想象的空间**——总结、归因、深入排查、提出后续建议，甚至按自然语言要求自行决定接下来该看什么、分析什么。也正因为这种自由，要正视它的另一面：它的输出天然带有不确定性，同一个 case 两次运行可能给出不同的观察。因此 `agent` 默认**不参与 case 的通过/失败判定**，它产出的是供人阅读的诊断与建议，而不是回归断言。需要稳定、可复现地卡住结论时，用 `verify`；想让测试在 UI 之外多一层探索和洞察时，用 `agent`。
 
 比如，可以让 `agent` 在当前页面上自由探查潜在问题：
 
@@ -61,14 +72,15 @@ flow:
       列出你的发现，并给出可能的原因和后续建议。
 ```
 
-Pi Agent 在执行这些节点时会看到两类上下文：
+每个 flow 步骤都有输出。这构成了一条明确的**上下文契约**：当 Pi Agent 执行某个 `verify` 或 `agent` 节点时，它能看到的全部就是——
 
-- 前序每个节点的输出，例如 `ui` 节点记录的结论、runtime 节点返回的 `conclusion`。
-- 当前截图，用来理解此刻页面或屏幕上的状态。
+- **所有过往步骤本身**，也就是每一步要做什么（它的意图）。
+- **每个过往步骤的输出**，例如 `ui` 节点记录的结论、runtime 节点返回的 `conclusion`。
+- **当前 UI 截图**，用来理解此刻页面或屏幕上的状态。
 
-它不会看到前序节点的完整执行过程。比如一个 `ui` 节点为了创建订单可能经历了多次点击、输入和重试，后续 `verify` / `agent` 只能看到这个节点最终输出了什么，以及当前截图是什么样。
+除此之外，没有别的。它不会看到前序节点的完整执行过程：一个 `ui` 节点为了创建订单可能经历了多次点击、输入和重试，后续 `verify` / `agent` 只能看到这个节点**最终输出了什么**。它也看不到历史截图——只有当前这一张。
 
-因此，如果后续步骤需要使用某个变量，应该让前面的节点把它明确写进输出里：
+由此得到一条贯穿始终的规则：**唯一能往后传递的通道就是 output。** 后续步骤要用到某个东西，前面那一步就必须把它明确写进自己的输出里：
 
 ```yaml
 flow:
@@ -86,9 +98,9 @@ flow:
       根据名为 createOrder 的输出、数据库验证结果和当前截图，分析本次测试风险。
 ```
 
-这里的 `ui` 仍然只有自然语言输入。`createOrder` 是这段自然语言要求 Pi Agent 记录的输出名称，`orderId` 是该输出里的字段。后续节点可以直接用自然语言引用“名为 `createOrder` 的输出中的 `orderId`”。
+这里的 `ui` 仍然只有自然语言输入。`createOrder` 是这段自然语言要求 Pi Agent 记录的输出名称，`orderId` 是该输出里的字段。需要说明的是：既然所有过往步骤的输出本就都在上下文里，命名**不是**“不命名就传不过去”，而是为了在多个输出之间**无歧义地指代**某一个——后续节点可以直接用自然语言引用“名为 `createOrder` 的输出中的 `orderId`”。
 
-对外部系统的引用也保持在自然语言里。`$database`、`$logs` 这样的 `$name` 会被运行时引擎解析为对应 skill；Pi Agent 会把 skill 结果、前序节点输出和当前截图一起用于 `verify` 或 `agent`。
+对外部系统的引用也保持在自然语言里。`$database`、`$logs` 这样的 `$name` 会被运行时引擎解析为对应 skill；Pi Agent 会把 skill 结果、过往步骤的输出和当前截图一起，用于**当前这一次** `verify` 或 `agent`。但要注意：**skill 结果只属于这一次执行**，不会自动进入后续节点的上下文。如果后面还要用到，需由当前节点把它写进自己的输出。
 
 一个更完整的 case 可以长成这样：
 
@@ -116,7 +128,7 @@ flow:
 
 这个例子里，`ui` 负责创建订单并输出订单信息；`verify` 用 `$database` 和 `$logs` 做外部验证，并给出通过或不通过的判断；`agent` 汇总验证结果和当前截图；`notifySlack` 是后面通过 runtime 扩展出来的自定义节点。
 
-这里有两类扩展方式：`$database` 和 `$logs` 这样的 `$name` 引用用来连接外部 skill；`prepareOrderFixture` 和 `notifySlack` 则是项目在 `midscene.config.ts` 里扩展出来的新 YAML 节点。
+这里的两种扩展方式是**分层**的，并不冲突：`$name` + skill 是**轻量接入层**——像 `$database`、`$logs` 这样的 `$name` 引用，只要注册好对应 skill，就能在自然语言里直接引用，接入成本很低；`defineRuntime`（如 `prepareOrderFixture`、`notifySlack`）是**更底层的扩展方案**，用来定义独立的 YAML 节点、接管一整步的执行逻辑。需要快速把外部上下文喂给 `verify` / `agent`，就用 `$name` skill；需要完全掌控一个步骤怎么跑，就用 `defineRuntime`。
 
 ## 扩展和集成能力
 
@@ -193,7 +205,7 @@ export default defineMidsceneConfig({
 });
 ```
 
-YAML 也可以按项目需要扩展新的节点。比如 `prepareOrderFixture` 和 `notifySlack` 可以注册成自定义 runtime：
+YAML 也可以按项目需要扩展新的节点。相比 `$name` skill 的轻量接入，`defineRuntime` 是更底层的扩展方案：它定义独立的 YAML 节点、接管整步执行逻辑。比如 `prepareOrderFixture` 和 `notifySlack` 可以注册成自定义 runtime：
 
 ```ts
 import {
@@ -232,23 +244,30 @@ export default defineMidsceneConfig({
 });
 ```
 
+runtime 节点有两条信道，对应上面讲过的上下文契约，要分清：
+
+- 返回值里的 `conclusion` 是**面向上下文的输出**，会和其它步骤的输出一样进入后续 `verify` / `agent` 的上下文。
+- `context.state`（如 `context.state.orderFixture`）是**面向工程的 TypeScript 状态**，供 runtime 节点之间传递结构化数据，**不会进入 Pi Agent 的上下文**。换句话说，agent 看不到 `context.state`，只看得到 `conclusion`。要让某个值被后续的 `verify` / `agent` 用到，就得把它放进 `conclusion`。
+
 这条路线不会丢掉 YAML 驱动 UI Test 的低门槛。相反，它把 YAML 作为面向人的测试表达，把 TypeScript 配置作为面向工程的能力注册入口：普通路径继续用自然语言描述，真正需要确定性证据的地方再接入团队自己的工具。
 
 ## 基于 Rstest 构建
 
-Midscene 是基于 Rstest 封装构建的上层测试框架。Rstest 在底层提供测试生命周期、fixture 模型、并发执行、用例过滤和适合 CI 的运行时能力。它基于 Rust 编写，具备更高的执行性能，因此 Midscene 用户默认就能获得高性能的测试底座。Midscene 则把这些能力封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
+Midscene 是基于 Rstest 封装构建的上层测试框架。对一个 AI 驱动的 UI 测试框架来说，真正的价值不在 runner 本身有多快——每个节点的耗时主要由模型推理决定——而在于它能不能稳稳地接住一套测试工程该有的能力：生命周期、fixture、并发、用例过滤、失败上报和 CI 接入。Rstest 在底层提供了这些，Midscene 则把它们封装成自然语言用例、AI UI 操作、视觉断言、截图、回放报告和诊断信息。
 
 绝大多数用户可以通过 Midscene 的 YAML runner 和 `midscene.config.ts` 直接使用这套底座，无需了解 Rstest 的项目细节。`midscene.config.ts` 的字段会刻意和 Rstest 的概念对齐，例如 include/exclude、maxConcurrency、retry、timeout、setup、teardown 和 reporters，同时把 Midscene 特有的 UI Agent 创建入口留在同一个配置里。
 
 ### Rstest 提供的工程能力
 
-Rstest 为 Midscene 项目提供可靠的工程底座：
+Rstest 为 Midscene 项目提供可靠的测试工程底座：
+
+- **标准测试生命周期**：setup / teardown / hook 给登录态准备、测试数据初始化和清理提供明确的挂载点，而不必把这些塞进每个用例。
+- **Fixture 模型**：把共享的前置依赖（账号、设备连接、fixture 数据）声明成可复用、可组合的 fixture，并按用例需要注入。
+- **并发与隔离**：用例可以并发执行，由 runner 负责调度与隔离，让回归套件在 CI 上的整体耗时可控。
+- **用例过滤与失败上报**：按文件、名称或标签筛选用例，配合标准的失败报告，方便定位和重跑。
+- **统一运行模型**：YAML case、runtime 节点和配置扩展共享同一个底层运行模型，团队可以从轻量项目起步，再自然长成长期回归套件，而不必更换框架。
 
-- 用例执行有标准测试生命周期支撑，包括 setup、teardown、hook、fixture 和并发能力。
-- 基于 Rust 的运行时让 Midscene 项目默认拥有面向性能优化的执行层。
-- 项目可以自然接入 CI、测试过滤、失败报告和团队已有的测试工程习惯。
-- YAML case、runtime 节点和配置扩展共享同一个底层运行模型。
-- 团队可以从轻量项目开始，同时保留成长为长期回归套件的路径。
+Rstest 本身基于 Rust 编写、执行层性能良好；但对 Midscene 用户而言，更有价值的是上面这套成熟的测试工程能力，而不是 runner 的原始速度——毕竟在 AI 测试里，时间主要花在模型推理上。
 
 ### 下一步
 

From 1b78400b244cb2a1403740fe1300f9c5a7a47286 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Wed, 3 Jun 2026 15:25:53 -0700
Subject: [PATCH 23/33] docs(workflow): add v2 testing framework phase-0 RFC

Phase-0 design draft for the v2 AI-native testing framework: node model
(ui/verify/soft/agent + runtime nodes), midscene.config.ts with a single
uiAgent union (declarative or factory), defineRuntime, $name skills reusing
Pi's built-in Skills, the verify verdict contract (report_verdict customTool,
fail-closed), the natural-language output model, and the Pi context contract.
Records all settled decisions; only open item is Pi custom base-URL parity.
---
 rfcs/0001-v2-testing-framework-phase0.md | 446 +++++++++++++++++++++++
 1 file changed, 446 insertions(+)
 create mode 100644 rfcs/0001-v2-testing-framework-phase0.md

diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
new file mode 100644
index 0000000000..024b626025
--- /dev/null
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -0,0 +1,446 @@
+# RFC 0001 · v2 Testing Framework — Phase 0 设计稿
+
+状态：**草稿 / 待讨论**
+范围：只覆盖 Phase 0 —— 节点模型、`midscene.config.ts`、`defineRuntime` / `$name` skill、verify 判定契约、output 契约与护栏、上下文装配。
+不覆盖：Pi 内部实现、Rstest 接线细节、v1→v2 迁移工具。
+
+> 本稿目标：把"动手前必须先定的接口"钉死成可评审的草案。每节末尾的 **🔶 待讨论** 是我留的开放决策点。
+
+---
+
+## 0. 术语与分层回顾（已达成共识，作为前提）
+
+- **新引擎，不改 `ScriptPlayer`**：v2 作为新包 `@midscene/testing-framework`，把 `@midscene/core` 的 `Agent` / Device / `ReportGenerator` 当库复用。
+- **两类 Agent**：`ui` 节点 → Midscene **UI Agent**（`agent.aiAct` 等）；`verify` / `agent` 节点 → **可替换的通用 Agent 层**（当前 Pi）。
+- **上下文契约**：执行任一 `verify` / `agent` 时，可见上下文 = `所有过往步骤(含意图) + 每步输出 + 当前 UI`，**没有别的**。
+- **判定语义**：`verify` = 确定性闸门（gate CI）；`agent` = 探索性、非确定、**不参与 pass/fail**。
+
+---
+
+## 1. 用例文件（v2 YAML schema）
+
+### 1.1 顶层
+
+```yaml
+name: Create Order          # 可选，人类可读名
+flow:                       # 有序步骤列表
+  - <step>
+  - <step>
+```
+
+不再有 v1 的 `web:` / `android:` / `tasks:` 等顶层环境字段——**环境/target 全部移到 `midscene.config.ts`**。用例文件只描述"用户要完成什么"。
+
+### 1.2 步骤（step）
+
+每个 step 是一个单键 map，键 = 节点类型或自定义节点名，值 = 该节点的输入。
+
+**内置节点（`ui` / `verify` / `agent`）的输入只有自然语言**，输出也用自然语言描述——**不引入 schema**。YAML 就是为简单而生，保持纯文本：
+
+```yaml
+flow:
+  - ui: Search for "running shoes"
+  - ui: |
+      创建一笔测试订单。
+      将这一步的输出命名为 createOrder，记录订单号 orderId 和页面状态 pageState。
+  - verify: The product detail page shows a visible Add to cart button
+  - agent: Freely inspect this page for anything that looks off
+```
+
+**自定义（runtime）节点的输入可以是 object**（指令不一定是文本）：
+
+```yaml
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+```
+
+> 规则：内置节点的值是字符串；自定义节点的值可为字符串或 object，整个值作为 `input` 交给 runtime（见 §3）。
+
+### 1.3 内置节点类型
+
+| 节点 | 执行者 | 语义 | 能否 gate |
+|---|---|---|---|
+| `ui` | UI Agent | 自然语言 UI 操作 | 操作失败抛错 → case 失败 |
+| `verify` | Pi Agent | 带判定的断言，必须给出 pass/fail | **是** |
+| `soft` | Pi Agent | 软断言，同 `verify` 但失败只记 warning（§6.1） | **否** |
+| `agent` | Pi Agent | 自由探索，产出诊断/建议 | **否**（advisory） |
+| `<自定义名>` | runtime（TS） | 项目扩展节点（见 §3） | 抛错 → case 失败 |
+
+**已定：`verify` / `agent` 只读 UI。** 它们只观察"当前截图"+ 调 skill，**不驱动页面**（不点击、不输入）；驱动页面只由 `ui` 和 runtime 负责。理由：gating 可控、避免 agent 中途把应用点到别处破坏后续步骤。"让 agent 自主驱动 UI 深查"作为后续扩展，Phase 0 不做。
+
+---
+
+## 2. `midscene.config.ts`
+
+```ts
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  // —— 运行目标：单字段 uiAgent，容纳配置式与编程式（见 §2.1）——
+  uiAgent:
+    | { type: 'web' | 'android' | 'ios' | 'computer'; options: Record<string, unknown> }
+    | ((ctx: UIAgentFactoryCtx) => Promise<{ agent: Agent }>);
+
+  // —— 用例发现 ——
+  testDir: string;
+  include?: string[];                            // 默认 ['**/*.yaml']
+  exclude?: string[];
+
+  // —— 执行策略（对齐 Rstest 概念）——
+  testRunner?: {
+    maxConcurrency?: number;
+    bail?: number;
+    testTimeout?: number;
+    retry?: number;
+  };
+
+  // —— 输出 ——
+  output?: {
+    summary?: string;
+    reportDir?: string;
+  };
+
+  // —— 共享 UI Agent 参数 ——
+  uiAgentOptions?: UIAgentOptions;               // aiActContext, generateReport, ...
+
+  // —— 扩展点 ——
+  runtime?: Record<string, RuntimeNode>;         // 自定义 YAML 节点（§3）
+  agentRuntime?: AgentRuntimeAdapter;            // Pi 的替换点（§6）
+});
+```
+
+**没有 `skills` 字段。** `$name` skill 不在 config 里注册——由 Pi 自行发现与加载，框架只负责"识别 `$name` 并交给 Pi"（见 §4）。
+
+### 2.1 运行目标：单字段 `uiAgent`（已定）
+
+**决定：用单个 `uiAgent` key 同时容纳配置式与编程式**（即上一轮的方案 b），且 key 名从 `target` 改为 `uiAgent`——和 `uiAgentOptions`、`RuntimeNodeContext.uiAgent` 统一命名，一眼看出这字段就是"UI Agent 怎么来"。
+
+- 值是**对象** → 配置式：框架据 `type + options` 创建 UI Agent。
+- 值是**函数** → 编程式：项目完全掌控构造。
+
+两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。
+
+**配置式样例：**
+
+```ts
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  uiAgent: {
+    type: 'web',
+    options: { url: 'https://shop.example.com' },   // 平台连接参数
+  },
+
+  testDir: './e2e',
+  include: ['**/*.yaml'],
+
+  testRunner: { maxConcurrency: 2, testTimeout: 120_000 },
+  output: {
+    summary: './midscene_run/output/summary.json',
+    reportDir: './midscene_run/report',
+  },
+
+  uiAgentOptions: {                                  // Agent 行为参数
+    aiActContext: 'The user is already signed in as a smoke-test account.',
+    generateReport: true,
+  },
+});
+```
+
+**编程式样例（同一个 `uiAgent` key，填工厂函数）：**
+
+```ts
+import { agentFromAdbDevice } from '@midscene/android';
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  uiAgent: async ({ uiAgentOptions, env }) => ({
+    agent: await agentFromAdbDevice(env.ANDROID_DEVICE_ID, {
+      ...uiAgentOptions,
+      androidAdbPath: env.ANDROID_ADB_PATH,
+      autoDismissKeyboard: false,
+    }),
+  }),
+
+  testDir: './e2e',
+  uiAgentOptions: {
+    aiActContext: 'The user is already signed in as a smoke-test account.',
+    generateReport: true,
+  },
+});
+```
+
+### 2.2 配套的完整用例 YAML
+
+用例文件里**没有任何环境/target**——那些都在 `midscene.config.ts`。`e2e/create-order.yaml` 就是纯 flow：
+
+```yaml
+name: Create Order
+
+flow:
+  - prepareOrderFixture:            # 自定义节点，input 为 object
+      scenario: paid-order
+
+  - ui: |                          # UI Agent，纯自然语言
+      使用测试账号登录并创建一笔测试订单。
+      将这一步的输出命名为 createOrder，记录订单号 orderId 与是否创建成功。
+
+  - verify: |                      # Pi，$name skill 由 Pi 按需加载，强制 verdict
+      使用 $database 验证名为 createOrder 的输出中的 orderId 真实存在，且状态为 paid。
+
+  - verify: |
+      使用 $logs 检查测试期间是否出现相关 ERROR。
+
+  - verify: 订单详情页展示支付成功    # 纯 UI 截图判定
+
+  - agent: 根据以上所有验证结果与当前截图，分析本次测试风险并给出后续建议  # advisory，不 gate
+
+  - notifySlack                    # 自定义节点
+```
+
+目录结构：
+
+```text
+.
+  midscene.config.ts
+  e2e/
+    create-order.yaml
+    checkout.yaml
+```
+
+---
+
+## 3. `defineRuntime` —— 自定义节点（更底层扩展）
+
+```ts
+type RuntimeNode = (ctx: RuntimeNodeContext) => Promise<RuntimeNodeResult>;
+
+interface RuntimeNodeContext {
+  input: unknown;                 // 该节点的 YAML 值（字符串或 object）
+  uiAgent: Agent;                 // UI Agent，runtime 也可驱动页面
+  outputs: OutputStore;           // 所有过往"面向上下文的输出"（只读）
+  state: Record<string, unknown>; // ★ TS 侧状态，agent 看不到（见 §7）
+  result: TestResultSoFar;        // 当前 case 已累积的结果
+  env: NodeJS.ProcessEnv;
+}
+
+interface RuntimeNodeResult {
+  conclusion: string;                       // ★ 面向上下文的输出，进 Pi 上下文
+  output?: Record<string, unknown>;         // 可选结构化输出（同样进上下文）
+}
+
+function defineRuntime(node: RuntimeNode): RuntimeNode;
+```
+
+要点：
+- `conclusion`（和可选 `output`）= **面向上下文信道**，进后续 `verify` / `agent`。
+- `state` = **面向工程信道**，runtime 节点之间传结构化数据，**不进 agent 上下文**。
+- runtime 抛错 → 该 case 失败。
+
+---
+
+## 4. `$name` skill —— 复用 Pi 自己的 Skills 机制
+
+**已核对 Pi（earendil-works/pi）的实际能力**，结论：不用我们造轮子，直接复用 Pi 内建的 Skills。
+
+Pi 的 Skills = Anthropic Agent-Skills 模型：每个 skill 是一个含 `SKILL.md`（YAML frontmatter：`name` + `description` + markdown 指令）的目录；**渐进式披露**——启动时只把各 skill 的 `name`/`description` 放进 system prompt，**完整指令按需加载**；模型在任务匹配时自行决定加载哪个。来源包括目录扫描、`package.json` 的 `skills/`、settings 的 `skills` 数组、CLI `--skill`。
+
+Pi 的可嵌入 SDK（`@earendil-works/pi-coding-agent`）提供了我们需要的全部接线点：
+
+```ts
+import { createAgentSession, SessionManager, DefaultResourceLoader }
+  from '@earendil-works/pi-coding-agent';
+
+// 1) 把项目的 skills 提供给 Pi（让其 description 进上下文）
+const loader = new DefaultResourceLoader({
+  skillsOverride: (cur) => ({ skills: [...cur.skills, ...projectSkills], diagnostics: cur.diagnostics }),
+});
+
+// 2) 创建会话
+const { session } = await createAgentSession({
+  sessionManager: SessionManager.inMemory(),
+  resourceLoader: loader,
+});
+
+// 3) 跑一个 verify/agent 节点：当前截图直接作为 image 传入
+await session.prompt(assembledContext, {
+  images: [{ type: 'image', source: { type: 'base64', mediaType: 'image/png', data: screenshotBase64 } }],
+});
+```
+
+**框架的全部职责，就这些：**
+1. 把项目可用的 skills 通过 `resourceLoader` / `skillsOverride` 交给 Pi（描述进上下文）。
+2. 组装上下文（§7）+ 当前截图（走 `prompt` 的 `images`）喂给 Pi。
+3. 节点自然语言里的 `$database`、`$logs` 等 `$name` token 作为**引导**，让 Pi 自行按需加载对应 skill。
+
+之后"加载哪个、怎么调、调几次"全由 Pi 决定，框架不介入。
+
+**关于 `$name` 的激活方式（对应你的判断）**：Pi SDK **没有**"按名字强制激活 skill"的程序化入口——skill 是靠模型推理按需加载的。所以 `$name` 落地为**你说的那个方案：在 prompt 里引导 Pi 自行加载**。代价是多一步模型决策、略慢，但实现零特殊接口、最贴合 agentic。`$name` 这个显式 token 恰好是很强的加载信号，比纯靠 description 匹配更稳。
+
+可选增强（非 Phase 0 必需）：框架可以静态提取 `$name` 集合，用来 ① 校验引用的 skill 是否存在（不存在直接报错，避免静默跑空）；② 把被引用的 skill 描述在 prompt 里置顶强调。但**激活本身仍是 Pi 按需加载**。
+
+生命周期（已入用户文档）：**skill 结果只属于这一次执行**，不自动进后续上下文；要留就由当前节点写进自己的 output。
+
+### 4.1 Pi 接线：已确认 vs 唯一缺口
+
+已对照 Pi SDK 文档核实，下面这些**都已存在**，足够支撑 Phase 0：
+
+| 需求 | Pi SDK | 状态 |
+|---|---|---|
+| 单节点跑完整 agent loop（多轮工具调用直到结束） | `session.prompt()` 跑完整 loop，turn 结束才 resolve | ✅ |
+| 读 agent 最终结果 | `subscribe` 的 `turn_end` 事件，带 `message` + `toolResults` | ✅ |
+| 注入当前截图 | `prompt(text, { images: [{ base64 png }] })` | ✅ |
+| 自定义 tool（verify 的 verdict 工具） | `customTools: [defineTool(...)]` 或 extension `pi.registerTool` | ✅ |
+| skills 注入 | `DefaultResourceLoader` + `skillsOverride` | ✅ |
+| 选模型 / 鉴权 | `getModel(provider, model)`；`AuthStorage.setRuntimeApiKey` 或 env | ✅ |
+
+🔶 **唯一真实对接项（C′）**：Pi 文档**没看到 base URL override**。Midscene 走 `MIDSCENE_MODEL_BASE_URL`（自定义 / OpenAI 兼容端点）。要让 `verify`/`agent` 和 `ui` 用**同一个模型端点**，必须确认 Pi 能否指定自定义 base URL / 兼容 provider。这是和 Pi 的**唯一一个需要落实的依赖**——其余都齐了。
+
+> 所以回答"还要确认啥"：设计层面已闭合；剩下的就这一条 base URL 能力，去 Pi 源码/最新文档核一下即可，不行就和 Pi 团队提。
+
+---
+
+## 5. output —— 纯自然语言，不做 schema
+
+**决定：output 没有 schema。** YAML 就是为简单而生，schema 会让作者搞不清楚，违背初衷。每个步骤的输出就是一段自然语言——要命名、要记哪些字段，都在自然语言里说清楚：
+
+```yaml
+- ui: |
+    创建一笔测试订单。
+    将这一步的输出命名为 createOrder，记录订单号 orderId 和页面状态 pageState。
+```
+
+后续节点同样用自然语言引用"名为 createOrder 的输出中的 orderId"。命名是为了**无歧义指代**，不是为了校验。
+
+**已知取舍（明确接受）**：output 是 LLM 生成的自然语言，缺字段不会硬失败——"静默丢字段"的风险在 Phase 0 **不做引擎级护栏**。
+
+**后续迭代的兜底**：真要确定性校验时，单独做一个**校验代码节点**（一个 runtime 节点形态的 TS 校验，从 `outputs` 里取值、用代码断言、不通过就 fail），而**不是**往 YAML 里塞 schema。把"确定性证据"留在 TS 侧，YAML 侧保持纯自然语言。这条排进 Phase 0 之后。
+
+---
+
+## 6. verify 判定契约
+
+`verify` 跑的是 Pi Agent（自由推理），但必须落到结构化判定。
+
+**提案：verify 节点强制收尾一个结构化 verdict。**
+
+```ts
+interface Verdict {
+  pass: boolean;
+  reason: string;            // 人类可读判定依据
+  evidence?: unknown;        // 可选：截图引用、skill 返回片段等
+}
+```
+
+**落地方式（已据 Pi SDK 确认）**：Pi 没有原生"强制 JSON 输出"，但有 `customTools`——所以引擎给 `verify` 这次运行注册一个 `report_verdict` 工具，并在 prompt 里要求 agent 在收尾时调用它；verdict 从 `turn_end` 的 `toolResults` 里取：
+
+```ts
+const reportVerdict = defineTool({
+  name: 'report_verdict',
+  description: '在判定完成时调用，提交本次 verify 的结论',
+  parameters: Type.Object({
+    pass: Type.Boolean(),
+    reason: Type.String(),
+    evidence: Type.Optional(Type.Unknown()),
+  }),
+  execute: async (_id, v) => v,        // 引擎从 toolResults 读回
+});
+```
+
+失败模型 **fail-closed**：
+- `pass === false` → 该 case 失败；
+- agent 没调 `report_verdict` / 无法解析 → **也判失败**（不确定一律按失败处理）；
+- `reason` 始终写进报告。
+
+`agent` 节点不收 verdict，其输出永远不改变 case 的 pass/fail。
+
+### 6.1 `soft` —— 过渡期软断言（已定：做）
+
+给一个"想看但还不想 gate"的档位：`soft` 和 `verify` 用法完全一样、同样产出 `Verdict` 进报告，**区别只在失败时不让 case 变红、不中断后续步骤**（只记录为 warning）。
+
+```yaml
+flow:
+  - verify: 订单详情页展示支付成功        # 失败 → case 红
+  - soft: 页面没有明显的布局错位      # 失败 → 仅记录 warning，不 gate
+```
+
+为什么做成**独立节点**而不是给 `verify` 加 `soft: true` 标志：§1.2 定了内置节点输入只有自然语言、不带 object/flag。新增一个 `soft` 节点类型，既保住"纯自然语言输入"，又把"软/硬"表达得一眼清楚。
+
+命名：**`soft`**（已定）。在 flow 里紧挨 `verify` 出现，`- soft: ...` 自然读作"soft (verify)"，短、够清楚。
+
+失败模型：`soft` `pass:false` → 记 warning，**不改变** case pass/fail；其余与 `verify` 一致（未产出 verdict 也按 warning 处理）。
+
+---
+
+## 7. 上下文装配（把文档契约形式化）
+
+执行某个 `verify` / `agent` 时，引擎注入 Pi 的上下文**精确等于**：
+
+```
+对每个过往步骤（按顺序）：
+  - 节点类型 + 指令（自然语言文本，或自定义节点的 object 输入）
+  - 该步骤的输出（自然语言；runtime 节点为其 conclusion）
+  - 若是 verify：其 pass/fail 与 reason
++ 当前 UI 截图（仅当前这一张）
++ 本节点预载入 Pi 的 skills（见 §4）
+```
+
+**显式排除**（"没有别的"）：执行过程 trace、历史截图、`context.state`、过往 skill 调用的中间结果。
+
+**已定：Phase 0 不截断。** 长 flow 的上下文会随"所有过往输出"线性增长，但我们选择**预测性 > 紧凑性**（一截断"可推理"卖点就破）。截断/压缩策略后面要加也容易，先不做。
+
+---
+
+## 8. 失败模型汇总
+
+| 情况 | 结果 |
+|---|---|
+| `ui` 操作失败抛错 | case 失败 |
+| `verify` `pass:false` | case 失败 |
+| `verify` 未产出 / 无法解析 verdict | case 失败（fail-closed） |
+| `soft` `pass:false` 或未产出 | 记 warning，**不改变** case pass/fail |
+| `agent` 内部出错 | 记录为诊断，**不改变** case pass/fail |
+| runtime 节点抛错 | case 失败 |
+
+---
+
+## 9. 端到端示例
+
+完整的 `midscene.config.ts` + 用例 YAML 配套样例见 **§2.1 / §2.2**（贯穿 `uiAgent`、自定义节点、`ui`、`verify`、`agent`、`$name` skill 的全链路）。`soft` 的用法见 **§6.1**。
+
+---
+
+## 10. 决策状态汇总
+
+### 已定（本轮拍板）
+
+| 决策 | 结论 |
+|---|---|
+| `ui`/`verify`/`agent` 输入 | 纯自然语言，无 schema |
+| `verify` / `agent` 与 UI | 只读，不驱动页面（驱动留给后续扩展） |
+| output | 纯自然语言，无 schema；确定性校验后续做成 TS 校验节点 |
+| config `skills` 字段 | 不要；skill 由 Pi 自行发现/加载 |
+| 框架对 skill 的职责 | 只识别 `$name` + 调 Pi 方法预载入，其余交给 Pi |
+| `RuntimeNodeContext` 字段名 | `agent` → `uiAgent` |
+| verify 判定契约 | 做：`report_verdict` customTool + `turn_end.toolResults`，fail-closed（§6） |
+| 软断言（F） | 做：独立节点 **`soft`**，失败只记 warning（§6.1） |
+| 运行目标（B） | 单字段 `uiAgent`，union 容纳配置式对象 / 编程式工厂（§2.1） |
+| skill 机制（C） | 复用 Pi 内建 Skills；框架经 `resourceLoader` 提供、`$name` 在 prompt 里引导按需加载（§4） |
+| Pi 接线（loop / 截图 / tool / 模型） | 已确认 SDK 支持（§4.1） |
+| 节点指令形态 | 内置=文本；自定义=文本或 object |
+| 长 flow 上下文 | 不截断（Phase 0） |
+
+### 唯一待对接
+
+| # | 事项 | 状态 |
+|---|---|---|
+| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | 文档未见；**去 Pi 源码/最新文档核实，不行则与 Pi 团队对接**（§4.1） |
+
+---
+
+## 附：Phase 0 之后（不在本稿讨论范围，仅备忘）
+
+- Pi `AgentRuntimeAdapter` 的最小接口（让 Codex Agent SDK 等可替换）。
+- Rstest 接线：用例 → 虚拟测试模块 → 生命周期/fixture 映射。
+- 报告：复用 core `ReportGenerator`，把 verify verdict / agent 诊断如何呈现。
+- v1→v2 转译器（可选、外挂）。

From 8bf4ea579544c2dba0c380068f5970e6f7212a8a Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 3 Jun 2026 22:55:49 +0000
Subject: [PATCH 24/33] feat(testing-framework): implement v2 testing framework
 Phase 0 (RFC 0001)

Add the new package @midscene/testing-framework implementing the Phase 0
contracts from rfcs/0001-v2-testing-framework-phase0.md:

- defineMidsceneConfig / defineRuntime authoring helpers
- v2 YAML case schema parser (name + flow; single-key step nodes)
- node engine: ui / verify / soft / agent / custom runtime nodes
- verify verdict contract with fail-closed semantics; soft = warning-only
- context assembly (past intents + outputs + verdicts + current screenshot)
- output store and the output-as-only-forward-channel contract
- single `uiAgent` field (config object | factory) for the run target
- default Pi-backed agent runtime; swappable via agentRuntime
- lightweight runner + `midscene-tf` CLI (Rstest wiring out of scope)

Resolve decision C': point Pi at the same OpenAI-compatible endpoint as the
UI Agent via ModelRegistry.registerProvider({ baseUrl, apiKey, models }), so
verify/agent and ui share MIDSCENE_MODEL_BASE_URL.

Add a copy-out demo under example/, unit tests, and smoke scripts (Pi
wiring + real-browser engine validated; model-backed smoke documented for
networked environments). Sync the en/zh design docs to the decided uiAgent
field and flattened runtime context.
---
 apps/site/docs/en/ui-testing-framework.mdx    |   52 +-
 apps/site/docs/zh/ui-testing-framework.mdx    |   51 +-
 example/.gitignore                            |    3 +
 example/README.md                             |   55 +
 example/e2e/add-to-cart.yaml                  |   26 +
 example/e2e/product-detail.yaml               |   12 +
 example/midscene.config.ts                    |   73 +
 example/package.json                          |   19 +
 example/site/index.html                       |   76 +
 example/skills/catalog/SKILL.md               |   29 +
 packages/testing-framework/README.md          |   74 +
 packages/testing-framework/bin/midscene-tf    |    2 +
 packages/testing-framework/package.json       |   72 +
 packages/testing-framework/rslib.config.ts    |   50 +
 .../src/agent-runtime/pi-runtime.ts           |  246 +++
 .../src/agent-runtime/skills.ts               |   23 +
 .../src/agent-runtime/types.ts                |   49 +
 packages/testing-framework/src/cli.ts         |   89 +
 .../testing-framework/src/config/index.ts     |   22 +
 .../testing-framework/src/config/types.ts     |   84 +
 .../src/context/assembler.ts                  |  101 ++
 .../src/engine/output-store.ts                |   27 +
 .../testing-framework/src/engine/run-case.ts  |  119 ++
 .../testing-framework/src/engine/run-node.ts  |  207 +++
 packages/testing-framework/src/index.ts       |   77 +
 packages/testing-framework/src/runner/glob.ts |   79 +
 .../src/runner/load-config.ts                 |   48 +
 packages/testing-framework/src/runner/run.ts  |  114 ++
 packages/testing-framework/src/runtime.ts     |   47 +
 packages/testing-framework/src/types.ts       |  100 ++
 .../testing-framework/src/ui-agent/factory.ts |  129 ++
 packages/testing-framework/src/yaml/parse.ts  |  120 ++
 packages/testing-framework/src/yaml/types.ts  |   23 +
 .../testing-framework/tests/smoke/README.md   |   14 +
 .../tests/smoke/browser-smoke.mjs             |  109 ++
 .../tests/smoke/model-smoke.mjs               |   70 +
 .../tests/smoke/pi-wiring.mjs                 |   92 +
 .../tests/unit-test/config.test.ts            |   42 +
 .../unit-test/context-and-skills.test.ts      |   79 +
 .../tests/unit-test/engine.test.ts            |  192 +++
 .../tests/unit-test/glob.test.ts              |   40 +
 .../tests/unit-test/yaml-parse.test.ts        |   88 +
 .../testing-framework/tsconfig.build.json     |    7 +
 packages/testing-framework/tsconfig.json      |   24 +
 packages/testing-framework/vitest.config.ts   |   22 +
 pnpm-lock.yaml                                | 1481 +++++++++++++++--
 rfcs/0001-v2-testing-framework-phase0.md      |   31 +-
 47 files changed, 4428 insertions(+), 161 deletions(-)
 create mode 100644 example/.gitignore
 create mode 100644 example/README.md
 create mode 100644 example/e2e/add-to-cart.yaml
 create mode 100644 example/e2e/product-detail.yaml
 create mode 100644 example/midscene.config.ts
 create mode 100644 example/package.json
 create mode 100644 example/site/index.html
 create mode 100644 example/skills/catalog/SKILL.md
 create mode 100644 packages/testing-framework/README.md
 create mode 100755 packages/testing-framework/bin/midscene-tf
 create mode 100644 packages/testing-framework/package.json
 create mode 100644 packages/testing-framework/rslib.config.ts
 create mode 100644 packages/testing-framework/src/agent-runtime/pi-runtime.ts
 create mode 100644 packages/testing-framework/src/agent-runtime/skills.ts
 create mode 100644 packages/testing-framework/src/agent-runtime/types.ts
 create mode 100644 packages/testing-framework/src/cli.ts
 create mode 100644 packages/testing-framework/src/config/index.ts
 create mode 100644 packages/testing-framework/src/config/types.ts
 create mode 100644 packages/testing-framework/src/context/assembler.ts
 create mode 100644 packages/testing-framework/src/engine/output-store.ts
 create mode 100644 packages/testing-framework/src/engine/run-case.ts
 create mode 100644 packages/testing-framework/src/engine/run-node.ts
 create mode 100644 packages/testing-framework/src/index.ts
 create mode 100644 packages/testing-framework/src/runner/glob.ts
 create mode 100644 packages/testing-framework/src/runner/load-config.ts
 create mode 100644 packages/testing-framework/src/runner/run.ts
 create mode 100644 packages/testing-framework/src/runtime.ts
 create mode 100644 packages/testing-framework/src/types.ts
 create mode 100644 packages/testing-framework/src/ui-agent/factory.ts
 create mode 100644 packages/testing-framework/src/yaml/parse.ts
 create mode 100644 packages/testing-framework/src/yaml/types.ts
 create mode 100644 packages/testing-framework/tests/smoke/README.md
 create mode 100644 packages/testing-framework/tests/smoke/browser-smoke.mjs
 create mode 100644 packages/testing-framework/tests/smoke/model-smoke.mjs
 create mode 100644 packages/testing-framework/tests/smoke/pi-wiring.mjs
 create mode 100644 packages/testing-framework/tests/unit-test/config.test.ts
 create mode 100644 packages/testing-framework/tests/unit-test/context-and-skills.test.ts
 create mode 100644 packages/testing-framework/tests/unit-test/engine.test.ts
 create mode 100644 packages/testing-framework/tests/unit-test/glob.test.ts
 create mode 100644 packages/testing-framework/tests/unit-test/yaml-parse.test.ts
 create mode 100644 packages/testing-framework/tsconfig.build.json
 create mode 100644 packages/testing-framework/tsconfig.json
 create mode 100644 packages/testing-framework/vitest.config.ts

diff --git a/apps/site/docs/en/ui-testing-framework.mdx b/apps/site/docs/en/ui-testing-framework.mdx
index f635e2264d..0e819e1eca 100644
--- a/apps/site/docs/en/ui-testing-framework.mdx
+++ b/apps/site/docs/en/ui-testing-framework.mdx
@@ -20,13 +20,9 @@ Midscene is not a choice between lightweight YAML and serious test engineering.
 
 Midscene starts by helping teams write a simple UI task clearly, run it, and replay it. For most smoke tests and lightweight regression projects, the first useful milestone is not setting up a complex testing project. It is turning a core user path into a readable, repeatable, inspectable case.
 
-A YAML case keeps the path readable:
+A YAML case keeps the path readable. The case only describes the user path — the target environment lives in `midscene.config.ts`, never in the case file:
 
 ```yaml
-target:
-  type: web
-  url: https://shop.example.com
-
 flow:
   - ui: Search for "running shoes"
   - ui: Open the first product
@@ -140,12 +136,13 @@ As a project grows from lightweight cases into a long-lived regression suite, en
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  target: {
-    type: 'android',
+  // A single `uiAgent` field defines the run target. An object is config-style
+  // (the framework builds the UI Agent from `type` + `options`); a function is
+  // programmatic (you build it yourself, see below).
+  uiAgent: {
+    type: 'web',
     options: {
-      deviceId: process.env.ANDROID_DEVICE_ID,
-      androidAdbPath: process.env.ANDROID_ADB_PATH,
-      autoDismissKeyboard: false,
+      url: 'https://shop.example.com',
     },
   },
 
@@ -181,7 +178,7 @@ With this config in place, the project can stay direct:
     checkout.yaml
 ```
 
-`e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. By default, the framework creates the UI Agent from `target.type` and `target.options`. If a project needs custom devices, remote services, or custom agent construction logic, it can create the UI Agent entirely inside `createUIAgent` and omit `target` to avoid defining the runtime target twice.
+`e2e/*.yaml` describes what the user should accomplish, while `midscene.config.ts` describes the target type and platform connection options, testRunner behavior, shared UI Agent options, and reporting. When `uiAgent` is an object, the framework creates the UI Agent from its `type` and `options`. If a project needs custom devices, remote services, or custom agent construction logic, set `uiAgent` to a factory function instead — the same single field, now holding the construction logic, so the runtime target is never defined twice. `options` (platform connection parameters like url / deviceId) and `uiAgentOptions` (Agent behavior like aiActContext / generateReport) stay distinct.
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -195,15 +192,14 @@ export default defineMidsceneConfig({
     generateReport: true,
   },
 
-  async createUIAgent({ uiAgentOptions }) {
-    return {
-      agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
-        ...uiAgentOptions,
-        androidAdbPath: process.env.ANDROID_ADB_PATH,
-        autoDismissKeyboard: false,
-      }),
-    };
-  },
+  // Programmatic form: the same `uiAgent` field, filled with a factory.
+  uiAgent: async ({ uiAgentOptions, env }) => ({
+    agent: await agentFromAdbDevice(env.ANDROID_DEVICE_ID, {
+      ...uiAgentOptions,
+      androidAdbPath: env.ANDROID_ADB_PATH,
+      autoDismissKeyboard: false,
+    }),
+  }),
 });
 ```
 
@@ -216,7 +212,7 @@ import {
 } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  target: {
+  uiAgent: {
     type: 'web',
     options: {
       url: 'http://127.0.0.1:3000',
@@ -226,17 +222,17 @@ export default defineMidsceneConfig({
   testDir: './e2e',
 
   runtime: {
-    prepareOrderFixture: defineRuntime(async ({ input, context }) => {
-      const fixture = await createOrderFixture(input);
-      context.state.orderFixture = fixture;
+    prepareOrderFixture: defineRuntime(async (ctx) => {
+      const fixture = await createOrderFixture(ctx.input);
+      ctx.state.orderFixture = fixture;
 
       return {
         conclusion: `Prepared order fixture ${fixture.id}`,
       };
     }),
 
-    notifySlack: defineRuntime(async ({ context }) => {
-      await sendSlackSummary(context.result);
+    notifySlack: defineRuntime(async (ctx) => {
+      await sendSlackSummary(ctx.result);
 
       return {
         conclusion: 'Slack notification sent',
@@ -246,10 +242,10 @@ export default defineMidsceneConfig({
 });
 ```
 
-A runtime node has two channels, matching the context contract above — keep them distinct:
+A runtime node receives a single context argument with `input`, `uiAgent`, `outputs`, `state`, `result`, and `env`. It has two channels, matching the context contract above — keep them distinct:
 
 - The `conclusion` in the return value is the **context-facing output**: like any other step's output, it enters the context of later `verify` / `agent` nodes.
-- `context.state` (such as `context.state.orderFixture`) is **engineering-facing TypeScript state** for passing structured data between runtime nodes, and **does not enter Pi Agent's context**. In other words, the agent cannot see `context.state`, only `conclusion`. To make a value available to a later `verify` / `agent`, put it in `conclusion`.
+- `ctx.state` (such as `ctx.state.orderFixture`) is **engineering-facing TypeScript state** for passing structured data between runtime nodes, and **does not enter Pi Agent's context**. In other words, the agent cannot see `ctx.state`, only `conclusion`. To make a value available to a later `verify` / `agent`, put it in `conclusion`.
 
 This direction keeps the low-friction YAML-driven UI testing model intact. YAML remains the human-facing expression for the test, and TypeScript config remains the engineering entry for registering capabilities: ordinary paths stay in natural language, while places that need deterministic evidence can connect to the team's own tools.
 
diff --git a/apps/site/docs/zh/ui-testing-framework.mdx b/apps/site/docs/zh/ui-testing-framework.mdx
index 50006809a1..caa8bd9017 100644
--- a/apps/site/docs/zh/ui-testing-framework.mdx
+++ b/apps/site/docs/zh/ui-testing-framework.mdx
@@ -20,13 +20,9 @@ Midscene 不是让团队在“轻量 YAML”和“严肃测试工程”之间二
 
 Midscene 的第一步，是让团队用 YAML 把一个简单 UI 任务写清楚、跑起来、回放出来。对于大多数 Smoke Test 和轻量回归项目，第一个有价值的里程碑不是搭建复杂工程，而是把核心用户路径变成可读、可重复执行、可分析的 case。
 
-YAML case 可以让路径保持可读：
+YAML case 可以让路径保持可读。case 只描述用户路径，运行目标环境放在 `midscene.config.ts` 里，绝不写进 case 文件：
 
 ```yaml
-target:
-  type: web
-  url: https://shop.example.com
-
 flow:
   - ui: Search for "running shoes"
   - ui: Open the first product
@@ -138,12 +134,12 @@ flow:
 import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  target: {
-    type: 'android',
+  // 单个 `uiAgent` 字段定义运行目标。传对象是配置式（框架据 `type` + `options`
+  // 创建 UI Agent）；传函数是编程式（自行构造，见下文）。
+  uiAgent: {
+    type: 'web',
     options: {
-      deviceId: process.env.ANDROID_DEVICE_ID,
-      androidAdbPath: process.env.ANDROID_ADB_PATH,
-      autoDismissKeyboard: false,
+      url: 'https://shop.example.com',
     },
   },
 
@@ -179,7 +175,7 @@ export default defineMidsceneConfig({
     checkout.yaml
 ```
 
-`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。默认情况下，框架会根据 `target.type` 和 `target.options` 创建 UI Agent；如果项目需要接入自定义设备、远程服务或自定义的 Agent 构造逻辑，可以在 `createUIAgent` 里完全自行创建 UI Agent，并省略 `target`，避免同一份配置里出现两套运行目标定义。
+`e2e/*.yaml` 描述用户要完成什么，`midscene.config.ts` 描述 target 类型和平台连接参数、testRunner 行为、共享 UI Agent 参数和报告。当 `uiAgent` 是对象时，框架会据其 `type` 和 `options` 创建 UI Agent；如果项目需要接入自定义设备、远程服务或自定义的 Agent 构造逻辑，把 `uiAgent` 设为工厂函数即可——还是同一个字段，只是换成构造逻辑，从根上避免出现两套运行目标定义。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。
 
 ```ts
 import { agentFromAdbDevice } from '@midscene/android';
@@ -193,15 +189,14 @@ export default defineMidsceneConfig({
     generateReport: true,
   },
 
-  async createUIAgent({ uiAgentOptions }) {
-    return {
-      agent: await agentFromAdbDevice(process.env.ANDROID_DEVICE_ID, {
-        ...uiAgentOptions,
-        androidAdbPath: process.env.ANDROID_ADB_PATH,
-        autoDismissKeyboard: false,
-      }),
-    };
-  },
+  // 编程式：同一个 `uiAgent` 字段，填工厂函数。
+  uiAgent: async ({ uiAgentOptions, env }) => ({
+    agent: await agentFromAdbDevice(env.ANDROID_DEVICE_ID, {
+      ...uiAgentOptions,
+      androidAdbPath: env.ANDROID_ADB_PATH,
+      autoDismissKeyboard: false,
+    }),
+  }),
 });
 ```
 
@@ -214,7 +209,7 @@ import {
 } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
-  target: {
+  uiAgent: {
     type: 'web',
     options: {
       url: 'http://127.0.0.1:3000',
@@ -224,17 +219,17 @@ export default defineMidsceneConfig({
   testDir: './e2e',
 
   runtime: {
-    prepareOrderFixture: defineRuntime(async ({ input, context }) => {
-      const fixture = await createOrderFixture(input);
-      context.state.orderFixture = fixture;
+    prepareOrderFixture: defineRuntime(async (ctx) => {
+      const fixture = await createOrderFixture(ctx.input);
+      ctx.state.orderFixture = fixture;
 
       return {
         conclusion: `Prepared order fixture ${fixture.id}`,
       };
     }),
 
-    notifySlack: defineRuntime(async ({ context }) => {
-      await sendSlackSummary(context.result);
+    notifySlack: defineRuntime(async (ctx) => {
+      await sendSlackSummary(ctx.result);
 
       return {
         conclusion: 'Slack notification sent',
@@ -244,10 +239,10 @@ export default defineMidsceneConfig({
 });
 ```
 
-runtime 节点有两条信道，对应上面讲过的上下文契约，要分清：
+runtime 节点接收单个上下文参数，包含 `input`、`uiAgent`、`outputs`、`state`、`result`、`env`。它有两条信道，对应上面讲过的上下文契约，要分清：
 
 - 返回值里的 `conclusion` 是**面向上下文的输出**，会和其它步骤的输出一样进入后续 `verify` / `agent` 的上下文。
-- `context.state`（如 `context.state.orderFixture`）是**面向工程的 TypeScript 状态**，供 runtime 节点之间传递结构化数据，**不会进入 Pi Agent 的上下文**。换句话说，agent 看不到 `context.state`，只看得到 `conclusion`。要让某个值被后续的 `verify` / `agent` 用到，就得把它放进 `conclusion`。
+- `ctx.state`（如 `ctx.state.orderFixture`）是**面向工程的 TypeScript 状态**，供 runtime 节点之间传递结构化数据，**不会进入 Pi Agent 的上下文**。换句话说，agent 看不到 `ctx.state`，只看得到 `conclusion`。要让某个值被后续的 `verify` / `agent` 用到，就得把它放进 `conclusion`。
 
 这条路线不会丢掉 YAML 驱动 UI Test 的低门槛。相反，它把 YAML 作为面向人的测试表达，把 TypeScript 配置作为面向工程的能力注册入口：普通路径继续用自然语言描述，真正需要确定性证据的地方再接入团队自己的工具。
 
diff --git a/example/.gitignore b/example/.gitignore
new file mode 100644
index 0000000000..86b63a7a6e
--- /dev/null
+++ b/example/.gitignore
@@ -0,0 +1,3 @@
+node_modules/
+midscene_run/
+.env
diff --git a/example/README.md b/example/README.md
new file mode 100644
index 0000000000..823d07c2eb
--- /dev/null
+++ b/example/README.md
@@ -0,0 +1,55 @@
+# Midscene v2 Testing Framework — Example
+
+A self-contained demo of [`@midscene/testing-framework`](../packages/testing-framework)
+(the AI-native v2 UI testing framework, Phase 0). Copy this folder out, install,
+set your model env vars, and run.
+
+## What it shows
+
+- A **config-style** `uiAgent` (web) in `midscene.config.ts` — environment lives
+  in config, never in the case YAML.
+- The full node model in `e2e/*.yaml`:
+  - `ui` — natural-language UI actions (run by Midscene's UI Agent)
+  - `verify` — gating judgment with a forced pass/fail verdict
+  - `soft` — non-gating soft assertion (failure → warning only)
+  - `agent` — advisory free exploration (never gates)
+  - custom **runtime** nodes (`prepareCartFixture`, `notify`) via `defineRuntime`
+- A `$name` **skill** reference (`$catalog`) backed by `skills/catalog/SKILL.md`.
+- The **output contract**: steps record natural-language conclusions that later
+  `verify` / `agent` nodes reference by name.
+
+## Run it
+
+```bash
+# 1. install
+pnpm install         # or npm install / yarn
+
+# 2. configure the model (UI Agent + Pi share one endpoint)
+cp .env.example .env # then edit, or export the vars in your shell
+
+# 3. run all cases
+pnpm test
+
+# run a single case
+pnpm test:one
+```
+
+By default the demo runs against the bundled static page in `site/index.html`
+(offline). Set `DEMO_URL` to point at your own app.
+
+Results are written to `midscene_run/output/summary.json`, and Midscene HTML
+reports for the UI steps land in `midscene_run/report/`.
+
+## Layout
+
+```text
+.
+  midscene.config.ts     # uiAgent + discovery + runtime nodes
+  e2e/
+    product-detail.yaml   # ui + verify + soft + agent
+    add-to-cart.yaml      # custom node + $catalog skill + verify + agent + notify
+  skills/
+    catalog/SKILL.md      # a $name skill (Pi discovers/loads it)
+  site/
+    index.html            # tiny static demo app
+```
diff --git a/example/e2e/add-to-cart.yaml b/example/e2e/add-to-cart.yaml
new file mode 100644
index 0000000000..df760db381
--- /dev/null
+++ b/example/e2e/add-to-cart.yaml
@@ -0,0 +1,26 @@
+name: Add a product to the cart
+
+flow:
+  - prepareCartFixture:
+      scenario: anonymous-checkout
+
+  - ui: Open the "Trail Backpack" product
+
+  - ui: |
+      Click "Add to cart".
+      Name this step's output cartResult and record whether the
+      "Added to cart" confirmation became visible.
+
+  - verify: |
+      Use the output named cartResult and the current screenshot to confirm
+      the product was added to the cart (the "Added to cart" badge is visible).
+
+  - verify: |
+      Use $catalog to confirm the "Trail Backpack" is a known catalog product
+      and report its expected price.
+
+  - agent: |
+      Summarize the risk of this add-to-cart flow based on all previous
+      verification results and the current screenshot.
+
+  - notify
diff --git a/example/e2e/product-detail.yaml b/example/e2e/product-detail.yaml
new file mode 100644
index 0000000000..34efd3dd44
--- /dev/null
+++ b/example/e2e/product-detail.yaml
@@ -0,0 +1,12 @@
+name: Open a product and check Add to cart
+
+flow:
+  - ui: Open the "Running Shoes" product
+  - ui: |
+      Read the product name and price on the detail page.
+      Name this step's output productInfo and record the name and price.
+  - verify: The product detail page shows a visible "Add to cart" button
+  - soft: The page has no obvious layout glitches
+  - agent: |
+      Briefly inspect this product detail page and note anything that looks
+      off (copy, pricing, button states) with follow-up suggestions.
diff --git a/example/midscene.config.ts b/example/midscene.config.ts
new file mode 100644
index 0000000000..f610e01a75
--- /dev/null
+++ b/example/midscene.config.ts
@@ -0,0 +1,73 @@
+import { join } from 'node:path';
+import { pathToFileURL } from 'node:url';
+import {
+  defineMidsceneConfig,
+  defineRuntime,
+} from '@midscene/testing-framework';
+
+// The demo ships a tiny static page so the example runs offline (only the model
+// endpoint needs network). Point `DEMO_URL` at your own app to try real flows.
+const demoUrl =
+  process.env.DEMO_URL ??
+  pathToFileURL(join(__dirname, 'site', 'index.html')).href;
+
+export default defineMidsceneConfig({
+  // —— run target: single `uiAgent` field (config-style object) ——
+  uiAgent: {
+    type: 'web',
+    options: {
+      url: demoUrl,
+    },
+  },
+
+  // —— case discovery ——
+  testDir: './e2e',
+  include: ['**/*.yaml'],
+  exclude: ['**/*.draft.yaml'],
+
+  // —— execution policy (aligned with Rstest concepts) ——
+  testRunner: {
+    maxConcurrency: 1,
+    bail: 0,
+    testTimeout: 120_000,
+  },
+
+  // —— output ——
+  output: {
+    summary: './midscene_run/output/summary.json',
+    reportDir: './midscene_run/report',
+  },
+
+  // —— shared UI Agent behavior ——
+  uiAgentOptions: {
+    aiActContext: 'The user is browsing a demo shop as an anonymous visitor.',
+    generateReport: true,
+  },
+
+  // —— custom YAML nodes (defineRuntime, RFC §3) ——
+  runtime: {
+    // A fixture-prep node: writes engineering state (not visible to the agent)
+    // and a natural-language conclusion (visible to later verify/agent).
+    prepareCartFixture: defineRuntime(async (ctx) => {
+      const input = (ctx.input ?? {}) as { scenario?: string };
+      const scenario = input.scenario ?? 'default';
+      ctx.state.cartFixture = { id: `cart-${Date.now()}`, scenario };
+
+      return {
+        conclusion: `Prepared a "${scenario}" cart fixture for this run.`,
+        output: { scenario },
+      };
+    }),
+
+    // A side-effect node that reads the accumulated case result.
+    notify: defineRuntime(async (ctx) => {
+      const failed = ctx.result.steps.filter((s) => s.status === 'failed');
+      return {
+        conclusion:
+          failed.length === 0
+            ? 'All gating checks passed; no alert needed.'
+            : `Would alert: ${failed.length} step(s) failed.`,
+      };
+    }),
+  },
+});
diff --git a/example/package.json b/example/package.json
new file mode 100644
index 0000000000..0e3eca7b11
--- /dev/null
+++ b/example/package.json
@@ -0,0 +1,19 @@
+{
+  "name": "midscene-testing-framework-example",
+  "version": "0.0.0",
+  "private": true,
+  "description": "Copy-out demo for @midscene/testing-framework (v2 Phase 0)",
+  "type": "module",
+  "scripts": {
+    "test": "midscene-tf run",
+    "test:one": "midscene-tf run e2e/product-detail.yaml"
+  },
+  "dependencies": {
+    "@midscene/testing-framework": "latest",
+    "@midscene/web": "latest",
+    "puppeteer": ">=20.0.0"
+  },
+  "engines": {
+    "node": ">=18.19.0"
+  }
+}
diff --git a/example/site/index.html b/example/site/index.html
new file mode 100644
index 0000000000..36daedba8b
--- /dev/null
+++ b/example/site/index.html
@@ -0,0 +1,76 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>Midscene Demo Shop</title>
+    <style>
+      * { box-sizing: border-box; font-family: system-ui, sans-serif; }
+      body { margin: 0; color: #1f2933; }
+      header { padding: 16px 24px; background: #0b5fff; color: #fff; }
+      main { padding: 24px; max-width: 720px; margin: 0 auto; }
+      .search { display: flex; gap: 8px; margin-bottom: 24px; }
+      .search input { flex: 1; padding: 10px; border: 1px solid #cbd2d9; border-radius: 6px; }
+      .grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }
+      .card { border: 1px solid #e4e7eb; border-radius: 10px; padding: 16px; cursor: pointer; }
+      .card h3 { margin: 0 0 8px; }
+      .price { color: #0b5fff; font-weight: 600; }
+      .detail { display: none; }
+      .detail.active { display: block; }
+      .add-to-cart { margin-top: 16px; padding: 12px 20px; background: #0b5fff; color: #fff; border: none; border-radius: 6px; font-size: 16px; cursor: pointer; }
+      .badge { display: inline-block; margin-left: 8px; padding: 2px 8px; border-radius: 99px; background: #def7ec; color: #03543f; font-size: 12px; }
+      button { cursor: pointer; }
+    </style>
+  </head>
+  <body>
+    <header><strong>Midscene Demo Shop</strong></header>
+    <main>
+      <section id="catalog">
+        <div class="search">
+          <input id="q" placeholder="Search products" />
+          <button id="searchBtn">Search</button>
+        </div>
+        <div class="grid">
+          <div class="card" data-id="sku-1">
+            <h3>Running Shoes</h3>
+            <div class="price">$89.00</div>
+          </div>
+          <div class="card" data-id="sku-2">
+            <h3>Trail Backpack</h3>
+            <div class="price">$129.00</div>
+          </div>
+        </div>
+      </section>
+
+      <section id="detail" class="detail">
+        <h2 id="detailName"></h2>
+        <div class="price" id="detailPrice"></div>
+        <p>Lightweight, breathable, built for everyday runs.</p>
+        <button class="add-to-cart" id="addToCart">Add to cart</button>
+        <span class="badge" id="cartStatus" hidden>Added to cart</span>
+      </section>
+    </main>
+
+    <script>
+      const products = {
+        'sku-1': { name: 'Running Shoes', price: '$89.00' },
+        'sku-2': { name: 'Trail Backpack', price: '$129.00' },
+      };
+      document.querySelectorAll('.card').forEach((card) => {
+        card.addEventListener('click', () => {
+          const p = products[card.dataset.id];
+          document.getElementById('detailName').textContent = p.name;
+          document.getElementById('detailPrice').textContent = p.price;
+          document.getElementById('catalog').style.display = 'none';
+          document.getElementById('detail').classList.add('active');
+        });
+      });
+      document.getElementById('addToCart').addEventListener('click', () => {
+        document.getElementById('cartStatus').hidden = false;
+      });
+      document.getElementById('searchBtn').addEventListener('click', () => {
+        // demo only: no-op filtering
+      });
+    </script>
+  </body>
+</html>
diff --git a/example/skills/catalog/SKILL.md b/example/skills/catalog/SKILL.md
new file mode 100644
index 0000000000..0e91c7fc40
--- /dev/null
+++ b/example/skills/catalog/SKILL.md
@@ -0,0 +1,29 @@
+---
+name: catalog
+description: >-
+  Look up demo-shop catalog products and their expected prices. Use this skill
+  whenever a test references $catalog to confirm a product exists and to report
+  its canonical price.
+---
+
+# Catalog skill
+
+This demo skill represents an external source of truth a real test might query
+(a database, an internal API, a price service). For the demo it is static.
+
+Known catalog products:
+
+| Product        | SKU   | Expected price |
+| -------------- | ----- | -------------- |
+| Running Shoes  | sku-1 | $89.00         |
+| Trail Backpack | sku-2 | $129.00        |
+
+When asked to confirm a product:
+
+1. Find the product by name in the table above.
+2. If it exists, report it as a known catalog product and state the expected
+   price.
+3. If it does not exist, say so clearly — the verification should fail.
+
+In a real project this skill would run a command or call an API to fetch the
+truth. Replace the static table with that lookup.
diff --git a/packages/testing-framework/README.md b/packages/testing-framework/README.md
new file mode 100644
index 0000000000..9f8a656f3f
--- /dev/null
+++ b/packages/testing-framework/README.md
@@ -0,0 +1,74 @@
+# @midscene/testing-framework
+
+AI-native v2 UI testing framework for natural-language cases (Phase 0).
+
+Write test cases as natural-language flows in YAML; let Midscene's UI Agent drive
+the UI and a swappable general-purpose agent (Pi by default) make gating
+judgments and free-form analysis.
+
+> This is the Phase 0 implementation of RFC 0001
+> (`rfcs/0001-v2-testing-framework-phase0.md`). It covers the node model,
+> `midscene.config.ts`, `defineRuntime` / `$name` skills, the verify verdict
+> contract, the output contract, and context assembly. Rstest wiring and
+> v1→v2 migration are out of scope for this phase.
+
+## Concepts
+
+- **Cases are natural language.** A case YAML has only a `name` and a `flow`;
+  the environment/target lives in `midscene.config.ts`.
+- **Node model:**
+  - `ui` — natural-language UI action, run by Midscene's UI Agent.
+  - `verify` — gating judgment; must produce a pass/fail verdict (fail-closed).
+  - `soft` — same as verify, but failure only records a warning.
+  - `agent` — advisory free exploration; never changes pass/fail.
+  - custom nodes — registered via `defineRuntime`, own a whole step.
+- **One model endpoint.** `verify`/`soft`/`agent` run on Pi, pointed at the same
+  `MIDSCENE_MODEL_BASE_URL` endpoint as the UI Agent (RFC decision C′).
+- **Output is the only channel forward.** Each step records a natural-language
+  conclusion; later nodes reference it by name. The current screenshot is the
+  only image; `state` (engineering-facing) never reaches the agent.
+
+## Quick start
+
+```ts
+// midscene.config.ts
+import { defineMidsceneConfig } from '@midscene/testing-framework';
+
+export default defineMidsceneConfig({
+  uiAgent: { type: 'web', options: { url: 'https://shop.example.com' } },
+  testDir: './e2e',
+  output: { summary: './midscene_run/output/summary.json' },
+  uiAgentOptions: { generateReport: true },
+});
+```
+
+```yaml
+# e2e/checkout.yaml
+name: Add to cart
+flow:
+  - ui: Open the first product
+  - verify: The product detail page shows a visible "Add to cart" button
+  - agent: Inspect the page for anything that looks off
+```
+
+```bash
+midscene-tf run            # run all discovered cases
+midscene-tf run e2e/x.yaml # run a specific case
+```
+
+See a runnable demo in the repository's `example/` directory.
+
+## Programmatic API
+
+```ts
+import { runAll, loadConfig } from '@midscene/testing-framework';
+
+const { config } = await loadConfig(process.cwd());
+const summary = await runAll(config);
+```
+
+## Swapping the agent layer
+
+The `verify`/`soft`/`agent` runtime is swappable. Provide your own
+`agentRuntime` (an `AgentRuntimeAdapter`) in `midscene.config.ts` to replace the
+default Pi-backed implementation.
diff --git a/packages/testing-framework/bin/midscene-tf b/packages/testing-framework/bin/midscene-tf
new file mode 100755
index 0000000000..0942d7aea8
--- /dev/null
+++ b/packages/testing-framework/bin/midscene-tf
@@ -0,0 +1,2 @@
+#!/usr/bin/env node
+require("../dist/lib/cli.js");
diff --git a/packages/testing-framework/package.json b/packages/testing-framework/package.json
new file mode 100644
index 0000000000..9789689aa4
--- /dev/null
+++ b/packages/testing-framework/package.json
@@ -0,0 +1,72 @@
+{
+  "name": "@midscene/testing-framework",
+  "version": "1.8.9",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/web-infra-dev/midscene.git",
+    "directory": "packages/testing-framework"
+  },
+  "description": "AI-native v2 UI testing framework for natural-language cases (Phase 0)",
+  "keywords": [
+    "AI testing",
+    "UI testing",
+    "natural language testing",
+    "midscene",
+    "agentic testing"
+  ],
+  "main": "./dist/lib/index.js",
+  "module": "./dist/es/index.mjs",
+  "types": "./dist/types/index.d.ts",
+  "bin": {
+    "midscene-tf": "./bin/midscene-tf"
+  },
+  "files": ["bin", "dist", "README.md"],
+  "exports": {
+    ".": {
+      "types": "./dist/types/index.d.ts",
+      "import": "./dist/es/index.mjs",
+      "require": "./dist/lib/index.js"
+    },
+    "./package.json": "./package.json"
+  },
+  "scripts": {
+    "dev": "npm run build:watch",
+    "build": "rslib build",
+    "build:watch": "rslib build --watch --no-clean",
+    "test": "vitest --run",
+    "test:u": "vitest --run -u"
+  },
+  "dependencies": {
+    "@earendil-works/pi-ai": "^0.78.0",
+    "@earendil-works/pi-coding-agent": "^0.78.0",
+    "@midscene/core": "workspace:*",
+    "@midscene/shared": "workspace:*",
+    "jiti": "2.7.0",
+    "js-yaml": "4.1.0"
+  },
+  "peerDependencies": {
+    "@midscene/web": "workspace:*"
+  },
+  "peerDependenciesMeta": {
+    "@midscene/web": {
+      "optional": true
+    }
+  },
+  "devDependencies": {
+    "@midscene/web": "workspace:*",
+    "@rslib/core": "^0.18.3",
+    "@types/js-yaml": "4.0.9",
+    "@types/node": "^18.0.0",
+    "dotenv": "^16.4.5",
+    "typescript": "^5.8.3",
+    "vitest": "3.0.5"
+  },
+  "engines": {
+    "node": ">=18.19.0"
+  },
+  "publishConfig": {
+    "access": "public",
+    "registry": "https://registry.npmjs.org"
+  },
+  "license": "MIT"
+}
diff --git a/packages/testing-framework/rslib.config.ts b/packages/testing-framework/rslib.config.ts
new file mode 100644
index 0000000000..353840c6fd
--- /dev/null
+++ b/packages/testing-framework/rslib.config.ts
@@ -0,0 +1,50 @@
+import { defineConfig } from '@rslib/core';
+import { createTypeCheckPlugin } from '../../scripts/rsbuild-utils.ts';
+import { version } from './package.json';
+
+export default defineConfig({
+  lib: [
+    {
+      output: {
+        distPath: {
+          root: 'dist/lib',
+        },
+      },
+      format: 'cjs',
+      syntax: 'es2020',
+    },
+    {
+      output: {
+        distPath: {
+          root: 'dist/es',
+        },
+      },
+      dts: {
+        bundle: true,
+        distPath: 'dist/types',
+      },
+      format: 'esm',
+      syntax: 'es2020',
+    },
+  ],
+  source: {
+    tsconfigPath: 'tsconfig.build.json',
+    entry: {
+      index: './src/index.ts',
+      cli: './src/cli.ts',
+    },
+    define: {
+      __VERSION__: JSON.stringify(version),
+    },
+  },
+  output: {
+    // Pi and the platform UI agents are heavy runtime deps; keep them external.
+    externals: [
+      '@earendil-works/pi-coding-agent',
+      '@earendil-works/pi-ai',
+      '@midscene/web',
+      '@midscene/web/puppeteer-agent-launcher',
+    ],
+  },
+  plugins: [createTypeCheckPlugin()],
+});
diff --git a/packages/testing-framework/src/agent-runtime/pi-runtime.ts b/packages/testing-framework/src/agent-runtime/pi-runtime.ts
new file mode 100644
index 0000000000..76be8ebc8a
--- /dev/null
+++ b/packages/testing-framework/src/agent-runtime/pi-runtime.ts
@@ -0,0 +1,246 @@
+/**
+ * Default agent runtime, backed by Pi (`@earendil-works/pi-coding-agent`).
+ *
+ * This is the Phase 0 implementation of the swappable agent layer used by
+ * `verify` / `soft` / `agent` nodes.
+ *
+ * Decision C′ (RFC §4.1 / §10) — RESOLVED here. Pi exposes
+ * `ModelRegistry.registerProvider({ baseUrl, apiKey, models })`, which lets us
+ * point Pi at the SAME OpenAI-compatible endpoint Midscene's UI Agent uses
+ * (`MIDSCENE_MODEL_BASE_URL` / `MIDSCENE_MODEL_API_KEY` / `MIDSCENE_MODEL_NAME`).
+ * So `verify`/`agent` and `ui` share one model endpoint without any Pi changes.
+ */
+import { existsSync } from 'node:fs';
+import { join } from 'node:path';
+import { Type } from '@earendil-works/pi-ai';
+import type { Model } from '@earendil-works/pi-ai';
+import {
+  AuthStorage,
+  DefaultResourceLoader,
+  ModelRegistry,
+  SessionManager,
+  createAgentSession,
+  defineTool,
+  getAgentDir,
+} from '@earendil-works/pi-coding-agent';
+import { getDebug } from '@midscene/shared/logger';
+import type { Verdict } from '../types';
+import type {
+  AgentRunInput,
+  AgentRunResult,
+  AgentRuntimeAdapter,
+} from './types';
+
+const debug = getDebug('testing-framework:pi');
+
+const PROVIDER_NAME = 'midscene';
+
+export interface PiRuntimeOptions {
+  /** Endpoint base URL. Defaults to MIDSCENE_MODEL_BASE_URL. */
+  baseUrl?: string;
+  /** API key. Defaults to MIDSCENE_MODEL_API_KEY. */
+  apiKey?: string;
+  /** Model id/name. Defaults to MIDSCENE_MODEL_NAME. */
+  modelName?: string;
+  /** Context window hint passed to Pi. */
+  contextWindow?: number;
+  /** Max output tokens hint passed to Pi. */
+  maxTokens?: number;
+}
+
+interface PreparedModel {
+  authStorage: AuthStorage;
+  modelRegistry: ModelRegistry;
+  model: Model<'openai-completions'>;
+}
+
+/**
+ * Pi-backed implementation of {@link AgentRuntimeAdapter}.
+ */
+export class PiAgentRuntime implements AgentRuntimeAdapter {
+  private prepared?: PreparedModel;
+  private readonly loaderCache = new Map<string, DefaultResourceLoader>();
+
+  constructor(private readonly options: PiRuntimeOptions = {}) {}
+
+  async run(input: AgentRunInput): Promise<AgentRunResult> {
+    const prepared = this.prepareModel();
+    const loader = await this.getResourceLoader(input.projectRoot);
+
+    let capturedVerdict: Verdict | undefined;
+    const needsVerdict = input.kind === 'verify' || input.kind === 'soft';
+
+    const customTools = needsVerdict
+      ? [
+          defineTool({
+            name: 'report_verdict',
+            label: 'Report verdict',
+            description:
+              'Call this exactly once when your judgment is complete to submit ' +
+              'the pass/fail verdict for this verification.',
+            parameters: Type.Object({
+              pass: Type.Boolean({
+                description: 'Whether the verification passed.',
+              }),
+              reason: Type.String({
+                description: 'Human-readable rationale for the verdict.',
+              }),
+              evidence: Type.Optional(
+                Type.Unknown({
+                  description: 'Optional supporting evidence.',
+                }),
+              ),
+            }),
+            execute: async (_id, params) => {
+              capturedVerdict = {
+                pass: params.pass,
+                reason: params.reason,
+                evidence: params.evidence,
+              };
+              return {
+                content: [{ type: 'text', text: 'Verdict recorded.' }],
+                details: capturedVerdict,
+                terminate: true,
+              };
+            },
+          }),
+        ]
+      : [];
+
+    const { session } = await createAgentSession({
+      cwd: input.projectRoot,
+      model: prepared.model,
+      modelRegistry: prepared.modelRegistry,
+      authStorage: prepared.authStorage,
+      sessionManager: SessionManager.inMemory(),
+      resourceLoader: loader,
+      customTools,
+      // verify/agent only read the UI; they must not mutate the project files.
+      // `read` and `bash` stay enabled so skills can fetch external context.
+      excludeTools: ['edit', 'write'],
+    });
+
+    try {
+      const promptText = this.buildPrompt(input);
+      const images = input.screenshotBase64
+        ? [
+            {
+              type: 'image' as const,
+              data: input.screenshotBase64,
+              mimeType: input.screenshotMediaType ?? 'image/png',
+            },
+          ]
+        : undefined;
+
+      await session.prompt(promptText, images ? { images } : undefined);
+
+      const text = session.getLastAssistantText() ?? '';
+      debug('pi run finished', {
+        kind: input.kind,
+        hasVerdict: Boolean(capturedVerdict),
+      });
+
+      return { text, verdict: capturedVerdict };
+    } finally {
+      session.dispose();
+    }
+  }
+
+  private buildPrompt(input: AgentRunInput): string {
+    const parts = [input.context];
+    if (input.referencedSkills.length > 0) {
+      parts.push('');
+      parts.push(
+        `This task references the following skills: ${input.referencedSkills
+          .map((s) => `$${s}`)
+          .join(', ')}. Load and use them as needed to complete the task.`,
+      );
+    }
+    return parts.join('\n');
+  }
+
+  private prepareModel(): PreparedModel {
+    if (this.prepared) return this.prepared;
+
+    const baseUrl = this.options.baseUrl ?? process.env.MIDSCENE_MODEL_BASE_URL;
+    const apiKey = this.options.apiKey ?? process.env.MIDSCENE_MODEL_API_KEY;
+    const modelName = this.options.modelName ?? process.env.MIDSCENE_MODEL_NAME;
+
+    if (!baseUrl) {
+      throw new Error(
+        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_BASE_URL ' +
+          '(or PiRuntimeOptions.baseUrl) so verify/agent share the UI Agent endpoint.',
+      );
+    }
+    if (!apiKey) {
+      throw new Error(
+        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_API_KEY (or PiRuntimeOptions.apiKey).',
+      );
+    }
+    if (!modelName) {
+      throw new Error(
+        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_NAME (or PiRuntimeOptions.modelName).',
+      );
+    }
+
+    const authStorage = AuthStorage.inMemory();
+    const modelRegistry = ModelRegistry.inMemory(authStorage);
+
+    modelRegistry.registerProvider(PROVIDER_NAME, {
+      baseUrl,
+      apiKey,
+      models: [
+        {
+          id: modelName,
+          name: modelName,
+          api: 'openai-completions',
+          reasoning: false,
+          input: ['text', 'image'],
+          cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+          contextWindow: this.options.contextWindow ?? 128_000,
+          maxTokens: this.options.maxTokens ?? 8_192,
+        },
+      ],
+    });
+
+    const model = modelRegistry.find(PROVIDER_NAME, modelName) as
+      | Model<'openai-completions'>
+      | undefined;
+    if (!model) {
+      throw new Error(
+        `[midscene] Failed to register Pi model "${modelName}" at ${baseUrl}.`,
+      );
+    }
+
+    this.prepared = { authStorage, modelRegistry, model };
+    return this.prepared;
+  }
+
+  private async getResourceLoader(
+    projectRoot: string,
+  ): Promise<DefaultResourceLoader> {
+    const cached = this.loaderCache.get(projectRoot);
+    if (cached) return cached;
+
+    // Convention: project skills live under `<projectRoot>/skills`. Pi also
+    // discovers its own default skill locations relative to cwd. The framework
+    // only POINTS Pi at the skills — discovery/activation stays Pi's job.
+    const additionalSkillPaths: string[] = [];
+    const conventionalSkillsDir = join(projectRoot, 'skills');
+    if (existsSync(conventionalSkillsDir)) {
+      additionalSkillPaths.push(conventionalSkillsDir);
+    }
+
+    const loader = new DefaultResourceLoader({
+      cwd: projectRoot,
+      agentDir: getAgentDir(),
+      additionalSkillPaths,
+      noExtensions: true,
+      noThemes: true,
+      noPromptTemplates: true,
+    });
+    await loader.reload();
+    this.loaderCache.set(projectRoot, loader);
+    return loader;
+  }
+}
diff --git a/packages/testing-framework/src/agent-runtime/skills.ts b/packages/testing-framework/src/agent-runtime/skills.ts
new file mode 100644
index 0000000000..5d5863cef6
--- /dev/null
+++ b/packages/testing-framework/src/agent-runtime/skills.ts
@@ -0,0 +1,23 @@
+/**
+ * `$name` skill references (RFC §4).
+ *
+ * The framework does NOT register or activate skills. It only statically
+ * extracts the `$name` tokens from a node's instruction so it can:
+ *   1. surface them to the agent as a strong load hint, and
+ *   2. optionally validate that the referenced skills exist.
+ *
+ * Actual loading/activation is Pi's job (progressive disclosure).
+ */
+
+// $name: starts with `$`, then a letter/underscore, then word chars or hyphens.
+const SKILL_TOKEN = /\$([A-Za-z_][A-Za-z0-9_-]*)/g;
+
+/** Extract the unique set of `$name` skill references from an instruction. */
+export function extractSkillReferences(instruction: string): string[] {
+  if (!instruction) return [];
+  const found = new Set<string>();
+  for (const match of instruction.matchAll(SKILL_TOKEN)) {
+    found.add(match[1]);
+  }
+  return [...found];
+}
diff --git a/packages/testing-framework/src/agent-runtime/types.ts b/packages/testing-framework/src/agent-runtime/types.ts
new file mode 100644
index 0000000000..b335f590ce
--- /dev/null
+++ b/packages/testing-framework/src/agent-runtime/types.ts
@@ -0,0 +1,49 @@
+/**
+ * AgentRuntimeAdapter — the swappable general-purpose agent layer (RFC §6,
+ * design doc "swappable agent framework"). The default implementation wraps
+ * Pi; teams can replace it with another agent SDK via `agentRuntime` in
+ * `midscene.config.ts`.
+ *
+ * Phase 0 keeps this interface deliberately minimal: a single `run` entry that
+ * the engine calls for `verify` / `soft` / `agent` nodes.
+ */
+import type { Verdict } from '../types';
+
+export interface AgentRunInput {
+  /**
+   * Node kind. `verify` and `soft` both must produce a verdict; `agent` is
+   * advisory and never produces one.
+   */
+  kind: 'verify' | 'soft' | 'agent';
+  /** The natural-language instruction from the YAML node. */
+  instruction: string;
+  /**
+   * The assembled context (RFC §7): every past step's intent + output +
+   * verify verdicts. Plain text.
+   */
+  context: string;
+  /** Current UI screenshot as bare base64 PNG (no data: prefix). */
+  screenshotBase64?: string;
+  /** PNG media type override; defaults to image/png. */
+  screenshotMediaType?: string;
+  /** `$name` tokens referenced by the instruction (RFC §4). */
+  referencedSkills: string[];
+  /** Project root, used for skill discovery and the agent's cwd. */
+  projectRoot: string;
+}
+
+export interface AgentRunResult {
+  /** The agent's final natural-language message. */
+  text: string;
+  /**
+   * For verify/soft: the structured verdict, or undefined when the agent
+   * never reported one (the engine treats undefined as fail-closed, RFC §6).
+   */
+  verdict?: Verdict;
+}
+
+export interface AgentRuntimeAdapter {
+  run(input: AgentRunInput): Promise<AgentRunResult>;
+  /** Release any underlying resources. */
+  dispose?(): Promise<void>;
+}
diff --git a/packages/testing-framework/src/cli.ts b/packages/testing-framework/src/cli.ts
new file mode 100644
index 0000000000..918c805166
--- /dev/null
+++ b/packages/testing-framework/src/cli.ts
@@ -0,0 +1,89 @@
+/**
+ * `midscene-tf` — minimal CLI for the v2 testing framework (Phase 0).
+ *
+ * Usage:
+ *   midscene-tf run [--config <path>] [--root <dir>] [file...]
+ */
+import { loadConfig } from './runner/load-config';
+import { runAll } from './runner/run';
+import type { RunSummary } from './types';
+
+interface ParsedArgs {
+  command: string;
+  config?: string;
+  root?: string;
+  files: string[];
+}
+
+function parseArgs(argv: string[]): ParsedArgs {
+  const args: ParsedArgs = { command: argv[0] ?? 'run', files: [] };
+  for (let i = 1; i < argv.length; i++) {
+    const arg = argv[i];
+    if (arg === '--config' || arg === '-c') {
+      args.config = argv[++i];
+    } else if (arg === '--root' || arg === '-r') {
+      args.root = argv[++i];
+    } else if (arg.startsWith('-')) {
+      throw new Error(`[midscene] Unknown flag: ${arg}`);
+    } else {
+      args.files.push(arg);
+    }
+  }
+  return args;
+}
+
+function printSummary(summary: RunSummary): void {
+  console.log('');
+  console.log(`Midscene v2 — ${summary.total} case(s)`);
+  for (const c of summary.cases) {
+    const mark = c.status === 'passed' ? '✓' : '✗';
+    console.log(`  ${mark} ${c.name} (${c.durationMs}ms)`);
+    for (const step of c.steps) {
+      const stepMark =
+        step.status === 'passed'
+          ? '✓'
+          : step.status === 'failed'
+            ? '✗'
+            : step.status === 'warning'
+              ? '!'
+              : '·';
+      const detail = step.verdict
+        ? ` — ${step.verdict.reason}`
+        : step.error
+          ? ` — ${step.error}`
+          : '';
+      console.log(`     ${stepMark} [${step.node}]${detail}`);
+    }
+    for (const w of c.warnings) {
+      console.log(`     ! ${w}`);
+    }
+  }
+  console.log('');
+  console.log(
+    `Passed: ${summary.passed}  Failed: ${summary.failed}  (${summary.durationMs}ms)`,
+  );
+}
+
+export async function main(argv = process.argv.slice(2)): Promise<number> {
+  const args = parseArgs(argv);
+
+  if (args.command !== 'run') {
+    console.error(`[midscene] Unknown command "${args.command}". Try: run`);
+    return 2;
+  }
+
+  const { config } = await loadConfig(args.config ?? args.root);
+  const summary = await runAll(config, {
+    projectRoot: args.root,
+    files: args.files,
+  });
+  printSummary(summary);
+  return summary.failed > 0 ? 1 : 0;
+}
+
+main()
+  .then((code) => process.exit(code))
+  .catch((err) => {
+    console.error(err);
+    process.exit(1);
+  });
diff --git a/packages/testing-framework/src/config/index.ts b/packages/testing-framework/src/config/index.ts
new file mode 100644
index 0000000000..8b33309714
--- /dev/null
+++ b/packages/testing-framework/src/config/index.ts
@@ -0,0 +1,22 @@
+import type { MidsceneConfig } from './types';
+
+/**
+ * Identity helper for `midscene.config.ts`, giving full type inference and a
+ * stable import surface (RFC §2).
+ */
+export function defineMidsceneConfig(config: MidsceneConfig): MidsceneConfig {
+  if (!config || typeof config !== 'object') {
+    throw new Error('[midscene] defineMidsceneConfig expects a config object.');
+  }
+  if (!config.uiAgent) {
+    throw new Error(
+      '[midscene] midscene.config.ts must define a `uiAgent` (object or factory function).',
+    );
+  }
+  if (!config.testDir) {
+    throw new Error('[midscene] midscene.config.ts must define a `testDir`.');
+  }
+  return config;
+}
+
+export * from './types';
diff --git a/packages/testing-framework/src/config/types.ts b/packages/testing-framework/src/config/types.ts
new file mode 100644
index 0000000000..21c4eb179d
--- /dev/null
+++ b/packages/testing-framework/src/config/types.ts
@@ -0,0 +1,84 @@
+/**
+ * `midscene.config.ts` schema (RFC §2). Environment / target lives here, never
+ * in the case YAML.
+ */
+import type { Agent } from '@midscene/core/agent';
+import type { AgentOpt } from '@midscene/core/agent';
+import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import type { RuntimeNode } from '../runtime';
+
+/** Platforms the framework can build a UI Agent for out of the box. */
+export type UIAgentType = 'web' | 'android' | 'ios' | 'computer';
+
+/** Shared UI Agent behavior parameters (aiActContext, generateReport, ...). */
+export type UIAgentOptions = AgentOpt;
+
+/** Configuration-style UI Agent: framework builds it from `type` + `options`. */
+export interface UIAgentConfig {
+  type: UIAgentType;
+  /** Platform connection parameters (url, deviceId, ...). */
+  options?: Record<string, unknown>;
+}
+
+/** Context passed to a programmatic UI Agent factory. */
+export interface UIAgentFactoryCtx {
+  uiAgentOptions?: UIAgentOptions;
+  env: NodeJS.ProcessEnv;
+}
+
+/** Programmatic UI Agent: the project fully controls construction. */
+export type UIAgentFactory = (ctx: UIAgentFactoryCtx) => Promise<{
+  agent: Agent;
+  /** Optional cleanup invoked after the case finishes (close browser, etc). */
+  cleanup?: () => Promise<void>;
+}>;
+
+/**
+ * The single `uiAgent` field (RFC §2.1): an object means config-style, a
+ * function means programmatic. One key, union type — no two ways to define a
+ * run target.
+ */
+export type UIAgent = UIAgentConfig | UIAgentFactory;
+
+export interface TestRunnerOptions {
+  maxConcurrency?: number;
+  bail?: number;
+  testTimeout?: number;
+  retry?: number;
+}
+
+export interface OutputOptions {
+  /** Path to write the aggregate run summary JSON. */
+  summary?: string;
+  /** Directory for Midscene HTML reports. */
+  reportDir?: string;
+}
+
+export interface MidsceneConfig {
+  /** How the UI Agent is created (RFC §2.1). */
+  uiAgent: UIAgent;
+
+  // —— case discovery ——
+  testDir: string;
+  /** Defaults to ['**\/*.yaml']. */
+  include?: string[];
+  exclude?: string[];
+
+  // —— execution policy (aligned with Rstest concepts) ——
+  testRunner?: TestRunnerOptions;
+
+  // —— output ——
+  output?: OutputOptions;
+
+  // —— shared UI Agent params ——
+  uiAgentOptions?: UIAgentOptions;
+
+  // —— extension points ——
+  /** Custom YAML nodes (RFC §3). */
+  runtime?: Record<string, RuntimeNode>;
+  /** Replacement for the default Pi-backed agent layer (RFC §6). */
+  agentRuntime?: AgentRuntimeAdapter;
+}
+
+/** Defaults applied when reading a config. */
+export const DEFAULT_INCLUDE = ['**/*.yaml'];
diff --git a/packages/testing-framework/src/context/assembler.ts b/packages/testing-framework/src/context/assembler.ts
new file mode 100644
index 0000000000..a1261cf055
--- /dev/null
+++ b/packages/testing-framework/src/context/assembler.ts
@@ -0,0 +1,101 @@
+/**
+ * Context assembly (RFC §7).
+ *
+ * When executing a `verify` / `agent` node, the agent sees EXACTLY:
+ *   for each past step (in order):
+ *     - node type + instruction (text, or object input for custom nodes)
+ *     - that step's output (natural language; runtime nodes use conclusion)
+ *     - if verify/soft: its pass/fail + reason
+ *   + the current UI screenshot (handed separately as an image)
+ *   + the skills pre-loaded into the agent
+ *
+ * Explicitly excluded ("nothing else"): execution traces, historical
+ * screenshots, runtime `state`, intermediate skill-call results.
+ *
+ * Phase 0 does NOT truncate (predictability > compactness).
+ */
+import type { StepResult } from '../types';
+
+export interface AssembleContextInput {
+  /** The case name, for a small header. */
+  caseName: string;
+  /** All steps executed before the current node, in order. */
+  pastSteps: ReadonlyArray<StepResult>;
+  /** The current node's instruction. */
+  instruction: string;
+  /** The current node's kind, for framing. */
+  kind: 'verify' | 'soft' | 'agent';
+}
+
+export function assembleContext(input: AssembleContextInput): string {
+  const { caseName, pastSteps, instruction, kind } = input;
+  const lines: string[] = [];
+
+  lines.push(`# Test case: ${caseName}`);
+  lines.push('');
+  lines.push(
+    'You are running inside a UI test. Below is the full history of previous ' +
+      'steps and their outputs. You also receive the current UI screenshot as ' +
+      'an image. This is everything you can see — there is no other hidden state.',
+  );
+  lines.push('');
+
+  if (pastSteps.length === 0) {
+    lines.push('## Previous steps');
+    lines.push('(none — this is the first step)');
+  } else {
+    lines.push('## Previous steps');
+    for (const step of pastSteps) {
+      lines.push('');
+      lines.push(`### Step ${step.index + 1}: ${step.node}`);
+      lines.push(`- Intent: ${formatInput(step.input)}`);
+      if (step.output?.text) {
+        lines.push(`- Output: ${step.output.text}`);
+      }
+      if (step.output?.structured) {
+        lines.push(`- Output fields: ${safeJson(step.output.structured)}`);
+      }
+      if (step.verdict) {
+        lines.push(
+          `- Verdict: ${step.verdict.pass ? 'PASS' : 'FAIL'} — ${step.verdict.reason}`,
+        );
+      }
+      if (step.error) {
+        lines.push(`- Error: ${step.error}`);
+      }
+    }
+  }
+
+  lines.push('');
+  lines.push('## Current task');
+  if (kind === 'agent') {
+    lines.push(
+      'Freely explore and analyze based on the history above and the current ' +
+        'screenshot. Your output is advisory and does NOT decide pass/fail.',
+    );
+  } else {
+    lines.push(
+      'Make a judgment. You MUST finish by calling the `report_verdict` tool ' +
+        'with `pass`, `reason`, and optional `evidence`. If you cannot ' +
+        'confidently determine the result, report `pass: false`.',
+    );
+  }
+  lines.push('');
+  lines.push(instruction.trim());
+
+  return lines.join('\n');
+}
+
+function formatInput(input: unknown): string {
+  if (typeof input === 'string') return input.trim();
+  if (input === undefined) return '(no input)';
+  return safeJson(input);
+}
+
+function safeJson(value: unknown): string {
+  try {
+    return JSON.stringify(value);
+  } catch {
+    return String(value);
+  }
+}
diff --git a/packages/testing-framework/src/engine/output-store.ts b/packages/testing-framework/src/engine/output-store.ts
new file mode 100644
index 0000000000..007124c23b
--- /dev/null
+++ b/packages/testing-framework/src/engine/output-store.ts
@@ -0,0 +1,27 @@
+import type { OutputStore, StepOutput } from '../types';
+
+interface StoredOutput {
+  node: string;
+  index: number;
+  output: StepOutput;
+}
+
+/**
+ * Mutable backing store for step outputs. Exposes a read-only {@link OutputStore}
+ * view to runtime nodes (RFC §3).
+ */
+export class OutputStoreImpl implements OutputStore {
+  private readonly outputs: StoredOutput[] = [];
+
+  add(node: string, index: number, output: StepOutput): void {
+    this.outputs.push({ node, index, output });
+  }
+
+  all(): ReadonlyArray<StoredOutput> {
+    return this.outputs;
+  }
+
+  latest(): StepOutput | undefined {
+    return this.outputs[this.outputs.length - 1]?.output;
+  }
+}
diff --git a/packages/testing-framework/src/engine/run-case.ts b/packages/testing-framework/src/engine/run-case.ts
new file mode 100644
index 0000000000..16131ceae9
--- /dev/null
+++ b/packages/testing-framework/src/engine/run-case.ts
@@ -0,0 +1,119 @@
+import type { Agent } from '@midscene/core/agent';
+import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import type { RuntimeNode } from '../runtime';
+import type { CaseResult, StepResult } from '../types';
+import type { ParsedCase } from '../yaml/types';
+import { OutputStoreImpl } from './output-store';
+import { type RunNodeDeps, runNode } from './run-node';
+
+export interface RunCaseOptions {
+  parsed: ParsedCase;
+  file: string;
+  uiAgent: Agent;
+  agentRuntime: AgentRuntimeAdapter;
+  runtimeNodes: Record<string, RuntimeNode>;
+  projectRoot: string;
+  env: NodeJS.ProcessEnv;
+}
+
+/**
+ * Execute a single case (one parsed flow). Returns a structured result; never
+ * throws for node-level failures (those are recorded as failed steps).
+ */
+export async function runCase(options: RunCaseOptions): Promise<CaseResult> {
+  const {
+    parsed,
+    file,
+    uiAgent,
+    agentRuntime,
+    runtimeNodes,
+    projectRoot,
+    env,
+  } = options;
+  const caseName = parsed.name ?? file;
+
+  const outputs = new OutputStoreImpl();
+  const state: Record<string, unknown> = {};
+  const steps: StepResult[] = [];
+  const warnings: string[] = [];
+  const startedAt = Date.now();
+  let status: CaseResult['status'] = 'passed';
+
+  for (let index = 0; index < parsed.flow.length; index++) {
+    const step = parsed.flow[index];
+    const stepStart = Date.now();
+
+    const deps: RunNodeDeps = {
+      uiAgent,
+      agentRuntime,
+      runtimeNodes,
+      outputs,
+      state,
+      projectRoot,
+      caseName,
+      caseFile: file,
+      pastSteps: steps,
+      env,
+    };
+
+    let stepResult: StepResult;
+    try {
+      const outcome = await runNode(step.node, step.input, index, deps);
+      stepResult = {
+        index,
+        node: step.node,
+        input: step.input,
+        status: outcome.status,
+        output: outcome.output,
+        verdict: outcome.verdict,
+        error: outcome.error,
+        durationMs: Date.now() - stepStart,
+      };
+    } catch (err) {
+      // Hard failure: ui action threw, runtime node threw, unknown node, etc.
+      stepResult = {
+        index,
+        node: step.node,
+        input: step.input,
+        status: 'failed',
+        error: (err as Error).message,
+        durationMs: Date.now() - stepStart,
+      };
+    }
+
+    steps.push(stepResult);
+    if (stepResult.output) {
+      outputs.add(step.node, index, stepResult.output);
+    }
+    if (stepResult.status === 'warning' && stepResult.error) {
+      warnings.push(stepResult.error);
+    }
+    if (stepResult.status === 'warning' && stepResult.verdict) {
+      warnings.push(
+        `soft check failed at step ${index + 1} (${step.node}): ${stepResult.verdict.reason}`,
+      );
+    }
+
+    if (stepResult.status === 'failed') {
+      // A gating failure stops the flow; later steps depend on prior ones.
+      status = 'failed';
+      break;
+    }
+  }
+
+  return {
+    name: caseName,
+    file,
+    status,
+    steps,
+    warnings,
+    durationMs: Date.now() - startedAt,
+    reportFile: getReportFile(uiAgent),
+  };
+}
+
+function getReportFile(agent: Agent): string | undefined {
+  const candidate = (agent as unknown as { reportFile?: string | null })
+    .reportFile;
+  return candidate ?? undefined;
+}
diff --git a/packages/testing-framework/src/engine/run-node.ts b/packages/testing-framework/src/engine/run-node.ts
new file mode 100644
index 0000000000..479d46177e
--- /dev/null
+++ b/packages/testing-framework/src/engine/run-node.ts
@@ -0,0 +1,207 @@
+import type { Agent } from '@midscene/core/agent';
+import { extractSkillReferences } from '../agent-runtime/skills';
+import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import { assembleContext } from '../context/assembler';
+import type { RuntimeNode, RuntimeNodeContext } from '../runtime';
+import type { StepOutput, StepResult, Verdict } from '../types';
+import { isBuiltinNode } from '../yaml/types';
+import type { OutputStoreImpl } from './output-store';
+
+export interface RunNodeDeps {
+  uiAgent: Agent;
+  agentRuntime: AgentRuntimeAdapter;
+  runtimeNodes: Record<string, RuntimeNode>;
+  outputs: OutputStoreImpl;
+  /** Shared engineering-facing state across runtime nodes. */
+  state: Record<string, unknown>;
+  projectRoot: string;
+  caseName: string;
+  caseFile: string;
+  /** Steps already executed (read-only context for the current node). */
+  pastSteps: ReadonlyArray<StepResult>;
+  env: NodeJS.ProcessEnv;
+}
+
+export interface RunNodeOutcome {
+  status: StepResult['status'];
+  output?: StepOutput;
+  verdict?: Verdict;
+  error?: string;
+}
+
+/**
+ * Execute a single flow step and return its outcome. Throwing is reserved for
+ * unexpected engine errors; node-level failures are reported via `status`.
+ */
+export async function runNode(
+  node: string,
+  input: unknown,
+  index: number,
+  deps: RunNodeDeps,
+): Promise<RunNodeOutcome> {
+  if (isBuiltinNode(node)) {
+    switch (node) {
+      case 'ui':
+        return runUiNode(input as string, deps);
+      case 'verify':
+        return runJudgmentNode('verify', input as string, index, deps);
+      case 'soft':
+        return runJudgmentNode('soft', input as string, index, deps);
+      case 'agent':
+        return runAgentNode(input as string, index, deps);
+    }
+  }
+  return runCustomNode(node, input, deps);
+}
+
+async function runUiNode(
+  instruction: string,
+  deps: RunNodeDeps,
+): Promise<RunNodeOutcome> {
+  // The UI Agent performs the natural-language action. Errors propagate up so
+  // the case fails (RFC §8).
+  const acted = await deps.uiAgent.aiAct(instruction);
+
+  let text = typeof acted === 'string' && acted.trim() ? acted.trim() : '';
+  if (!text) {
+    // Produce a context-facing conclusion grounded in the current screen,
+    // honoring any "record these values" request in the instruction.
+    text = await deps.uiAgent.aiAsk(
+      `In natural language, summarize the result of performing the following instruction on the current screen. If the instruction asked to record or name any values, include them explicitly.\n\nInstruction:\n${instruction}`,
+    );
+  }
+
+  return { status: 'info', output: { text } };
+}
+
+async function runJudgmentNode(
+  kind: 'verify' | 'soft',
+  instruction: string,
+  index: number,
+  deps: RunNodeDeps,
+): Promise<RunNodeOutcome> {
+  const { data, mediaType } = await captureScreenshot(deps.uiAgent);
+  const context = assembleContext({
+    caseName: deps.caseName,
+    pastSteps: deps.pastSteps,
+    instruction,
+    kind,
+  });
+
+  const result = await deps.agentRuntime.run({
+    kind,
+    instruction,
+    context,
+    screenshotBase64: data,
+    screenshotMediaType: mediaType,
+    referencedSkills: extractSkillReferences(instruction),
+    projectRoot: deps.projectRoot,
+  });
+
+  // Fail-closed: a missing/unparseable verdict is treated as failure (RFC §6).
+  const verdict: Verdict = result.verdict ?? {
+    pass: false,
+    reason:
+      'The agent did not report a verdict via report_verdict; treated as failure (fail-closed).',
+  };
+
+  const output: StepOutput = {
+    text: result.text || verdict.reason,
+  };
+
+  if (verdict.pass) {
+    return { status: 'passed', output, verdict };
+  }
+  // verify gates the case; soft only warns.
+  return {
+    status: kind === 'verify' ? 'failed' : 'warning',
+    output,
+    verdict,
+  };
+}
+
+async function runAgentNode(
+  instruction: string,
+  index: number,
+  deps: RunNodeDeps,
+): Promise<RunNodeOutcome> {
+  const { data, mediaType } = await captureScreenshot(deps.uiAgent);
+  const context = assembleContext({
+    caseName: deps.caseName,
+    pastSteps: deps.pastSteps,
+    instruction,
+    kind: 'agent',
+  });
+
+  // `agent` is advisory: its output never changes pass/fail. Even internal
+  // errors are downgraded to a warning (RFC §8).
+  try {
+    const result = await deps.agentRuntime.run({
+      kind: 'agent',
+      instruction,
+      context,
+      screenshotBase64: data,
+      screenshotMediaType: mediaType,
+      referencedSkills: extractSkillReferences(instruction),
+      projectRoot: deps.projectRoot,
+    });
+    return { status: 'info', output: { text: result.text } };
+  } catch (err) {
+    return {
+      status: 'warning',
+      error: `agent node error (advisory, non-gating): ${(err as Error).message}`,
+    };
+  }
+}
+
+async function runCustomNode(
+  node: string,
+  input: unknown,
+  deps: RunNodeDeps,
+): Promise<RunNodeOutcome> {
+  const runtimeNode = deps.runtimeNodes[node];
+  if (!runtimeNode) {
+    throw new Error(
+      `[midscene] Unknown node "${node}". It is not a built-in node and is not registered under \`runtime\` in midscene.config.ts.`,
+    );
+  }
+
+  const ctx: RuntimeNodeContext = {
+    input,
+    uiAgent: deps.uiAgent,
+    outputs: deps.outputs,
+    state: deps.state,
+    result: {
+      name: deps.caseName,
+      file: deps.caseFile,
+      steps: deps.pastSteps,
+    },
+    env: deps.env,
+  };
+
+  // A runtime node that throws fails the case (RFC §8).
+  const result = await runtimeNode(ctx);
+  return {
+    status: 'info',
+    output: { text: result.conclusion, structured: result.output },
+  };
+}
+
+async function captureScreenshot(
+  agent: Agent,
+): Promise<{ data?: string; mediaType: string }> {
+  try {
+    const raw = await agent.interface.screenshotBase64();
+    return splitDataUrl(raw);
+  } catch {
+    return { data: undefined, mediaType: 'image/png' };
+  }
+}
+
+function splitDataUrl(value: string): { data: string; mediaType: string } {
+  const match = /^data:(image\/[\w.+-]+);base64,(.*)$/s.exec(value);
+  if (match) {
+    return { mediaType: match[1], data: match[2] };
+  }
+  return { mediaType: 'image/png', data: value };
+}
diff --git a/packages/testing-framework/src/index.ts b/packages/testing-framework/src/index.ts
new file mode 100644
index 0000000000..e6db1158b0
--- /dev/null
+++ b/packages/testing-framework/src/index.ts
@@ -0,0 +1,77 @@
+/**
+ * @midscene/testing-framework — AI-native v2 UI testing framework (Phase 0).
+ *
+ * Public surface implementing RFC 0001:
+ *  - `defineMidsceneConfig` / `defineRuntime` authoring helpers
+ *  - the node model, verdict contract, output contract, context-assembly
+ *    contract (as types)
+ *  - a lightweight runner (`runAll`) and CLI (`midscene-tf`)
+ *  - the default Pi-backed agent runtime with a custom model base URL
+ *    (decision C′, RFC §4.1)
+ */
+
+// —— authoring helpers ——
+export { defineMidsceneConfig } from './config';
+export { defineRuntime } from './runtime';
+
+// —— config types ——
+export type {
+  MidsceneConfig,
+  UIAgent,
+  UIAgentConfig,
+  UIAgentFactory,
+  UIAgentFactoryCtx,
+  UIAgentOptions,
+  UIAgentType,
+  TestRunnerOptions,
+  OutputOptions,
+} from './config/types';
+
+// —— runtime node contract ——
+export type {
+  RuntimeNode,
+  RuntimeNodeContext,
+  RuntimeNodeResult,
+} from './runtime';
+
+// —— core contracts ——
+export type {
+  Verdict,
+  StepOutput,
+  StepStatus,
+  StepResult,
+  CaseResult,
+  RunSummary,
+  OutputStore,
+  TestResultSoFar,
+  BuiltinNodeType,
+} from './types';
+
+// —— agent runtime (swappable) ——
+export type {
+  AgentRuntimeAdapter,
+  AgentRunInput,
+  AgentRunResult,
+} from './agent-runtime/types';
+export { PiAgentRuntime } from './agent-runtime/pi-runtime';
+export type { PiRuntimeOptions } from './agent-runtime/pi-runtime';
+export { extractSkillReferences } from './agent-runtime/skills';
+
+// —— YAML ——
+export { parseCaseYaml } from './yaml/parse';
+export type { ParsedCase, FlowStep } from './yaml/types';
+export { BUILTIN_NODES, isBuiltinNode } from './yaml/types';
+
+// —— context assembly ——
+export { assembleContext } from './context/assembler';
+export type { AssembleContextInput } from './context/assembler';
+
+// —— engine / runner ——
+export { runCase } from './engine/run-case';
+export type { RunCaseOptions } from './engine/run-case';
+export { runAll } from './runner/run';
+export type { RunAllOptions } from './runner/run';
+export { loadConfig, resolveConfigPath } from './runner/load-config';
+export { discoverCases } from './runner/glob';
+export { createUIAgent } from './ui-agent/factory';
+export type { ResolvedUIAgent } from './ui-agent/factory';
diff --git a/packages/testing-framework/src/runner/glob.ts b/packages/testing-framework/src/runner/glob.ts
new file mode 100644
index 0000000000..7063387c1e
--- /dev/null
+++ b/packages/testing-framework/src/runner/glob.ts
@@ -0,0 +1,79 @@
+import { readdirSync, statSync } from 'node:fs';
+import { join, relative, sep } from 'node:path';
+
+/**
+ * Minimal glob support for case discovery. Supports `**`, `*`, and `?` against
+ * POSIX-style relative paths. Kept dependency-free on purpose (Phase 0 only
+ * needs patterns like `**\/*.yaml` and `**\/*.draft.yaml`).
+ */
+export function globToRegExp(pattern: string): RegExp {
+  let re = '';
+  for (let i = 0; i < pattern.length; i++) {
+    const ch = pattern[i];
+    if (ch === '*') {
+      if (pattern[i + 1] === '*') {
+        // `**` — match across path segments
+        i++;
+        if (pattern[i + 1] === '/') i++;
+        re += '(?:.*/)?';
+      } else {
+        // `*` — match within a single segment
+        re += '[^/]*';
+      }
+    } else if (ch === '?') {
+      re += '[^/]';
+    } else if ('.+^${}()|[]\\'.includes(ch)) {
+      re += `\\${ch}`;
+    } else {
+      re += ch;
+    }
+  }
+  return new RegExp(`^${re}$`);
+}
+
+export function matchesAny(relPath: string, patterns: string[]): boolean {
+  return patterns.some((p) => globToRegExp(p).test(relPath));
+}
+
+/** Recursively list files under `dir` as POSIX-style paths relative to it. */
+export function listFiles(dir: string): string[] {
+  const out: string[] = [];
+  const walk = (current: string) => {
+    let entries: string[];
+    try {
+      entries = readdirSync(current);
+    } catch {
+      return;
+    }
+    for (const entry of entries) {
+      const full = join(current, entry);
+      let stat: ReturnType<typeof statSync>;
+      try {
+        stat = statSync(full);
+      } catch {
+        continue;
+      }
+      if (stat.isDirectory()) {
+        if (entry === 'node_modules' || entry.startsWith('.')) continue;
+        walk(full);
+      } else {
+        out.push(relative(dir, full).split(sep).join('/'));
+      }
+    }
+  };
+  walk(dir);
+  return out;
+}
+
+export function discoverCases(
+  testDir: string,
+  include: string[],
+  exclude: string[] = [],
+): string[] {
+  const files = listFiles(testDir);
+  return files
+    .filter((f) => matchesAny(f, include))
+    .filter((f) => exclude.length === 0 || !matchesAny(f, exclude))
+    .sort()
+    .map((f) => join(testDir, f));
+}
diff --git a/packages/testing-framework/src/runner/load-config.ts b/packages/testing-framework/src/runner/load-config.ts
new file mode 100644
index 0000000000..f4fb85af70
--- /dev/null
+++ b/packages/testing-framework/src/runner/load-config.ts
@@ -0,0 +1,48 @@
+import { existsSync } from 'node:fs';
+import { isAbsolute, resolve } from 'node:path';
+import { createJiti } from 'jiti';
+import type { MidsceneConfig } from '../config/types';
+
+const CONFIG_CANDIDATES = [
+  'midscene.config.ts',
+  'midscene.config.mts',
+  'midscene.config.js',
+  'midscene.config.mjs',
+];
+
+/** Resolve the config file path from an explicit path or a project root. */
+export function resolveConfigPath(cwdOrPath: string = process.cwd()): string {
+  const abs = isAbsolute(cwdOrPath) ? cwdOrPath : resolve(cwdOrPath);
+  // If it points directly at a file, use it.
+  if (existsSync(abs) && /\.(ts|mts|js|mjs)$/.test(abs)) {
+    return abs;
+  }
+  for (const candidate of CONFIG_CANDIDATES) {
+    const full = resolve(abs, candidate);
+    if (existsSync(full)) return full;
+  }
+  throw new Error(
+    `[midscene] Could not find midscene.config.ts in ${abs}. Looked for: ${CONFIG_CANDIDATES.join(', ')}.`,
+  );
+}
+
+/**
+ * Load and return the config object from a `midscene.config.*` file. Uses jiti
+ * so TypeScript config works without a build step.
+ */
+export async function loadConfig(
+  cwdOrPath?: string,
+): Promise<{ config: MidsceneConfig; configPath: string }> {
+  const configPath = resolveConfigPath(cwdOrPath);
+  const jiti = createJiti(configPath, { interopDefault: true });
+  const loaded = (await jiti.import(configPath, {
+    default: true,
+  })) as MidsceneConfig;
+
+  if (!loaded || typeof loaded !== 'object') {
+    throw new Error(
+      `[midscene] ${configPath} must default-export a config object from defineMidsceneConfig().`,
+    );
+  }
+  return { config: loaded, configPath };
+}
diff --git a/packages/testing-framework/src/runner/run.ts b/packages/testing-framework/src/runner/run.ts
new file mode 100644
index 0000000000..97005a645a
--- /dev/null
+++ b/packages/testing-framework/src/runner/run.ts
@@ -0,0 +1,114 @@
+import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
+import { dirname, isAbsolute, resolve } from 'node:path';
+import { getDebug } from '@midscene/shared/logger';
+import { PiAgentRuntime } from '../agent-runtime/pi-runtime';
+import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import { DEFAULT_INCLUDE, type MidsceneConfig } from '../config/types';
+import { runCase } from '../engine/run-case';
+import type { CaseResult, RunSummary } from '../types';
+import { createUIAgent } from '../ui-agent/factory';
+import { parseCaseYaml } from '../yaml/parse';
+import { discoverCases } from './glob';
+
+const debug = getDebug('testing-framework:runner');
+
+export interface RunAllOptions {
+  /** Root used to resolve relative paths (testDir, output). Default: cwd. */
+  projectRoot?: string;
+  /** Restrict to specific case files (absolute or project-relative). */
+  files?: string[];
+  /** Override the process environment passed to nodes/factories. */
+  env?: NodeJS.ProcessEnv;
+}
+
+/**
+ * Run an entire suite from a resolved config. This is the lightweight Phase 0
+ * runner; Rstest wiring is out of scope (RFC scope note).
+ */
+export async function runAll(
+  config: MidsceneConfig,
+  options: RunAllOptions = {},
+): Promise<RunSummary> {
+  const projectRoot = options.projectRoot
+    ? resolve(options.projectRoot)
+    : process.cwd();
+  const env = options.env ?? process.env;
+
+  const testDir = resolvePath(projectRoot, config.testDir);
+  const include = config.include ?? DEFAULT_INCLUDE;
+  const exclude = config.exclude ?? [];
+
+  const files =
+    options.files && options.files.length > 0
+      ? options.files.map((f) => resolvePath(projectRoot, f))
+      : discoverCases(testDir, include, exclude);
+
+  debug('discovered cases', files);
+
+  const agentRuntime: AgentRuntimeAdapter =
+    config.agentRuntime ?? new PiAgentRuntime();
+  const runtimeNodes = config.runtime ?? {};
+
+  const startedAt = new Date();
+  const cases: CaseResult[] = [];
+  const bail = config.testRunner?.bail ?? 0;
+  let failures = 0;
+
+  for (const file of files) {
+    const source = readFileSync(file, 'utf-8');
+    const parsed = parseCaseYaml(source, file);
+
+    const { agent, cleanup } = await createUIAgent(
+      config.uiAgent,
+      config.uiAgentOptions,
+      env,
+    );
+
+    try {
+      const result = await runCase({
+        parsed,
+        file,
+        uiAgent: agent,
+        agentRuntime,
+        runtimeNodes,
+        projectRoot,
+        env,
+      });
+      cases.push(result);
+      if (result.status === 'failed') failures++;
+    } finally {
+      await cleanup?.();
+    }
+
+    if (bail > 0 && failures >= bail) {
+      debug('bail threshold reached', { bail, failures });
+      break;
+    }
+  }
+
+  await agentRuntime.dispose?.();
+
+  const finishedAt = new Date();
+  const summary: RunSummary = {
+    startedAt: startedAt.toISOString(),
+    finishedAt: finishedAt.toISOString(),
+    durationMs: finishedAt.getTime() - startedAt.getTime(),
+    total: cases.length,
+    passed: cases.filter((c) => c.status === 'passed').length,
+    failed: cases.filter((c) => c.status === 'failed').length,
+    cases,
+  };
+
+  if (config.output?.summary) {
+    const summaryPath = resolvePath(projectRoot, config.output.summary);
+    mkdirSync(dirname(summaryPath), { recursive: true });
+    writeFileSync(summaryPath, JSON.stringify(summary, null, 2), 'utf-8');
+    debug('wrote summary', summaryPath);
+  }
+
+  return summary;
+}
+
+function resolvePath(root: string, p: string): string {
+  return isAbsolute(p) ? p : resolve(root, p);
+}
diff --git a/packages/testing-framework/src/runtime.ts b/packages/testing-framework/src/runtime.ts
new file mode 100644
index 0000000000..259f6abf3b
--- /dev/null
+++ b/packages/testing-framework/src/runtime.ts
@@ -0,0 +1,47 @@
+/**
+ * `defineRuntime` — custom YAML nodes (RFC §3).
+ *
+ * A runtime node owns a whole step's execution. It has two channels:
+ *  - `conclusion` (+ optional `output`): context-facing, flows into later
+ *    verify/agent nodes.
+ *  - `state`: engineering-facing TypeScript state shared between runtime
+ *    nodes; the agent never sees it.
+ */
+import type { Agent, OutputStore, TestResultSoFar } from './types';
+
+export interface RuntimeNodeContext {
+  /** This node's YAML value (string or object). */
+  input: unknown;
+  /** The UI Agent — runtime nodes may also drive the page. */
+  uiAgent: Agent;
+  /** All past context-facing outputs (read-only). */
+  outputs: OutputStore;
+  /**
+   * Engineering-facing TS state shared across runtime nodes. NOT visible to
+   * the agent. Use `conclusion` to expose anything to later verify/agent.
+   */
+  state: Record<string, unknown>;
+  /** The case's accumulated result so far. */
+  result: TestResultSoFar;
+  /** Process environment. */
+  env: NodeJS.ProcessEnv;
+}
+
+export interface RuntimeNodeResult {
+  /** Context-facing output. Enters later verify/agent context. */
+  conclusion: string;
+  /** Optional structured output (also enters context). */
+  output?: Record<string, unknown>;
+}
+
+export type RuntimeNode = (
+  ctx: RuntimeNodeContext,
+) => Promise<RuntimeNodeResult>;
+
+/**
+ * Identity helper that gives a custom node full type inference. Mirrors the
+ * `defineRuntime` entry described in the design doc.
+ */
+export function defineRuntime(node: RuntimeNode): RuntimeNode {
+  return node;
+}
diff --git a/packages/testing-framework/src/types.ts b/packages/testing-framework/src/types.ts
new file mode 100644
index 0000000000..ec224eb176
--- /dev/null
+++ b/packages/testing-framework/src/types.ts
@@ -0,0 +1,100 @@
+/**
+ * Core contracts for the v2 testing framework (Phase 0).
+ *
+ * These types formalize the decisions in RFC 0001. They intentionally model
+ * "what must be agreed before building": the node model, the verify verdict
+ * contract, the output contract, and the context-assembly contract.
+ */
+import type { Agent } from '@midscene/core/agent';
+
+/** Built-in node types plus the open-ended custom (runtime) node name. */
+export type BuiltinNodeType = 'ui' | 'verify' | 'soft' | 'agent';
+
+/**
+ * A verify/soft verdict. `verify` gates the case; `soft` only records a warning.
+ * See RFC §6.
+ */
+export interface Verdict {
+  pass: boolean;
+  /** Human-readable rationale. Always written into the report. */
+  reason: string;
+  /** Optional: screenshot refs, skill response fragments, etc. */
+  evidence?: unknown;
+}
+
+/**
+ * The context-facing output of a single step (RFC §5, §7).
+ *
+ * Output is plain natural language by design — there is no schema. `text`
+ * is the natural-language conclusion; `structured` is an optional bag of
+ * fields a runtime node chose to expose (it also flows into context).
+ */
+export interface StepOutput {
+  text: string;
+  structured?: Record<string, unknown>;
+}
+
+/** Status of a single executed step. */
+export type StepStatus = 'passed' | 'failed' | 'warning' | 'info';
+
+/** Result of executing one flow step. */
+export interface StepResult {
+  index: number;
+  /** Node type or custom node name. */
+  node: string;
+  /** The instruction given to the node (text, or object for custom nodes). */
+  input: unknown;
+  status: StepStatus;
+  /** Context-facing output of this step (RFC §7). */
+  output?: StepOutput;
+  /** Present for verify/soft nodes. */
+  verdict?: Verdict;
+  /** Error message when the step threw. */
+  error?: string;
+  /** Wall-clock duration in milliseconds. */
+  durationMs: number;
+}
+
+/** Result of executing a single case (one YAML file / flow). */
+export interface CaseResult {
+  name: string;
+  file: string;
+  status: 'passed' | 'failed';
+  steps: StepResult[];
+  /** Warnings collected from `soft` failures and `agent` errors. */
+  warnings: string[];
+  durationMs: number;
+  /** Path to the Midscene HTML report for the UI agent, if generated. */
+  reportFile?: string;
+}
+
+/** Aggregate run summary written to `output.summary`. */
+export interface RunSummary {
+  startedAt: string;
+  finishedAt: string;
+  durationMs: number;
+  total: number;
+  passed: number;
+  failed: number;
+  cases: CaseResult[];
+}
+
+/**
+ * The accumulated, read-only view of the case so far, handed to runtime nodes
+ * (RFC §3) and used to assemble agent context (RFC §7).
+ */
+export interface TestResultSoFar {
+  name: string;
+  file: string;
+  steps: ReadonlyArray<StepResult>;
+}
+
+/** Read-only store of every past step's context-facing output (RFC §3). */
+export interface OutputStore {
+  /** All outputs in flow order. */
+  all(): ReadonlyArray<{ node: string; index: number; output: StepOutput }>;
+  /** The most recent output, if any. */
+  latest(): StepOutput | undefined;
+}
+
+export type { Agent };
diff --git a/packages/testing-framework/src/ui-agent/factory.ts b/packages/testing-framework/src/ui-agent/factory.ts
new file mode 100644
index 0000000000..15020eca41
--- /dev/null
+++ b/packages/testing-framework/src/ui-agent/factory.ts
@@ -0,0 +1,129 @@
+/**
+ * UI Agent creation (RFC §2.1).
+ *
+ * `config.uiAgent` is a union: an object (config-style) or a factory function
+ * (programmatic). This module resolves both into a live Midscene UI Agent plus
+ * an optional cleanup hook.
+ */
+import type { Agent } from '@midscene/core/agent';
+import type { UIAgent, UIAgentConfig, UIAgentOptions } from '../config/types';
+
+export interface ResolvedUIAgent {
+  agent: Agent;
+  cleanup?: () => Promise<void>;
+}
+
+export async function createUIAgent(
+  uiAgent: UIAgent,
+  uiAgentOptions: UIAgentOptions | undefined,
+  env: NodeJS.ProcessEnv,
+): Promise<ResolvedUIAgent> {
+  if (typeof uiAgent === 'function') {
+    // Programmatic factory: the project fully controls construction.
+    const result = await uiAgent({ uiAgentOptions, env });
+    if (!result?.agent) {
+      throw new Error(
+        '[midscene] The uiAgent factory must resolve to `{ agent }`.',
+      );
+    }
+    return { agent: result.agent, cleanup: result.cleanup };
+  }
+
+  return createFromConfig(uiAgent, uiAgentOptions);
+}
+
+async function createFromConfig(
+  config: UIAgentConfig,
+  uiAgentOptions: UIAgentOptions | undefined,
+): Promise<ResolvedUIAgent> {
+  switch (config.type) {
+    case 'web':
+      return createWebAgent(config, uiAgentOptions);
+    case 'android':
+      return createAndroidAgent(config, uiAgentOptions);
+    case 'ios':
+    case 'computer':
+      throw new Error(
+        `[midscene] uiAgent.type "${config.type}" is not yet supported by the config-style factory. Provide a \`uiAgent\` factory function instead.`,
+      );
+    default:
+      throw new Error(
+        `[midscene] Unknown uiAgent.type "${(config as UIAgentConfig).type}".`,
+      );
+  }
+}
+
+async function createWebAgent(
+  config: UIAgentConfig,
+  uiAgentOptions: UIAgentOptions | undefined,
+): Promise<ResolvedUIAgent> {
+  const options = (config.options ?? {}) as Record<string, unknown>;
+  if (!options.url) {
+    throw new Error('[midscene] uiAgent.type "web" requires `options.url`.');
+  }
+
+  let mod: typeof import('@midscene/web/puppeteer-agent-launcher');
+  try {
+    mod = await import('@midscene/web/puppeteer-agent-launcher');
+  } catch (err) {
+    throw new Error(
+      `[midscene] Could not load @midscene/web for the web UI Agent. Install \`@midscene/web\` and \`puppeteer\`. Original error: ${(err as Error).message}`,
+    );
+  }
+
+  const { agent, freeFn } = await mod.puppeteerAgentForTarget(
+    options as unknown as Parameters<typeof mod.puppeteerAgentForTarget>[0],
+    uiAgentOptions as unknown as Parameters<
+      typeof mod.puppeteerAgentForTarget
+    >[1],
+  );
+
+  return {
+    agent: agent as unknown as Agent,
+    cleanup: async () => {
+      for (const free of freeFn) {
+        try {
+          await free.fn();
+        } catch {
+          // best-effort cleanup
+        }
+      }
+    },
+  };
+}
+
+async function createAndroidAgent(
+  config: UIAgentConfig,
+  uiAgentOptions: UIAgentOptions | undefined,
+): Promise<ResolvedUIAgent> {
+  const options = (config.options ?? {}) as Record<string, unknown>;
+  // `@midscene/android` is an optional peer; load it loosely so the framework
+  // does not hard-depend on it.
+  const spec = '@midscene/android';
+  let mod: {
+    agentFromAdbDevice: (
+      deviceId?: string,
+      opts?: Record<string, unknown>,
+    ) => Promise<Agent>;
+  };
+  try {
+    mod = (await import(spec)) as typeof mod;
+  } catch (err) {
+    throw new Error(
+      `[midscene] Could not load @midscene/android for the android UI Agent. Original error: ${(err as Error).message}`,
+    );
+  }
+
+  const deviceId = options.deviceId as string | undefined;
+  const agent = await mod.agentFromAdbDevice(deviceId, {
+    ...(uiAgentOptions as object),
+    ...options,
+  });
+
+  return {
+    agent,
+    cleanup: async () => {
+      await agent.destroy?.();
+    },
+  };
+}
diff --git a/packages/testing-framework/src/yaml/parse.ts b/packages/testing-framework/src/yaml/parse.ts
new file mode 100644
index 0000000000..e53e1d66c3
--- /dev/null
+++ b/packages/testing-framework/src/yaml/parse.ts
@@ -0,0 +1,120 @@
+import yaml from 'js-yaml';
+import { type FlowStep, type ParsedCase, isBuiltinNode } from './types';
+
+/**
+ * Parse a v2 case YAML string (RFC §1).
+ *
+ * Rules enforced here:
+ *  - top-level has `flow` (ordered list) and optional `name`; no v1 `web:` /
+ *    `android:` / `tasks:` environment fields.
+ *  - each step is either a single-key map (`node: value`) or a bare string
+ *    (a custom node with no input, e.g. `- notifySlack`).
+ *  - built-in nodes (ui/verify/soft/agent) must take a string value.
+ *  - custom nodes may take a string or an object.
+ */
+export function parseCaseYaml(source: string, file = '<inline>'): ParsedCase {
+  let doc: unknown;
+  try {
+    doc = yaml.load(source);
+  } catch (err) {
+    throw new Error(
+      `[midscene] Failed to parse YAML in ${file}: ${(err as Error).message}`,
+    );
+  }
+
+  if (doc === null || typeof doc !== 'object' || Array.isArray(doc)) {
+    throw new Error(
+      `[midscene] ${file}: a case file must be a mapping with a \`flow\` list.`,
+    );
+  }
+
+  const record = doc as Record<string, unknown>;
+
+  for (const legacy of [
+    'web',
+    'android',
+    'ios',
+    'computer',
+    'tasks',
+    'target',
+  ]) {
+    if (legacy in record) {
+      throw new Error(
+        `[midscene] ${file}: \`${legacy}\` is not allowed in a v2 case file. Environment and target belong in midscene.config.ts; the case only describes the flow.`,
+      );
+    }
+  }
+
+  const { name, flow } = record;
+
+  if (name !== undefined && typeof name !== 'string') {
+    throw new Error(`[midscene] ${file}: \`name\` must be a string.`);
+  }
+
+  if (!Array.isArray(flow)) {
+    throw new Error(`[midscene] ${file}: \`flow\` must be a list of steps.`);
+  }
+
+  const steps: FlowStep[] = flow.map((raw, i) => parseStep(raw, i, file));
+
+  if (steps.length === 0) {
+    throw new Error(
+      `[midscene] ${file}: \`flow\` must contain at least one step.`,
+    );
+  }
+
+  return { name, flow: steps };
+}
+
+function parseStep(raw: unknown, index: number, file: string): FlowStep {
+  const where = `${file}: flow[${index}]`;
+
+  // Bare string step: a custom node name with no input (e.g. `- notifySlack`).
+  if (typeof raw === 'string') {
+    const node = raw.trim();
+    if (!node) {
+      throw new Error(`[midscene] ${where}: empty step.`);
+    }
+    if (isBuiltinNode(node)) {
+      throw new Error(
+        `[midscene] ${where}: built-in node \`${node}\` requires a natural-language instruction.`,
+      );
+    }
+    return { node, input: undefined };
+  }
+
+  if (raw === null || typeof raw !== 'object' || Array.isArray(raw)) {
+    throw new Error(
+      `[midscene] ${where}: a step must be a single-key mapping (\`node: value\`) or a bare node name.`,
+    );
+  }
+
+  const keys = Object.keys(raw as Record<string, unknown>);
+  if (keys.length !== 1) {
+    throw new Error(
+      `[midscene] ${where}: a step must have exactly one key (the node), got: ${keys.join(', ') || '(none)'}.`,
+    );
+  }
+
+  const node = keys[0];
+  const input = (raw as Record<string, unknown>)[node];
+
+  if (isBuiltinNode(node)) {
+    if (typeof input !== 'string') {
+      throw new Error(
+        `[midscene] ${where}: built-in node \`${node}\` must take a natural-language string, not ${describeType(input)}.`,
+      );
+    }
+    if (!input.trim()) {
+      throw new Error(`[midscene] ${where}: \`${node}\` instruction is empty.`);
+    }
+  }
+
+  return { node, input };
+}
+
+function describeType(value: unknown): string {
+  if (value === null) return 'null';
+  if (Array.isArray(value)) return 'a list';
+  return `a ${typeof value}`;
+}
diff --git a/packages/testing-framework/src/yaml/types.ts b/packages/testing-framework/src/yaml/types.ts
new file mode 100644
index 0000000000..a0327695b4
--- /dev/null
+++ b/packages/testing-framework/src/yaml/types.ts
@@ -0,0 +1,23 @@
+/** Parsed representation of a v2 case YAML (RFC §1). */
+export interface FlowStep {
+  /** Node type (ui/verify/soft/agent) or a custom runtime node name. */
+  node: string;
+  /**
+   * The node's input. For built-in nodes this is always a string; for custom
+   * nodes it may be a string, an object, or undefined (bare-name step).
+   */
+  input: unknown;
+}
+
+export interface ParsedCase {
+  /** Optional human-readable name. */
+  name?: string;
+  flow: FlowStep[];
+}
+
+export const BUILTIN_NODES = ['ui', 'verify', 'soft', 'agent'] as const;
+export type BuiltinNode = (typeof BUILTIN_NODES)[number];
+
+export function isBuiltinNode(node: string): node is BuiltinNode {
+  return (BUILTIN_NODES as readonly string[]).includes(node);
+}
diff --git a/packages/testing-framework/tests/smoke/README.md b/packages/testing-framework/tests/smoke/README.md
new file mode 100644
index 0000000000..18e8dd4234
--- /dev/null
+++ b/packages/testing-framework/tests/smoke/README.md
@@ -0,0 +1,14 @@
+# Smoke tests
+
+These are standalone smoke scripts (not part of `vitest`). Build the package
+first (`npx nx build @midscene/testing-framework`), then run with `node`.
+
+| Script | What it checks | Needs network? | Needs a browser? |
+| ------ | -------------- | -------------- | ---------------- |
+| `pi-wiring.mjs` | Decision C′: Pi registers a custom base-URL provider, resolves the API key, selects the model, and activates the `report_verdict` tool. No model call. | no | no |
+| `browser-smoke.mjs` | Real headless Chrome: discover + parse the example cases, launch the web UI Agent, navigate, capture a screenshot, and drive the engine (runtime node + verify) with a **stubbed** agent runtime. | no | yes |
+| `model-smoke.mjs` | Full end-to-end: runs the real example cases with the real UI Agent (`ui`) and real Pi runtime (`verify`/`soft`/`agent`) on the same model endpoint. | yes (model endpoint) | yes |
+
+`pi-wiring.mjs` and `browser-smoke.mjs` run in CI-like sandboxes. `model-smoke.mjs`
+requires `MIDSCENE_MODEL_BASE_URL` / `MIDSCENE_MODEL_API_KEY` / `MIDSCENE_MODEL_NAME`
+(and a VL `MIDSCENE_MODEL_FAMILY`) and a network path to that endpoint.
diff --git a/packages/testing-framework/tests/smoke/browser-smoke.mjs b/packages/testing-framework/tests/smoke/browser-smoke.mjs
new file mode 100644
index 0000000000..75b4c93b9d
--- /dev/null
+++ b/packages/testing-framework/tests/smoke/browser-smoke.mjs
@@ -0,0 +1,109 @@
+import { dirname, join } from 'node:path';
+// Real-browser smoke: launches the web UI Agent against the bundled demo page,
+// captures a screenshot, and drives the engine end-to-end. The MODEL is stubbed
+// (this sandbox cannot reach the model endpoint), so this exercises everything
+// EXCEPT live inference: config -> ui agent (chrome launch + navigate) ->
+// screenshot capture -> context assembly -> verify path -> summary.
+import { pathToFileURL } from 'node:url';
+import { fileURLToPath } from 'node:url';
+import {
+  createUIAgent,
+  defineRuntime,
+  discoverCases,
+  parseCaseYaml,
+  runCase,
+} from '../../dist/es/index.mjs';
+
+const here = dirname(fileURLToPath(import.meta.url));
+const repoRoot = join(here, '../../../..');
+const demoUrl = pathToFileURL(
+  join(repoRoot, 'example', 'site', 'index.html'),
+).href;
+
+// 1) discovery + parse against the real example cases
+const found = discoverCases(
+  join(repoRoot, 'example', 'e2e'),
+  ['**/*.yaml'],
+  ['**/*.draft.yaml'],
+);
+console.log(
+  'DISCOVERED',
+  found.map((f) => f.split('/').pop()),
+);
+for (const file of found) {
+  const fs = await import('node:fs');
+  parseCaseYaml(fs.readFileSync(file, 'utf-8'), file);
+}
+console.log('PARSE_OK');
+
+// 2) launch the real web UI agent (headless chrome) and navigate to the demo
+const { agent, cleanup } = await createUIAgent(
+  { type: 'web', options: { url: demoUrl } },
+  { generateReport: false },
+  process.env,
+);
+
+try {
+  const shot = await agent.interface.screenshotBase64();
+  if (!/^data:image\/(png|jpeg);base64,/.test(shot)) {
+    throw new Error('screenshot is not a data URL');
+  }
+  console.log(
+    'SCREENSHOT_OK',
+    `${shot.slice(0, 28)}... (${shot.length} bytes)`,
+  );
+
+  // 3) drive the engine with a runtime node + verify, using a stubbed agent
+  //    runtime so no model call is needed. The stub asserts it received the
+  //    assembled context + the real screenshot.
+  const parsed = parseCaseYaml(`
+name: smoke
+flow:
+  - prepareCartFixture:
+      scenario: smoke
+  - verify: Confirm the demo shop page rendered
+`);
+
+  let sawScreenshot = false;
+  let sawConclusion = false;
+  const stubRuntime = {
+    run: async (input) => {
+      sawScreenshot = Boolean(input.screenshotBase64);
+      sawConclusion = input.context.includes('smoke');
+      return {
+        text: 'looks fine',
+        verdict: { pass: true, reason: 'rendered' },
+      };
+    },
+  };
+
+  const result = await runCase({
+    parsed,
+    file: 'smoke.yaml',
+    uiAgent: agent,
+    agentRuntime: stubRuntime,
+    runtimeNodes: {
+      prepareCartFixture: defineRuntime(async (ctx) => {
+        ctx.state.fixture = { scenario: ctx.input?.scenario };
+        return { conclusion: `prepared ${ctx.input?.scenario} fixture` };
+      }),
+    },
+    projectRoot: repoRoot,
+    env: process.env,
+  });
+
+  if (result.status !== 'passed') {
+    throw new Error(`expected passed, got ${result.status}`);
+  }
+  if (!sawScreenshot) throw new Error('verify did not receive a screenshot');
+  if (!sawConclusion)
+    throw new Error('runtime conclusion did not reach verify context');
+
+  console.log('ENGINE_OK', {
+    status: result.status,
+    steps: result.steps.map((s) => `${s.node}:${s.status}`),
+  });
+  console.log('BROWSER_SMOKE_OK');
+} finally {
+  await cleanup?.();
+}
diff --git a/packages/testing-framework/tests/smoke/model-smoke.mjs b/packages/testing-framework/tests/smoke/model-smoke.mjs
new file mode 100644
index 0000000000..b7ecf8aaa5
--- /dev/null
+++ b/packages/testing-framework/tests/smoke/model-smoke.mjs
@@ -0,0 +1,70 @@
+import { dirname, join } from 'node:path';
+// Full model-backed smoke. Run this in an environment that can reach your
+// MIDSCENE_MODEL_BASE_URL endpoint (this CI sandbox cannot, so it is not part
+// of the automated suite). It runs the real example cases against the bundled
+// demo page using the real UI Agent (ui nodes) and the real Pi runtime
+// (verify/soft/agent nodes) on the SAME model endpoint.
+//
+//   export MIDSCENE_MODEL_BASE_URL=...
+//   export MIDSCENE_MODEL_API_KEY=...
+//   export MIDSCENE_MODEL_NAME=...
+//   export MIDSCENE_MODEL_FAMILY=...   # a VL model is required for UI grounding
+//   node packages/testing-framework/tests/smoke/model-smoke.mjs
+import { fileURLToPath, pathToFileURL } from 'node:url';
+import { runAll } from '../../dist/es/index.mjs';
+
+const here = dirname(fileURLToPath(import.meta.url));
+const repoRoot = join(here, '../../../..');
+const exampleDir = join(repoRoot, 'example');
+const demoUrl =
+  process.env.DEMO_URL ??
+  pathToFileURL(join(exampleDir, 'site', 'index.html')).href;
+
+for (const v of [
+  'MIDSCENE_MODEL_BASE_URL',
+  'MIDSCENE_MODEL_API_KEY',
+  'MIDSCENE_MODEL_NAME',
+]) {
+  if (!process.env[v]) {
+    console.error(`Missing required env var: ${v}`);
+    process.exit(2);
+  }
+}
+
+const summary = await runAll(
+  {
+    uiAgent: { type: 'web', options: { url: demoUrl } },
+    testDir: join(exampleDir, 'e2e'),
+    include: ['**/*.yaml'],
+    exclude: ['**/*.draft.yaml'],
+    output: {
+      summary: join(exampleDir, 'midscene_run/output/summary.json'),
+      reportDir: join(exampleDir, 'midscene_run/report'),
+    },
+    uiAgentOptions: {
+      aiActContext: 'The user is browsing a demo shop as an anonymous visitor.',
+      generateReport: true,
+    },
+    runtime: {
+      prepareCartFixture: async (ctx) => {
+        ctx.state.cartFixture = { scenario: ctx.input?.scenario };
+        return {
+          conclusion: `Prepared a "${ctx.input?.scenario}" cart fixture.`,
+        };
+      },
+      notify: async (ctx) => {
+        const failed = ctx.result.steps.filter((s) => s.status === 'failed');
+        return {
+          conclusion:
+            failed.length === 0
+              ? 'All gating checks passed; no alert needed.'
+              : `Would alert: ${failed.length} step(s) failed.`,
+        };
+      },
+    },
+  },
+  { projectRoot: exampleDir },
+);
+
+console.log(JSON.stringify(summary, null, 2));
+process.exit(summary.failed > 0 ? 1 : 0);
diff --git a/packages/testing-framework/tests/smoke/pi-wiring.mjs b/packages/testing-framework/tests/smoke/pi-wiring.mjs
new file mode 100644
index 0000000000..aa01b5f572
--- /dev/null
+++ b/packages/testing-framework/tests/smoke/pi-wiring.mjs
@@ -0,0 +1,92 @@
+import { Type } from '@earendil-works/pi-ai';
+// Validates decision C′: Pi can be pointed at a custom OpenAI-compatible base
+// URL (MIDSCENE_MODEL_BASE_URL) so verify/agent share the UI Agent endpoint.
+// This constructs the provider + session WITHOUT making a network call.
+import {
+  AuthStorage,
+  DefaultResourceLoader,
+  ModelRegistry,
+  SessionManager,
+  createAgentSession,
+  defineTool,
+  getAgentDir,
+} from '@earendil-works/pi-coding-agent';
+
+const baseUrl =
+  process.env.MIDSCENE_MODEL_BASE_URL ?? 'https://example.test/v1';
+const apiKey = process.env.MIDSCENE_MODEL_API_KEY ?? 'sk-fake';
+const modelName = process.env.MIDSCENE_MODEL_NAME ?? 'fake-model';
+
+const authStorage = AuthStorage.inMemory();
+const registry = ModelRegistry.inMemory(authStorage);
+registry.registerProvider('midscene', {
+  baseUrl,
+  apiKey,
+  models: [
+    {
+      id: modelName,
+      name: modelName,
+      api: 'openai-completions',
+      reasoning: false,
+      input: ['text', 'image'],
+      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+      contextWindow: 128000,
+      maxTokens: 8192,
+    },
+  ],
+});
+
+const model = registry.find('midscene', modelName);
+if (!model) throw new Error('model not found after registerProvider');
+if (model.baseUrl !== baseUrl) throw new Error('baseUrl override failed');
+if (!registry.hasConfiguredAuth(model)) throw new Error('auth not configured');
+
+const auth = await registry.getApiKeyAndHeaders(model);
+if (!auth.ok || auth.apiKey !== apiKey)
+  throw new Error('apiKey resolution failed');
+
+const loader = new DefaultResourceLoader({
+  cwd: process.cwd(),
+  agentDir: getAgentDir(),
+  noExtensions: true,
+  noThemes: true,
+  noPromptTemplates: true,
+});
+await loader.reload();
+
+const reportVerdict = defineTool({
+  name: 'report_verdict',
+  label: 'Report verdict',
+  description: 'submit verdict',
+  parameters: Type.Object({ pass: Type.Boolean(), reason: Type.String() }),
+  execute: async (_id, p) => ({
+    content: [{ type: 'text', text: 'ok' }],
+    details: p,
+    terminate: true,
+  }),
+});
+
+const { session } = await createAgentSession({
+  cwd: process.cwd(),
+  model,
+  modelRegistry: registry,
+  authStorage,
+  sessionManager: SessionManager.inMemory(),
+  resourceLoader: loader,
+  customTools: [reportVerdict],
+  excludeTools: ['edit', 'write'],
+});
+
+if (!session.model) throw new Error('session has no model');
+if (session.model.baseUrl !== baseUrl)
+  throw new Error('session model baseUrl mismatch');
+const toolNames = session.getActiveToolNames();
+if (!toolNames.includes('report_verdict'))
+  throw new Error(`report_verdict not active; got: ${toolNames.join(',')}`);
+
+session.dispose();
+console.log('PI_WIRING_OK', {
+  model: session?.model?.id,
+  baseUrl: model.baseUrl,
+  activeTools: toolNames,
+});
diff --git a/packages/testing-framework/tests/unit-test/config.test.ts b/packages/testing-framework/tests/unit-test/config.test.ts
new file mode 100644
index 0000000000..046907629d
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/config.test.ts
@@ -0,0 +1,42 @@
+import { describe, expect, it } from 'vitest';
+import { defineMidsceneConfig } from '../../src/config';
+import { defineRuntime } from '../../src/runtime';
+
+describe('defineMidsceneConfig', () => {
+  it('accepts a config-style uiAgent object', () => {
+    const config = defineMidsceneConfig({
+      uiAgent: { type: 'web', options: { url: 'https://x.test' } },
+      testDir: './e2e',
+    });
+    expect(config.uiAgent).toMatchObject({ type: 'web' });
+  });
+
+  it('accepts a programmatic uiAgent factory', () => {
+    const config = defineMidsceneConfig({
+      uiAgent: async () => ({ agent: {} as never }),
+      testDir: './e2e',
+    });
+    expect(typeof config.uiAgent).toBe('function');
+  });
+
+  it('throws without uiAgent', () => {
+    expect(() =>
+      // @ts-expect-error intentionally missing
+      defineMidsceneConfig({ testDir: './e2e' }),
+    ).toThrow(/uiAgent/);
+  });
+
+  it('throws without testDir', () => {
+    expect(() =>
+      // @ts-expect-error intentionally missing
+      defineMidsceneConfig({ uiAgent: { type: 'web' } }),
+    ).toThrow(/testDir/);
+  });
+});
+
+describe('defineRuntime', () => {
+  it('returns the node function unchanged', () => {
+    const node = defineRuntime(async () => ({ conclusion: 'done' }));
+    expect(typeof node).toBe('function');
+  });
+});
diff --git a/packages/testing-framework/tests/unit-test/context-and-skills.test.ts b/packages/testing-framework/tests/unit-test/context-and-skills.test.ts
new file mode 100644
index 0000000000..bc1b1a6f2f
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/context-and-skills.test.ts
@@ -0,0 +1,79 @@
+import { describe, expect, it } from 'vitest';
+import { extractSkillReferences } from '../../src/agent-runtime/skills';
+import { assembleContext } from '../../src/context/assembler';
+import { OutputStoreImpl } from '../../src/engine/output-store';
+import type { StepResult } from '../../src/types';
+
+describe('extractSkillReferences', () => {
+  it('extracts unique $name tokens', () => {
+    expect(
+      extractSkillReferences('Use $database and $logs, again $database'),
+    ).toEqual(['database', 'logs']);
+  });
+  it('returns empty for no references', () => {
+    expect(extractSkillReferences('just check the page')).toEqual([]);
+  });
+  it('supports hyphenated names', () => {
+    expect(extractSkillReferences('use $order-db')).toEqual(['order-db']);
+  });
+});
+
+describe('assembleContext', () => {
+  const pastSteps: StepResult[] = [
+    {
+      index: 0,
+      node: 'ui',
+      input: 'Create an order',
+      status: 'info',
+      output: { text: 'Created order #123', structured: { orderId: '123' } },
+      durationMs: 1,
+    },
+    {
+      index: 1,
+      node: 'verify',
+      input: 'Order exists',
+      status: 'passed',
+      output: { text: 'ok' },
+      verdict: { pass: true, reason: 'found in db' },
+      durationMs: 1,
+    },
+  ];
+
+  it('includes intents, outputs, and verdicts', () => {
+    const ctx = assembleContext({
+      caseName: 'Create Order',
+      pastSteps,
+      instruction: 'Use $database to verify orderId',
+      kind: 'verify',
+    });
+    expect(ctx).toContain('Create Order');
+    expect(ctx).toContain('Create an order');
+    expect(ctx).toContain('Created order #123');
+    expect(ctx).toContain('"orderId":"123"');
+    expect(ctx).toContain('PASS — found in db');
+    expect(ctx).toContain('report_verdict');
+    expect(ctx).toContain('Use $database to verify orderId');
+  });
+
+  it('frames agent nodes as advisory', () => {
+    const ctx = assembleContext({
+      caseName: 'c',
+      pastSteps: [],
+      instruction: 'look around',
+      kind: 'agent',
+    });
+    expect(ctx).toContain('advisory');
+    expect(ctx).toContain('first step');
+  });
+});
+
+describe('OutputStoreImpl', () => {
+  it('tracks outputs in order and latest', () => {
+    const store = new OutputStoreImpl();
+    expect(store.latest()).toBeUndefined();
+    store.add('ui', 0, { text: 'first' });
+    store.add('verify', 1, { text: 'second' });
+    expect(store.all()).toHaveLength(2);
+    expect(store.latest()?.text).toBe('second');
+  });
+});
diff --git a/packages/testing-framework/tests/unit-test/engine.test.ts b/packages/testing-framework/tests/unit-test/engine.test.ts
new file mode 100644
index 0000000000..cb4c782d8f
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/engine.test.ts
@@ -0,0 +1,192 @@
+import { describe, expect, it, vi } from 'vitest';
+import type {
+  AgentRunInput,
+  AgentRunResult,
+  AgentRuntimeAdapter,
+} from '../../src/agent-runtime/types';
+import { runCase } from '../../src/engine/run-case';
+import { defineRuntime } from '../../src/runtime';
+import type { Agent } from '../../src/types';
+import { parseCaseYaml } from '../../src/yaml/parse';
+
+function fakeAgent(overrides: Partial<Record<string, unknown>> = {}): Agent {
+  const agent = {
+    aiAct: vi.fn(async () => undefined),
+    aiAsk: vi.fn(async () => 'did the thing'),
+    interface: {
+      screenshotBase64: vi.fn(async () => 'data:image/png;base64,AAAA'),
+    },
+    reportFile: '/tmp/report.html',
+    ...overrides,
+  };
+  return agent as unknown as Agent;
+}
+
+function fakeRuntime(
+  handler: (input: AgentRunInput) => AgentRunResult,
+): AgentRuntimeAdapter {
+  return { run: async (input) => handler(input) };
+}
+
+const base = {
+  projectRoot: '/proj',
+  env: {} as NodeJS.ProcessEnv,
+  runtimeNodes: {},
+};
+
+describe('runCase node semantics', () => {
+  it('ui produces a natural-language output', async () => {
+    const parsed = parseCaseYaml('flow:\n  - ui: do something');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime(() => ({ text: '' })),
+    });
+    expect(result.status).toBe('passed');
+    expect(result.steps[0].output?.text).toBe('did the thing');
+  });
+
+  it('verify pass keeps case green', async () => {
+    const parsed = parseCaseYaml('flow:\n  - verify: ok?');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime(() => ({
+        text: 'looks good',
+        verdict: { pass: true, reason: 'all good' },
+      })),
+    });
+    expect(result.status).toBe('passed');
+    expect(result.steps[0].verdict?.pass).toBe(true);
+  });
+
+  it('verify fail fails the case and stops the flow', async () => {
+    const parsed = parseCaseYaml('flow:\n  - verify: ok?\n  - ui: next');
+    const uiAgent = fakeAgent();
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent,
+      agentRuntime: fakeRuntime(() => ({
+        text: 'nope',
+        verdict: { pass: false, reason: 'missing' },
+      })),
+    });
+    expect(result.status).toBe('failed');
+    expect(result.steps).toHaveLength(1); // stopped before ui
+  });
+
+  it('verify with NO verdict is fail-closed', async () => {
+    const parsed = parseCaseYaml('flow:\n  - verify: ok?');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime(() => ({ text: 'I am not sure' })),
+    });
+    expect(result.status).toBe('failed');
+    expect(result.steps[0].verdict?.pass).toBe(false);
+    expect(result.steps[0].verdict?.reason).toMatch(/fail-closed/);
+  });
+
+  it('soft fail only warns, does not gate', async () => {
+    const parsed = parseCaseYaml('flow:\n  - soft: nit?\n  - ui: keep going');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime((input) =>
+        input.kind === 'soft'
+          ? { text: 'minor', verdict: { pass: false, reason: 'tiny glitch' } }
+          : { text: '' },
+      ),
+    });
+    expect(result.status).toBe('passed');
+    expect(result.steps).toHaveLength(2);
+    expect(result.warnings.join(' ')).toMatch(/tiny glitch/);
+  });
+
+  it('agent is advisory and never gates, even on error', async () => {
+    const parsed = parseCaseYaml('flow:\n  - agent: explore');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: {
+        run: async () => {
+          throw new Error('boom');
+        },
+      },
+    });
+    expect(result.status).toBe('passed');
+    expect(result.warnings.join(' ')).toMatch(/boom/);
+  });
+
+  it('ui action throwing fails the case', async () => {
+    const parsed = parseCaseYaml('flow:\n  - ui: do');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent({
+        aiAct: vi.fn(async () => {
+          throw new Error('click failed');
+        }),
+      }),
+      agentRuntime: fakeRuntime(() => ({ text: '' })),
+    });
+    expect(result.status).toBe('failed');
+    expect(result.steps[0].error).toMatch(/click failed/);
+  });
+
+  it('custom runtime node exposes conclusion and state', async () => {
+    const parsed = parseCaseYaml(
+      'flow:\n  - prep:\n      scenario: paid\n  - verify: check',
+    );
+    const seen: string[] = [];
+    const result = await runCase({
+      ...base,
+      runtimeNodes: {
+        prep: defineRuntime(async (ctx) => {
+          ctx.state.fixtureId = 'fx-1';
+          return {
+            conclusion: `prepared ${(ctx.input as { scenario: string }).scenario}`,
+          };
+        }),
+      },
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime((input) => {
+        seen.push(input.context);
+        return { text: 'ok', verdict: { pass: true, reason: 'fine' } };
+      }),
+    });
+    expect(result.status).toBe('passed');
+    expect(result.steps[0].output?.text).toBe('prepared paid');
+    // conclusion flows into later verify context; state never does
+    expect(seen[0]).toContain('prepared paid');
+    expect(seen[0]).not.toContain('fixtureId');
+  });
+
+  it('unknown node fails the case', async () => {
+    const parsed = parseCaseYaml('flow:\n  - mysteryNode: x');
+    const result = await runCase({
+      ...base,
+      parsed,
+      file: 'c.yaml',
+      uiAgent: fakeAgent(),
+      agentRuntime: fakeRuntime(() => ({ text: '' })),
+    });
+    expect(result.status).toBe('failed');
+    expect(result.steps[0].error).toMatch(/Unknown node/);
+  });
+});
diff --git a/packages/testing-framework/tests/unit-test/glob.test.ts b/packages/testing-framework/tests/unit-test/glob.test.ts
new file mode 100644
index 0000000000..3d851aba30
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/glob.test.ts
@@ -0,0 +1,40 @@
+import { mkdirSync, mkdtempSync, writeFileSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { describe, expect, it } from 'vitest';
+import { discoverCases, globToRegExp, matchesAny } from '../../src/runner/glob';
+
+describe('globToRegExp', () => {
+  it('matches **/*.yaml across segments', () => {
+    const re = globToRegExp('**/*.yaml');
+    expect(re.test('a.yaml')).toBe(true);
+    expect(re.test('e2e/a.yaml')).toBe(true);
+    expect(re.test('e2e/deep/a.yaml')).toBe(true);
+    expect(re.test('a.yml')).toBe(false);
+  });
+
+  it('* stays within a segment', () => {
+    const re = globToRegExp('*.yaml');
+    expect(re.test('a.yaml')).toBe(true);
+    expect(re.test('dir/a.yaml')).toBe(false);
+  });
+
+  it('matchesAny supports draft exclusion', () => {
+    expect(matchesAny('e2e/a.draft.yaml', ['**/*.draft.yaml'])).toBe(true);
+    expect(matchesAny('e2e/a.yaml', ['**/*.draft.yaml'])).toBe(false);
+  });
+});
+
+describe('discoverCases', () => {
+  it('includes yaml and excludes drafts', () => {
+    const dir = mkdtempSync(join(tmpdir(), 'mts-glob-'));
+    mkdirSync(join(dir, 'e2e'), { recursive: true });
+    writeFileSync(join(dir, 'e2e', 'a.yaml'), 'flow: []');
+    writeFileSync(join(dir, 'e2e', 'b.draft.yaml'), 'flow: []');
+    writeFileSync(join(dir, 'e2e', 'c.txt'), 'nope');
+
+    const found = discoverCases(dir, ['**/*.yaml'], ['**/*.draft.yaml']);
+    expect(found).toHaveLength(1);
+    expect(found[0]).toContain('a.yaml');
+  });
+});
diff --git a/packages/testing-framework/tests/unit-test/yaml-parse.test.ts b/packages/testing-framework/tests/unit-test/yaml-parse.test.ts
new file mode 100644
index 0000000000..8f4c6181b9
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/yaml-parse.test.ts
@@ -0,0 +1,88 @@
+import { describe, expect, it } from 'vitest';
+import { parseCaseYaml } from '../../src/yaml/parse';
+
+describe('parseCaseYaml', () => {
+  it('parses name + flow with built-in and custom nodes', () => {
+    const parsed = parseCaseYaml(`
+name: Create Order
+flow:
+  - prepareOrderFixture:
+      scenario: paid-order
+  - ui: Search for "running shoes"
+  - verify: The Add to cart button is visible
+  - soft: No obvious layout glitches
+  - agent: Inspect the page for anything off
+  - notifySlack
+`);
+    expect(parsed.name).toBe('Create Order');
+    expect(parsed.flow).toHaveLength(6);
+    expect(parsed.flow[0]).toEqual({
+      node: 'prepareOrderFixture',
+      input: { scenario: 'paid-order' },
+    });
+    expect(parsed.flow[1]).toEqual({
+      node: 'ui',
+      input: 'Search for "running shoes"',
+    });
+    expect(parsed.flow[5]).toEqual({ node: 'notifySlack', input: undefined });
+  });
+
+  it('allows multi-line natural-language instructions', () => {
+    const parsed = parseCaseYaml(`
+flow:
+  - ui: |
+      Create a test order.
+      Record orderId and pageState.
+`);
+    expect(parsed.flow[0].node).toBe('ui');
+    expect(parsed.flow[0].input).toContain('Record orderId');
+  });
+
+  it('rejects v1 environment fields', () => {
+    expect(() =>
+      parseCaseYaml(`
+web:
+  url: https://example.com
+flow:
+  - ui: do something
+`),
+    ).toThrow(/not allowed in a v2 case file/);
+  });
+
+  it('rejects a built-in node with object input', () => {
+    expect(() =>
+      parseCaseYaml(`
+flow:
+  - verify:
+      foo: bar
+`),
+    ).toThrow(/must take a natural-language string/);
+  });
+
+  it('rejects a built-in bare name without instruction', () => {
+    expect(() =>
+      parseCaseYaml(`
+flow:
+  - verify
+`),
+    ).toThrow(/requires a natural-language instruction/);
+  });
+
+  it('rejects a step with multiple keys', () => {
+    expect(() =>
+      parseCaseYaml(`
+flow:
+  - ui: do
+    verify: check
+`),
+    ).toThrow(/exactly one key/);
+  });
+
+  it('requires a flow list', () => {
+    expect(() => parseCaseYaml('name: x')).toThrow(/must be a list of steps/);
+  });
+
+  it('requires at least one step', () => {
+    expect(() => parseCaseYaml('flow: []')).toThrow(/at least one step/);
+  });
+});
diff --git a/packages/testing-framework/tsconfig.build.json b/packages/testing-framework/tsconfig.build.json
new file mode 100644
index 0000000000..51532825a3
--- /dev/null
+++ b/packages/testing-framework/tsconfig.build.json
@@ -0,0 +1,7 @@
+{
+  "extends": "./tsconfig.json",
+  "compilerOptions": {
+    "rootDir": "src"
+  },
+  "include": ["src"]
+}
diff --git a/packages/testing-framework/tsconfig.json b/packages/testing-framework/tsconfig.json
new file mode 100644
index 0000000000..8ac2495426
--- /dev/null
+++ b/packages/testing-framework/tsconfig.json
@@ -0,0 +1,24 @@
+{
+  "extends": "../shared/tsconfig.base.json",
+  "compilerOptions": {
+    "rootDir": ".",
+    "module": "ES2020",
+    "declarationDir": "dist/types",
+    "types": ["node"],
+    "paths": {
+      "@/*": ["./src/*"]
+    }
+  },
+  "include": ["src", "tests"],
+  "references": [
+    {
+      "path": "../core"
+    },
+    {
+      "path": "../shared"
+    },
+    {
+      "path": "../web-integration"
+    }
+  ]
+}
diff --git a/packages/testing-framework/vitest.config.ts b/packages/testing-framework/vitest.config.ts
new file mode 100644
index 0000000000..613747d241
--- /dev/null
+++ b/packages/testing-framework/vitest.config.ts
@@ -0,0 +1,22 @@
+import path from 'node:path';
+import dotenv from 'dotenv';
+import { defineConfig } from 'vitest/config';
+
+/**
+ * Read environment variables from the repo root .env (if present).
+ */
+dotenv.config({
+  path: path.join(__dirname, '../../.env'),
+});
+
+export default defineConfig({
+  resolve: {
+    alias: {
+      '@': path.resolve(__dirname, 'src'),
+    },
+  },
+  test: {
+    include: ['tests/unit-test/**/*.test.ts'],
+    testTimeout: 30 * 1000,
+  },
+});
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index 0f5e0cd14b..974313c026 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -1616,6 +1616,49 @@ importers:
         specifier: 3.0.5
         version: 3.0.5(@types/debug@4.1.12)(@types/node@18.19.62)(jsdom@29.0.2)(less@4.3.0)(lightningcss@1.30.1)(sass-embedded@1.86.3)(terser@5.46.1)
 
+  packages/testing-framework:
+    dependencies:
+      '@earendil-works/pi-ai':
+        specifier: ^0.78.0
+        version: 0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      '@earendil-works/pi-coding-agent':
+        specifier: ^0.78.0
+        version: 0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      '@midscene/core':
+        specifier: workspace:*
+        version: link:../core
+      '@midscene/shared':
+        specifier: workspace:*
+        version: link:../shared
+      jiti:
+        specifier: 2.7.0
+        version: 2.7.0
+      js-yaml:
+        specifier: 4.1.0
+        version: 4.1.0
+    devDependencies:
+      '@midscene/web':
+        specifier: workspace:*
+        version: link:../web-integration
+      '@rslib/core':
+        specifier: ^0.18.3
+        version: 0.18.3(@microsoft/api-extractor@7.52.10(@types/node@18.19.130))(typescript@5.8.3)
+      '@types/js-yaml':
+        specifier: 4.0.9
+        version: 4.0.9
+      '@types/node':
+        specifier: ^18.0.0
+        version: 18.19.130
+      dotenv:
+        specifier: ^16.4.5
+        version: 16.4.7
+      typescript:
+        specifier: ^5.8.3
+        version: 5.8.3
+      vitest:
+        specifier: 3.0.5
+        version: 3.0.5(@types/debug@4.1.12)(@types/node@18.19.130)(jsdom@29.0.2)(less@4.3.0)(lightningcss@1.30.1)(sass-embedded@1.86.3)(terser@5.46.1)
+
   packages/visualizer:
     dependencies:
       '@ant-design/icons':
@@ -1872,6 +1915,12 @@ importers:
         specifier: 3.0.5
         version: 3.0.5(@types/debug@4.1.12)(@types/node@18.19.118)(jsdom@29.0.2)(less@4.3.0)(lightningcss@1.30.1)(sass-embedded@1.86.3)(terser@5.46.1)
 
+  tmp-20332-jvOvjDBiByix:
+    devDependencies:
+      nx:
+        specifier: 22.7.5
+        version: 22.7.5
+
 packages:
 
   '@alloc/quick-lru@5.2.0':
@@ -1916,6 +1965,15 @@ packages:
     peerDependencies:
       react: '>=16.9.0'
 
+  '@anthropic-ai/sdk@0.91.1':
+    resolution: {integrity: sha512-LAmu761tSN9r66ixvmciswUj/ZC+1Q4iAfpedTfSVLeswRwnY3n2Nb6Tsk+cLPP28aLOPWeMgIuTuCcMC6W/iw==}
+    hasBin: true
+    peerDependencies:
+      zod: ^3.25.0 || ^4.0.0
+    peerDependenciesMeta:
+      zod:
+        optional: true
+
   '@appium/logger@1.6.1':
     resolution: {integrity: sha512-3TWpLR1qVQ0usLJ6R49iN4TV9Zs0nog1oL3hakCglwP0g4ZllwwEbp+2b1ovJfX6oOv1wXNREyokq2uxU5gB/Q==}
     engines: {node: ^14.17.0 || ^16.13.0 || >=18.0.0, npm: '>=8'}
@@ -2005,6 +2063,107 @@ packages:
     resolution: {integrity: sha512-Hb4o6h1Pf6yRUAX07DR4JVY7dmQw+RVQMW5/m55GoiAT/VRoKCWBtIUPPOnqDVhbx1Cjfil9b6EDrgJsUAujEQ==}
     engines: {node: '>= 10'}
 
+  '@aws-crypto/crc32@5.2.0':
+    resolution: {integrity: sha512-nLbCWqQNgUiwwtFsen1AdzAtvuLRsQS8rYgMuxCrdKf9kOssamGLuPwyTY9wyYblNr9+1XM8v6zoDTPPSIeANg==}
+    engines: {node: '>=16.0.0'}
+
+  '@aws-crypto/sha256-browser@5.2.0':
+    resolution: {integrity: sha512-AXfN/lGotSQwu6HNcEsIASo7kWXZ5HYWvfOmSNKDsEqC4OashTp8alTmaz+F7TC2L083SFv5RdB+qU3Vs1kZqw==}
+
+  '@aws-crypto/sha256-js@5.2.0':
+    resolution: {integrity: sha512-FFQQyu7edu4ufvIZ+OadFpHHOt+eSTBaYaki44c+akjg7qZg9oOQeLlk77F6tSYqjDAFClrHJk9tMf0HdVyOvA==}
+    engines: {node: '>=16.0.0'}
+
+  '@aws-crypto/supports-web-crypto@5.2.0':
+    resolution: {integrity: sha512-iAvUotm021kM33eCdNfwIN//F77/IADDSs58i+MDaOqFrVjZo9bAal0NK7HurRuWLLpF1iLX7gbWrjHjeo+YFg==}
+
+  '@aws-crypto/util@5.2.0':
+    resolution: {integrity: sha512-4RkU9EsI6ZpBve5fseQlGNUWKMa1RLPQ1dnjnQoe07ldfIzcsGb5hC5W0Dm7u423KWzawlrpbjXBrXCEv9zazQ==}
+
+  '@aws-sdk/client-bedrock-runtime@3.1048.0':
+    resolution: {integrity: sha512-u+NT61JZEkRFtpL0CAw1N1dwxnaLgwVXQl/zjJxTGgLyS/jTIdg2SdoEoCTHxgDyCnqa1HEi9QOoE9/pYRNpOQ==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/core@3.974.17':
+    resolution: {integrity: sha512-r8o4h2K7j6P9ngno+8ei0aK0U/4JwDb7A2fMMxGVoSqDN8AFlIzSDeZHME9LcVLR2codyhtr1WAAg+/nmkeeMA==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-env@3.972.43':
+    resolution: {integrity: sha512-g0XVQKzaA/4cq1vz1IvCQwYM+1Pkv01J9yHDpCTXekVuGZRDEz0wqBQ1AuYTq7FM6uik4uBGH8Tb5d9YvgeA7g==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-http@3.972.45':
+    resolution: {integrity: sha512-w9PuOoKCt6+xoESvY+zlV0u3PKQ0mVL259PcsVR6a3S/uYJJHnIi4r1NxdJHEcNldUVRIciltWnFMGBR4YEm3g==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-ini@3.972.48':
+    resolution: {integrity: sha512-+6BQ6Lrnc+EyAGElLRW6j+Sa+RirPHnIJsobvYO6nnyK+oGKmz1ne/ieclbLWyjyDKEU3/JVJWcWY3VLFPvGtQ==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-login@3.972.47':
+    resolution: {integrity: sha512-Iy2ebWVgrZBH05464uJiQYu6HSSiROnwVZptthEFXx2gWjo1ORCxEAFZB5Cr2MdfrSnZ+0QUPkZ1ZpCqpkUrLQ==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-node@3.972.50':
+    resolution: {integrity: sha512-b05Aelq5cqAvCCDQjCYacl0XmR8QhBNSqLbsdISkQmlQBa5oPS66zYPteWcSp5LswbpoIe552EUGjluKiadBig==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-process@3.972.43':
+    resolution: {integrity: sha512-GPokLNyvTfCmuaHk+v3GKVs4ZT3cMu5kgS2a+NPkOMt96cq6fSIK0g+mZHpGS6Cd4QGrPKesANEaLUKgOskTzg==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-sso@3.972.47':
+    resolution: {integrity: sha512-0AzvLrzlvJs0DzbeWGvNj+bX3Uzd7VNS6vDqCOdZzBlCGKGd78uxctJSW9iK/Rt/nxiJqpTvrYQlVJ4guVM2Dw==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/credential-provider-web-identity@3.972.47':
+    resolution: {integrity: sha512-eksfbUErOejUAGWBAcNqaP7IX21oUOEo73d9R56k9Ua4d57qS90NEYkWJsuSGzTXMFulCu17qXJI/qGmM7hvoA==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/eventstream-handler-node@3.972.19':
+    resolution: {integrity: sha512-MZhrsChY4jwEp7LQnNkcNSvF4KHjDC8es1pgu61h6L48fY7YgRqDfGRoT4ADd7lj4dB+gtOYITgmf7k4QQ2TKg==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/middleware-eventstream@3.972.15':
+    resolution: {integrity: sha512-4qYsO6temM6rEawcxHpMPWnRSIiLzsKhuizMlXCVujj54Q+HoGkVlcxk8S+5ekq/hOBdkyRnQjNsZaeRBz60hg==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/middleware-websocket@3.972.25':
+    resolution: {integrity: sha512-1u/r6SYArJr5qBHWQzwGw8cQu32V5Rcx68qb4v+ZhHXFn6dGDtCG5ImyULCLxhTktibLTh2qaRHOoHmkTKCyvA==}
+    engines: {node: '>= 14.0.0'}
+
+  '@aws-sdk/nested-clients@3.997.15':
+    resolution: {integrity: sha512-Fpri1/PXKMKveORZ7E00VLTlWS5DkfZkW70PUE+bOnpWpAeHAQLoiDHhkzN3kNWbbSsGg64+IZYiq/EZgME3Mg==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/signature-v4-multi-region@3.996.31':
+    resolution: {integrity: sha512-Kn2up9SlG1KC6wRtwf0d7waTGF6rvp9DxYqB54x6UCKdQ6kyaXCqHL4WGb5vUJga5kS8FxnjhY0LqM28aMvnNQ==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/token-providers@3.1048.0':
+    resolution: {integrity: sha512-k0y/GcuesuSfWyUM0WamrGyeZmltRYaPbHO82UDA6mZ/doB+FOHKutikPAtSXMn/hDz970cF+iRuuiYO9VEbAA==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/token-providers@3.1060.0':
+    resolution: {integrity: sha512-6NZaMKkFhpaNiwLpHi1sZaYjidL/lCJE6ME6NxwA8gv9vQna+Kr0j4OFwVoz6tANRWM3WbGz6jiPsGX/Vkjwow==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/types@3.973.10':
+    resolution: {integrity: sha512-992QrTO7G9qCvKD0fx1rMlqcL14plUcRAbwmqqYVsuF3GrqcvlAL9qxR+baMafarEZ+l7DUQ5lCMmt5mbMhF7g==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/util-locate-window@3.965.5':
+    resolution: {integrity: sha512-WhlJNNINQB+9qtLtZJcpQdgZw3SCDCpXdUJP7cToGwHbCWCnRckGlc6Bx/OhWwIYFNAn+FIydY8SZ0QmVu3xTQ==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws-sdk/xml-builder@3.972.27':
+    resolution: {integrity: sha512-hpsCXCOI436kxWpjtRuIHVvuPP81MOw8f18jzfZeg+UOiiOvlqWcmWChzEhJEu16cOC6+ku4ncBN+7rdt+DZ9g==}
+    engines: {node: '>=20.0.0'}
+
+  '@aws/lambda-invoke-store@0.2.4':
+    resolution: {integrity: sha512-iY8yvjE0y651BixKNPgmv1WrQc+GZ142sb0z4gYnChDDY2YqI4P/jsSopBWrKfAt7LOJAkOXt7rC/hms+WclQQ==}
+    engines: {node: '>=18.0.0'}
+
   '@babel/code-frame@7.26.2':
     resolution: {integrity: sha512-RJlIHRueQgwWitWgF8OdFYGZX328Ax5BCemNGlqHfplnRT9ESi8JkFlvaVYbS+UubVY6dpv87Fs2u5M29iNFVQ==}
     engines: {node: '>=6.9.0'}
@@ -2304,19 +2463,16 @@ packages:
   '@computer-use/libnut-darwin@2.7.1':
     resolution: {integrity: sha512-7B/aPcIYS4a4S7D3IYIHSpZ4B4m8Z3CjlYq0efTr+/JYmEu+LlO67ZhPvisLKifhagxf7goqEfnphg1F4jq5jw==}
     engines: {node: '>=10.15.3'}
-    cpu: [x64, arm64]
     os: [darwin, linux, win32]
 
   '@computer-use/libnut-linux@2.7.1':
     resolution: {integrity: sha512-QJD5URTFJ/2+JwBwRyajRF2BB+3eXpd4+t5btGeRVeiRQLKQ4lgorbMHySo6IrfAbSfnU1OVOrxAUygGxj0cFg==}
     engines: {node: '>=10.15.3'}
-    cpu: [x64, arm64]
     os: [darwin, linux, win32]
 
   '@computer-use/libnut-win32@2.7.1':
     resolution: {integrity: sha512-nDvH5kP1zoO2cBtFYWV0om9xtTu523cc1LIk8r/wizqPrIAm0wCizTU+odF3Fi42zcKJWT6J+Pguy8fKrZyIuA==}
     engines: {node: '>=10.15.3'}
-    cpu: [x64, arm64]
     os: [darwin, linux, win32]
 
   '@computer-use/libnut@4.2.0':
@@ -2406,6 +2562,24 @@ packages:
     resolution: {integrity: sha512-dBVuXR082gk3jsFp7Rd/JI4kytwGHecnCoTtXFb7DB6CNHp4rg5k1bhg0nWdLGLnOV71lmDzGQaLMy8iPLY0pw==}
     engines: {node: '>=10.0.0'}
 
+  '@earendil-works/pi-agent-core@0.78.0':
+    resolution: {integrity: sha512-xhWd59Qzd8yO88gYQw2S4dEQstJJEiUtxRP01//YzVJ61jCtUASMfcyAmYhgGYR4Onp7GmwEAbBBGOiV6Iwk9g==}
+    engines: {node: '>=22.19.0'}
+
+  '@earendil-works/pi-ai@0.78.0':
+    resolution: {integrity: sha512-q0hUrvT6ngT6cgBX0oIbzfQfmzztgdkZobP8OTL+sCOOBlnG6+1YRt8g7zO9CC/4NdeYEqa7uGqWdQhH0fjCLA==}
+    engines: {node: '>=22.19.0'}
+    hasBin: true
+
+  '@earendil-works/pi-coding-agent@0.78.0':
+    resolution: {integrity: sha512-gXt6pD3BoSG0yLwfLqb6844vz6qAO87PvNrv+YSDYKP3QliTjcwIld9v4ihmDcmBjO13QwKswubq/lYCvn4bkg==}
+    engines: {node: '>=22.19.0'}
+    hasBin: true
+
+  '@earendil-works/pi-tui@0.78.0':
+    resolution: {integrity: sha512-3a705FnsVVUhAyceShNB3kS2rpxcxLcx+hqB0u6MMMpHwQGbW+m++MqA6r7eOzq/8FLx5e3vDh38h/SVTk2qzw==}
+    engines: {node: '>=22.19.0'}
+
   '@electron/asar@4.2.0':
     resolution: {integrity: sha512-npW1NW5yy8EB9XY/vEw9sUdgmq0sJEhmSBb6bqyFOAw1CSkrhvAvO6QWlW8CdIMo8VN1lkdF345l/MeW0LrY0Q==}
     engines: {node: '>=22.12.0'}
@@ -2445,15 +2619,24 @@ packages:
   '@emnapi/core@1.10.0':
     resolution: {integrity: sha512-yq6OkJ4p82CAfPl0u9mQebQHKPJkY7WrIuk205cTYnYe+k2Z8YBh11FrbRG/H6ihirqcacOgl2BIO8oyMQLeXw==}
 
+  '@emnapi/core@1.4.5':
+    resolution: {integrity: sha512-XsLw1dEOpkSX/WucdqUhPWP7hDxSvZiY+fsUC14h+FtQ2Ifni4znbBt8punRX+Uj2JG/uDb8nEHVKvrVlvdZ5Q==}
+
   '@emnapi/core@1.7.1':
     resolution: {integrity: sha512-o1uhUASyo921r2XtHYOHy7gdkGLge8ghBEQHMWmyJFoXlpU58kIrhhN3w26lpQb6dspetweapMn2CSNwQ8I4wg==}
 
   '@emnapi/runtime@1.10.0':
     resolution: {integrity: sha512-ewvYlk86xUoGI0zQRNq/mC+16R1QeDlKQy21Ki3oSYXNgLb45GV1P6A0M+/s6nyCuNDqe5VpaY84BzXGwVbwFA==}
 
+  '@emnapi/runtime@1.4.5':
+    resolution: {integrity: sha512-++LApOtY0pEEz1zrd9vy1/zXVaVJJ/EbAF3u0fXIzPJEDtnITsBGbbK0EkM72amhl/R5b+5xx0Y/QhcVOpuulg==}
+
   '@emnapi/runtime@1.7.1':
     resolution: {integrity: sha512-PVtJr5CmLwYAU9PZDMITZoR5iAOShYREoR45EyyLrbntV50mdePTgUn4AmOw90Ifcj+x2kRjdzr1HP3RrNiHGA==}
 
+  '@emnapi/wasi-threads@1.0.4':
+    resolution: {integrity: sha512-PJR+bOmMOPH8AtcTGAyYNiuJ3/Fcoj2XN/gBEWzDIKh254XO+mM9XoXHk5GNEhodxeMznbg7BlRojVbKN+gC6g==}
+
   '@emnapi/wasi-threads@1.1.0':
     resolution: {integrity: sha512-WI0DdZ8xFSbgMjR1sFsKABJ/C5OnRrjT06JXbZKexJGrDuPTzZdDYfFlsgcCXCyf+suG5QU2e/y1Wo2V/OapLQ==}
 
@@ -2873,6 +3056,15 @@ packages:
     resolution: {integrity: sha512-PyUXQWB42s4jBli435TDiYuVsadwRHnMc27YaLouINktvTWsL3FcKrRMGawTayFk46X+n5bE23RjUTWQwrukWw==}
     engines: {node: '>= 0.10.0'}
 
+  '@google/genai@1.52.0':
+    resolution: {integrity: sha512-gwSvbpiN/17O9TbsqSsE/OzZcpv5Fo4RQjdngGgogtuB9RsyJ8ZHhX5KjHj1bp5N9snN2eK8LDGXSaWW2hof8Q==}
+    engines: {node: '>=20.0.0'}
+    peerDependencies:
+      '@modelcontextprotocol/sdk': ^1.25.2
+    peerDependenciesMeta:
+      '@modelcontextprotocol/sdk':
+        optional: true
+
   '@humanwhocodes/config-array@0.13.0':
     resolution: {integrity: sha512-DZLEEqFWQFiyK6h5YIeynKx7JlvCYWL0cImfSRXZ9l4Sg2efkFGTuFf6vzXjK1cq6IYkU+Eg/JizXw+TD2vRNw==}
     engines: {node: '>=10.10.0'}
@@ -3533,6 +3725,69 @@ packages:
     resolution: {integrity: sha512-0FOIepYR4ugPYaHwK7hDeHDkfPOBVvayt9QpvRbi2LT/h2b0GaE/gM9Gag7fsnyYyNaTZ2IGyOuVg07IYepvYQ==}
     engines: {node: '>=20.0.0'}
 
+  '@mariozechner/clipboard-darwin-arm64@0.3.9':
+    resolution: {integrity: sha512-BfgV7vCEWZwJwZJw03r6bP5+tf0iI/ANuQYCxi9RNn7FrWB3yzGuMKCrNLRl6V761vXRdL8+OqZ0wd4TqlsNOQ==}
+    engines: {node: '>= 10'}
+    cpu: [arm64]
+    os: [darwin]
+
+  '@mariozechner/clipboard-darwin-universal@0.3.9':
+    resolution: {integrity: sha512-BGGR4iA9Z2shAjI65eI5xtyb3LYNlDW9X3gxKxDbqtbnREohsrqznov6zpKoIrsRWpzlYVEdKphS7ksJ0/ndSQ==}
+    engines: {node: '>= 10'}
+    os: [darwin]
+
+  '@mariozechner/clipboard-darwin-x64@0.3.9':
+    resolution: {integrity: sha512-4kURmCbS6nt8uYhtmWpUcJWyPHfmAr5dTpXD1nO3pIfa+TSQ9DbrGOYCKH+aEFW47XhQ4Vp8ZTszie+wfFvDKg==}
+    engines: {node: '>= 10'}
+    cpu: [x64]
+    os: [darwin]
+
+  '@mariozechner/clipboard-linux-arm64-gnu@0.3.9':
+    resolution: {integrity: sha512-g59OkUGP2DDfCOIKypHeYgv2M55u/cKvXa5dSxFbEJ34XvIQMdcVmpKCkGUro3ZgefXiGVdwguvTMQGpHWzIXw==}
+    engines: {node: '>= 10'}
+    cpu: [arm64]
+    os: [linux]
+
+  '@mariozechner/clipboard-linux-arm64-musl@0.3.9':
+    resolution: {integrity: sha512-AGuJdgKsmJdm4Pych7kv3sqe591ERRaAHW3xjLooiFzn8J+PxUyof++7YZrB5Y5tpnTO+K18Og3taj2NpluCRQ==}
+    engines: {node: '>= 10'}
+    cpu: [arm64]
+    os: [linux]
+
+  '@mariozechner/clipboard-linux-riscv64-gnu@0.3.9':
+    resolution: {integrity: sha512-DXBEAiuMpk7dhS1a9NzNxVAFi1vaKoPu7rQNgY8LIDLGrK3lnIp3nT10DUum+PKVJoJppIP+NAA8IZe4DMNDPw==}
+    engines: {node: '>= 10'}
+    cpu: [riscv64]
+    os: [linux]
+
+  '@mariozechner/clipboard-linux-x64-gnu@0.3.9':
+    resolution: {integrity: sha512-WORrMLd6EpElEME7JRKfSaY34nW1P5LbdgK5YNCS1ncG2LqmITsSMEJ8nh2mpvxb3TxqbOOKgY7k9eMJYlW9Mw==}
+    engines: {node: '>= 10'}
+    cpu: [x64]
+    os: [linux]
+
+  '@mariozechner/clipboard-linux-x64-musl@0.3.9':
+    resolution: {integrity: sha512-/DHn+1DrfL6oRaPPWXaOKvonFFrni666fxd+zFqiQEfvBH0tsHVWjq9iqBk0oDp0qaPA72lIMy5BptxISBEhZQ==}
+    engines: {node: '>= 10'}
+    cpu: [x64]
+    os: [linux]
+
+  '@mariozechner/clipboard-win32-arm64-msvc@0.3.9':
+    resolution: {integrity: sha512-O5FHD3ErkMwMhNzAfu3ggy0ug4z7btZuoQgwwxlzPrwV2bxlD6WDpqBY4NCgICAgZdDKdp+loUEKVAVt8aYnhQ==}
+    engines: {node: '>= 10'}
+    cpu: [arm64]
+    os: [win32]
+
+  '@mariozechner/clipboard-win32-x64-msvc@0.3.9':
+    resolution: {integrity: sha512-ihQC3EufqEY81vhXBgVBtK4prL+wc62zJsSvxrgz7K1hsdt6OObz6v9p3Rn1OG3GJksTTKMJF0u/guMISHPhSA==}
+    engines: {node: '>= 10'}
+    cpu: [x64]
+    os: [win32]
+
+  '@mariozechner/clipboard@0.3.9':
+    resolution: {integrity: sha512-ABnA53mdfkGZwOFUdZNv2S0CWGO/EIuPj8Vv9xmBFmSYg/qFc7ihO6q5FcQjvoE67kZpWkEc4AhD6B/os04yuA==}
+    engines: {node: '>= 10'}
+
   '@mdn/browser-compat-data@7.1.7':
     resolution: {integrity: sha512-bpWZ7hidvjrwNWcMngZ8nTMTxn8WhnQntsGqEYgPr1vjy66kfwfDVizwXg6PvsgoANZ7nhuRBmvzjpCMk4ITDw==}
 
@@ -3558,6 +3813,9 @@ packages:
   '@microsoft/tsdoc@0.15.1':
     resolution: {integrity: sha512-4aErSrCR/On/e5G2hDP0wjooqDdauzEbIq8hIkIe5pXV0rtWJZvdCEKL0ykZxex+IxIwBp0eGeV48hQN07dXtw==}
 
+  '@mistralai/mistralai@2.2.1':
+    resolution: {integrity: sha512-uKU8CZmL2RzYKmplsU01hii4p3pe4HqJefpWNRWXm1Tcm0Sm4xXfwSLIy4k7ZCPlbETCGcp69E7hZs+WOJ5itQ==}
+
   '@modelcontextprotocol/inspector-cli@0.16.3':
     resolution: {integrity: sha512-6Hbh+QVRsEDel7hA9qiRklwWEoW4dQXHw4Ltr8JdsU1ziqem4/ERgGxthg/d+qrmbVwW9shOqPIaGoRFaZ264g==}
     hasBin: true
@@ -3616,6 +3874,9 @@ packages:
       '@emnapi/core': ^1.7.1
       '@emnapi/runtime': ^1.7.1
 
+  '@nodable/entities@2.1.1':
+    resolution: {integrity: sha512-Pig3HxDIoMgjdEH8OCf/dkcTmLFjJRjWuq8jSnklu284/TKOPibSRERmOykiwmyXTtv61mP+44f3GMx0tLAyjg==}
+
   '@nodelib/fs.scandir@2.1.5':
     resolution: {integrity: sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==}
     engines: {node: '>= 8'}
@@ -3633,51 +3894,101 @@ packages:
     cpu: [arm64]
     os: [darwin]
 
+  '@nx/nx-darwin-arm64@22.7.5':
+    resolution: {integrity: sha512-eoPtwx0qZqvRUD+VVOHm150AlSYwYoPxkDHBBGqKCn5nzPspb0lLWw8q83crM/L1M928YgK0WmGf3C++7eqsTA==}
+    cpu: [arm64]
+    os: [darwin]
+
   '@nx/nx-darwin-x64@22.1.3':
     resolution: {integrity: sha512-XmdccOBp1Lx9DXUzYDX65mkFqFvXaxUKm1d63bfA43vxIYUpR59SASB81KRQ/Q4dgvvU27C0EJuxSJbXsSkSYw==}
     cpu: [x64]
     os: [darwin]
 
+  '@nx/nx-darwin-x64@22.7.5':
+    resolution: {integrity: sha512-VLOn/ZoEn3HfjSj+yIHLCM56/el79r+9I28CkZNHaSXJQWZ3edSkcgcfYjVxCurpN2VEwDQHLBeFCH8M+lQ7wQ==}
+    cpu: [x64]
+    os: [darwin]
+
   '@nx/nx-freebsd-x64@22.1.3':
     resolution: {integrity: sha512-O+o4mqPwhKxfdsri4KxDbXbjwIwr04GfTSfA0TwgXs6hFf68qmc45FAmPGrPSvxIJg9+mUVDeFirdS8GcUE0jQ==}
     cpu: [x64]
     os: [freebsd]
 
+  '@nx/nx-freebsd-x64@22.7.5':
+    resolution: {integrity: sha512-LEVer/E2xfGvK9Go+imMQoEninOoq/38Z2bhV1SD3AThXrp1xaLFVkW5jQ6juebeVkAeztEoMLFlr576egS0vw==}
+    cpu: [x64]
+    os: [freebsd]
+
   '@nx/nx-linux-arm-gnueabihf@22.1.3':
     resolution: {integrity: sha512-ZIPDgzLq8qmvrZ3Bp+bWXam5uKwahjcChBNtORVtrHQfm4mxov2RMUMKTg2ZsVAWVP64zK+gmzG5LuoZjPMm4Q==}
     cpu: [arm]
     os: [linux]
 
+  '@nx/nx-linux-arm-gnueabihf@22.7.5':
+    resolution: {integrity: sha512-NP27EFGpmFJM6RL1Ey/AFJ7gA2xuqtIHaw6jjSNGvfrnZRUNaway30GrVaGGeODf0DsvAty/unqoBMPy6kDHbw==}
+    cpu: [arm]
+    os: [linux]
+
   '@nx/nx-linux-arm64-gnu@22.1.3':
     resolution: {integrity: sha512-wgpPaTpQKl+cCkSuE5zamTVrg14mRvT+bLAeN/yHSUgMztvGxwl3Ll+K9DgEcktBo1PLECTWNkVaW8IAsJm4Rg==}
     cpu: [arm64]
     os: [linux]
 
+  '@nx/nx-linux-arm64-gnu@22.7.5':
+    resolution: {integrity: sha512-QLnkJl3HkHsPfpLiNiAiMfpfAeFpic0U1diAxF8RqChOkCpQ7ulvyBVgE1UrQxvhd+gFQ3ed5RNDxtCRw8nTiw==}
+    cpu: [arm64]
+    os: [linux]
+
   '@nx/nx-linux-arm64-musl@22.1.3':
     resolution: {integrity: sha512-o9XmQehSPR2y0RD4evD+Ob3lNFuwsFOL5upVJqZ3rcE6GkJIFPg8SwEP5FaRIS5MwS04fxnek20NZ18BHjjV/g==}
     cpu: [arm64]
     os: [linux]
 
+  '@nx/nx-linux-arm64-musl@22.7.5':
+    resolution: {integrity: sha512-cEP6KmwBgnb38+jTTaibWCjwXcHmigqhTfy0tN1be7WZr6bHxbqNLsXqKRN70PSNA3HouZcxw1cdRL8tqbPBBA==}
+    cpu: [arm64]
+    os: [linux]
+
   '@nx/nx-linux-x64-gnu@22.1.3':
     resolution: {integrity: sha512-ekcinyDNTa2huVe02T2SFMR8oArohozRbMGO19zftbObXXI4dLdoAuLNb3vK9Pe4vYOpkhfxBVkZvcWMmx7JdA==}
     cpu: [x64]
     os: [linux]
 
+  '@nx/nx-linux-x64-gnu@22.7.5':
+    resolution: {integrity: sha512-tbaX1tZCSpGifDNBfDdEZAMxVF3Yg4bhFP/bm1needc0diqb+Zflc0u5tM5/6BWDMITQDwenJVsNiQ8ZdtJURA==}
+    cpu: [x64]
+    os: [linux]
+
   '@nx/nx-linux-x64-musl@22.1.3':
     resolution: {integrity: sha512-CqpRIJeIgELCqIgjtSsYnnLi6G0uqjbp/Pw9d7w4im4/NmJXqaE9gxpdHA1eowXLgAy9W1LkfzCPS8Q2IScPuQ==}
     cpu: [x64]
     os: [linux]
 
+  '@nx/nx-linux-x64-musl@22.7.5':
+    resolution: {integrity: sha512-H0M7csOZIgPT822LqjxSXzf4MXRND15vIkAQe3F3Jlr3Si8LC3tzbL52aVcRfgb8MF/xOB5U47mSwxWt1M2bPQ==}
+    cpu: [x64]
+    os: [linux]
+
   '@nx/nx-win32-arm64-msvc@22.1.3':
     resolution: {integrity: sha512-YbuWb8KQsAR9G0+7b4HA16GV962/VWtRcdS7WY2yaScmPT2W5rObl528Y2j4DuB0j/MVZj12qJKrYfUyjL+UJA==}
     cpu: [arm64]
     os: [win32]
 
+  '@nx/nx-win32-arm64-msvc@22.7.5':
+    resolution: {integrity: sha512-JTcZch9YAnDL1gbhqePz3DZ4x7iYemLn1yJzrjbbXAmXju2eiiJiZvJJHbV06+SP9HKXDT8RjTKuAWTdVxnHug==}
+    cpu: [arm64]
+    os: [win32]
+
   '@nx/nx-win32-x64-msvc@22.1.3':
     resolution: {integrity: sha512-G90Sp409ypeOUbmj6nmEbdy043KJUKaZ7pffxmM6i63yEe2F2WdmMgdi525vUEgmq+pfB9zQQOX1sDR/rPFvtg==}
     cpu: [x64]
     os: [win32]
 
+  '@nx/nx-win32-x64-msvc@22.7.5':
+    resolution: {integrity: sha512-ngcMyHdBJ9FSz2nHdbZ7gtJlFq0O2b05sPAsVMkZ18CKzdaA1qrBDJfsMO49hPCny505eiT766+CkKdaCDl5kA==}
+    cpu: [x64]
+    os: [win32]
+
   '@opentelemetry/api-logs@0.210.0':
     resolution: {integrity: sha512-CMtLxp+lYDriveZejpBND/2TmadrrhUfChyxzmkFtHaMDdSKfP59MAYyA0ICBvEBdm3iXwLcaj/8Ic/pnGw9Yg==}
     engines: {node: '>=8.0.0'}
@@ -3771,18 +4082,30 @@ packages:
   '@protobufjs/codegen@2.0.4':
     resolution: {integrity: sha512-YyFaikqM5sH0ziFZCN3xDC7zeGaB/d0IUb9CATugHWbd1FRFwWwt4ld4OYMPWu5a3Xe01mGAULCdqhMlPl29Jg==}
 
+  '@protobufjs/codegen@2.0.5':
+    resolution: {integrity: sha512-zgXFLzW3Ap33e6d0Wlj4MGIm6Ce8O89n/apUaGNB/jx+hw+ruWEp7EwGUshdLKVRCxZW12fp9r40E1mQrf/34g==}
+
   '@protobufjs/eventemitter@1.1.0':
     resolution: {integrity: sha512-j9ednRT81vYJ9OfVuXG6ERSTdEL1xVsNgqpkxMsbIabzSo3goCjDIveeGv5d03om39ML71RdmrGNjG5SReBP/Q==}
 
+  '@protobufjs/eventemitter@1.1.1':
+    resolution: {integrity: sha512-vW1GmwMZNnL+gMRaovlh9yZX74kc+TTU3FObkkurpMaRtBfLP3ldjS9KQWlwZgraRE0+dheEEoAxdzcJQ8eXZg==}
+
   '@protobufjs/fetch@1.1.0':
     resolution: {integrity: sha512-lljVXpqXebpsijW71PZaCYeIcE5on1w5DlQy5WH6GLbFryLUrBD4932W/E2BSpfRJWseIL4v/KPgBFxDOIdKpQ==}
 
+  '@protobufjs/fetch@1.1.1':
+    resolution: {integrity: sha512-GpptLrs57adMSuHi3VNj0mAF8dwh36LMaYF6XyJ6JMWlVsc+t42tm1HSEDmOs3A8fC9yyeisgLhsTVQokOZ0zw==}
+
   '@protobufjs/float@1.0.2':
     resolution: {integrity: sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==}
 
   '@protobufjs/inquire@1.1.0':
     resolution: {integrity: sha512-kdSefcPdruJiFMVSbn801t4vFK7KB/5gd2fYvrxhuJYg8ILrmn9SKSX2tZdV6V+ksulWqS7aXjBcRXl3wHoD9Q==}
 
+  '@protobufjs/inquire@1.1.2':
+    resolution: {integrity: sha512-pa0vFRuws4wkvaXKK1uXZMAwAX4/t8ANaJo45iw/oQHNQ9q5xUzwgFmVJGXiga2BeN+zpX7Vf9vmsiIa2J+MUw==}
+
   '@protobufjs/path@1.1.2':
     resolution: {integrity: sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==}
 
@@ -3792,6 +4115,9 @@ packages:
   '@protobufjs/utf8@1.1.0':
     resolution: {integrity: sha512-Vvn3zZrhQZkkBE8LSuW3em98c0FwgO4nxzv6OdSxPKJIEKY2bGbHn+mhGIPerzI4twdxaP8/0+06HBpwf345Lw==}
 
+  '@protobufjs/utf8@1.1.1':
+    resolution: {integrity: sha512-oOAWABowe8EAbMyWKM0tYDKi8Yaox52D+HWZhAIJqQXbqe0xI/GV7FhLWqlEKreMkfDjshR5FKgi3mnle0h6Eg==}
+
   '@puppeteer/browsers@2.9.0':
     resolution: {integrity: sha512-8+xM+cFydYET4X/5/3yZMHs7sjS6c9I6H5I3xJdb6cinzxWUT/I2QVw4avxCQ8QDndwdHkG/FiSZIrCjAbaKvQ==}
     engines: {node: '>=18'}
@@ -4831,6 +5157,9 @@ packages:
   '@silvia-odwyer/photon-node@0.3.3':
     resolution: {integrity: sha512-30nDWTHQ7/d1xGnO41ol5tnBA1Bmo2N6h9HNPByBbIYU2xCYB9g4o4zB6vxAq15ixrBRTjb1Nnz1K0Jli3Hxnw==}
 
+  '@silvia-odwyer/photon-node@0.3.4':
+    resolution: {integrity: sha512-bnly4BKB3KDTFxrUIcgCLbaeVVS8lrAkri1pEzskpmxu9MdfGQTy8b8EgcD83ywD3RPMsIulY8xJH5Awa+t9fA==}
+
   '@silvia-odwyer/photon@0.3.3':
     resolution: {integrity: sha512-8BhUjEch4slwRe8uXnaA4vcA5uiiOTT90UMsxulOj2gN98X1p0q9Z4Ysk4DkD05uNgbR9XoSqtZ37w+33w4QKQ==}
 
@@ -4849,6 +5178,46 @@ packages:
     resolution: {integrity: sha512-tlqY9xq5ukxTUZBmoOp+m61cqwQD5pHJtFY3Mn8CA8ps6yghLH/Hw8UPdqg4OLmFW3IFlcXnQNmo/dh8HzXYIQ==}
     engines: {node: '>=18'}
 
+  '@smithy/core@3.24.6':
+    resolution: {integrity: sha512-wBXDRup6UU97VKyaiRo8AssnfStPtG0oAAfpq/bC0a1YYau8pM86YB4kM6ccoVi1mS8l/UHbn9oDM+7uozr/ug==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/credential-provider-imds@4.3.7':
+    resolution: {integrity: sha512-xj8gq/bjFABAh6qWPSDCYcY3kzQIm4b561C+YnHH4zGq8rOgzQ3Shk+JGlpUxSd41UGiO6FkLdUCtNX1FAeHgg==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/fetch-http-handler@5.4.6':
+    resolution: {integrity: sha512-FEwEYJ1jlBKdhe9TPzfghEi1bP55ZeEImlDkEa62bBBYzUcnB6RUCyuiS2mqKt6ZVjUbBgcNhzfIctH+Hevx9g==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/is-array-buffer@2.2.0':
+    resolution: {integrity: sha512-GGP3O9QFD24uGeAXYUjwSTXARoqpZykHadOmA8G5vfJPK0/DC67qa//0qvqrJzL1xc8WQWX7/yc7fwudjPHPhA==}
+    engines: {node: '>=14.0.0'}
+
+  '@smithy/node-http-handler@4.7.3':
+    resolution: {integrity: sha512-/jPhevcTFPMVl6KNjbaI47iOg1zxC7IsnX4PQDGVZKMFceOXtB8IEYaB7a9VvkP/3oC60WzTeKocvSI7vLT0vA==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/node-http-handler@4.7.6':
+    resolution: {integrity: sha512-3fya8i7GrJilQouk4cZJKdy5k8MWQBpjfXrRNaXDedH8r779tr0jcxyH3+yoTmsluc2+vF4S343yFbnvu8ExDQ==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/signature-v4@5.4.6':
+    resolution: {integrity: sha512-Ojg4B6oIDlIr1R86xCDJt1zJWnYa0VINmqdjfe9qxWjdRivHalZ3iSlQgVqYbW0MdpFOC5XfHEWsnbmdnpIILQ==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/types@4.14.3':
+    resolution: {integrity: sha512-YupL0ZWmFtJexUN2cHzkvvF/b9pKrtAIfT1o7/oY/Ppu8IYeZ+lDPM5vZdQJaSeA132dJCqojjGC9NhXeF71VQ==}
+    engines: {node: '>=18.0.0'}
+
+  '@smithy/util-buffer-from@2.2.0':
+    resolution: {integrity: sha512-IJdWBbTcMQ6DA0gdNhh/BwrLkDR+ADW5Kr1aZmd4k3DIF6ezMV4R2NIAmT08wQJ3yUK82thHWmC/TnK/wpMMIA==}
+    engines: {node: '>=14.0.0'}
+
+  '@smithy/util-utf8@2.3.0':
+    resolution: {integrity: sha512-R8Rdn8Hy72KKcebgLiv8jQcQkXoLMOGGv5uI1/k0l+snqkOzQ1R0ChUBCxWMlBsFMekWjq0wRudIweFs7sKT5A==}
+    engines: {node: '>=14.0.0'}
+
   '@socket.io/component-emitter@3.1.2':
     resolution: {integrity: sha512-9BCxFwvbGg/RsZK9tjXd8s4UcwR0MWeFQ1XEKIQVVvAGJyINdrqKMcTRyLoK8Rse1GjzLV9cwjWV1olXRWEXVA==}
 
@@ -5766,6 +6135,9 @@ packages:
   axios@1.13.2:
     resolution: {integrity: sha512-VPk9ebNqPcy5lRGuSlKx752IlDatOjT9paPlm8A7yOuW2Fbvp4X3JznJtT4f0GzGLLiWE9W8onz51SqLYwzGaA==}
 
+  axios@1.16.0:
+    resolution: {integrity: sha512-6hp5CwvTPlN2A31g5dxnwAX0orzM7pmCRDLnZSX772mv8WDqICwFjowHuPs04Mc8deIld1+ejhtaMn5vp6b+1w==}
+
   axios@1.8.3:
     resolution: {integrity: sha512-iP4DebzoNlP/YN2dpwCgb8zoCmhtkajzS48JvwmkSkXvPI3DHc7m+XYL5tGnSlJtR6nImXZmdCuN5aP8dh1d8A==}
 
@@ -5789,6 +6161,10 @@ packages:
   balanced-match@1.0.2:
     resolution: {integrity: sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==}
 
+  balanced-match@4.0.3:
+    resolution: {integrity: sha512-1pHv8LX9CpKut1Zp4EXey7Z8OfH11ONNH6Dhi2WDUt31VVZFXZzKwXcysBgqSumFCmR+0dqjMK5v5JiFHzi0+g==}
+    engines: {node: 20 || >=22}
+
   balanced-match@4.0.4:
     resolution: {integrity: sha512-BLrgEcRTwX2o6gGxGOCNyMvGSp35YofuYzw9h1IMTRmKqttAZZVU67bdb9Pr2vUHA8+j3i2tJfjO6C6+4myGTA==}
     engines: {node: 18 || 20 || >=22}
@@ -5871,6 +6247,9 @@ packages:
     resolution: {integrity: sha512-QxD8cf2eVqJOOz63z6JIN9BzvVs/dlySa5HGSBH5xtR8dPteIRQnBxxKqkNTiT6jbDTF6jAfrd4oMcND9RGbQg==}
     engines: {node: '>=0.6'}
 
+  bignumber.js@9.3.1:
+    resolution: {integrity: sha512-Ko0uX15oIUS7wJ3Rb30Fs6SkVbLmPBAKdlm7q9+ak9bbIeFf0MwuBsQV6z7+X768/cHsfg+WlysDWJcmthjsjQ==}
+
   binary-extensions@2.3.0:
     resolution: {integrity: sha512-Ceh+7ox5qe7LJuLHoY0feh3pHuUDHAcRUeyL2VYghZwfpkNIy/+8Ocg0a3UuSoYzavmylwuLWQOf3hl0jjMMIw==}
     engines: {node: '>=8'}
@@ -5911,6 +6290,9 @@ packages:
     resolution: {integrity: sha512-d0II/GO9uf9lfUHH2BQsjxzRJZBdsjgsBiW4BvhWk/3qoKwQFjIDVN19PfX8F2D/r9PCMTtLWjYVCFrpeYUzsw==}
     deprecated: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
 
+  bowser@2.14.1:
+    resolution: {integrity: sha512-tzPjzCxygAKWFOJP011oxFHs57HzIhOEracIgAePE4pqB3LikALKnSzUyU4MGs9/iCEUuHlAJTjTc5M+u7YEGg==}
+
   boxen@8.0.1:
     resolution: {integrity: sha512-F3PH5k5juxom4xktynS7MoFY+NUWH5LC4CnH11YB8NPew+HLpmBLCybSAEyb2F+4pRXhuhWqFesoQd6DAyc2hw==}
     engines: {node: '>=18'}
@@ -5934,8 +6316,8 @@ packages:
   brace-expansion@2.0.2:
     resolution: {integrity: sha512-Jt0vHyM+jmUBqojB7E1NIYadt0vI0Qxjxd2TErW94wDz+E2LAm5vKMXXwg6ZZBTHPuUlDgQHKXvjGBdfcF1ZDQ==}
 
-  brace-expansion@5.0.5:
-    resolution: {integrity: sha512-VZznLgtwhn+Mact9tfiwx64fA9erHH/MCXEUfB/0bX/6Fz6ny5EGTXYltMocqg4xFAQZtnO3DHWWXi8RiuN7cQ==}
+  brace-expansion@5.0.6:
+    resolution: {integrity: sha512-kLpxurY4Z4r9sgMsyG0Z9uzsBlgiU/EFKhj/h91/8yHu0edo7XuixOIH3VcJ8kkxs6/jPzoI6U9Vj3WqbMQ94g==}
     engines: {node: 18 || 20 || >=22}
 
   braces@3.0.3:
@@ -5994,6 +6376,9 @@ packages:
     resolution: {integrity: sha512-Db1SbgBS/fg/392AblrMJk97KggmvYhr4pB5ZIMTWtaivCPMWLkmb7m21cJvpvgK+J3nsU2CmmixNBZx4vFj/w==}
     engines: {node: '>=8.0.0'}
 
+  buffer-equal-constant-time@1.0.1:
+    resolution: {integrity: sha512-zRpUiDwd/xk6ADqPMATG8vc9VPrkck7T07OIx0gnjmJAnHnTVXNQG3vfvWNuiZIkwu9KrKdA1iJKfsfTVxE6NA==}
+
   buffer-equal@0.0.1:
     resolution: {integrity: sha512-RgSV6InVQ9ODPdLWJ5UAqBqJBOg370Nz6ZQtRzpt6nUjc8v0St97uJ4PYC6NztqIScrAXafKM3mZPMygSe1ggA==}
     engines: {node: '>=0.4.0'}
@@ -6806,6 +7191,10 @@ packages:
     resolution: {integrity: sha512-58lmxKSA4BNyLz+HHMUzlOEpg09FV+ev6ZMe3vJihgdxzgcwZ8VoEEPmALCZG9LmqfVoNMMKpttIYTVG6uDY7A==}
     engines: {node: '>=0.3.1'}
 
+  diff@8.0.4:
+    resolution: {integrity: sha512-DPi0FmjiSU5EvQV0++GFDOJ9ASQUVFh5kD+OzOnYdi7n3Wpm9hWWGfB/O2blfHcMVTL5WkQXSnRiK9makhrcnw==}
+    engines: {node: '>=0.3.1'}
+
   diffie-hellman@5.0.3:
     resolution: {integrity: sha512-kqag/Nl+f3GwyK25fhUMYj81BUOrZ9IuJsjIcDE5icNM9FJHAVm3VcUDxdLPoQtTuUylWm6ZIknYJwwaPxsUzg==}
 
@@ -6855,6 +7244,10 @@ packages:
     resolution: {integrity: sha512-zIHwmZPRshsCdpMDyVsqGmgyP0yT8GAgXUnkdAoJisxvf33k7yO6OuoKmcTGuXPWSsm8Oh88nZicRLA9Y0rUeA==}
     engines: {node: '>=12'}
 
+  dotenv-expand@12.0.3:
+    resolution: {integrity: sha512-uc47g4b+4k/M/SeaW1y4OApx+mtLWl92l5LMPP0GNXctZqELk+YGgOPIIC5elYmUH4OuoK3JLhuRUYegeySiFA==}
+    engines: {node: '>=12'}
+
   dotenv@16.4.5:
     resolution: {integrity: sha512-ZmdL2rui+eB2YwhsWzjInR8LldtZHGDoQ1ugH85ppHKwpUHL7j7rN0Ti9NCnGiQbhaZ11FpR+7ao1dNsmduNUg==}
     engines: {node: '>=12'}
@@ -6876,12 +7269,20 @@ packages:
   eastasianwidth@0.2.0:
     resolution: {integrity: sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA==}
 
+  ecdsa-sig-formatter@1.0.11:
+    resolution: {integrity: sha512-nagl3RYrbNv6kQkeJIpt6NJZy8twLB/2vtz6yN9Z4vRKHN4/QZJIEbqohALSgwKdnksuY3k5Addp5lg8sVoVcQ==}
+
   edit-json-file@1.8.1:
     resolution: {integrity: sha512-x8L381+GwqxQejPipwrUZIyAg5gDQ9tLVwiETOspgXiaQztLsrOm7luBW5+Pe31aNezuzDY79YyzF+7viCRPXA==}
 
   ee-first@1.1.1:
     resolution: {integrity: sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==}
 
+  ejs@5.0.1:
+    resolution: {integrity: sha512-COqBPFMxuPTPspXl2DkVYaDS3HtrD1GpzOGkNTJ1IYkifq/r9h8SVEFrjA3D9/VJGOEoMQcrlhpntcSUrM8k6A==}
+    engines: {node: '>=0.12.18'}
+    hasBin: true
+
   electron-to-chromium@1.5.182:
     resolution: {integrity: sha512-Lv65Btwv9W4J9pyODI6EWpdnhfvrve/us5h1WspW8B2Fb0366REPtY3hX7ounk1CkV/TBjWCEvCBBbYbmV0qCA==}
 
@@ -7263,6 +7664,13 @@ packages:
   fast-uri@3.1.0:
     resolution: {integrity: sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==}
 
+  fast-xml-builder@1.2.0:
+    resolution: {integrity: sha512-00aAWieqff+ZJhsXA4g1g7M8k+7AYoMUUHF+/zFb5U6Uv/P0Vl4QZo84/IcufzYalLuEj9928bXN9PbbFzMF0Q==}
+
+  fast-xml-parser@5.7.3:
+    resolution: {integrity: sha512-C0AaNuC+mscy6vrAQKAc/rMq+zAPHodfHGZu4sGVehvAQt/JLG1O5zEcYcXSY5zSqr4YVgxsB+pHXTq0i7eDlg==}
+    hasBin: true
+
   fastq@1.19.1:
     resolution: {integrity: sha512-GwLTyxkCXjXbxqIhTsMI2Nui8huMPtnxg7krajPJAjnEG/iiOS7i+zCtWGZR9G0NBKbXKh6X9m9UIsYX/N6vvQ==}
 
@@ -7426,6 +7834,15 @@ packages:
       debug:
         optional: true
 
+  follow-redirects@1.16.0:
+    resolution: {integrity: sha512-y5rN/uOsadFT/JfYwhxRS5R7Qce+g3zG97+JrtFZlC9klX/W5hD7iiLzScI4nZqUS7DNUdhPgw4xI8W2LuXlUw==}
+    engines: {node: '>=4.0'}
+    peerDependencies:
+      debug: '*'
+    peerDependenciesMeta:
+      debug:
+        optional: true
+
   for-each@0.3.5:
     resolution: {integrity: sha512-dKx12eRCVIzqCxFGplyFKJMPvLEWgmNtUrpTiJIR5u97zEhRG8ySrtboPHZXx7daLxQVrl643cTzbab2tkQjxg==}
     engines: {node: '>= 0.4'}
@@ -7542,6 +7959,14 @@ packages:
     resolution: {integrity: sha512-HmKyTFGomdAchz4umx8MwBnrnfFmdpwiTyGA4ZOF7rya2Lmgbc9qate4yweInL+0gUBVImhaz12SBGpW3SY4Yg==}
     engines: {node: '>=22.12.0'}
 
+  gaxios@7.1.4:
+    resolution: {integrity: sha512-bTIgTsM2bWn3XklZISBTQX7ZSddGW+IO3bMdGaemHZ3tbqExMENHLx6kKZ/KlejgrMtj8q7wBItt51yegqalrA==}
+    engines: {node: '>=18'}
+
+  gcp-metadata@8.1.2:
+    resolution: {integrity: sha512-zV/5HKTfCeKWnxG0Dmrw51hEWFGfcF2xiXqcA3+J90WDuP0SvoiSO5ORvcBsifmx/FoIjgQN3oNOGaQ5PhLFkg==}
+    engines: {node: '>=18'}
+
   generate-function@2.3.1:
     resolution: {integrity: sha512-eeB5GfMNeevm/GRYq20ShmsaGcmI81kIX2K9XQx5miC8KdHaC6Jm0qQ8ZNeGOi7wYB8OsdxKs+Y2oVuTFuVwKQ==}
 
@@ -7564,6 +7989,10 @@ packages:
     resolution: {integrity: sha512-QZjmEOC+IT1uk6Rx0sX22V6uHWVwbdbxf1faPqJ1QhLdGgsRGCZoyaQBm/piRdJy/D2um6hM1UP7ZEeQ4EkP+Q==}
     engines: {node: '>=18'}
 
+  get-east-asian-width@1.6.0:
+    resolution: {integrity: sha512-QRbvDIbx6YklUe6RxeTeleMR0yv3cYH6PsPZHcnVn7xv7zO1BHN8r0XETu8n6Ye3Q+ahtSarc3WgtNWmehIBfA==}
+    engines: {node: '>=18'}
+
   get-intrinsic@1.3.0:
     resolution: {integrity: sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==}
     engines: {node: '>= 0.4'}
@@ -7707,6 +8136,14 @@ packages:
     resolution: {integrity: sha512-Y1zNGV+pzQdh7H39l9zgB4PJqjRNqydvdYCDG4HFXM4XuvSaQQlEc91IU1yALL8gUTDomgBAfz3XJdmUS+oo0w==}
     engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
 
+  google-auth-library@10.6.2:
+    resolution: {integrity: sha512-e27Z6EThmVNNvtYASwQxose/G57rkRuaRbQyxM2bvYLLX/GqWZ5chWq2EBoUchJbCc57eC9ArzO5wMsEmWftCw==}
+    engines: {node: '>=18'}
+
+  google-logging-utils@1.1.3:
+    resolution: {integrity: sha512-eAmLkjDjAFCVXg7A1unxHsLf961m6y17QFqXqAXGj/gVkKFrEICfStRfwUlGNfeCEjNRa32JEWOUTlYXPyyKvA==}
+    engines: {node: '>=14'}
+
   gopd@1.2.0:
     resolution: {integrity: sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==}
     engines: {node: '>= 0.4'}
@@ -7850,6 +8287,10 @@ packages:
   hosted-git-info@2.8.9:
     resolution: {integrity: sha512-mxIDAb9Lsm6DoOJ7xH+5+X4y1LU/4Hi50L9C5sIswK3JzULS4bwk1FvjdBgvYR4bzT4tuUQiC15FE2f5HbLvYw==}
 
+  hosted-git-info@9.0.3:
+    resolution: {integrity: sha512-Hc+ghLoSt6QaYZUv0WBiIvmMDZuZZ7oaDvdH8MbfOO4lOsxdXLEvuC6ePoGs9H1X9oCLyq6+NVN0MKqD+ydxyg==}
+    engines: {node: ^20.17.0 || >=22.9.0}
+
   html-encoding-sniffer@3.0.0:
     resolution: {integrity: sha512-oWv4T4yJ52iKrufjnyZPkrN0CH3QnrUqdB6In1g5Fe1mia8GmF36gnfNySxoZtxD5+NmYw1EElVXiBk93UeskA==}
     engines: {node: '>=12'}
@@ -8411,6 +8852,10 @@ packages:
     resolution: {integrity: sha512-ekilCSN1jwRvIbgeg/57YFh8qQDNbwDb9xT/qu2DAHbFFZUicIl4ygVaAvzveMhMVr3LnpSKTNnwt8PoOfmKhQ==}
     hasBin: true
 
+  jiti@2.7.0:
+    resolution: {integrity: sha512-AC/7JofJvZGrrneWNaEnJeOLUx+JlGt7tNa0wZiRPT4MY1wmfKjt2+6O2p2uz2+skll8OZZmJMNqeke7kKbNgQ==}
+    hasBin: true
+
   jju@1.4.0:
     resolution: {integrity: sha512-8wb9Yw966OSxApiCt0K3yNJL8pnNeIv+OEq2YMidz4FKP6nonSRoOXc80iXY4JaN2FC11B9qsNmDsm+ZOfMROA==}
 
@@ -8434,10 +8879,6 @@ packages:
     resolution: {integrity: sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==}
     hasBin: true
 
-  js-yaml@4.1.1:
-    resolution: {integrity: sha512-qQKT4zQxXl8lLwBtHMWwaTcGfFOZviOJet3Oy/xmGk2gZH677CJM9EvtfdSkgWcATZhj/55JZ0rmy3myCT5lsA==}
-    hasBin: true
-
   jsbn@1.1.0:
     resolution: {integrity: sha512-4bYVV3aAMtDTTu4+xsDYa6sy9GyJ69/amsu9sYF2zqjiEoZA5xJi3BrfX3uY+/IekIu7MwdObdbDWpoZdBv3/A==}
 
@@ -8459,6 +8900,9 @@ packages:
     resolution: {integrity: sha512-r79EVB8jaNAZbq8hvanL8e8JGu2ZNr2bXdHC4ZdQhRImpSPpnWwm5DYVzQ5QxJmtGtKhNNuvqGgbNaFl604fEQ==}
     engines: {node: '>=6'}
 
+  json-bigint@1.0.0:
+    resolution: {integrity: sha512-SiPv/8VpZuWbvLSMtTDU8hEfrZWg/mH/nV/b4o0CYbSxu1UIQPLdwKOCIyLQX+VIPO5vrLX3i8qtqFyhdPSUSQ==}
+
   json-buffer@3.0.1:
     resolution: {integrity: sha512-4bV5BfR2mqfQTJm+V5tPPdf+ZpuhiIvTuAB5g8kcrXOZpTT/QwwVRWBywX1ozr6lEuPdbHxwaJlm9G6mI2sfSQ==}
 
@@ -8472,6 +8916,10 @@ packages:
   json-parse-even-better-errors@2.3.1:
     resolution: {integrity: sha512-xyFwyhro/JEof6Ghe2iz2NcXoj2sloNsWr/XsERDK/oiPCfaNhl5ONfp+jQdAZRQQ0IJWNzH9zIZF7li91kh2w==}
 
+  json-schema-to-ts@3.1.1:
+    resolution: {integrity: sha512-+DWg8jCJG2TEnpy7kOm/7/AxaYoaRbjVB4LFZLySZlWn8exGs3A4OLJR966cVvU26N7X9TWxl+Jsw7dzAqKT6g==}
+    engines: {node: '>=16'}
+
   json-schema-traverse@0.4.1:
     resolution: {integrity: sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==}
 
@@ -8526,6 +8974,12 @@ packages:
     resolution: {integrity: sha512-Qush0uP+G8ZScpGMZvHUiRfI0YBWuB3gVBYlI0v0vvOJt5FLicco+IkP0a50LqTTQhmts/m6tP5SWE+USyIvcQ==}
     engines: {node: '>=12.20'}
 
+  jwa@2.0.1:
+    resolution: {integrity: sha512-hRF04fqJIP8Abbkq5NKGN0Bbr3JxlQ+qhZufXVr0DvujKy93ZCbXZMHDL4EOtodSbCWxOqR8MS1tXA5hwqCXDg==}
+
+  jws@4.0.1:
+    resolution: {integrity: sha512-EKI/M/yqPncGUUh44xz0PxSidXFr/+r0pA70+gIYhjv+et7yxM+s29Y+VGDkovRofQem0fs7Uvf4+YmAdyRduA==}
+
   keyv@4.5.4:
     resolution: {integrity: sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==}
 
@@ -8803,10 +9257,6 @@ packages:
   lru-cache@10.4.3:
     resolution: {integrity: sha512-JNAzZcXrCt42VGLuYz0zfAzDfAvJWW6AfYlDBQyDV5DClI2m5sAmK+OIO7s59XfsRsWHp02jAJrRadPRGTt6SQ==}
 
-  lru-cache@11.0.2:
-    resolution: {integrity: sha512-123qHRfJBmo2jXDbo/a5YOQrJoHF/GNQTLzQ5+IdK5pWpceK17yRc6ozlWd25FxvGKQbIUs91fDFkXmDHTKcyA==}
-    engines: {node: 20 || >=22}
-
   lru-cache@11.3.5:
     resolution: {integrity: sha512-NxVFwLAnrd9i7KUBxC4DrUhmgjzOs+1Qm50D3oF1/oL+r1NpZ4gA7xvG0/zJ8evR7zIKn4vLf7qTNduWFtCrRw==}
     engines: {node: 20 || >=22}
@@ -8871,6 +9321,11 @@ packages:
   markdown-table@3.0.4:
     resolution: {integrity: sha512-wiYz4+JrLyb/DqW2hkFJxP7Vd7JuTDm77fvbM8VfEQdmSMqcImWeeRbHwZjBjIFki/VaMK2BhFi7oUUZeM5bqw==}
 
+  marked@15.0.12:
+    resolution: {integrity: sha512-8dD6FusOQSrpv9Z1rdNMdlSgQOIP880DHqnohobOmYLElGEqAL/JvxvuxZO16r4HtjTlfPRDC1hbvxC9dPN2nA==}
+    engines: {node: '>= 18'}
+    hasBin: true
+
   marky@1.3.0:
     resolution: {integrity: sha512-ocnPZQLNpvbedwTy9kNrQEsknEfgvcLMvOtz3sFeWApDq1MXH1TqkCIx58xlpESsfwQOnuBO9beyQuNGzVvuhQ==}
 
@@ -9441,8 +9896,20 @@ packages:
       '@swc/core':
         optional: true
 
-  object-assign@4.1.1:
-    resolution: {integrity: sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==}
+  nx@22.7.5:
+    resolution: {integrity: sha512-zoxsJabb33jl1QYnalDn0bicryrEBgSzdKp90d7VGGv/jDgzKrcLg/hw2ZxeYiOjWPIT/o8QNT9G9vTs4dv3AQ==}
+    hasBin: true
+    peerDependencies:
+      '@swc-node/register': ^1.11.1
+      '@swc/core': ^1.15.8
+    peerDependenciesMeta:
+      '@swc-node/register':
+        optional: true
+      '@swc/core':
+        optional: true
+
+  object-assign@4.1.1:
+    resolution: {integrity: sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==}
     engines: {node: '>=0.10.0'}
 
   object-inspect@1.13.4:
@@ -9509,6 +9976,18 @@ packages:
     resolution: {integrity: sha512-7x81NCL719oNbsq/3mh+hVrAWmFuEYUqrq/Iw3kUzH8ReypT9QQ0BLoJS7/G9k6N81XjW4qHWtjWwe/9eLy1EQ==}
     engines: {node: '>=12'}
 
+  openai@6.26.0:
+    resolution: {integrity: sha512-zd23dbWTjiJ6sSAX6s0HrCZi41JwTA1bQVs0wLQPZ2/5o2gxOJA5wh7yOAUgwYybfhDXyhwlpeQf7Mlgx8EOCA==}
+    hasBin: true
+    peerDependencies:
+      ws: ^8.18.0
+      zod: ^3.25 || ^4.0
+    peerDependenciesMeta:
+      ws:
+        optional: true
+      zod:
+        optional: true
+
   openai@6.3.0:
     resolution: {integrity: sha512-E6vOGtZvdcb4yXQ5jXvDlUG599OhIkb/GjBLZXS+qk0HF+PJReIldEc9hM8Ft81vn+N6dRdFRb7BZNK8bbvXrw==}
     hasBin: true
@@ -9723,6 +10202,9 @@ packages:
     resolution: {integrity: sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==}
     engines: {node: '>= 0.8'}
 
+  partial-json@0.1.7:
+    resolution: {integrity: sha512-Njv/59hHaokb/hRUjce3Hdv12wd60MtM9Z5Olmn+nehe0QDAsRtRbJPvJ0Z91TusF0SuZRIvnM+S4l6EIP8leA==}
+
   path-browserify@1.0.1:
     resolution: {integrity: sha512-b7uo2UCUOYZcnF/3ID0lulOJi/bafxa1xPe7ZPsammBSpjSWQkjNxlt635YGS2MiR9GjvuXCtz2emr3jbsz98g==}
 
@@ -9738,6 +10220,10 @@ packages:
     resolution: {integrity: sha512-RjhtfwJOxzcFmNOi6ltcbcu4Iu+FL3zEj83dk4kAS+fVpTxXLO1b38RvJgT/0QwvV/L3aY9TAnyv0EOqW4GoMQ==}
     engines: {node: ^12.20.0 || ^14.13.1 || >=16.0.0}
 
+  path-expression-matcher@1.5.0:
+    resolution: {integrity: sha512-cbrerZV+6rvdQrrD+iGMcZFEiiSrbv9Tfdkvnusy6y0x0GKBXREFg/Y65GhIfm0tnLntThhzCnfKwp1WRjeCyQ==}
+    engines: {node: '>=14.0.0'}
+
   path-is-absolute@1.0.1:
     resolution: {integrity: sha512-AVbw3UJ2e9bq64vSaS9Am0fje1Pa8pbGqTTsmXfaIiMpnr5DlDhfJOuLj9Sf95ZPVDAUerDfEk88MPmPe7UCQg==}
     engines: {node: '>=0.10.0'}
@@ -9976,6 +10462,9 @@ packages:
     resolution: {integrity: sha512-NV8aTmpwrZv+Iys54sSFOBx3tuVaOBvvrft5PNppnxy9xpU/akHbaWIril22AB22zaPgrgwKdD0KsrM0ptUtpg==}
     engines: {node: '>=6'}
 
+  proper-lockfile@4.1.2:
+    resolution: {integrity: sha512-TjNPblN4BwAWMXU8s9AEz4JmQxnD1NNL7bNOY/AKUzyamc379FWASUhc/K1pL2noVb+XmZKLL68cjzLsiOAMaA==}
+
   property-information@6.5.0:
     resolution: {integrity: sha512-PgTgs/BlvHxOu8QuEN7wi5A0OmXaBcHpmCSTehcs6Uuu9IkDIEo13Hy7n898RHfrQ49vKCoGeWZSaAK01nwVig==}
 
@@ -9985,6 +10474,10 @@ packages:
   proto-list@1.2.4:
     resolution: {integrity: sha512-vtK/94akxsTMhe0/cbfpR+syPuszcuwhqVjJq26CuNDgFGj682oRBXOP5MJpv2r7JtE8MsiepGIqvvOTBwn2vA==}
 
+  protobufjs@7.6.2:
+    resolution: {integrity: sha512-N9EiLovGEQOJSPF26Ij7qUGvahfEnq0eeYZ02aigIedkmz1qZSwjnP9SBITHJuF/6MYbIW4HDN8zdYjsjqJKXQ==}
+    engines: {node: '>=12.0.0'}
+
   protobufjs@8.0.0:
     resolution: {integrity: sha512-jx6+sE9h/UryaCZhsJWbJtTEy47yXoGNYI4z8ZaRncM0zBKeRqjO2JEcOUYwrYGb1WLhXM1FfMzW3annvFv0rw==}
     engines: {node: '>=12.0.0'}
@@ -10000,6 +10493,10 @@ packages:
   proxy-from-env@1.1.0:
     resolution: {integrity: sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==}
 
+  proxy-from-env@2.1.0:
+    resolution: {integrity: sha512-cJ+oHTW1VAEa8cJslgmUZrc+sjRKgAKl3Zyse6+PV38hZe/V6Z14TbCuXcan9F9ghlz4QrFr2c92TNF82UkYHA==}
+    engines: {node: '>=10'}
+
   prr@1.0.1:
     resolution: {integrity: sha512-yPw4Sng1gWghHQWj0B3ZggWUm4qVbPwPFcRG8KyxiU7J2OHFSoEHKS+EZ3fv5l1t9CyCiop6l/ZYeWbrgoQejw==}
 
@@ -10963,6 +11460,11 @@ packages:
     engines: {node: '>=10'}
     hasBin: true
 
+  semver@7.7.4:
+    resolution: {integrity: sha512-vFKC2IEtQnVhpT78h1Yp8wzwrf8CM+MzKMHGJZfBtzhZNycRFnXsHk6E5TxIkkMsgNS7mdX3AGB7x2QM2di4lA==}
+    engines: {node: '>=10'}
+    hasBin: true
+
   send@0.19.0:
     resolution: {integrity: sha512-dW41u5VfLXu8SJh5bwRmyYUbAoSB3c9uQh6L8h/KtsFREPWpbX1lrljJo186Jc4nmci/sGUZ9a0a0J2zgfq2hw==}
     engines: {node: '>= 0.8.0'}
@@ -11121,6 +11623,10 @@ packages:
     engines: {node: '>=6'}
     hasBin: true
 
+  smol-toml@1.6.1:
+    resolution: {integrity: sha512-dWUG8F5sIIARXih1DTaQAX4SsiTXhInKf1buxdY9DIg4ZYPZK5nGM1VRIYmEbDbsHt7USo99xSLFu5Q1IqTmsg==}
+    engines: {node: '>= 18'}
+
   snake-case@3.0.4:
     resolution: {integrity: sha512-LAOh4z89bGQvl9pFfNF8V146i7o7/CqFPbqzYgP+yYzDIDeS9HaNFtXABamRW+AQzEVODcvE79ljJ+8a9YSdMg==}
 
@@ -11363,6 +11869,9 @@ packages:
     resolution: {integrity: sha512-1tB5mhVo7U+ETBKNf92xT4hrQa3pm0MZ0PQvuDnWgAAGHDsfp4lPSpiS6psrSiet87wyGPh9ft6wmhOMQ0hDiw==}
     engines: {node: '>=14.16'}
 
+  strnum@2.3.0:
+    resolution: {integrity: sha512-ums3KNd42PGyx5xaoVTO1mjU1bH3NpY4vsrVlnv9PNGqQj8wd7rJ6nEypLrJ7z5vxK5RP0yMLo6J/Gsm62DI5Q==}
+
   strtok3@6.3.0:
     resolution: {integrity: sha512-fZtbhtvI9I48xDSywd/somNqgUHl2L2cstmXCCif0itOf96jeW18MBSyrLuNicYQVkvpOxkZtkzujiTJ9LW5Jw==}
     engines: {node: '>=10'}
@@ -11586,6 +12095,10 @@ packages:
     resolution: {integrity: sha512-voyz6MApa1rQGUxT3E+BK7/ROe8itEx7vD8/HEvt4xwXucvQ5G5oeEiHkmHZJuBO21RpOf+YYm9MOivj709jow==}
     engines: {node: '>=14.14'}
 
+  tmp@0.2.6:
+    resolution: {integrity: sha512-5sJPdPjfI5Kx+qbrDesxkglRBxW//g7hCsqspEjwkewGvBMGIKMOTKzLt1hFVJzyadba3lDUN20O9qhvbQUSTA==}
+    engines: {node: '>=14.14'}
+
   tn1150@0.1.0:
     resolution: {integrity: sha512-DbplOfQFkqG5IHcDyyrs/lkvSr3mPUVsFf/RbDppOshs22yTPnSJWEe6FkYd1txAwU/zcnR905ar2fi4kwF29w==}
     engines: {node: '>=0.12'}
@@ -11646,6 +12159,9 @@ packages:
   truncate-utf8-bytes@1.0.2:
     resolution: {integrity: sha512-95Pu1QXQvruGEhv62XCMO3Mm90GscOCClvrIUwCM0PYOXK3kaF3l3sIHxx71ThJfcbM2O5Au6SO3AWCSEfW4mQ==}
 
+  ts-algebra@2.0.0:
+    resolution: {integrity: sha512-FPAhNPFMrkwz76P7cdjdmiShwMynZYN6SgOujD1urY4oNm80Ou9oMdmbR45LotcKOXoy7wSmHkRFE6Mxbrhefw==}
+
   ts-checker-rspack-plugin@1.2.2:
     resolution: {integrity: sha512-I9TV5+vg9PfHgdWgmn2J2APfZ4YCszfo7hytKQXJ0bJsxR/MMRRsfyyc2cHCTDS7pyhosdug9WthVWiYdeGYtA==}
     peerDependencies:
@@ -11740,6 +12256,9 @@ packages:
     resolution: {integrity: sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==}
     engines: {node: '>= 0.6'}
 
+  typebox@1.1.38:
+    resolution: {integrity: sha512-pZ0aQPmMmXoUvSbeuWf/Hzsc+avNw/Zd6VeE8CFgkVGWyuHPJvqeJJDeJqLve+K70LvjYIoleGcoJHPT17cWoA==}
+
   typed-array-buffer@1.0.3:
     resolution: {integrity: sha512-nAYYwfY3qnzX30IkA6AQZjVbtK6duGontcQm1WSG1MD94YLqK0515GNApXkoxKOWMusVssAHWLh9SeaoefYFGw==}
     engines: {node: '>= 0.4'}
@@ -11793,14 +12312,14 @@ packages:
     resolution: {integrity: sha512-hU/10obOIu62MGYjdskASR3CUAiYaFTtC9Pa6vHyf//mAipSvSQg6od2CnJswq7fvzNS3zJhxoRkgNVaHurWKw==}
     engines: {node: '>=18.17'}
 
-  undici@7.16.0:
-    resolution: {integrity: sha512-QEg3HPMll0o3t2ourKwOeUAZ159Kn9mx5pnzHRQO8+Wixmh88YdZRiIwat0iNzNNXn0yoEtXJqFpyW7eM8BV7g==}
-    engines: {node: '>=20.18.1'}
-
   undici@7.25.0:
     resolution: {integrity: sha512-xXnp4kTyor2Zq+J1FfPI6Eq3ew5h6Vl0F/8d9XU5zZQf1tX9s2Su1/3PiMmUANFULpmksxkClamIZcaUqryHsQ==}
     engines: {node: '>=20.18.1'}
 
+  undici@8.3.0:
+    resolution: {integrity: sha512-TkUDgb6tl7KOGZ+7e8E3d2FYgUQgF6z5YypqjWmixVQSQERFcVrVg0ySADm2LVLRh5ljAaHTCR5Fmz3Q34rB7Q==}
+    engines: {node: '>=22.19.0'}
+
   unhead@2.1.13:
     resolution: {integrity: sha512-jO9M1sI6b2h/1KpIu4Jeu+ptumLmUKboRRLxys5pYHFeT+lqTzfNHbYUX9bxVDhC1FBszAGuWcUVlmvIPsah8Q==}
 
@@ -12263,6 +12782,10 @@ packages:
     resolution: {integrity: sha512-EvGK8EJ3DhaHfbRlETOWAS5pO9MZITeauHKJyb8wyajUfQUenkIg2MvLDTZ4T/TgIcm3HU0TFBgWWboAZ30UHg==}
     engines: {node: '>=18'}
 
+  xml-naming@0.1.0:
+    resolution: {integrity: sha512-k8KO9hrMyNk6tUWqUfkTEZbezRRpONVOzUTnc97VnCvyj6Tf9lyUR9EDAIeiVLv56jsMcoXEwjW8Kv5yPY52lw==}
+    engines: {node: '>=16.0.0'}
+
   xml-parse-from-string@1.0.1:
     resolution: {integrity: sha512-ErcKwJTF54uRzzNMXq2X5sMIy88zJvfN2DmdoQvy7PAFJ+tPRU6ydWuOKNMyfmOjdyBQTFREi60s0Y0SyI0G0g==}
 
@@ -12318,6 +12841,11 @@ packages:
     engines: {node: '>= 14.6'}
     hasBin: true
 
+  yaml@2.9.0:
+    resolution: {integrity: sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA==}
+    engines: {node: '>= 14.6'}
+    hasBin: true
+
   yargs-parser@18.1.3:
     resolution: {integrity: sha512-o50j0JeToy/4K6OZcaQmW6lyXXKhq7csREXcDwk2omFPJEwUNOVtJKvmDr9EI1fAJZUyZcRF7kxGBWmRXudrCQ==}
     engines: {node: '>=6'}
@@ -12391,6 +12919,11 @@ packages:
     peerDependencies:
       zod: ^3.24.1
 
+  zod-to-json-schema@3.25.2:
+    resolution: {integrity: sha512-O/PgfnpT1xKSDeQYSCfRI5Gy3hPf91mKVDuYLUHZJMiDFptvP41MSnWofm8dnCm0256ZNfZIM7DSzuSMAFnjHA==}
+    peerDependencies:
+      zod: ^3.25.28 || ^4
+
   zod@3.25.76:
     resolution: {integrity: sha512-gzUt/qt81nXsFGKIFcC3YnfEAx5NkunCfnDlvuBSSFS02bcXu4Lmea0AFIUwbLWxWPx3d9p8S5QoaujKcNQxcQ==}
 
@@ -12428,14 +12961,14 @@ snapshots:
   '@ant-design/cssinjs-utils@1.1.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
       '@ant-design/cssinjs': 1.21.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
       react-dom: 18.3.1(react@18.3.1)
 
   '@ant-design/cssinjs@1.21.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@emotion/hash': 0.8.0
       '@emotion/unitless': 0.7.5
       classnames: 2.5.1
@@ -12463,13 +12996,19 @@ snapshots:
 
   '@ant-design/react-slick@1.1.2(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       json2mq: 0.2.0
       react: 18.3.1
       resize-observer-polyfill: 1.5.1
       throttle-debounce: 5.0.2
 
+  '@anthropic-ai/sdk@0.91.1(zod@3.25.76)':
+    dependencies:
+      json-schema-to-ts: 3.1.1
+    optionalDependencies:
+      zod: 3.25.76
+
   '@appium/logger@1.6.1':
     dependencies:
       console-control-strings: 1.1.0
@@ -12591,6 +13130,229 @@ snapshots:
       '@ast-grep/napi-win32-ia32-msvc': 0.37.0
       '@ast-grep/napi-win32-x64-msvc': 0.37.0
 
+  '@aws-crypto/crc32@5.2.0':
+    dependencies:
+      '@aws-crypto/util': 5.2.0
+      '@aws-sdk/types': 3.973.10
+      tslib: 2.8.1
+
+  '@aws-crypto/sha256-browser@5.2.0':
+    dependencies:
+      '@aws-crypto/sha256-js': 5.2.0
+      '@aws-crypto/supports-web-crypto': 5.2.0
+      '@aws-crypto/util': 5.2.0
+      '@aws-sdk/types': 3.973.10
+      '@aws-sdk/util-locate-window': 3.965.5
+      '@smithy/util-utf8': 2.3.0
+      tslib: 2.8.1
+
+  '@aws-crypto/sha256-js@5.2.0':
+    dependencies:
+      '@aws-crypto/util': 5.2.0
+      '@aws-sdk/types': 3.973.10
+      tslib: 2.8.1
+
+  '@aws-crypto/supports-web-crypto@5.2.0':
+    dependencies:
+      tslib: 2.8.1
+
+  '@aws-crypto/util@5.2.0':
+    dependencies:
+      '@aws-sdk/types': 3.973.10
+      '@smithy/util-utf8': 2.3.0
+      tslib: 2.8.1
+
+  '@aws-sdk/client-bedrock-runtime@3.1048.0':
+    dependencies:
+      '@aws-crypto/sha256-browser': 5.2.0
+      '@aws-crypto/sha256-js': 5.2.0
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/credential-provider-node': 3.972.50
+      '@aws-sdk/eventstream-handler-node': 3.972.19
+      '@aws-sdk/middleware-eventstream': 3.972.15
+      '@aws-sdk/middleware-websocket': 3.972.25
+      '@aws-sdk/token-providers': 3.1048.0
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/fetch-http-handler': 5.4.6
+      '@smithy/node-http-handler': 4.7.3
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/core@3.974.17':
+    dependencies:
+      '@aws-sdk/types': 3.973.10
+      '@aws-sdk/xml-builder': 3.972.27
+      '@aws/lambda-invoke-store': 0.2.4
+      '@smithy/core': 3.24.6
+      '@smithy/signature-v4': 5.4.6
+      '@smithy/types': 4.14.3
+      bowser: 2.14.1
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-env@3.972.43':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-http@3.972.45':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/fetch-http-handler': 5.4.6
+      '@smithy/node-http-handler': 4.7.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-ini@3.972.48':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/credential-provider-env': 3.972.43
+      '@aws-sdk/credential-provider-http': 3.972.45
+      '@aws-sdk/credential-provider-login': 3.972.47
+      '@aws-sdk/credential-provider-process': 3.972.43
+      '@aws-sdk/credential-provider-sso': 3.972.47
+      '@aws-sdk/credential-provider-web-identity': 3.972.47
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/credential-provider-imds': 4.3.7
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-login@3.972.47':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-node@3.972.50':
+    dependencies:
+      '@aws-sdk/credential-provider-env': 3.972.43
+      '@aws-sdk/credential-provider-http': 3.972.45
+      '@aws-sdk/credential-provider-ini': 3.972.48
+      '@aws-sdk/credential-provider-process': 3.972.43
+      '@aws-sdk/credential-provider-sso': 3.972.47
+      '@aws-sdk/credential-provider-web-identity': 3.972.47
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/credential-provider-imds': 4.3.7
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-process@3.972.43':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-sso@3.972.47':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/token-providers': 3.1060.0
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/credential-provider-web-identity@3.972.47':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/eventstream-handler-node@3.972.19':
+    dependencies:
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/middleware-eventstream@3.972.15':
+    dependencies:
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/middleware-websocket@3.972.25':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/fetch-http-handler': 5.4.6
+      '@smithy/signature-v4': 5.4.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/nested-clients@3.997.15':
+    dependencies:
+      '@aws-crypto/sha256-browser': 5.2.0
+      '@aws-crypto/sha256-js': 5.2.0
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/signature-v4-multi-region': 3.996.31
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/fetch-http-handler': 5.4.6
+      '@smithy/node-http-handler': 4.7.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/signature-v4-multi-region@3.996.31':
+    dependencies:
+      '@aws-sdk/types': 3.973.10
+      '@smithy/signature-v4': 5.4.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/token-providers@3.1048.0':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/token-providers@3.1060.0':
+    dependencies:
+      '@aws-sdk/core': 3.974.17
+      '@aws-sdk/nested-clients': 3.997.15
+      '@aws-sdk/types': 3.973.10
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/types@3.973.10':
+    dependencies:
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@aws-sdk/util-locate-window@3.965.5':
+    dependencies:
+      tslib: 2.8.1
+
+  '@aws-sdk/xml-builder@3.972.27':
+    dependencies:
+      '@smithy/types': 4.14.3
+      fast-xml-parser: 5.7.3
+      tslib: 2.8.1
+
+  '@aws/lambda-invoke-store@0.2.4': {}
+
   '@babel/code-frame@7.26.2':
     dependencies:
       '@babel/helper-validator-identifier': 7.27.1
@@ -13176,11 +13938,79 @@ snapshots:
 
   '@discoveryjs/json-ext@0.5.7': {}
 
+  '@earendil-works/pi-agent-core@0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)':
+    dependencies:
+      '@earendil-works/pi-ai': 0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      ignore: 7.0.5
+      typebox: 1.1.38
+      yaml: 2.9.0
+    transitivePeerDependencies:
+      - '@modelcontextprotocol/sdk'
+      - bufferutil
+      - supports-color
+      - utf-8-validate
+      - ws
+      - zod
+
+  '@earendil-works/pi-ai@0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)':
+    dependencies:
+      '@anthropic-ai/sdk': 0.91.1(zod@3.25.76)
+      '@aws-sdk/client-bedrock-runtime': 3.1048.0
+      '@google/genai': 1.52.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+      '@mistralai/mistralai': 2.2.1(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+      '@smithy/node-http-handler': 4.7.3
+      http-proxy-agent: 7.0.2
+      https-proxy-agent: 7.0.6
+      openai: 6.26.0(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      partial-json: 0.1.7
+      typebox: 1.1.38
+    transitivePeerDependencies:
+      - '@modelcontextprotocol/sdk'
+      - bufferutil
+      - supports-color
+      - utf-8-validate
+      - ws
+      - zod
+
+  '@earendil-works/pi-coding-agent@0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)':
+    dependencies:
+      '@earendil-works/pi-agent-core': 0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      '@earendil-works/pi-ai': 0.78.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76)
+      '@earendil-works/pi-tui': 0.78.0
+      '@silvia-odwyer/photon-node': 0.3.4
+      chalk: 5.6.2
+      cross-spawn: 7.0.6
+      diff: 8.0.4
+      glob: 13.0.6
+      highlight.js: 10.7.3
+      hosted-git-info: 9.0.3
+      ignore: 7.0.5
+      jiti: 2.7.0
+      minimatch: 10.2.5
+      proper-lockfile: 4.1.2
+      typebox: 1.1.38
+      undici: 8.3.0
+      yaml: 2.9.0
+    optionalDependencies:
+      '@mariozechner/clipboard': 0.3.9
+    transitivePeerDependencies:
+      - '@modelcontextprotocol/sdk'
+      - bufferutil
+      - supports-color
+      - utf-8-validate
+      - ws
+      - zod
+
+  '@earendil-works/pi-tui@0.78.0':
+    dependencies:
+      get-east-asian-width: 1.6.0
+      marked: 15.0.12
+
   '@electron/asar@4.2.0':
     dependencies:
       commander: 13.1.0
       glob: 13.0.6
-      minimatch: 10.0.3
+      minimatch: 10.2.5
 
   '@electron/get@2.0.3':
     dependencies:
@@ -13274,30 +14104,43 @@ snapshots:
     dependencies:
       '@emnapi/wasi-threads': 1.2.1
       tslib: 2.8.1
-    optional: true
+
+  '@emnapi/core@1.4.5':
+    dependencies:
+      '@emnapi/wasi-threads': 1.0.4
+      tslib: 2.8.1
 
   '@emnapi/core@1.7.1':
     dependencies:
       '@emnapi/wasi-threads': 1.1.0
       tslib: 2.8.1
+    optional: true
 
   '@emnapi/runtime@1.10.0':
     dependencies:
       tslib: 2.8.1
-    optional: true
+
+  '@emnapi/runtime@1.4.5':
+    dependencies:
+      tslib: 2.8.1
 
   '@emnapi/runtime@1.7.1':
     dependencies:
       tslib: 2.8.1
+    optional: true
+
+  '@emnapi/wasi-threads@1.0.4':
+    dependencies:
+      tslib: 2.8.1
 
   '@emnapi/wasi-threads@1.1.0':
     dependencies:
       tslib: 2.8.1
+    optional: true
 
   '@emnapi/wasi-threads@1.2.1':
     dependencies:
       tslib: 2.8.1
-    optional: true
 
   '@emotion/hash@0.8.0': {}
 
@@ -13556,6 +14399,17 @@ snapshots:
 
   '@fregante/relaxed-json@2.0.0': {}
 
+  '@google/genai@1.52.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)':
+    dependencies:
+      google-auth-library: 10.6.2
+      p-retry: 4.6.2
+      protobufjs: 7.6.2
+      ws: 8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+    transitivePeerDependencies:
+      - bufferutil
+      - supports-color
+      - utf-8-validate
+
   '@humanwhocodes/config-array@0.13.0':
     dependencies:
       '@humanwhocodes/object-schema': 2.0.3
@@ -13975,11 +14829,13 @@ snapshots:
     optionalDependencies:
       '@types/node': 18.19.62
 
-  '@isaacs/balanced-match@4.0.1': {}
+  '@isaacs/balanced-match@4.0.1':
+    optional: true
 
   '@isaacs/brace-expansion@5.0.1':
     dependencies:
       '@isaacs/balanced-match': 4.0.1
+    optional: true
 
   '@isaacs/cliui@8.0.2':
     dependencies:
@@ -14335,6 +15191,50 @@ snapshots:
       js-yaml: 4.1.0
       tinyglobby: 0.2.15
 
+  '@mariozechner/clipboard-darwin-arm64@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-darwin-universal@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-darwin-x64@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-linux-arm64-gnu@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-linux-arm64-musl@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-linux-riscv64-gnu@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-linux-x64-gnu@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-linux-x64-musl@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-win32-arm64-msvc@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard-win32-x64-msvc@0.3.9':
+    optional: true
+
+  '@mariozechner/clipboard@0.3.9':
+    optionalDependencies:
+      '@mariozechner/clipboard-darwin-arm64': 0.3.9
+      '@mariozechner/clipboard-darwin-universal': 0.3.9
+      '@mariozechner/clipboard-darwin-x64': 0.3.9
+      '@mariozechner/clipboard-linux-arm64-gnu': 0.3.9
+      '@mariozechner/clipboard-linux-arm64-musl': 0.3.9
+      '@mariozechner/clipboard-linux-riscv64-gnu': 0.3.9
+      '@mariozechner/clipboard-linux-x64-gnu': 0.3.9
+      '@mariozechner/clipboard-linux-x64-musl': 0.3.9
+      '@mariozechner/clipboard-win32-arm64-msvc': 0.3.9
+      '@mariozechner/clipboard-win32-x64-msvc': 0.3.9
+    optional: true
+
   '@mdn/browser-compat-data@7.1.7': {}
 
   '@mdx-js/mdx@3.1.1':
@@ -14468,6 +15368,15 @@ snapshots:
   '@microsoft/tsdoc@0.15.1':
     optional: true
 
+  '@mistralai/mistralai@2.2.1(bufferutil@4.0.9)(utf-8-validate@6.0.5)':
+    dependencies:
+      ws: 8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+      zod: 3.25.76
+      zod-to-json-schema: 3.25.2(zod@3.25.76)
+    transitivePeerDependencies:
+      - bufferutil
+      - utf-8-validate
+
   '@modelcontextprotocol/inspector-cli@0.16.3':
     dependencies:
       '@modelcontextprotocol/sdk': 1.17.2
@@ -14514,7 +15423,7 @@ snapshots:
       '@modelcontextprotocol/sdk': 1.17.2
       cors: 2.8.5
       express: 5.1.0
-      ws: 8.18.3(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+      ws: 8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
       zod: 3.25.76
     transitivePeerDependencies:
       - bufferutil
@@ -14628,8 +15537,8 @@ snapshots:
 
   '@napi-rs/wasm-runtime@0.2.4':
     dependencies:
-      '@emnapi/core': 1.7.1
-      '@emnapi/runtime': 1.7.1
+      '@emnapi/core': 1.10.0
+      '@emnapi/runtime': 1.10.0
       '@tybys/wasm-util': 0.9.0
 
   '@napi-rs/wasm-runtime@1.0.7':
@@ -14653,6 +15562,8 @@ snapshots:
       '@tybys/wasm-util': 0.10.1
     optional: true
 
+  '@nodable/entities@2.1.1': {}
+
   '@nodelib/fs.scandir@2.1.5':
     dependencies:
       '@nodelib/fs.stat': 2.0.5
@@ -14668,33 +15579,63 @@ snapshots:
   '@nx/nx-darwin-arm64@22.1.3':
     optional: true
 
+  '@nx/nx-darwin-arm64@22.7.5':
+    optional: true
+
   '@nx/nx-darwin-x64@22.1.3':
     optional: true
 
+  '@nx/nx-darwin-x64@22.7.5':
+    optional: true
+
   '@nx/nx-freebsd-x64@22.1.3':
     optional: true
 
+  '@nx/nx-freebsd-x64@22.7.5':
+    optional: true
+
   '@nx/nx-linux-arm-gnueabihf@22.1.3':
     optional: true
 
+  '@nx/nx-linux-arm-gnueabihf@22.7.5':
+    optional: true
+
   '@nx/nx-linux-arm64-gnu@22.1.3':
     optional: true
 
+  '@nx/nx-linux-arm64-gnu@22.7.5':
+    optional: true
+
   '@nx/nx-linux-arm64-musl@22.1.3':
     optional: true
 
+  '@nx/nx-linux-arm64-musl@22.7.5':
+    optional: true
+
   '@nx/nx-linux-x64-gnu@22.1.3':
     optional: true
 
+  '@nx/nx-linux-x64-gnu@22.7.5':
+    optional: true
+
   '@nx/nx-linux-x64-musl@22.1.3':
     optional: true
 
+  '@nx/nx-linux-x64-musl@22.7.5':
+    optional: true
+
   '@nx/nx-win32-arm64-msvc@22.1.3':
     optional: true
 
+  '@nx/nx-win32-arm64-msvc@22.7.5':
+    optional: true
+
   '@nx/nx-win32-x64-msvc@22.1.3':
     optional: true
 
+  '@nx/nx-win32-x64-msvc@22.7.5':
+    optional: true
+
   '@opentelemetry/api-logs@0.210.0':
     dependencies:
       '@opentelemetry/api': 1.9.0
@@ -14792,39 +15733,46 @@ snapshots:
 
   '@polka/url@1.0.0-next.28': {}
 
-  '@protobufjs/aspromise@1.1.2':
-    optional: true
+  '@protobufjs/aspromise@1.1.2': {}
 
-  '@protobufjs/base64@1.1.2':
-    optional: true
+  '@protobufjs/base64@1.1.2': {}
 
   '@protobufjs/codegen@2.0.4':
     optional: true
 
+  '@protobufjs/codegen@2.0.5': {}
+
   '@protobufjs/eventemitter@1.1.0':
     optional: true
 
+  '@protobufjs/eventemitter@1.1.1': {}
+
   '@protobufjs/fetch@1.1.0':
     dependencies:
       '@protobufjs/aspromise': 1.1.2
       '@protobufjs/inquire': 1.1.0
     optional: true
 
-  '@protobufjs/float@1.0.2':
-    optional: true
+  '@protobufjs/fetch@1.1.1':
+    dependencies:
+      '@protobufjs/aspromise': 1.1.2
+
+  '@protobufjs/float@1.0.2': {}
 
   '@protobufjs/inquire@1.1.0':
     optional: true
 
-  '@protobufjs/path@1.1.2':
-    optional: true
+  '@protobufjs/inquire@1.1.2': {}
 
-  '@protobufjs/pool@1.1.0':
-    optional: true
+  '@protobufjs/path@1.1.2': {}
+
+  '@protobufjs/pool@1.1.0': {}
 
   '@protobufjs/utf8@1.1.0':
     optional: true
 
+  '@protobufjs/utf8@1.1.1': {}
+
   '@puppeteer/browsers@2.9.0':
     dependencies:
       debug: 4.4.0
@@ -15222,7 +16170,7 @@ snapshots:
   '@rc-component/color-picker@2.0.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
       '@ant-design/fast-color': 2.0.6
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -15241,7 +16189,7 @@ snapshots:
 
   '@rc-component/mutate-observer@1.1.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -15257,7 +16205,7 @@ snapshots:
 
   '@rc-component/qrcode@1.0.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -15265,7 +16213,7 @@ snapshots:
 
   '@rc-component/tour@1.15.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/portal': 1.1.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
@@ -15275,7 +16223,7 @@ snapshots:
 
   '@rc-component/trigger@2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/portal': 1.1.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -16166,6 +17114,8 @@ snapshots:
 
   '@silvia-odwyer/photon-node@0.3.3': {}
 
+  '@silvia-odwyer/photon-node@0.3.4': {}
+
   '@silvia-odwyer/photon@0.3.3': {}
 
   '@sinclair/typebox@0.34.41': {}
@@ -16176,6 +17126,60 @@ snapshots:
 
   '@sindresorhus/merge-streams@4.0.0': {}
 
+  '@smithy/core@3.24.6':
+    dependencies:
+      '@aws-crypto/crc32': 5.2.0
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/credential-provider-imds@4.3.7':
+    dependencies:
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/fetch-http-handler@5.4.6':
+    dependencies:
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/is-array-buffer@2.2.0':
+    dependencies:
+      tslib: 2.8.1
+
+  '@smithy/node-http-handler@4.7.3':
+    dependencies:
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/node-http-handler@4.7.6':
+    dependencies:
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/signature-v4@5.4.6':
+    dependencies:
+      '@smithy/core': 3.24.6
+      '@smithy/types': 4.14.3
+      tslib: 2.8.1
+
+  '@smithy/types@4.14.3':
+    dependencies:
+      tslib: 2.8.1
+
+  '@smithy/util-buffer-from@2.2.0':
+    dependencies:
+      '@smithy/is-array-buffer': 2.2.0
+      tslib: 2.8.1
+
+  '@smithy/util-utf8@2.3.0':
+    dependencies:
+      '@smithy/util-buffer-from': 2.2.0
+      tslib: 2.8.1
+
   '@socket.io/component-emitter@3.1.2': {}
 
   '@svgr/babel-plugin-add-jsx-attribute@8.0.0(@babel/core@7.26.10)':
@@ -16273,7 +17277,7 @@ snapshots:
     dependencies:
       '@ampproject/remapping': 2.3.0
       enhanced-resolve: 5.20.1
-      jiti: 2.6.1
+      jiti: 2.7.0
       lightningcss: 1.30.1
       magic-string: 0.30.17
       source-map-js: 1.2.1
@@ -17327,6 +18331,14 @@ snapshots:
     transitivePeerDependencies:
       - debug
 
+  axios@1.16.0:
+    dependencies:
+      follow-redirects: 1.16.0
+      form-data: 4.0.5
+      proxy-from-env: 2.1.0
+    transitivePeerDependencies:
+      - debug
+
   axios@1.8.3:
     dependencies:
       follow-redirects: 1.15.9
@@ -17352,6 +18364,8 @@ snapshots:
 
   balanced-match@1.0.2: {}
 
+  balanced-match@4.0.3: {}
+
   balanced-match@4.0.4: {}
 
   bare-events@2.5.0:
@@ -17424,6 +18438,8 @@ snapshots:
 
   big-integer@1.6.52: {}
 
+  bignumber.js@9.3.1: {}
+
   binary-extensions@2.3.0: {}
 
   bindings@1.5.0:
@@ -17480,6 +18496,8 @@ snapshots:
   boolean@3.2.0:
     optional: true
 
+  bowser@2.14.1: {}
+
   boxen@8.0.1:
     dependencies:
       ansi-align: 3.0.1
@@ -17517,7 +18535,7 @@ snapshots:
     dependencies:
       balanced-match: 1.0.2
 
-  brace-expansion@5.0.5:
+  brace-expansion@5.0.6:
     dependencies:
       balanced-match: 4.0.4
 
@@ -17604,6 +18622,8 @@ snapshots:
 
   buffer-crc32@1.0.0: {}
 
+  buffer-equal-constant-time@1.0.1: {}
+
   buffer-equal@0.0.1: {}
 
   buffer-from@1.1.2: {}
@@ -17765,7 +18785,7 @@ snapshots:
       commander: 11.1.0
       edit-json-file: 1.8.1
       globby: 13.2.2
-      js-yaml: 4.1.1
+      js-yaml: 4.1.0
       semver: 7.5.2
       table: 6.9.0
       type-fest: 4.41.0
@@ -17792,7 +18812,7 @@ snapshots:
       parse5: 7.3.0
       parse5-htmlparser2-tree-adapter: 7.1.0
       parse5-parser-stream: 7.1.2
-      undici: 7.16.0
+      undici: 7.25.0
       whatwg-mimetype: 4.0.0
 
   chokidar@3.6.0:
@@ -18133,7 +19153,7 @@ snapshots:
     dependencies:
       '@types/node': 18.19.130
       cosmiconfig: 9.0.0(typescript@5.8.3)
-      jiti: 2.6.1
+      jiti: 2.7.0
       typescript: 5.8.3
 
   cosmiconfig@8.3.6(typescript@5.8.3):
@@ -18149,7 +19169,7 @@ snapshots:
     dependencies:
       env-paths: 2.2.1
       import-fresh: 3.3.1
-      js-yaml: 4.1.1
+      js-yaml: 4.1.0
       parse-json: 5.2.0
     optionalDependencies:
       typescript: 5.8.3
@@ -18464,6 +19484,8 @@ snapshots:
 
   diff@4.0.2: {}
 
+  diff@8.0.4: {}
+
   diffie-hellman@5.0.3:
     dependencies:
       bn.js: 4.12.0
@@ -18522,6 +19544,10 @@ snapshots:
     dependencies:
       dotenv: 16.4.7
 
+  dotenv-expand@12.0.3:
+    dependencies:
+      dotenv: 16.4.7
+
   dotenv@16.4.5: {}
 
   dotenv@16.4.7: {}
@@ -18543,6 +19569,10 @@ snapshots:
 
   eastasianwidth@0.2.0: {}
 
+  ecdsa-sig-formatter@1.0.11:
+    dependencies:
+      safe-buffer: 5.2.1
+
   edit-json-file@1.8.1:
     dependencies:
       find-value: 1.0.13
@@ -18553,6 +19583,8 @@ snapshots:
 
   ee-first@1.1.1: {}
 
+  ejs@5.0.1: {}
+
   electron-to-chromium@1.5.182: {}
 
   electron-to-chromium@1.5.260: {}
@@ -19181,6 +20213,18 @@ snapshots:
 
   fast-uri@3.1.0: {}
 
+  fast-xml-builder@1.2.0:
+    dependencies:
+      path-expression-matcher: 1.5.0
+      xml-naming: 0.1.0
+
+  fast-xml-parser@5.7.3:
+    dependencies:
+      '@nodable/entities': 2.1.1
+      fast-xml-builder: 1.2.0
+      path-expression-matcher: 1.5.0
+      strnum: 2.3.0
+
   fastq@1.19.1:
     dependencies:
       reusify: 1.1.0
@@ -19354,6 +20398,8 @@ snapshots:
 
   follow-redirects@1.15.9: {}
 
+  follow-redirects@1.16.0: {}
+
   for-each@0.3.5:
     dependencies:
       is-callable: 1.2.7
@@ -19490,6 +20536,22 @@ snapshots:
     transitivePeerDependencies:
       - supports-color
 
+  gaxios@7.1.4:
+    dependencies:
+      extend: 3.0.2
+      https-proxy-agent: 7.0.6
+      node-fetch: 3.3.2
+    transitivePeerDependencies:
+      - supports-color
+
+  gcp-metadata@8.1.2:
+    dependencies:
+      gaxios: 7.1.4
+      google-logging-utils: 1.1.3
+      json-bigint: 1.0.0
+    transitivePeerDependencies:
+      - supports-color
+
   generate-function@2.3.1:
     dependencies:
       is-property: 1.0.2
@@ -19508,6 +20570,8 @@ snapshots:
 
   get-east-asian-width@1.4.0: {}
 
+  get-east-asian-width@1.6.0: {}
+
   get-intrinsic@1.3.0:
     dependencies:
       call-bind-apply-helpers: 1.0.2
@@ -19604,7 +20668,7 @@ snapshots:
       foreground-child: 3.3.0
       jackspeak: 3.4.3
       minimatch: 9.0.5
-      minipass: 7.1.2
+      minipass: 7.1.3
       package-json-from-dist: 1.0.1
       path-scurry: 1.11.1
 
@@ -19707,6 +20771,19 @@ snapshots:
       merge2: 1.4.1
       slash: 4.0.0
 
+  google-auth-library@10.6.2:
+    dependencies:
+      base64-js: 1.5.1
+      ecdsa-sig-formatter: 1.0.11
+      gaxios: 7.1.4
+      gcp-metadata: 8.1.2
+      google-logging-utils: 1.1.3
+      jws: 4.0.1
+    transitivePeerDependencies:
+      - supports-color
+
+  google-logging-utils@1.1.3: {}
+
   gopd@1.2.0: {}
 
   got@11.8.6:
@@ -19941,6 +21018,10 @@ snapshots:
 
   hosted-git-info@2.8.9: {}
 
+  hosted-git-info@9.0.3:
+    dependencies:
+      lru-cache: 11.3.5
+
   html-encoding-sniffer@3.0.0:
     dependencies:
       whatwg-encoding: 2.0.0
@@ -20494,6 +21575,8 @@ snapshots:
 
   jiti@2.6.1: {}
 
+  jiti@2.7.0: {}
+
   jju@1.4.0: {}
 
   jose@5.9.6: {}
@@ -20513,10 +21596,6 @@ snapshots:
     dependencies:
       argparse: 2.0.1
 
-  js-yaml@4.1.1:
-    dependencies:
-      argparse: 2.0.1
-
   jsbn@1.1.0: {}
 
   jsdom@29.0.2:
@@ -20556,6 +21635,10 @@ snapshots:
       stream-combiner: 0.2.2
       unorm: 1.6.0
 
+  json-bigint@1.0.0:
+    dependencies:
+      bignumber.js: 9.3.1
+
   json-buffer@3.0.1: {}
 
   json-cycle@1.5.0: {}
@@ -20566,6 +21649,11 @@ snapshots:
 
   json-parse-even-better-errors@2.3.1: {}
 
+  json-schema-to-ts@3.1.1:
+    dependencies:
+      '@babel/runtime': 7.28.4
+      ts-algebra: 2.0.0
+
   json-schema-traverse@0.4.1: {}
 
   json-schema-traverse@1.0.0: {}
@@ -20613,6 +21701,17 @@ snapshots:
 
   junk@4.0.1: {}
 
+  jwa@2.0.1:
+    dependencies:
+      buffer-equal-constant-time: 1.0.1
+      ecdsa-sig-formatter: 1.0.11
+      safe-buffer: 5.2.1
+
+  jws@4.0.1:
+    dependencies:
+      jwa: 2.0.1
+      safe-buffer: 5.2.1
+
   keyv@4.5.4:
     dependencies:
       json-buffer: 3.0.1
@@ -20844,8 +21943,7 @@ snapshots:
       chalk: 4.1.2
       is-unicode-supported: 0.1.0
 
-  long@5.3.2:
-    optional: true
+  long@5.3.2: {}
 
   longest-streak@3.1.0: {}
 
@@ -20867,8 +21965,6 @@ snapshots:
 
   lru-cache@10.4.3: {}
 
-  lru-cache@11.0.2: {}
-
   lru-cache@11.3.5: {}
 
   lru-cache@4.1.5:
@@ -20929,6 +22025,8 @@ snapshots:
 
   markdown-table@3.0.4: {}
 
+  marked@15.0.12: {}
+
   marky@1.3.0: {}
 
   matcher@3.0.0:
@@ -21511,10 +22609,11 @@ snapshots:
   minimatch@10.0.3:
     dependencies:
       '@isaacs/brace-expansion': 5.0.1
+    optional: true
 
   minimatch@10.2.5:
     dependencies:
-      brace-expansion: 5.0.5
+      brace-expansion: 5.0.6
 
   minimatch@3.1.2:
     dependencies:
@@ -21763,6 +22862,132 @@ snapshots:
     transitivePeerDependencies:
       - debug
 
+  nx@22.7.5:
+    dependencies:
+      '@emnapi/core': 1.4.5
+      '@emnapi/runtime': 1.4.5
+      '@emnapi/wasi-threads': 1.0.4
+      '@jest/diff-sequences': 30.0.1
+      '@napi-rs/wasm-runtime': 0.2.4
+      '@tybys/wasm-util': 0.9.0
+      '@yarnpkg/lockfile': 1.1.0
+      '@zkochan/js-yaml': 0.0.7
+      ansi-colors: 4.1.3
+      ansi-regex: 5.0.1
+      ansi-styles: 4.3.0
+      argparse: 2.0.1
+      asynckit: 0.4.0
+      axios: 1.16.0
+      balanced-match: 4.0.3
+      base64-js: 1.5.1
+      bl: 4.1.0
+      brace-expansion: 5.0.6
+      buffer: 5.7.1
+      call-bind-apply-helpers: 1.0.2
+      chalk: 4.1.2
+      cli-cursor: 3.1.0
+      cli-spinners: 2.6.1
+      cliui: 8.0.1
+      clone: 1.0.4
+      color-convert: 2.0.1
+      color-name: 1.1.4
+      combined-stream: 1.0.8
+      defaults: 1.0.4
+      define-lazy-prop: 2.0.0
+      delayed-stream: 1.0.0
+      dotenv: 16.4.7
+      dotenv-expand: 12.0.3
+      dunder-proto: 1.0.1
+      ejs: 5.0.1
+      emoji-regex: 8.0.0
+      end-of-stream: 1.4.5
+      enquirer: 2.3.6
+      es-define-property: 1.0.1
+      es-errors: 1.3.0
+      es-object-atoms: 1.1.1
+      es-set-tostringtag: 2.1.0
+      escalade: 3.2.0
+      escape-string-regexp: 1.0.5
+      figures: 3.2.0
+      flat: 5.0.2
+      follow-redirects: 1.16.0
+      form-data: 4.0.5
+      fs-constants: 1.0.0
+      function-bind: 1.1.2
+      get-caller-file: 2.0.5
+      get-intrinsic: 1.3.0
+      get-proto: 1.0.1
+      gopd: 1.2.0
+      has-flag: 4.0.0
+      has-symbols: 1.1.0
+      has-tostringtag: 1.0.2
+      hasown: 2.0.2
+      ieee754: 1.2.1
+      ignore: 7.0.5
+      inherits: 2.0.4
+      is-docker: 2.2.1
+      is-fullwidth-code-point: 3.0.0
+      is-interactive: 1.0.0
+      is-unicode-supported: 0.1.0
+      is-wsl: 2.2.0
+      json5: 2.2.3
+      jsonc-parser: 3.2.0
+      lines-and-columns: 2.0.3
+      log-symbols: 4.1.0
+      math-intrinsics: 1.1.0
+      mime-db: 1.52.0
+      mime-types: 2.1.35
+      mimic-fn: 2.1.0
+      minimatch: 10.2.5
+      minimist: 1.2.8
+      npm-run-path: 4.0.1
+      once: 1.4.0
+      onetime: 5.1.2
+      open: 8.4.2
+      ora: 5.3.0
+      path-key: 3.1.1
+      picocolors: 1.1.1
+      proxy-from-env: 2.1.0
+      readable-stream: 3.6.2
+      require-directory: 2.1.1
+      resolve.exports: 2.0.3
+      restore-cursor: 3.1.0
+      safe-buffer: 5.2.1
+      semver: 7.7.4
+      signal-exit: 3.0.7
+      smol-toml: 1.6.1
+      string-width: 4.2.3
+      string_decoder: 1.3.0
+      strip-ansi: 6.0.1
+      strip-bom: 3.0.0
+      supports-color: 7.2.0
+      tar-stream: 2.2.0
+      tmp: 0.2.6
+      tree-kill: 1.2.2
+      tsconfig-paths: 4.2.0
+      tslib: 2.8.1
+      util-deprecate: 1.0.2
+      wcwidth: 1.0.1
+      wrap-ansi: 7.0.0
+      wrappy: 1.0.2
+      y18n: 5.0.8
+      yaml: 2.9.0
+      yargs: 17.7.2
+      yargs-parser: 21.1.1
+    optionalDependencies:
+      '@nx/nx-darwin-arm64': 22.7.5
+      '@nx/nx-darwin-x64': 22.7.5
+      '@nx/nx-freebsd-x64': 22.7.5
+      '@nx/nx-linux-arm-gnueabihf': 22.7.5
+      '@nx/nx-linux-arm64-gnu': 22.7.5
+      '@nx/nx-linux-arm64-musl': 22.7.5
+      '@nx/nx-linux-x64-gnu': 22.7.5
+      '@nx/nx-linux-x64-musl': 22.7.5
+      '@nx/nx-win32-arm64-msvc': 22.7.5
+      '@nx/nx-win32-x64-msvc': 22.7.5
+    transitivePeerDependencies:
+      - debug
+
   object-assign@4.1.1: {}
 
   object-inspect@1.13.4: {}
@@ -21839,9 +23064,14 @@ snapshots:
       is-docker: 2.2.1
       is-wsl: 2.2.0
 
+  openai@6.26.0(ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5))(zod@3.25.76):
+    optionalDependencies:
+      ws: 8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
+      zod: 3.25.76
+
   openai@6.3.0(ws@8.20.0)(zod@3.25.76):
     optionalDependencies:
-      ws: 8.20.0
+      ws: 8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5)
       zod: 3.25.76
 
   opener@1.5.2: {}
@@ -21860,7 +23090,7 @@ snapshots:
       bl: 4.1.0
       chalk: 4.1.2
       cli-cursor: 3.1.0
-      cli-spinners: 2.6.1
+      cli-spinners: 2.9.2
       is-interactive: 1.0.0
       log-symbols: 4.1.0
       strip-ansi: 6.0.1
@@ -22075,6 +23305,8 @@ snapshots:
 
   parseurl@1.3.3: {}
 
+  partial-json@0.1.7: {}
+
   path-browserify@1.0.1: {}
 
   path-exists@3.0.0: {}
@@ -22083,6 +23315,8 @@ snapshots:
 
   path-exists@5.0.0: {}
 
+  path-expression-matcher@1.5.0: {}
+
   path-is-absolute@1.0.1: {}
 
   path-is-inside@1.0.2: {}
@@ -22098,12 +23332,12 @@ snapshots:
   path-scurry@1.11.1:
     dependencies:
       lru-cache: 10.4.3
-      minipass: 7.1.2
+      minipass: 7.1.3
 
   path-scurry@2.0.0:
     dependencies:
-      lru-cache: 11.0.2
-      minipass: 7.1.2
+      lru-cache: 11.3.5
+      minipass: 7.1.3
 
   path-scurry@2.0.2:
     dependencies:
@@ -22286,12 +23520,33 @@ snapshots:
     dependencies:
       make-error: 1.3.6
 
+  proper-lockfile@4.1.2:
+    dependencies:
+      graceful-fs: 4.2.11
+      retry: 0.12.0
+      signal-exit: 3.0.7
+
   property-information@6.5.0: {}
 
   property-information@7.0.0: {}
 
   proto-list@1.2.4: {}
 
+  protobufjs@7.6.2:
+    dependencies:
+      '@protobufjs/aspromise': 1.1.2
+      '@protobufjs/base64': 1.1.2
+      '@protobufjs/codegen': 2.0.5
+      '@protobufjs/eventemitter': 1.1.1
+      '@protobufjs/fetch': 1.1.1
+      '@protobufjs/float': 1.0.2
+      '@protobufjs/inquire': 1.1.2
+      '@protobufjs/path': 1.1.2
+      '@protobufjs/pool': 1.1.0
+      '@protobufjs/utf8': 1.1.1
+      '@types/node': 18.19.130
+      long: 5.3.2
+
   protobufjs@8.0.0:
     dependencies:
       '@protobufjs/aspromise': 1.1.2
@@ -22328,6 +23583,8 @@ snapshots:
 
   proxy-from-env@1.1.0: {}
 
+  proxy-from-env@2.1.0: {}
+
   prr@1.0.1:
     optional: true
 
@@ -22455,7 +23712,7 @@ snapshots:
 
   rc-cascader@3.28.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       array-tree-filter: 2.1.0
       classnames: 2.5.1
       rc-select: 14.15.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22466,7 +23723,7 @@ snapshots:
 
   rc-checkbox@3.3.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22474,7 +23731,7 @@ snapshots:
 
   rc-collapse@3.8.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22483,7 +23740,7 @@ snapshots:
 
   rc-dialog@9.6.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/portal': 1.1.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22493,7 +23750,7 @@ snapshots:
 
   rc-drawer@7.2.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/portal': 1.1.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22503,7 +23760,7 @@ snapshots:
 
   rc-dropdown@4.2.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22512,7 +23769,7 @@ snapshots:
 
   rc-field-form@2.4.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/async-validator': 5.0.4
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22520,7 +23777,7 @@ snapshots:
 
   rc-image@7.11.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/portal': 1.1.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-dialog: 9.6.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22531,7 +23788,7 @@ snapshots:
 
   rc-input-number@9.2.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/mini-decimal': 1.1.0
       classnames: 2.5.1
       rc-input: 1.6.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22541,7 +23798,7 @@ snapshots:
 
   rc-input@1.6.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22549,7 +23806,7 @@ snapshots:
 
   rc-mentions@2.16.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-input: 1.6.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22561,7 +23818,7 @@ snapshots:
 
   rc-menu@9.15.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22572,7 +23829,7 @@ snapshots:
 
   rc-motion@2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22580,7 +23837,7 @@ snapshots:
 
   rc-notification@5.6.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22598,7 +23855,7 @@ snapshots:
 
   rc-pagination@4.3.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22606,7 +23863,7 @@ snapshots:
 
   rc-picker@4.6.15(date-fns@2.30.0)(dayjs@1.11.13)(moment@2.30.1)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-overflow: 1.3.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22621,7 +23878,7 @@ snapshots:
 
   rc-progress@4.0.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22629,7 +23886,7 @@ snapshots:
 
   rc-rate@2.13.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22637,7 +23894,7 @@ snapshots:
 
   rc-resize-observer@1.4.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22646,7 +23903,7 @@ snapshots:
 
   rc-segmented@2.5.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22655,7 +23912,7 @@ snapshots:
 
   rc-select@14.15.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22667,7 +23924,7 @@ snapshots:
 
   rc-slider@11.1.7(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22675,7 +23932,7 @@ snapshots:
 
   rc-steps@6.0.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22683,7 +23940,7 @@ snapshots:
 
   rc-switch@4.1.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -22691,7 +23948,7 @@ snapshots:
 
   rc-table@7.47.5(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/context': 1.4.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       rc-resize-observer: 1.4.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22702,7 +23959,7 @@ snapshots:
 
   rc-tabs@15.3.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-dropdown: 4.2.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-menu: 9.15.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22714,7 +23971,7 @@ snapshots:
 
   rc-textarea@1.8.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-input: 1.6.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-resize-observer: 1.4.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22724,7 +23981,7 @@ snapshots:
 
   rc-tooltip@6.2.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       '@rc-component/trigger': 2.2.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       classnames: 2.5.1
       react: 18.3.1
@@ -22732,7 +23989,7 @@ snapshots:
 
   rc-tree-select@5.23.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-select: 14.15.2(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-tree: 5.9.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22742,7 +23999,7 @@ snapshots:
 
   rc-tree@5.9.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-motion: 2.9.3(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -22752,7 +24009,7 @@ snapshots:
 
   rc-upload@4.8.1(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
-      '@babel/runtime': 7.27.0
+      '@babel/runtime': 7.28.4
       classnames: 2.5.1
       rc-util: 5.43.0(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
       react: 18.3.1
@@ -23474,6 +24731,8 @@ snapshots:
 
   semver@7.7.3: {}
 
+  semver@7.7.4: {}
+
   send@0.19.0:
     dependencies:
       debug: 2.6.9
@@ -23733,6 +24992,8 @@ snapshots:
       wcwidth: 1.0.1
       yargs: 15.4.1
 
+  smol-toml@1.6.1: {}
+
   snake-case@3.0.4:
     dependencies:
       dot-case: 3.0.4
@@ -24018,6 +25279,8 @@ snapshots:
 
   strip-json-comments@5.0.3: {}
 
+  strnum@2.3.0: {}
+
   strtok3@6.3.0:
     dependencies:
       '@tokenizer/token': 0.3.0
@@ -24237,6 +25500,8 @@ snapshots:
 
   tmp@0.2.5: {}
 
+  tmp@0.2.6: {}
+
   tn1150@0.1.0:
     dependencies:
       unorm: 1.6.0
@@ -24286,6 +25551,8 @@ snapshots:
     dependencies:
       utf8-byte-length: 1.0.5
 
+  ts-algebra@2.0.0: {}
+
   ts-checker-rspack-plugin@1.2.2(@rspack/core@1.6.8)(typescript@5.8.3):
     dependencies:
       '@babel/code-frame': 7.27.1
@@ -24397,6 +25664,8 @@ snapshots:
       media-typer: 1.1.0
       mime-types: 3.0.1
 
+  typebox@1.1.38: {}
+
   typed-array-buffer@1.0.3:
     dependencies:
       call-bound: 1.0.4
@@ -24457,10 +25726,10 @@ snapshots:
 
   undici@6.22.0: {}
 
-  undici@7.16.0: {}
-
   undici@7.25.0: {}
 
+  undici@8.3.0: {}
+
   unhead@2.1.13:
     dependencies:
       hookable: 6.0.1
@@ -25245,8 +26514,10 @@ snapshots:
       bufferutil: 4.0.9
       utf-8-validate: 6.0.5
 
-  ws@8.20.0:
-    optional: true
+  ws@8.20.0(bufferutil@4.0.9)(utf-8-validate@6.0.5):
+    optionalDependencies:
+      bufferutil: 4.0.9
+      utf-8-validate: 6.0.5
 
   wsl-utils@0.1.0:
     dependencies:
@@ -25263,6 +26534,8 @@ snapshots:
 
   xml-name-validator@5.0.0: {}
 
+  xml-naming@0.1.0: {}
+
   xml-parse-from-string@1.0.1: {}
 
   xml2js@0.5.0:
@@ -25299,6 +26572,8 @@ snapshots:
 
   yaml@2.8.2: {}
 
+  yaml@2.9.0: {}
+
   yargs-parser@18.1.3:
     dependencies:
       camelcase: 5.3.1
@@ -25379,6 +26654,10 @@ snapshots:
     dependencies:
       zod: 3.25.76
 
+  zod-to-json-schema@3.25.2(zod@3.25.76):
+    dependencies:
+      zod: 3.25.76
+
   zod@3.25.76: {}
 
   zustand@4.5.2(@types/react@18.3.23)(immer@10.1.1)(react@18.3.1):
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
index 024b626025..650c91d60a 100644
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -6,6 +6,8 @@
 
 > 本稿目标：把"动手前必须先定的接口"钉死成可评审的草案。每节末尾的 **🔶 待讨论** 是我留的开放决策点。
 
+> **实现状态（Phase 0）**：本稿契约已落地为新包 `@midscene/testing-framework`（`packages/testing-framework`），含 `defineMidsceneConfig` / `defineRuntime`、v2 YAML 解析、节点引擎（`ui`/`verify`/`soft`/`agent`/自定义）、上下文装配、verify fail-closed 判定、默认 Pi agent 运行时（已解决 C′），以及一个轻量 runner 与 CLI（`midscene-tf`）。可 copy 演示的样例在仓库根 `example/`。唯一开放项 C′ 已落实（见 §4.1）。
+
 ---
 
 ## 0. 术语与分层回顾（已达成共识，作为前提）
@@ -294,9 +296,28 @@ await session.prompt(assembledContext, {
 | skills 注入 | `DefaultResourceLoader` + `skillsOverride` | ✅ |
 | 选模型 / 鉴权 | `getModel(provider, model)`；`AuthStorage.setRuntimeApiKey` 或 env | ✅ |
 
-🔶 **唯一真实对接项（C′）**：Pi 文档**没看到 base URL override**。Midscene 走 `MIDSCENE_MODEL_BASE_URL`（自定义 / OpenAI 兼容端点）。要让 `verify`/`agent` 和 `ui` 用**同一个模型端点**，必须确认 Pi 能否指定自定义 base URL / 兼容 provider。这是和 Pi 的**唯一一个需要落实的依赖**——其余都齐了。
+✅ **C′ 已落实（不再是开放项）**：核对 Pi SDK 源码（`@earendil-works/pi-coding-agent` 0.78）确认 `ModelRegistry.registerProvider(name, config)` 接受 `baseUrl` + `apiKey` + 一组 `models`（可指定 `api: 'openai-completions'`、`input: ['text','image']`）。因此框架可以：
 
-> 所以回答"还要确认啥"：设计层面已闭合；剩下的就这一条 base URL 能力，去 Pi 源码/最新文档核一下即可，不行就和 Pi 团队提。
+```ts
+const authStorage = AuthStorage.inMemory();
+const registry = ModelRegistry.inMemory(authStorage);
+registry.registerProvider('midscene', {
+  baseUrl: process.env.MIDSCENE_MODEL_BASE_URL,
+  apiKey: process.env.MIDSCENE_MODEL_API_KEY,
+  models: [{
+    id: process.env.MIDSCENE_MODEL_NAME, name: process.env.MIDSCENE_MODEL_NAME,
+    api: 'openai-completions', reasoning: false, input: ['text', 'image'],
+    cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+    contextWindow: 128_000, maxTokens: 8_192,
+  }],
+});
+const model = registry.find('midscene', process.env.MIDSCENE_MODEL_NAME);
+const { session } = await createAgentSession({ model, modelRegistry: registry, authStorage, ... });
+```
+
+这样 `verify`/`agent`（Pi）与 `ui`（Midscene UI Agent）走**同一个 `MIDSCENE_MODEL_BASE_URL` 端点**，零 Pi 改动。实现见 `@midscene/testing-framework` 的 `PiAgentRuntime`（`src/agent-runtime/pi-runtime.ts`），并有 `tests/smoke/pi-wiring.mjs` 验证（provider 注册 / apiKey 解析 / session 选模型 / `report_verdict` customTool 激活）均通过。
+
+> 注：`MIDSCENE_MODEL_EXTRA_BODY_JSON`（如 `{"service_tier":"fast"}`）只对 `ui` 节点的 Midscene UI Agent 生效；Phase 0 未把它透传给 Pi 节点（属性能优化、非正确性，后续可经 stream `onPayload` 接入）。
 
 ---
 
@@ -430,11 +451,13 @@ flow:
 | 节点指令形态 | 内置=文本；自定义=文本或 object |
 | 长 flow 上下文 | 不截断（Phase 0） |
 
-### 唯一待对接
+### 待对接
 
 | # | 事项 | 状态 |
 |---|---|---|
-| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | 文档未见；**去 Pi 源码/最新文档核实，不行则与 Pi 团队对接**（§4.1） |
+| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | ✅ **已落实**：经 `ModelRegistry.registerProvider({ baseUrl, apiKey, models })` 实现，见 §4.1 与 `PiAgentRuntime` |
+
+（无剩余待对接项。）
 
 ---
 

From 85bd63efbea4e2544784dea5b777f888f29f0fa7 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 3 Jun 2026 23:04:51 +0000
Subject: [PATCH 25/33] test(testing-framework): add CI-friendly mock-model
 smoke over example cases

Run the real runner end-to-end (discovery, YAML parsing, node engine, output
store, summary writing) against the real example/e2e cases, mocking only the
browser (fake UI Agent) and the model (mock agent runtime). No network or
Chrome required, so it runs in the standard `nx test` / `test:coverage` CI job.

Narrow the package tsconfig include to tests/unit-test so type-check:tests does
not try to type-check the standalone .mjs smoke scripts.
---
 .../tests/unit-test/runner-smoke.test.ts      | 110 ++++++++++++++++++
 packages/testing-framework/tsconfig.json      |   2 +-
 2 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 packages/testing-framework/tests/unit-test/runner-smoke.test.ts

diff --git a/packages/testing-framework/tests/unit-test/runner-smoke.test.ts b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
new file mode 100644
index 0000000000..d704b41d7e
--- /dev/null
+++ b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
@@ -0,0 +1,110 @@
+import { mkdtempSync, readFileSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { dirname, join } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { describe, expect, it, vi } from 'vitest';
+import type {
+  AgentRunInput,
+  AgentRuntimeAdapter,
+} from '../../src/agent-runtime/types';
+import type { MidsceneConfig } from '../../src/config/types';
+import { runAll } from '../../src/runner/run';
+import { defineRuntime } from '../../src/runtime';
+import type { Agent, RunSummary } from '../../src/types';
+
+/**
+ * CI-friendly mock-model smoke. Runs the REAL runner end-to-end over the REAL
+ * example cases — discovery, YAML parsing, the node engine, output store, and
+ * summary writing — while mocking the two external boundaries the sandbox/CI
+ * cannot reach: the browser (a fake UI Agent) and the model (a mock agent
+ * runtime). No network, no Chrome, fully deterministic.
+ */
+
+const here = dirname(fileURLToPath(import.meta.url));
+const repoRoot = join(here, '../../../..');
+const exampleDir = join(repoRoot, 'example');
+
+function fakeUiAgent(): Agent {
+  return {
+    aiAct: vi.fn(async () => undefined),
+    aiAsk: vi.fn(async () => 'recorded the requested values'),
+    interface: {
+      screenshotBase64: vi.fn(async () => 'data:image/png;base64,AAAA'),
+    },
+    reportFile: undefined,
+  } as unknown as Agent;
+}
+
+describe('runner mock-model smoke (example cases)', () => {
+  it('runs the example suite green with a mocked browser + model', async () => {
+    const seenVerify: AgentRunInput[] = [];
+
+    const mockRuntime: AgentRuntimeAdapter = {
+      run: async (input) => {
+        if (input.kind === 'verify' || input.kind === 'soft') {
+          seenVerify.push(input);
+          return {
+            text: 'verified',
+            verdict: { pass: true, reason: 'mock pass' },
+          };
+        }
+        return { text: 'mock analysis' };
+      },
+    };
+
+    const summaryPath = join(
+      mkdtempSync(join(tmpdir(), 'mts-smoke-')),
+      'summary.json',
+    );
+
+    const config: MidsceneConfig = {
+      uiAgent: async () => ({ agent: fakeUiAgent() }),
+      testDir: join(exampleDir, 'e2e'),
+      include: ['**/*.yaml'],
+      exclude: ['**/*.draft.yaml'],
+      output: { summary: summaryPath },
+      agentRuntime: mockRuntime,
+      runtime: {
+        prepareCartFixture: defineRuntime(async (ctx) => {
+          const input = (ctx.input ?? {}) as { scenario?: string };
+          ctx.state.cartFixture = { scenario: input.scenario };
+          return { conclusion: `Prepared a "${input.scenario}" cart fixture.` };
+        }),
+        notify: defineRuntime(async (ctx) => {
+          const failed = ctx.result.steps.filter((s) => s.status === 'failed');
+          return {
+            conclusion: failed.length === 0 ? 'no alert needed' : 'would alert',
+          };
+        }),
+      },
+    };
+
+    const summary = await runAll(config, { projectRoot: exampleDir });
+
+    // both example cases discovered and green
+    expect(summary.total).toBe(2);
+    expect(summary.failed).toBe(0);
+    expect(summary.passed).toBe(2);
+
+    // the $catalog skill reference reached the verify boundary
+    const referenced = seenVerify.flatMap((i) => i.referencedSkills);
+    expect(referenced).toContain('catalog');
+
+    // verify always received the (mocked) current screenshot
+    expect(seenVerify.every((i) => Boolean(i.screenshotBase64))).toBe(true);
+
+    // a runtime conclusion is visible in later context; engineering state is not
+    const ctxWithFixture = seenVerify.find((i) =>
+      i.context.includes('cart fixture'),
+    );
+    expect(ctxWithFixture).toBeDefined();
+    expect(ctxWithFixture?.context).not.toContain('cartFixture');
+
+    // the summary file was written and round-trips
+    const written = JSON.parse(
+      readFileSync(summaryPath, 'utf-8'),
+    ) as RunSummary;
+    expect(written.total).toBe(2);
+    expect(written.cases.map((c) => c.status)).toEqual(['passed', 'passed']);
+  });
+});
diff --git a/packages/testing-framework/tsconfig.json b/packages/testing-framework/tsconfig.json
index 8ac2495426..5f88c00c0c 100644
--- a/packages/testing-framework/tsconfig.json
+++ b/packages/testing-framework/tsconfig.json
@@ -9,7 +9,7 @@
       "@/*": ["./src/*"]
     }
   },
-  "include": ["src", "tests"],
+  "include": ["src", "tests/unit-test"],
   "references": [
     {
       "path": "../core"

From d6450bcfa1a67a36203a49b8d992adfc9b0226b4 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 4 Jun 2026 14:51:23 -0700
Subject: [PATCH 26/33] docs(site): remove draft UI testing framework from
 sidebar nav

---
 apps/site/rspress.config.ts | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/apps/site/rspress.config.ts b/apps/site/rspress.config.ts
index 3c9bbf2ebb..4d9e1daa85 100644
--- a/apps/site/rspress.config.ts
+++ b/apps/site/rspress.config.ts
@@ -129,13 +129,6 @@ export default defineConfig(async () => {
             text: 'Showcases',
             link: '/showcases',
           },
-          {
-            sectionHeaderText: 'UI Testing Framework',
-          },
-          {
-            text: 'Overview',
-            link: '/ui-testing-framework',
-          },
           {
             sectionHeaderText: 'Web browser',
           },
@@ -316,13 +309,6 @@ export default defineConfig(async () => {
             text: '案例展示',
             link: '/zh/showcases',
           },
-          {
-            sectionHeaderText: 'UI Testing Framework',
-          },
-          {
-            text: '专题总览',
-            link: '/zh/ui-testing-framework',
-          },
           {
             sectionHeaderText: 'Web 浏览器',
           },

From d3521253f329a355cda9acc790bd73fce7314f06 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 4 Jun 2026 15:43:04 -0700
Subject: [PATCH 27/33] refactor(testing-framework): rename agent-runtime to
 general-agent

The `agent-runtime` directory sat at the intersection of two naming
collisions: "agent" (shared with the UI Agent in `ui-agent/`) and
"runtime" (shared with custom YAML nodes, `defineRuntime`). Reserve
"runtime" for custom nodes and rename the swappable general-purpose
agent layer to read as the counterpart of the UI Agent.

- src/agent-runtime/ -> src/general-agent/ (pi-runtime.ts -> pi-general-agent.ts)
- AgentRuntimeAdapter -> GeneralAgentAdapter
- AgentRunInput/AgentRunResult -> GeneralAgentInput/GeneralAgentResult
- PiAgentRuntime -> PiGeneralAgent, PiRuntimeOptions -> PiGeneralAgentOptions
- config field agentRuntime -> generalAgent

Also drop the unused `index` param threaded through runNode /
runJudgmentNode / runAgentNode in the engine.

Docs (package README, RFC 0001) updated to match.
---
 packages/testing-framework/README.md          |  8 ++---
 .../testing-framework/src/config/types.ts     |  6 ++--
 .../testing-framework/src/engine/run-case.ts  | 10 +++---
 .../testing-framework/src/engine/run-node.ts  | 19 +++++------
 .../pi-general-agent.ts}                      | 32 ++++++++---------
 .../skills.ts                                 |  0
 .../{agent-runtime => general-agent}/types.ts | 22 +++++++-----
 packages/testing-framework/src/index.ts       | 18 +++++-----
 packages/testing-framework/src/runner/run.ts  | 12 +++----
 .../tests/smoke/browser-smoke.mjs             |  4 +--
 .../unit-test/context-and-skills.test.ts      |  2 +-
 .../tests/unit-test/engine.test.ts            | 34 +++++++++----------
 .../tests/unit-test/runner-smoke.test.ts      | 14 ++++----
 rfcs/0001-v2-testing-framework-phase0.md      | 10 +++---
 14 files changed, 97 insertions(+), 94 deletions(-)
 rename packages/testing-framework/src/{agent-runtime/pi-runtime.ts => general-agent/pi-general-agent.ts} (87%)
 rename packages/testing-framework/src/{agent-runtime => general-agent}/skills.ts (100%)
 rename packages/testing-framework/src/{agent-runtime => general-agent}/types.ts (62%)

diff --git a/packages/testing-framework/README.md b/packages/testing-framework/README.md
index 9f8a656f3f..052f07e1ce 100644
--- a/packages/testing-framework/README.md
+++ b/packages/testing-framework/README.md
@@ -67,8 +67,8 @@ const { config } = await loadConfig(process.cwd());
 const summary = await runAll(config);
 ```
 
-## Swapping the agent layer
+## Swapping the general agent
 
-The `verify`/`soft`/`agent` runtime is swappable. Provide your own
-`agentRuntime` (an `AgentRuntimeAdapter`) in `midscene.config.ts` to replace the
-default Pi-backed implementation.
+The general agent that backs `verify`/`soft`/`agent` is swappable. Provide your
+own `generalAgent` (a `GeneralAgentAdapter`) in `midscene.config.ts` to replace
+the default Pi-backed implementation.
diff --git a/packages/testing-framework/src/config/types.ts b/packages/testing-framework/src/config/types.ts
index 21c4eb179d..317966e05e 100644
--- a/packages/testing-framework/src/config/types.ts
+++ b/packages/testing-framework/src/config/types.ts
@@ -4,7 +4,7 @@
  */
 import type { Agent } from '@midscene/core/agent';
 import type { AgentOpt } from '@midscene/core/agent';
-import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import type { GeneralAgentAdapter } from '../general-agent/types';
 import type { RuntimeNode } from '../runtime';
 
 /** Platforms the framework can build a UI Agent for out of the box. */
@@ -76,8 +76,8 @@ export interface MidsceneConfig {
   // —— extension points ——
   /** Custom YAML nodes (RFC §3). */
   runtime?: Record<string, RuntimeNode>;
-  /** Replacement for the default Pi-backed agent layer (RFC §6). */
-  agentRuntime?: AgentRuntimeAdapter;
+  /** Replacement for the default Pi-backed general agent layer (RFC §6). */
+  generalAgent?: GeneralAgentAdapter;
 }
 
 /** Defaults applied when reading a config. */
diff --git a/packages/testing-framework/src/engine/run-case.ts b/packages/testing-framework/src/engine/run-case.ts
index 16131ceae9..a4fc67f2ff 100644
--- a/packages/testing-framework/src/engine/run-case.ts
+++ b/packages/testing-framework/src/engine/run-case.ts
@@ -1,5 +1,5 @@
 import type { Agent } from '@midscene/core/agent';
-import type { AgentRuntimeAdapter } from '../agent-runtime/types';
+import type { GeneralAgentAdapter } from '../general-agent/types';
 import type { RuntimeNode } from '../runtime';
 import type { CaseResult, StepResult } from '../types';
 import type { ParsedCase } from '../yaml/types';
@@ -10,7 +10,7 @@ export interface RunCaseOptions {
   parsed: ParsedCase;
   file: string;
   uiAgent: Agent;
-  agentRuntime: AgentRuntimeAdapter;
+  generalAgent: GeneralAgentAdapter;
   runtimeNodes: Record<string, RuntimeNode>;
   projectRoot: string;
   env: NodeJS.ProcessEnv;
@@ -25,7 +25,7 @@ export async function runCase(options: RunCaseOptions): Promise<CaseResult> {
     parsed,
     file,
     uiAgent,
-    agentRuntime,
+    generalAgent,
     runtimeNodes,
     projectRoot,
     env,
@@ -45,7 +45,7 @@ export async function runCase(options: RunCaseOptions): Promise<CaseResult> {
 
     const deps: RunNodeDeps = {
       uiAgent,
-      agentRuntime,
+      generalAgent,
       runtimeNodes,
       outputs,
       state,
@@ -58,7 +58,7 @@ export async function runCase(options: RunCaseOptions): Promise<CaseResult> {
 
     let stepResult: StepResult;
     try {
-      const outcome = await runNode(step.node, step.input, index, deps);
+      const outcome = await runNode(step.node, step.input, deps);
       stepResult = {
         index,
         node: step.node,
diff --git a/packages/testing-framework/src/engine/run-node.ts b/packages/testing-framework/src/engine/run-node.ts
index 479d46177e..1dfe2cf4d2 100644
--- a/packages/testing-framework/src/engine/run-node.ts
+++ b/packages/testing-framework/src/engine/run-node.ts
@@ -1,7 +1,7 @@
 import type { Agent } from '@midscene/core/agent';
-import { extractSkillReferences } from '../agent-runtime/skills';
-import type { AgentRuntimeAdapter } from '../agent-runtime/types';
 import { assembleContext } from '../context/assembler';
+import { extractSkillReferences } from '../general-agent/skills';
+import type { GeneralAgentAdapter } from '../general-agent/types';
 import type { RuntimeNode, RuntimeNodeContext } from '../runtime';
 import type { StepOutput, StepResult, Verdict } from '../types';
 import { isBuiltinNode } from '../yaml/types';
@@ -9,7 +9,7 @@ import type { OutputStoreImpl } from './output-store';
 
 export interface RunNodeDeps {
   uiAgent: Agent;
-  agentRuntime: AgentRuntimeAdapter;
+  generalAgent: GeneralAgentAdapter;
   runtimeNodes: Record<string, RuntimeNode>;
   outputs: OutputStoreImpl;
   /** Shared engineering-facing state across runtime nodes. */
@@ -36,7 +36,6 @@ export interface RunNodeOutcome {
 export async function runNode(
   node: string,
   input: unknown,
-  index: number,
   deps: RunNodeDeps,
 ): Promise<RunNodeOutcome> {
   if (isBuiltinNode(node)) {
@@ -44,11 +43,11 @@ export async function runNode(
       case 'ui':
         return runUiNode(input as string, deps);
       case 'verify':
-        return runJudgmentNode('verify', input as string, index, deps);
+        return runJudgmentNode('verify', input as string, deps);
       case 'soft':
-        return runJudgmentNode('soft', input as string, index, deps);
+        return runJudgmentNode('soft', input as string, deps);
       case 'agent':
-        return runAgentNode(input as string, index, deps);
+        return runAgentNode(input as string, deps);
     }
   }
   return runCustomNode(node, input, deps);
@@ -77,7 +76,6 @@ async function runUiNode(
 async function runJudgmentNode(
   kind: 'verify' | 'soft',
   instruction: string,
-  index: number,
   deps: RunNodeDeps,
 ): Promise<RunNodeOutcome> {
   const { data, mediaType } = await captureScreenshot(deps.uiAgent);
@@ -88,7 +86,7 @@ async function runJudgmentNode(
     kind,
   });
 
-  const result = await deps.agentRuntime.run({
+  const result = await deps.generalAgent.run({
     kind,
     instruction,
     context,
@@ -122,7 +120,6 @@ async function runJudgmentNode(
 
 async function runAgentNode(
   instruction: string,
-  index: number,
   deps: RunNodeDeps,
 ): Promise<RunNodeOutcome> {
   const { data, mediaType } = await captureScreenshot(deps.uiAgent);
@@ -136,7 +133,7 @@ async function runAgentNode(
   // `agent` is advisory: its output never changes pass/fail. Even internal
   // errors are downgraded to a warning (RFC §8).
   try {
-    const result = await deps.agentRuntime.run({
+    const result = await deps.generalAgent.run({
       kind: 'agent',
       instruction,
       context,
diff --git a/packages/testing-framework/src/agent-runtime/pi-runtime.ts b/packages/testing-framework/src/general-agent/pi-general-agent.ts
similarity index 87%
rename from packages/testing-framework/src/agent-runtime/pi-runtime.ts
rename to packages/testing-framework/src/general-agent/pi-general-agent.ts
index 76be8ebc8a..ca13c77025 100644
--- a/packages/testing-framework/src/agent-runtime/pi-runtime.ts
+++ b/packages/testing-framework/src/general-agent/pi-general-agent.ts
@@ -1,8 +1,8 @@
 /**
- * Default agent runtime, backed by Pi (`@earendil-works/pi-coding-agent`).
+ * Default general agent, backed by Pi (`@earendil-works/pi-coding-agent`).
  *
- * This is the Phase 0 implementation of the swappable agent layer used by
- * `verify` / `soft` / `agent` nodes.
+ * This is the Phase 0 implementation of the swappable general agent layer used
+ * by `verify` / `soft` / `agent` nodes.
  *
  * Decision C′ (RFC §4.1 / §10) — RESOLVED here. Pi exposes
  * `ModelRegistry.registerProvider({ baseUrl, apiKey, models })`, which lets us
@@ -26,16 +26,16 @@ import {
 import { getDebug } from '@midscene/shared/logger';
 import type { Verdict } from '../types';
 import type {
-  AgentRunInput,
-  AgentRunResult,
-  AgentRuntimeAdapter,
+  GeneralAgentAdapter,
+  GeneralAgentInput,
+  GeneralAgentResult,
 } from './types';
 
 const debug = getDebug('testing-framework:pi');
 
 const PROVIDER_NAME = 'midscene';
 
-export interface PiRuntimeOptions {
+export interface PiGeneralAgentOptions {
   /** Endpoint base URL. Defaults to MIDSCENE_MODEL_BASE_URL. */
   baseUrl?: string;
   /** API key. Defaults to MIDSCENE_MODEL_API_KEY. */
@@ -55,15 +55,15 @@ interface PreparedModel {
 }
 
 /**
- * Pi-backed implementation of {@link AgentRuntimeAdapter}.
+ * Pi-backed implementation of {@link GeneralAgentAdapter}.
  */
-export class PiAgentRuntime implements AgentRuntimeAdapter {
+export class PiGeneralAgent implements GeneralAgentAdapter {
   private prepared?: PreparedModel;
   private readonly loaderCache = new Map<string, DefaultResourceLoader>();
 
-  constructor(private readonly options: PiRuntimeOptions = {}) {}
+  constructor(private readonly options: PiGeneralAgentOptions = {}) {}
 
-  async run(input: AgentRunInput): Promise<AgentRunResult> {
+  async run(input: GeneralAgentInput): Promise<GeneralAgentResult> {
     const prepared = this.prepareModel();
     const loader = await this.getResourceLoader(input.projectRoot);
 
@@ -146,7 +146,7 @@ export class PiAgentRuntime implements AgentRuntimeAdapter {
     }
   }
 
-  private buildPrompt(input: AgentRunInput): string {
+  private buildPrompt(input: GeneralAgentInput): string {
     const parts = [input.context];
     if (input.referencedSkills.length > 0) {
       parts.push('');
@@ -168,18 +168,18 @@ export class PiAgentRuntime implements AgentRuntimeAdapter {
 
     if (!baseUrl) {
       throw new Error(
-        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_BASE_URL ' +
-          '(or PiRuntimeOptions.baseUrl) so verify/agent share the UI Agent endpoint.',
+        '[midscene] Pi general agent requires MIDSCENE_MODEL_BASE_URL ' +
+          '(or PiGeneralAgentOptions.baseUrl) so verify/agent share the UI Agent endpoint.',
       );
     }
     if (!apiKey) {
       throw new Error(
-        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_API_KEY (or PiRuntimeOptions.apiKey).',
+        '[midscene] Pi general agent requires MIDSCENE_MODEL_API_KEY (or PiGeneralAgentOptions.apiKey).',
       );
     }
     if (!modelName) {
       throw new Error(
-        '[midscene] Pi agent runtime requires MIDSCENE_MODEL_NAME (or PiRuntimeOptions.modelName).',
+        '[midscene] Pi general agent requires MIDSCENE_MODEL_NAME (or PiGeneralAgentOptions.modelName).',
       );
     }
 
diff --git a/packages/testing-framework/src/agent-runtime/skills.ts b/packages/testing-framework/src/general-agent/skills.ts
similarity index 100%
rename from packages/testing-framework/src/agent-runtime/skills.ts
rename to packages/testing-framework/src/general-agent/skills.ts
diff --git a/packages/testing-framework/src/agent-runtime/types.ts b/packages/testing-framework/src/general-agent/types.ts
similarity index 62%
rename from packages/testing-framework/src/agent-runtime/types.ts
rename to packages/testing-framework/src/general-agent/types.ts
index b335f590ce..a327cad79c 100644
--- a/packages/testing-framework/src/agent-runtime/types.ts
+++ b/packages/testing-framework/src/general-agent/types.ts
@@ -1,15 +1,21 @@
 /**
- * AgentRuntimeAdapter — the swappable general-purpose agent layer (RFC §6,
- * design doc "swappable agent framework"). The default implementation wraps
- * Pi; teams can replace it with another agent SDK via `agentRuntime` in
- * `midscene.config.ts`.
+ * GeneralAgentAdapter — the swappable general-purpose agent layer (RFC §6,
+ * design doc "swappable agent framework"). It is the counterpart to the UI
+ * Agent (`ui-agent/`): the UI Agent *acts on* the page (`ui` nodes), while the
+ * general agent *reasons about* it and gates (`verify` / `soft` / `agent`
+ * nodes). The default implementation wraps Pi; teams can replace it via the
+ * `generalAgent` field in `midscene.config.ts`.
+ *
+ * Naming note: "runtime" is reserved for custom YAML nodes (`defineRuntime`,
+ * RFC §3) — this layer deliberately avoids that word to keep the two extension
+ * points distinct.
  *
  * Phase 0 keeps this interface deliberately minimal: a single `run` entry that
  * the engine calls for `verify` / `soft` / `agent` nodes.
  */
 import type { Verdict } from '../types';
 
-export interface AgentRunInput {
+export interface GeneralAgentInput {
   /**
    * Node kind. `verify` and `soft` both must produce a verdict; `agent` is
    * advisory and never produces one.
@@ -32,7 +38,7 @@ export interface AgentRunInput {
   projectRoot: string;
 }
 
-export interface AgentRunResult {
+export interface GeneralAgentResult {
   /** The agent's final natural-language message. */
   text: string;
   /**
@@ -42,8 +48,8 @@ export interface AgentRunResult {
   verdict?: Verdict;
 }
 
-export interface AgentRuntimeAdapter {
-  run(input: AgentRunInput): Promise<AgentRunResult>;
+export interface GeneralAgentAdapter {
+  run(input: GeneralAgentInput): Promise<GeneralAgentResult>;
   /** Release any underlying resources. */
   dispose?(): Promise<void>;
 }
diff --git a/packages/testing-framework/src/index.ts b/packages/testing-framework/src/index.ts
index e6db1158b0..0acad2d162 100644
--- a/packages/testing-framework/src/index.ts
+++ b/packages/testing-framework/src/index.ts
@@ -6,7 +6,7 @@
  *  - the node model, verdict contract, output contract, context-assembly
  *    contract (as types)
  *  - a lightweight runner (`runAll`) and CLI (`midscene-tf`)
- *  - the default Pi-backed agent runtime with a custom model base URL
+ *  - the default Pi-backed general agent with a custom model base URL
  *    (decision C′, RFC §4.1)
  */
 
@@ -47,15 +47,15 @@ export type {
   BuiltinNodeType,
 } from './types';
 
-// —— agent runtime (swappable) ——
+// —— general agent (swappable) ——
 export type {
-  AgentRuntimeAdapter,
-  AgentRunInput,
-  AgentRunResult,
-} from './agent-runtime/types';
-export { PiAgentRuntime } from './agent-runtime/pi-runtime';
-export type { PiRuntimeOptions } from './agent-runtime/pi-runtime';
-export { extractSkillReferences } from './agent-runtime/skills';
+  GeneralAgentAdapter,
+  GeneralAgentInput,
+  GeneralAgentResult,
+} from './general-agent/types';
+export { PiGeneralAgent } from './general-agent/pi-general-agent';
+export type { PiGeneralAgentOptions } from './general-agent/pi-general-agent';
+export { extractSkillReferences } from './general-agent/skills';
 
 // —— YAML ——
 export { parseCaseYaml } from './yaml/parse';
diff --git a/packages/testing-framework/src/runner/run.ts b/packages/testing-framework/src/runner/run.ts
index 97005a645a..32235427d2 100644
--- a/packages/testing-framework/src/runner/run.ts
+++ b/packages/testing-framework/src/runner/run.ts
@@ -1,10 +1,10 @@
 import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
 import { dirname, isAbsolute, resolve } from 'node:path';
 import { getDebug } from '@midscene/shared/logger';
-import { PiAgentRuntime } from '../agent-runtime/pi-runtime';
-import type { AgentRuntimeAdapter } from '../agent-runtime/types';
 import { DEFAULT_INCLUDE, type MidsceneConfig } from '../config/types';
 import { runCase } from '../engine/run-case';
+import { PiGeneralAgent } from '../general-agent/pi-general-agent';
+import type { GeneralAgentAdapter } from '../general-agent/types';
 import type { CaseResult, RunSummary } from '../types';
 import { createUIAgent } from '../ui-agent/factory';
 import { parseCaseYaml } from '../yaml/parse';
@@ -45,8 +45,8 @@ export async function runAll(
 
   debug('discovered cases', files);
 
-  const agentRuntime: AgentRuntimeAdapter =
-    config.agentRuntime ?? new PiAgentRuntime();
+  const generalAgent: GeneralAgentAdapter =
+    config.generalAgent ?? new PiGeneralAgent();
   const runtimeNodes = config.runtime ?? {};
 
   const startedAt = new Date();
@@ -69,7 +69,7 @@ export async function runAll(
         parsed,
         file,
         uiAgent: agent,
-        agentRuntime,
+        generalAgent,
         runtimeNodes,
         projectRoot,
         env,
@@ -86,7 +86,7 @@ export async function runAll(
     }
   }
 
-  await agentRuntime.dispose?.();
+  await generalAgent.dispose?.();
 
   const finishedAt = new Date();
   const summary: RunSummary = {
diff --git a/packages/testing-framework/tests/smoke/browser-smoke.mjs b/packages/testing-framework/tests/smoke/browser-smoke.mjs
index 75b4c93b9d..8d23940d4a 100644
--- a/packages/testing-framework/tests/smoke/browser-smoke.mjs
+++ b/packages/testing-framework/tests/smoke/browser-smoke.mjs
@@ -66,7 +66,7 @@ flow:
 
   let sawScreenshot = false;
   let sawConclusion = false;
-  const stubRuntime = {
+  const stubGeneralAgent = {
     run: async (input) => {
       sawScreenshot = Boolean(input.screenshotBase64);
       sawConclusion = input.context.includes('smoke');
@@ -81,7 +81,7 @@ flow:
     parsed,
     file: 'smoke.yaml',
     uiAgent: agent,
-    agentRuntime: stubRuntime,
+    generalAgent: stubGeneralAgent,
     runtimeNodes: {
       prepareCartFixture: defineRuntime(async (ctx) => {
         ctx.state.fixture = { scenario: ctx.input?.scenario };
diff --git a/packages/testing-framework/tests/unit-test/context-and-skills.test.ts b/packages/testing-framework/tests/unit-test/context-and-skills.test.ts
index bc1b1a6f2f..4724a3397f 100644
--- a/packages/testing-framework/tests/unit-test/context-and-skills.test.ts
+++ b/packages/testing-framework/tests/unit-test/context-and-skills.test.ts
@@ -1,7 +1,7 @@
 import { describe, expect, it } from 'vitest';
-import { extractSkillReferences } from '../../src/agent-runtime/skills';
 import { assembleContext } from '../../src/context/assembler';
 import { OutputStoreImpl } from '../../src/engine/output-store';
+import { extractSkillReferences } from '../../src/general-agent/skills';
 import type { StepResult } from '../../src/types';
 
 describe('extractSkillReferences', () => {
diff --git a/packages/testing-framework/tests/unit-test/engine.test.ts b/packages/testing-framework/tests/unit-test/engine.test.ts
index cb4c782d8f..f3f365f27c 100644
--- a/packages/testing-framework/tests/unit-test/engine.test.ts
+++ b/packages/testing-framework/tests/unit-test/engine.test.ts
@@ -1,10 +1,10 @@
 import { describe, expect, it, vi } from 'vitest';
-import type {
-  AgentRunInput,
-  AgentRunResult,
-  AgentRuntimeAdapter,
-} from '../../src/agent-runtime/types';
 import { runCase } from '../../src/engine/run-case';
+import type {
+  GeneralAgentAdapter,
+  GeneralAgentInput,
+  GeneralAgentResult,
+} from '../../src/general-agent/types';
 import { defineRuntime } from '../../src/runtime';
 import type { Agent } from '../../src/types';
 import { parseCaseYaml } from '../../src/yaml/parse';
@@ -22,9 +22,9 @@ function fakeAgent(overrides: Partial<Record<string, unknown>> = {}): Agent {
   return agent as unknown as Agent;
 }
 
-function fakeRuntime(
-  handler: (input: AgentRunInput) => AgentRunResult,
-): AgentRuntimeAdapter {
+function fakeGeneralAgent(
+  handler: (input: GeneralAgentInput) => GeneralAgentResult,
+): GeneralAgentAdapter {
   return { run: async (input) => handler(input) };
 }
 
@@ -42,7 +42,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime(() => ({ text: '' })),
+      generalAgent: fakeGeneralAgent(() => ({ text: '' })),
     });
     expect(result.status).toBe('passed');
     expect(result.steps[0].output?.text).toBe('did the thing');
@@ -55,7 +55,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime(() => ({
+      generalAgent: fakeGeneralAgent(() => ({
         text: 'looks good',
         verdict: { pass: true, reason: 'all good' },
       })),
@@ -72,7 +72,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent,
-      agentRuntime: fakeRuntime(() => ({
+      generalAgent: fakeGeneralAgent(() => ({
         text: 'nope',
         verdict: { pass: false, reason: 'missing' },
       })),
@@ -88,7 +88,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime(() => ({ text: 'I am not sure' })),
+      generalAgent: fakeGeneralAgent(() => ({ text: 'I am not sure' })),
     });
     expect(result.status).toBe('failed');
     expect(result.steps[0].verdict?.pass).toBe(false);
@@ -102,7 +102,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime((input) =>
+      generalAgent: fakeGeneralAgent((input) =>
         input.kind === 'soft'
           ? { text: 'minor', verdict: { pass: false, reason: 'tiny glitch' } }
           : { text: '' },
@@ -120,7 +120,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: {
+      generalAgent: {
         run: async () => {
           throw new Error('boom');
         },
@@ -141,7 +141,7 @@ describe('runCase node semantics', () => {
           throw new Error('click failed');
         }),
       }),
-      agentRuntime: fakeRuntime(() => ({ text: '' })),
+      generalAgent: fakeGeneralAgent(() => ({ text: '' })),
     });
     expect(result.status).toBe('failed');
     expect(result.steps[0].error).toMatch(/click failed/);
@@ -165,7 +165,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime((input) => {
+      generalAgent: fakeGeneralAgent((input) => {
         seen.push(input.context);
         return { text: 'ok', verdict: { pass: true, reason: 'fine' } };
       }),
@@ -184,7 +184,7 @@ describe('runCase node semantics', () => {
       parsed,
       file: 'c.yaml',
       uiAgent: fakeAgent(),
-      agentRuntime: fakeRuntime(() => ({ text: '' })),
+      generalAgent: fakeGeneralAgent(() => ({ text: '' })),
     });
     expect(result.status).toBe('failed');
     expect(result.steps[0].error).toMatch(/Unknown node/);
diff --git a/packages/testing-framework/tests/unit-test/runner-smoke.test.ts b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
index d704b41d7e..d6118daadd 100644
--- a/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
+++ b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
@@ -3,11 +3,11 @@ import { tmpdir } from 'node:os';
 import { dirname, join } from 'node:path';
 import { fileURLToPath } from 'node:url';
 import { describe, expect, it, vi } from 'vitest';
-import type {
-  AgentRunInput,
-  AgentRuntimeAdapter,
-} from '../../src/agent-runtime/types';
 import type { MidsceneConfig } from '../../src/config/types';
+import type {
+  GeneralAgentAdapter,
+  GeneralAgentInput,
+} from '../../src/general-agent/types';
 import { runAll } from '../../src/runner/run';
 import { defineRuntime } from '../../src/runtime';
 import type { Agent, RunSummary } from '../../src/types';
@@ -37,9 +37,9 @@ function fakeUiAgent(): Agent {
 
 describe('runner mock-model smoke (example cases)', () => {
   it('runs the example suite green with a mocked browser + model', async () => {
-    const seenVerify: AgentRunInput[] = [];
+    const seenVerify: GeneralAgentInput[] = [];
 
-    const mockRuntime: AgentRuntimeAdapter = {
+    const mockGeneralAgent: GeneralAgentAdapter = {
       run: async (input) => {
         if (input.kind === 'verify' || input.kind === 'soft') {
           seenVerify.push(input);
@@ -63,7 +63,7 @@ describe('runner mock-model smoke (example cases)', () => {
       include: ['**/*.yaml'],
       exclude: ['**/*.draft.yaml'],
       output: { summary: summaryPath },
-      agentRuntime: mockRuntime,
+      generalAgent: mockGeneralAgent,
       runtime: {
         prepareCartFixture: defineRuntime(async (ctx) => {
           const input = (ctx.input ?? {}) as { scenario?: string };
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
index 650c91d60a..910b80291d 100644
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -6,7 +6,7 @@
 
 > 本稿目标：把"动手前必须先定的接口"钉死成可评审的草案。每节末尾的 **🔶 待讨论** 是我留的开放决策点。
 
-> **实现状态（Phase 0）**：本稿契约已落地为新包 `@midscene/testing-framework`（`packages/testing-framework`），含 `defineMidsceneConfig` / `defineRuntime`、v2 YAML 解析、节点引擎（`ui`/`verify`/`soft`/`agent`/自定义）、上下文装配、verify fail-closed 判定、默认 Pi agent 运行时（已解决 C′），以及一个轻量 runner 与 CLI（`midscene-tf`）。可 copy 演示的样例在仓库根 `example/`。唯一开放项 C′ 已落实（见 §4.1）。
+> **实现状态（Phase 0）**：本稿契约已落地为新包 `@midscene/testing-framework`（`packages/testing-framework`），含 `defineMidsceneConfig` / `defineRuntime`、v2 YAML 解析、节点引擎（`ui`/`verify`/`soft`/`agent`/自定义）、上下文装配、verify fail-closed 判定、默认 Pi 通用 agent（已解决 C′），以及一个轻量 runner 与 CLI（`midscene-tf`）。可 copy 演示的样例在仓库根 `example/`。唯一开放项 C′ 已落实（见 §4.1）。
 
 ---
 
@@ -107,7 +107,7 @@ export default defineMidsceneConfig({
 
   // —— 扩展点 ——
   runtime?: Record<string, RuntimeNode>;         // 自定义 YAML 节点（§3）
-  agentRuntime?: AgentRuntimeAdapter;            // Pi 的替换点（§6）
+  generalAgent?: GeneralAgentAdapter;            // Pi 的替换点（§6）
 });
 ```
 
@@ -315,7 +315,7 @@ const model = registry.find('midscene', process.env.MIDSCENE_MODEL_NAME);
 const { session } = await createAgentSession({ model, modelRegistry: registry, authStorage, ... });
 ```
 
-这样 `verify`/`agent`（Pi）与 `ui`（Midscene UI Agent）走**同一个 `MIDSCENE_MODEL_BASE_URL` 端点**，零 Pi 改动。实现见 `@midscene/testing-framework` 的 `PiAgentRuntime`（`src/agent-runtime/pi-runtime.ts`），并有 `tests/smoke/pi-wiring.mjs` 验证（provider 注册 / apiKey 解析 / session 选模型 / `report_verdict` customTool 激活）均通过。
+这样 `verify`/`agent`（Pi）与 `ui`（Midscene UI Agent）走**同一个 `MIDSCENE_MODEL_BASE_URL` 端点**，零 Pi 改动。实现见 `@midscene/testing-framework` 的 `PiGeneralAgent`（`src/general-agent/pi-general-agent.ts`），并有 `tests/smoke/pi-wiring.mjs` 验证（provider 注册 / apiKey 解析 / session 选模型 / `report_verdict` customTool 激活）均通过。
 
 > 注：`MIDSCENE_MODEL_EXTRA_BODY_JSON`（如 `{"service_tier":"fast"}`）只对 `ui` 节点的 Midscene UI Agent 生效；Phase 0 未把它透传给 Pi 节点（属性能优化、非正确性，后续可经 stream `onPayload` 接入）。
 
@@ -455,7 +455,7 @@ flow:
 
 | # | 事项 | 状态 |
 |---|---|---|
-| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | ✅ **已落实**：经 `ModelRegistry.registerProvider({ baseUrl, apiKey, models })` 实现，见 §4.1 与 `PiAgentRuntime` |
+| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | ✅ **已落实**：经 `ModelRegistry.registerProvider({ baseUrl, apiKey, models })` 实现，见 §4.1 与 `PiGeneralAgent` |
 
 （无剩余待对接项。）
 
@@ -463,7 +463,7 @@ flow:
 
 ## 附：Phase 0 之后（不在本稿讨论范围，仅备忘）
 
-- Pi `AgentRuntimeAdapter` 的最小接口（让 Codex Agent SDK 等可替换）。
+- Pi `GeneralAgentAdapter` 的最小接口（让 Codex Agent SDK 等可替换）。
 - Rstest 接线：用例 → 虚拟测试模块 → 生命周期/fixture 映射。
 - 报告：复用 core `ReportGenerator`，把 verify verdict / agent 诊断如何呈现。
 - v1→v2 转译器（可选、外挂）。

From b6e42c96fd7d07c4cca1e6d65c8ee598a7842e60 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 4 Jun 2026 15:45:34 -0700
Subject: [PATCH 28/33] feat(testing-framework): warn instead of throw on
 missing uiAgent

A config without `uiAgent` is recoverable for flows that only use custom
runtime nodes, so `defineMidsceneConfig` now logs a warning (via
getDebug console channel) rather than throwing during config load.

The UI Agent factory gains a clear guard so a case that actually needs
the UI Agent still fails with an actionable message instead of a cryptic
"cannot read 'type' of undefined" crash.
---
 .../testing-framework/src/config/index.ts     |  9 +++++++--
 .../testing-framework/src/ui-agent/factory.ts | 10 +++++++++-
 .../tests/unit-test/config.test.ts            | 19 ++++++++++++++-----
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/packages/testing-framework/src/config/index.ts b/packages/testing-framework/src/config/index.ts
index 8b33309714..2c8a6656ca 100644
--- a/packages/testing-framework/src/config/index.ts
+++ b/packages/testing-framework/src/config/index.ts
@@ -1,5 +1,8 @@
+import { getDebug } from '@midscene/shared/logger';
 import type { MidsceneConfig } from './types';
 
+const warn = getDebug('testing-framework:config', { console: true });
+
 /**
  * Identity helper for `midscene.config.ts`, giving full type inference and a
  * stable import surface (RFC §2).
@@ -9,8 +12,10 @@ export function defineMidsceneConfig(config: MidsceneConfig): MidsceneConfig {
     throw new Error('[midscene] defineMidsceneConfig expects a config object.');
   }
   if (!config.uiAgent) {
-    throw new Error(
-      '[midscene] midscene.config.ts must define a `uiAgent` (object or factory function).',
+    // A missing uiAgent is recoverable for some flows (e.g. cases that only use
+    // custom runtime nodes), so warn instead of failing the whole config load.
+    warn(
+      'midscene.config.ts does not define a `uiAgent` (object or factory function); ui/verify/soft/agent nodes will have no UI Agent to run against.',
     );
   }
   if (!config.testDir) {
diff --git a/packages/testing-framework/src/ui-agent/factory.ts b/packages/testing-framework/src/ui-agent/factory.ts
index 15020eca41..575408915a 100644
--- a/packages/testing-framework/src/ui-agent/factory.ts
+++ b/packages/testing-framework/src/ui-agent/factory.ts
@@ -14,10 +14,18 @@ export interface ResolvedUIAgent {
 }
 
 export async function createUIAgent(
-  uiAgent: UIAgent,
+  uiAgent: UIAgent | undefined,
   uiAgentOptions: UIAgentOptions | undefined,
   env: NodeJS.ProcessEnv,
 ): Promise<ResolvedUIAgent> {
+  if (!uiAgent) {
+    // `defineMidsceneConfig` only warns about a missing `uiAgent` (some flows
+    // use custom runtime nodes only). Once a case actually needs the UI Agent,
+    // fail with a clear, actionable message rather than a cryptic crash.
+    throw new Error(
+      '[midscene] This case needs a UI Agent, but `uiAgent` is not configured in midscene.config.ts. Add a `uiAgent` object or factory function.',
+    );
+  }
   if (typeof uiAgent === 'function') {
     // Programmatic factory: the project fully controls construction.
     const result = await uiAgent({ uiAgentOptions, env });
diff --git a/packages/testing-framework/tests/unit-test/config.test.ts b/packages/testing-framework/tests/unit-test/config.test.ts
index 046907629d..d9ff8fec0e 100644
--- a/packages/testing-framework/tests/unit-test/config.test.ts
+++ b/packages/testing-framework/tests/unit-test/config.test.ts
@@ -1,8 +1,12 @@
-import { describe, expect, it } from 'vitest';
+import { afterEach, describe, expect, it, vi } from 'vitest';
 import { defineMidsceneConfig } from '../../src/config';
 import { defineRuntime } from '../../src/runtime';
 
 describe('defineMidsceneConfig', () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
   it('accepts a config-style uiAgent object', () => {
     const config = defineMidsceneConfig({
       uiAgent: { type: 'web', options: { url: 'https://x.test' } },
@@ -19,11 +23,16 @@ describe('defineMidsceneConfig', () => {
     expect(typeof config.uiAgent).toBe('function');
   });
 
-  it('throws without uiAgent', () => {
-    expect(() =>
+  it('warns but does not throw without uiAgent', () => {
+    const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
+    const config =
       // @ts-expect-error intentionally missing
-      defineMidsceneConfig({ testDir: './e2e' }),
-    ).toThrow(/uiAgent/);
+      defineMidsceneConfig({ testDir: './e2e' });
+    expect(config.uiAgent).toBeUndefined();
+    expect(warnSpy).toHaveBeenCalledWith(
+      '[Midscene]',
+      expect.stringMatching(/uiAgent/),
+    );
   });
 
   it('throws without testDir', () => {

From 020bc0f3d39fc278c1c7620826517ba84a0c704b Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Thu, 4 Jun 2026 15:54:13 -0700
Subject: [PATCH 29/33] refactor(testing-framework): split RuntimeNode into
 (input, context)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A runtime node's own YAML value (`input`) was crammed into
`RuntimeNodeContext` alongside the ambient execution context (uiAgent,
outputs, state, result, env). Pull it out as a dedicated first
positional argument so the handler signature reads
`(input, context) => ...` — "what this node was invoked with" vs "what's
around it".

- RuntimeNode: `(ctx)` -> `(input, context)`
- RuntimeNodeContext: drop the `input` field
- update engine call site, unit/smoke tests, example config, RFC §3
---
 example/midscene.config.ts                             |  6 +++---
 packages/testing-framework/src/engine/run-node.ts      |  3 +--
 packages/testing-framework/src/runtime.ts              | 10 ++++++----
 .../testing-framework/tests/smoke/browser-smoke.mjs    |  6 +++---
 packages/testing-framework/tests/smoke/model-smoke.mjs |  8 ++++----
 .../testing-framework/tests/unit-test/engine.test.ts   |  4 ++--
 .../tests/unit-test/runner-smoke.test.ts               |  6 +++---
 rfcs/0001-v2-testing-framework-phase0.md               |  7 +++++--
 8 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/example/midscene.config.ts b/example/midscene.config.ts
index f610e01a75..2393ad1f7e 100644
--- a/example/midscene.config.ts
+++ b/example/midscene.config.ts
@@ -48,8 +48,8 @@ export default defineMidsceneConfig({
   runtime: {
     // A fixture-prep node: writes engineering state (not visible to the agent)
     // and a natural-language conclusion (visible to later verify/agent).
-    prepareCartFixture: defineRuntime(async (ctx) => {
-      const input = (ctx.input ?? {}) as { scenario?: string };
+    prepareCartFixture: defineRuntime(async (rawInput, ctx) => {
+      const input = (rawInput ?? {}) as { scenario?: string };
       const scenario = input.scenario ?? 'default';
       ctx.state.cartFixture = { id: `cart-${Date.now()}`, scenario };
 
@@ -60,7 +60,7 @@ export default defineMidsceneConfig({
     }),
 
     // A side-effect node that reads the accumulated case result.
-    notify: defineRuntime(async (ctx) => {
+    notify: defineRuntime(async (_input, ctx) => {
       const failed = ctx.result.steps.filter((s) => s.status === 'failed');
       return {
         conclusion:
diff --git a/packages/testing-framework/src/engine/run-node.ts b/packages/testing-framework/src/engine/run-node.ts
index 1dfe2cf4d2..659e1004d2 100644
--- a/packages/testing-framework/src/engine/run-node.ts
+++ b/packages/testing-framework/src/engine/run-node.ts
@@ -164,7 +164,6 @@ async function runCustomNode(
   }
 
   const ctx: RuntimeNodeContext = {
-    input,
     uiAgent: deps.uiAgent,
     outputs: deps.outputs,
     state: deps.state,
@@ -177,7 +176,7 @@ async function runCustomNode(
   };
 
   // A runtime node that throws fails the case (RFC §8).
-  const result = await runtimeNode(ctx);
+  const result = await runtimeNode(input, ctx);
   return {
     status: 'info',
     output: { text: result.conclusion, structured: result.output },
diff --git a/packages/testing-framework/src/runtime.ts b/packages/testing-framework/src/runtime.ts
index 259f6abf3b..b09176d2f0 100644
--- a/packages/testing-framework/src/runtime.ts
+++ b/packages/testing-framework/src/runtime.ts
@@ -1,7 +1,9 @@
 /**
  * `defineRuntime` — custom YAML nodes (RFC §3).
  *
- * A runtime node owns a whole step's execution. It has two channels:
+ * A runtime node owns a whole step's execution. Its handler receives two
+ * arguments: `input` (this node's own YAML value) and `context` (the ambient
+ * execution context). It has two output channels:
  *  - `conclusion` (+ optional `output`): context-facing, flows into later
  *    verify/agent nodes.
  *  - `state`: engineering-facing TypeScript state shared between runtime
@@ -10,8 +12,6 @@
 import type { Agent, OutputStore, TestResultSoFar } from './types';
 
 export interface RuntimeNodeContext {
-  /** This node's YAML value (string or object). */
-  input: unknown;
   /** The UI Agent — runtime nodes may also drive the page. */
   uiAgent: Agent;
   /** All past context-facing outputs (read-only). */
@@ -35,7 +35,9 @@ export interface RuntimeNodeResult {
 }
 
 export type RuntimeNode = (
-  ctx: RuntimeNodeContext,
+  /** This node's YAML value (string or object). */
+  input: unknown,
+  context: RuntimeNodeContext,
 ) => Promise<RuntimeNodeResult>;
 
 /**
diff --git a/packages/testing-framework/tests/smoke/browser-smoke.mjs b/packages/testing-framework/tests/smoke/browser-smoke.mjs
index 8d23940d4a..830551165b 100644
--- a/packages/testing-framework/tests/smoke/browser-smoke.mjs
+++ b/packages/testing-framework/tests/smoke/browser-smoke.mjs
@@ -83,9 +83,9 @@ flow:
     uiAgent: agent,
     generalAgent: stubGeneralAgent,
     runtimeNodes: {
-      prepareCartFixture: defineRuntime(async (ctx) => {
-        ctx.state.fixture = { scenario: ctx.input?.scenario };
-        return { conclusion: `prepared ${ctx.input?.scenario} fixture` };
+      prepareCartFixture: defineRuntime(async (input, ctx) => {
+        ctx.state.fixture = { scenario: input?.scenario };
+        return { conclusion: `prepared ${input?.scenario} fixture` };
       }),
     },
     projectRoot: repoRoot,
diff --git a/packages/testing-framework/tests/smoke/model-smoke.mjs b/packages/testing-framework/tests/smoke/model-smoke.mjs
index b7ecf8aaa5..1e10798417 100644
--- a/packages/testing-framework/tests/smoke/model-smoke.mjs
+++ b/packages/testing-framework/tests/smoke/model-smoke.mjs
@@ -46,13 +46,13 @@ const summary = await runAll(
       generateReport: true,
     },
     runtime: {
-      prepareCartFixture: async (ctx) => {
-        ctx.state.cartFixture = { scenario: ctx.input?.scenario };
+      prepareCartFixture: async (input, ctx) => {
+        ctx.state.cartFixture = { scenario: input?.scenario };
         return {
-          conclusion: `Prepared a "${ctx.input?.scenario}" cart fixture.`,
+          conclusion: `Prepared a "${input?.scenario}" cart fixture.`,
         };
       },
-      notify: async (ctx) => {
+      notify: async (_input, ctx) => {
         const failed = ctx.result.steps.filter((s) => s.status === 'failed');
         return {
           conclusion:
diff --git a/packages/testing-framework/tests/unit-test/engine.test.ts b/packages/testing-framework/tests/unit-test/engine.test.ts
index f3f365f27c..d2371937e3 100644
--- a/packages/testing-framework/tests/unit-test/engine.test.ts
+++ b/packages/testing-framework/tests/unit-test/engine.test.ts
@@ -155,10 +155,10 @@ describe('runCase node semantics', () => {
     const result = await runCase({
       ...base,
       runtimeNodes: {
-        prep: defineRuntime(async (ctx) => {
+        prep: defineRuntime(async (input, ctx) => {
           ctx.state.fixtureId = 'fx-1';
           return {
-            conclusion: `prepared ${(ctx.input as { scenario: string }).scenario}`,
+            conclusion: `prepared ${(input as { scenario: string }).scenario}`,
           };
         }),
       },
diff --git a/packages/testing-framework/tests/unit-test/runner-smoke.test.ts b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
index d6118daadd..54cbe32286 100644
--- a/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
+++ b/packages/testing-framework/tests/unit-test/runner-smoke.test.ts
@@ -65,12 +65,12 @@ describe('runner mock-model smoke (example cases)', () => {
       output: { summary: summaryPath },
       generalAgent: mockGeneralAgent,
       runtime: {
-        prepareCartFixture: defineRuntime(async (ctx) => {
-          const input = (ctx.input ?? {}) as { scenario?: string };
+        prepareCartFixture: defineRuntime(async (rawInput, ctx) => {
+          const input = (rawInput ?? {}) as { scenario?: string };
           ctx.state.cartFixture = { scenario: input.scenario };
           return { conclusion: `Prepared a "${input.scenario}" cart fixture.` };
         }),
-        notify: defineRuntime(async (ctx) => {
+        notify: defineRuntime(async (_input, ctx) => {
           const failed = ctx.result.steps.filter((s) => s.status === 'failed');
           return {
             conclusion: failed.length === 0 ? 'no alert needed' : 'would alert',
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
index 910b80291d..f6330b5a1e 100644
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -215,10 +215,12 @@ flow:
 ## 3. `defineRuntime` —— 自定义节点（更底层扩展）
 
 ```ts
-type RuntimeNode = (ctx: RuntimeNodeContext) => Promise<RuntimeNodeResult>;
+type RuntimeNode = (
+  input: unknown,                 // 该节点的 YAML 值（字符串或 object）
+  context: RuntimeNodeContext,
+) => Promise<RuntimeNodeResult>;
 
 interface RuntimeNodeContext {
-  input: unknown;                 // 该节点的 YAML 值（字符串或 object）
   uiAgent: Agent;                 // UI Agent，runtime 也可驱动页面
   outputs: OutputStore;           // 所有过往"面向上下文的输出"（只读）
   state: Record<string, unknown>; // ★ TS 侧状态，agent 看不到（见 §7）
@@ -235,6 +237,7 @@ function defineRuntime(node: RuntimeNode): RuntimeNode;
 ```
 
 要点：
+- 节点入参拆成两个：`input`（这个节点自己的 YAML 值）+ `context`（环境上下文）。
 - `conclusion`（和可选 `output`）= **面向上下文信道**，进后续 `verify` / `agent`。
 - `state` = **面向工程信道**，runtime 节点之间传结构化数据，**不进 agent 上下文**。
 - runtime 抛错 → 该 case 失败。

From a2cf59fb17e9b8b8917b57a785560399da925448 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 5 Jun 2026 00:13:21 -0700
Subject: [PATCH 30/33] feat(core): expose canonical per-platform connection
 option types

Add `WebConnectionOpt` / `AndroidConnectionOpt` / `IOSConnectionOpt` /
`ComputerConnectionOpt` (and `HarmonyConnectionOpt`): the pure "how to
reach the target" shapes, derived from the `MidsceneYamlScript*Env`
types with the YAML run config and agent-behavior options stripped out.
They stay in sync with the env types automatically (derived via Omit) and
give consumers a connection-only contract without the YAML/agent-opt
baggage the env type names imply.

Also export `MidsceneYamlScriptComputerEnv`, which was missing from the
public surface while its web/android/ios siblings were already exported.
---
 packages/core/src/index.ts |  5 +++++
 packages/core/src/yaml.ts  | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts
index ff12df9d4f..2cc04c7703 100644
--- a/packages/core/src/index.ts
+++ b/packages/core/src/index.ts
@@ -52,7 +52,12 @@ export type {
   MidsceneYamlScriptWebEnv,
   MidsceneYamlScriptAndroidEnv,
   MidsceneYamlScriptIOSEnv,
+  MidsceneYamlScriptComputerEnv,
   MidsceneYamlScriptEnv,
+  WebConnectionOpt,
+  AndroidConnectionOpt,
+  IOSConnectionOpt,
+  ComputerConnectionOpt,
   LocateOption,
   DetailedLocateParam,
 } from './yaml';
diff --git a/packages/core/src/yaml.ts b/packages/core/src/yaml.ts
index b12550457b..8168eb1041 100644
--- a/packages/core/src/yaml.ts
+++ b/packages/core/src/yaml.ts
@@ -227,6 +227,42 @@ export type MidsceneYamlScriptEnv =
   | MidsceneYamlScriptHarmonyEnv
   | MidsceneYamlScriptComputerEnv;
 
+/**
+ * Canonical per-platform connection / launch target options.
+ *
+ * These are the pure "how to reach the target" types: the same fields the
+ * `MidsceneYamlScript*Env` types carry, but with the YAML run config
+ * (`MidsceneYamlScriptConfig`: output, unstableLogContent) and the agent
+ * behavior options (`MidsceneYamlScriptAgentOpt`: generateReport, cache, ...)
+ * stripped out. Use these when you only need to describe the connection target
+ * and want agent behavior expressed separately (e.g. via `AgentOpt`). They are
+ * derived from the env types, so they stay in sync automatically.
+ */
+export type WebConnectionOpt = Omit<
+  MidsceneYamlScriptWebEnv,
+  keyof MidsceneYamlScriptConfig | keyof MidsceneYamlScriptAgentOpt
+>;
+
+export type AndroidConnectionOpt = Omit<
+  MidsceneYamlScriptAndroidEnv,
+  keyof MidsceneYamlScriptConfig
+>;
+
+export type IOSConnectionOpt = Omit<
+  MidsceneYamlScriptIOSEnv,
+  keyof MidsceneYamlScriptConfig
+>;
+
+export type HarmonyConnectionOpt = Omit<
+  MidsceneYamlScriptHarmonyEnv,
+  keyof MidsceneYamlScriptConfig
+>;
+
+export type ComputerConnectionOpt = Omit<
+  MidsceneYamlScriptComputerEnv,
+  keyof MidsceneYamlScriptConfig
+>;
+
 export interface MidsceneYamlFlowItemAIAction {
   // defined as aiAction for backward compatibility
   aiAction?: string;

From ec331b356471dd8349d0ce668037f0efb975e38c Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 5 Jun 2026 00:13:36 -0700
Subject: [PATCH 31/33] refactor(testing-framework): type uiAgent options via
 core connection types
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

`UIAgentConfig.options` was a hand-rolled `Record<string, unknown>`,
disconnected from the real agent launcher inputs and forcing
`as unknown as Parameters<...>` casts in the factory.

Make `UIAgentConfig` a discriminated union keyed by `type`, with each
variant's `options` typed against the canonical connection types from
`@midscene/core` (`WebConnectionOpt` / `AndroidConnectionOpt` / ...).
`UIAgentType` is derived from the union; web's `url` is required.

Also drop the now-unnecessary `agent as unknown as Agent` cast in the web
factory: `PuppeteerAgent` is `@midscene/core`'s `Agent<PuppeteerWebPage>`,
so once options are typed it is directly assignable to `Agent` — the cast
was only needed because the prior `Record` casts poisoned the return type.

Updates config test and RFC §2/§2.1.

Validation: nx build core, testing-framework build + tsc + 33 unit tests,
pnpm lint.
---
 .../testing-framework/src/config/types.ts     | 32 +++++++++++++------
 .../testing-framework/src/ui-agent/factory.ts | 29 ++++++++---------
 .../tests/unit-test/config.test.ts            |  6 ++--
 rfcs/0001-v2-testing-framework-phase0.md      | 11 +++++--
 4 files changed, 49 insertions(+), 29 deletions(-)

diff --git a/packages/testing-framework/src/config/types.ts b/packages/testing-framework/src/config/types.ts
index 317966e05e..d86db8ba37 100644
--- a/packages/testing-framework/src/config/types.ts
+++ b/packages/testing-framework/src/config/types.ts
@@ -2,23 +2,37 @@
  * `midscene.config.ts` schema (RFC §2). Environment / target lives here, never
  * in the case YAML.
  */
+import type {
+  AndroidConnectionOpt,
+  ComputerConnectionOpt,
+  IOSConnectionOpt,
+  WebConnectionOpt,
+} from '@midscene/core';
 import type { Agent } from '@midscene/core/agent';
 import type { AgentOpt } from '@midscene/core/agent';
 import type { GeneralAgentAdapter } from '../general-agent/types';
 import type { RuntimeNode } from '../runtime';
 
-/** Platforms the framework can build a UI Agent for out of the box. */
-export type UIAgentType = 'web' | 'android' | 'ios' | 'computer';
-
 /** Shared UI Agent behavior parameters (aiActContext, generateReport, ...). */
 export type UIAgentOptions = AgentOpt;
 
-/** Configuration-style UI Agent: framework builds it from `type` + `options`. */
-export interface UIAgentConfig {
-  type: UIAgentType;
-  /** Platform connection parameters (url, deviceId, ...). */
-  options?: Record<string, unknown>;
-}
+/**
+ * Configuration-style UI Agent: the framework builds the agent from `type` +
+ * `options`. `options` is the platform connection target, typed against the
+ * canonical per-platform connection types from `@midscene/core`
+ * (`WebConnectionOpt` / `AndroidConnectionOpt` / ...). Those are the pure
+ * "how to reach the target" shapes the agent launchers consume — agent
+ * behavior is expressed separately via `uiAgentOptions`. Keeping `options`
+ * bound to core means it can never drift from the launcher inputs.
+ */
+export type UIAgentConfig =
+  | { type: 'web'; options: WebConnectionOpt }
+  | { type: 'android'; options?: AndroidConnectionOpt }
+  | { type: 'ios'; options?: IOSConnectionOpt }
+  | { type: 'computer'; options?: ComputerConnectionOpt };
+
+/** Platforms the framework can build a UI Agent for out of the box. */
+export type UIAgentType = UIAgentConfig['type'];
 
 /** Context passed to a programmatic UI Agent factory. */
 export interface UIAgentFactoryCtx {
diff --git a/packages/testing-framework/src/ui-agent/factory.ts b/packages/testing-framework/src/ui-agent/factory.ts
index 575408915a..5629026ef7 100644
--- a/packages/testing-framework/src/ui-agent/factory.ts
+++ b/packages/testing-framework/src/ui-agent/factory.ts
@@ -5,6 +5,7 @@
  * (programmatic). This module resolves both into a live Midscene UI Agent plus
  * an optional cleanup hook.
  */
+import type { AndroidConnectionOpt, WebConnectionOpt } from '@midscene/core';
 import type { Agent } from '@midscene/core/agent';
 import type { UIAgent, UIAgentConfig, UIAgentOptions } from '../config/types';
 
@@ -46,9 +47,9 @@ async function createFromConfig(
 ): Promise<ResolvedUIAgent> {
   switch (config.type) {
     case 'web':
-      return createWebAgent(config, uiAgentOptions);
+      return createWebAgent(config.options, uiAgentOptions);
     case 'android':
-      return createAndroidAgent(config, uiAgentOptions);
+      return createAndroidAgent(config.options, uiAgentOptions);
     case 'ios':
     case 'computer':
       throw new Error(
@@ -62,11 +63,10 @@ async function createFromConfig(
 }
 
 async function createWebAgent(
-  config: UIAgentConfig,
+  options: WebConnectionOpt,
   uiAgentOptions: UIAgentOptions | undefined,
 ): Promise<ResolvedUIAgent> {
-  const options = (config.options ?? {}) as Record<string, unknown>;
-  if (!options.url) {
+  if (!options?.url) {
     throw new Error('[midscene] uiAgent.type "web" requires `options.url`.');
   }
 
@@ -80,14 +80,12 @@ async function createWebAgent(
   }
 
   const { agent, freeFn } = await mod.puppeteerAgentForTarget(
-    options as unknown as Parameters<typeof mod.puppeteerAgentForTarget>[0],
-    uiAgentOptions as unknown as Parameters<
-      typeof mod.puppeteerAgentForTarget
-    >[1],
+    options,
+    uiAgentOptions,
   );
 
   return {
-    agent: agent as unknown as Agent,
+    agent,
     cleanup: async () => {
       for (const free of freeFn) {
         try {
@@ -101,10 +99,10 @@ async function createWebAgent(
 }
 
 async function createAndroidAgent(
-  config: UIAgentConfig,
+  options: AndroidConnectionOpt | undefined,
   uiAgentOptions: UIAgentOptions | undefined,
 ): Promise<ResolvedUIAgent> {
-  const options = (config.options ?? {}) as Record<string, unknown>;
+  const env = options ?? {};
   // `@midscene/android` is an optional peer; load it loosely so the framework
   // does not hard-depend on it.
   const spec = '@midscene/android';
@@ -122,10 +120,9 @@ async function createAndroidAgent(
     );
   }
 
-  const deviceId = options.deviceId as string | undefined;
-  const agent = await mod.agentFromAdbDevice(deviceId, {
-    ...(uiAgentOptions as object),
-    ...options,
+  const agent = await mod.agentFromAdbDevice(env.deviceId, {
+    ...uiAgentOptions,
+    ...env,
   });
 
   return {
diff --git a/packages/testing-framework/tests/unit-test/config.test.ts b/packages/testing-framework/tests/unit-test/config.test.ts
index d9ff8fec0e..ae771f5696 100644
--- a/packages/testing-framework/tests/unit-test/config.test.ts
+++ b/packages/testing-framework/tests/unit-test/config.test.ts
@@ -37,8 +37,10 @@ describe('defineMidsceneConfig', () => {
 
   it('throws without testDir', () => {
     expect(() =>
-      // @ts-expect-error intentionally missing
-      defineMidsceneConfig({ uiAgent: { type: 'web' } }),
+      // @ts-expect-error intentionally missing testDir
+      defineMidsceneConfig({
+        uiAgent: { type: 'web', options: { url: 'https://x.test' } },
+      }),
     ).toThrow(/testDir/);
   });
 });
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
index f6330b5a1e..cf1ccbcc8b 100644
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -79,8 +79,15 @@ import { defineMidsceneConfig } from '@midscene/testing-framework';
 
 export default defineMidsceneConfig({
   // —— 运行目标：单字段 uiAgent，容纳配置式与编程式（见 §2.1）——
+  // 配置式按 type 判别，options 直接复用 @midscene/core 的逐平台「连接类型」
+  // （WebConnectionOpt / AndroidConnectionOpt / IOSConnectionOpt /
+  // ComputerConnectionOpt），即从 env 类型里剥掉 agent 行为与 yaml 配置后的
+  // 纯连接契约，与 agent launcher 入参同源，不再是手写的 Record。
   uiAgent:
-    | { type: 'web' | 'android' | 'ios' | 'computer'; options: Record<string, unknown> }
+    | { type: 'web'; options: WebConnectionOpt }
+    | { type: 'android'; options?: AndroidConnectionOpt }
+    | { type: 'ios'; options?: IOSConnectionOpt }
+    | { type: 'computer'; options?: ComputerConnectionOpt }
     | ((ctx: UIAgentFactoryCtx) => Promise<{ agent: Agent }>);
 
   // —— 用例发现 ——
@@ -120,7 +127,7 @@ export default defineMidsceneConfig({
 - 值是**对象** → 配置式：框架据 `type + options` 创建 UI Agent。
 - 值是**函数** → 编程式：项目完全掌控构造。
 
-两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。
+两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。`options` 不是手写的 `Record`，而是按 `type` 判别后落到 `@midscene/core` 暴露的逐平台「连接类型」`WebConnectionOpt` / `AndroidConnectionOpt` / … 上——这些是从对应 env 类型派生、剥掉 agent 行为与 yaml 配置后的纯连接契约（web 的 `url` 因此是必填）。改 core 类型这里会立即感知，且不再把 agent-opt / output 等无关字段混进连接参数。
 
 **配置式样例：**
 

From 410c259aa465e1ba938d650842fa9c51477c40e9 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 5 Jun 2026 00:30:15 -0700
Subject: [PATCH 32/33] refactor(core): make connection options first-class,
 env composed from them
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previous commit derived the `*ConnectionOpt` types from the
`MidsceneYamlScript*Env` types via `Omit<...>`, which framed the YAML env
as the source of truth and the connection options as a byproduct.

Invert that: the `*ConnectionOpt` types are now first-class interfaces in
a dedicated `connection-options.ts` module (web fields + JSDoc moved
there; native ones extend `Omit<*DeviceOpt, 'customActions'>`). The
`MidsceneYamlScript*Env` types are composed FROM them via `extends`
(env = connection + YAML run config + agent behavior for web).

Layering now reads: AndroidDeviceOpt (driver) ⊂ AndroidConnectionOpt
(connection) ⊂ MidsceneYamlScriptAndroidEnv (YAML flavor). Env shapes are
structurally identical — no behavior change, no impact on CLI /
ScriptPlayer / web-integration.

Validation: nx build core, nx test core (only pre-existing auto-glm prompt
snapshot failures, unrelated), testing-framework tsc + 33 unit tests,
pnpm lint.
---
 packages/core/src/connection-options.ts  | 108 +++++++++++++++++
 packages/core/src/index.ts               |  10 +-
 packages/core/src/yaml.ts                | 145 ++++-------------------
 rfcs/0001-v2-testing-framework-phase0.md |   7 +-
 4 files changed, 139 insertions(+), 131 deletions(-)
 create mode 100644 packages/core/src/connection-options.ts

diff --git a/packages/core/src/connection-options.ts b/packages/core/src/connection-options.ts
new file mode 100644
index 0000000000..a872755b13
--- /dev/null
+++ b/packages/core/src/connection-options.ts
@@ -0,0 +1,108 @@
+/**
+ * Canonical per-platform connection / launch target options.
+ *
+ * These are the first-class "how to reach the target" types. They describe the
+ * connection only — agent behavior (`AgentOpt`) and YAML run config
+ * (`MidsceneYamlScriptConfig`) are expressed separately. The
+ * `MidsceneYamlScript*Env` types in `./yaml` are composed FROM these (env =
+ * connection + run config + agent behavior), so the connection options are the
+ * source of truth, not a byproduct of the YAML schema.
+ */
+import type {
+  AndroidDeviceOpt,
+  HarmonyDeviceOpt,
+  IOSDeviceOpt,
+} from './device';
+
+/** How to reach / launch a web target. */
+export interface WebConnectionOpt {
+  // for web only
+  serve?: string;
+  url: string;
+
+  // puppeteer only
+  userAgent?: string;
+  acceptInsecureCerts?: boolean;
+  viewportWidth?: number;
+  viewportHeight?: number;
+  deviceScaleFactor?: number;
+  waitForNetworkIdle?: {
+    timeout?: number;
+    continueOnNetworkIdleError?: boolean; // should continue if failed to wait for network idle, true for default
+  };
+  cookie?: string;
+  forceSameTabNavigation?: boolean; // if track the newly opened tab, true for default in yaml script
+
+  /**
+   * Custom Chrome launch arguments (Puppeteer only, not supported in bridge mode).
+   *
+   * Allows passing custom command-line arguments to Chrome/Chromium when launching the browser.
+   * This is useful for testing scenarios that require specific browser configurations.
+   *
+   * ⚠️ Security Warning: Some arguments (e.g., --no-sandbox, --disable-web-security) may
+   * reduce browser security. Use only in controlled testing environments.
+   *
+   * @example
+   * ```yaml
+   * web:
+   *   url: https://example.com
+   *   chromeArgs:
+   *     - '--disable-features=ThirdPartyCookiePhaseout'
+   *     - '--disable-features=SameSiteByDefaultCookies'
+   *     - '--window-size=1920,1080'
+   * ```
+   */
+  chromeArgs?: string[];
+
+  // bridge mode config
+  bridgeMode?: false | 'newTabWithUrl' | 'currentTab';
+  closeNewTabsAfterDisconnect?: boolean;
+
+  /**
+   * CDP (Chrome DevTools Protocol) endpoint URL.
+   * When specified, connects to an existing Chrome browser via CDP instead of launching a new one.
+   *
+   * @example
+   * ```yaml
+   * web:
+   *   url: https://example.com
+   *   cdpEndpoint: ws://localhost:9222/devtools/browser/xxxx
+   * ```
+   */
+  cdpEndpoint?: string;
+}
+
+/** How to reach / launch an Android target (device driver options + which device + what to launch). */
+export interface AndroidConnectionOpt
+  extends Omit<AndroidDeviceOpt, 'customActions'> {
+  // The Android device ID to connect to, optional, will use the first device if not specified
+  deviceId?: string;
+
+  // The URL or app package to launch, optional, will use the current screen if not specified
+  launch?: string;
+}
+
+/** How to reach / launch an iOS target. */
+export interface IOSConnectionOpt extends Omit<IOSDeviceOpt, 'customActions'> {
+  // The URL or app bundle ID to launch, optional, will use the current screen if not specified
+  launch?: string;
+}
+
+/** How to reach / launch a HarmonyOS target. */
+export interface HarmonyConnectionOpt
+  extends Omit<HarmonyDeviceOpt, 'customActions'> {
+  // The HarmonyOS device ID to connect to, optional, will use the first device if not specified
+  deviceId?: string;
+
+  // The app package to launch, optional, will use the current screen if not specified
+  launch?: string;
+
+  // Custom mapping of app names to bundle names, user-provided mappings take precedence over defaults
+  appNameMapping?: Record<string, string>;
+}
+
+/** How to reach a computer target. */
+export interface ComputerConnectionOpt {
+  // The display ID to use, optional, will use the primary display if not specified
+  displayId?: string;
+}
diff --git a/packages/core/src/index.ts b/packages/core/src/index.ts
index 2cc04c7703..295cb70a2a 100644
--- a/packages/core/src/index.ts
+++ b/packages/core/src/index.ts
@@ -54,13 +54,17 @@ export type {
   MidsceneYamlScriptIOSEnv,
   MidsceneYamlScriptComputerEnv,
   MidsceneYamlScriptEnv,
+  LocateOption,
+  DetailedLocateParam,
+} from './yaml';
+
+export type {
   WebConnectionOpt,
   AndroidConnectionOpt,
   IOSConnectionOpt,
+  HarmonyConnectionOpt,
   ComputerConnectionOpt,
-  LocateOption,
-  DetailedLocateParam,
-} from './yaml';
+} from './connection-options';
 
 export { Agent, type AgentOpt, type AiActOptions, createAgent } from './agent';
 
diff --git a/packages/core/src/yaml.ts b/packages/core/src/yaml.ts
index 8168eb1041..525f03dca1 100644
--- a/packages/core/src/yaml.ts
+++ b/packages/core/src/yaml.ts
@@ -1,9 +1,11 @@
 import type { TMultimodalPrompt, TUserPrompt } from './common';
 import type {
-  AndroidDeviceOpt,
-  HarmonyDeviceOpt,
-  IOSDeviceOpt,
-} from './device';
+  AndroidConnectionOpt,
+  ComputerConnectionOpt,
+  HarmonyConnectionOpt,
+  IOSConnectionOpt,
+  WebConnectionOpt,
+} from './connection-options';
 import type { AgentOpt, LocateResultElement, Rect } from './types';
 import type { UIContext } from './types';
 
@@ -125,100 +127,29 @@ export interface MidsceneYamlScriptEnvGeneralInterface {
   param?: Record<string, any>;
 }
 
+// The YAML-script env types are the connection options plus the YAML run
+// config (and, for web, agent behavior). Connection options are the source of
+// truth — see `./connection-options`.
 export interface MidsceneYamlScriptWebEnv
-  extends MidsceneYamlScriptConfig,
-    MidsceneYamlScriptAgentOpt {
-  // for web only
-  serve?: string;
-  url: string;
-
-  // puppeteer only
-  userAgent?: string;
-  acceptInsecureCerts?: boolean;
-  viewportWidth?: number;
-  viewportHeight?: number;
-  deviceScaleFactor?: number;
-  waitForNetworkIdle?: {
-    timeout?: number;
-    continueOnNetworkIdleError?: boolean; // should continue if failed to wait for network idle, true for default
-  };
-  cookie?: string;
-  forceSameTabNavigation?: boolean; // if track the newly opened tab, true for default in yaml script
-
-  /**
-   * Custom Chrome launch arguments (Puppeteer only, not supported in bridge mode).
-   *
-   * Allows passing custom command-line arguments to Chrome/Chromium when launching the browser.
-   * This is useful for testing scenarios that require specific browser configurations.
-   *
-   * ⚠️ Security Warning: Some arguments (e.g., --no-sandbox, --disable-web-security) may
-   * reduce browser security. Use only in controlled testing environments.
-   *
-   * @example
-   * ```yaml
-   * web:
-   *   url: https://example.com
-   *   chromeArgs:
-   *     - '--disable-features=ThirdPartyCookiePhaseout'
-   *     - '--disable-features=SameSiteByDefaultCookies'
-   *     - '--window-size=1920,1080'
-   * ```
-   */
-  chromeArgs?: string[];
-
-  // bridge mode config
-  bridgeMode?: false | 'newTabWithUrl' | 'currentTab';
-  closeNewTabsAfterDisconnect?: boolean;
-
-  /**
-   * CDP (Chrome DevTools Protocol) endpoint URL.
-   * When specified, connects to an existing Chrome browser via CDP instead of launching a new one.
-   *
-   * @example
-   * ```yaml
-   * web:
-   *   url: https://example.com
-   *   cdpEndpoint: ws://localhost:9222/devtools/browser/xxxx
-   * ```
-   */
-  cdpEndpoint?: string;
-}
+  extends WebConnectionOpt,
+    MidsceneYamlScriptConfig,
+    MidsceneYamlScriptAgentOpt {}
 
 export interface MidsceneYamlScriptAndroidEnv
-  extends MidsceneYamlScriptConfig,
-    Omit<AndroidDeviceOpt, 'customActions'> {
-  // The Android device ID to connect to, optional, will use the first device if not specified
-  deviceId?: string;
-
-  // The URL or app package to launch, optional, will use the current screen if not specified
-  launch?: string;
-}
+  extends AndroidConnectionOpt,
+    MidsceneYamlScriptConfig {}
 
 export interface MidsceneYamlScriptIOSEnv
-  extends MidsceneYamlScriptConfig,
-    Omit<IOSDeviceOpt, 'customActions'> {
-  // The URL or app bundle ID to launch, optional, will use the current screen if not specified
-  launch?: string;
-}
+  extends IOSConnectionOpt,
+    MidsceneYamlScriptConfig {}
 
 export interface MidsceneYamlScriptHarmonyEnv
-  extends MidsceneYamlScriptConfig,
-    Omit<HarmonyDeviceOpt, 'customActions'> {
-  // The HarmonyOS device ID to connect to, optional, will use the first device if not specified
-  deviceId?: string;
-
-  // The app package to launch, optional, will use the current screen if not specified
-  launch?: string;
-
-  // Custom mapping of app names to bundle names, user-provided mappings take precedence over defaults
-  appNameMapping?: Record<string, string>;
-}
+  extends HarmonyConnectionOpt,
+    MidsceneYamlScriptConfig {}
 
 export interface MidsceneYamlScriptComputerEnv
-  extends MidsceneYamlScriptConfig {
-  // The display ID to use, optional, will use the primary display if not specified
-  displayId?: string;
-}
+  extends ComputerConnectionOpt,
+    MidsceneYamlScriptConfig {}
 
 export type MidsceneYamlScriptEnv =
   | MidsceneYamlScriptWebEnv
@@ -227,42 +158,6 @@ export type MidsceneYamlScriptEnv =
   | MidsceneYamlScriptHarmonyEnv
   | MidsceneYamlScriptComputerEnv;
 
-/**
- * Canonical per-platform connection / launch target options.
- *
- * These are the pure "how to reach the target" types: the same fields the
- * `MidsceneYamlScript*Env` types carry, but with the YAML run config
- * (`MidsceneYamlScriptConfig`: output, unstableLogContent) and the agent
- * behavior options (`MidsceneYamlScriptAgentOpt`: generateReport, cache, ...)
- * stripped out. Use these when you only need to describe the connection target
- * and want agent behavior expressed separately (e.g. via `AgentOpt`). They are
- * derived from the env types, so they stay in sync automatically.
- */
-export type WebConnectionOpt = Omit<
-  MidsceneYamlScriptWebEnv,
-  keyof MidsceneYamlScriptConfig | keyof MidsceneYamlScriptAgentOpt
->;
-
-export type AndroidConnectionOpt = Omit<
-  MidsceneYamlScriptAndroidEnv,
-  keyof MidsceneYamlScriptConfig
->;
-
-export type IOSConnectionOpt = Omit<
-  MidsceneYamlScriptIOSEnv,
-  keyof MidsceneYamlScriptConfig
->;
-
-export type HarmonyConnectionOpt = Omit<
-  MidsceneYamlScriptHarmonyEnv,
-  keyof MidsceneYamlScriptConfig
->;
-
-export type ComputerConnectionOpt = Omit<
-  MidsceneYamlScriptComputerEnv,
-  keyof MidsceneYamlScriptConfig
->;
-
 export interface MidsceneYamlFlowItemAIAction {
   // defined as aiAction for backward compatibility
   aiAction?: string;
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
index cf1ccbcc8b..57615803e9 100644
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ b/rfcs/0001-v2-testing-framework-phase0.md
@@ -81,8 +81,9 @@ export default defineMidsceneConfig({
   // —— 运行目标：单字段 uiAgent，容纳配置式与编程式（见 §2.1）——
   // 配置式按 type 判别，options 直接复用 @midscene/core 的逐平台「连接类型」
   // （WebConnectionOpt / AndroidConnectionOpt / IOSConnectionOpt /
-  // ComputerConnectionOpt），即从 env 类型里剥掉 agent 行为与 yaml 配置后的
-  // 纯连接契约，与 agent launcher 入参同源，不再是手写的 Record。
+  // ComputerConnectionOpt）——这些是一等的纯连接契约，env 类型反过来由它们
+  // 组合而成（env = 连接 + yaml 配置 + agent 行为）。与 agent launcher 入参
+  // 同源，不再是手写的 Record。
   uiAgent:
     | { type: 'web'; options: WebConnectionOpt }
     | { type: 'android'; options?: AndroidConnectionOpt }
@@ -127,7 +128,7 @@ export default defineMidsceneConfig({
 - 值是**对象** → 配置式：框架据 `type + options` 创建 UI Agent。
 - 值是**函数** → 编程式：项目完全掌控构造。
 
-两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。`options` 不是手写的 `Record`，而是按 `type` 判别后落到 `@midscene/core` 暴露的逐平台「连接类型」`WebConnectionOpt` / `AndroidConnectionOpt` / … 上——这些是从对应 env 类型派生、剥掉 agent 行为与 yaml 配置后的纯连接契约（web 的 `url` 因此是必填）。改 core 类型这里会立即感知，且不再把 agent-opt / output 等无关字段混进连接参数。
+两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。`options` 不是手写的 `Record`，而是按 `type` 判别后落到 `@midscene/core` 暴露的逐平台「连接类型」`WebConnectionOpt` / `AndroidConnectionOpt` / … 上。这些连接类型是**一等公民**（定义在 core 的 `connection-options.ts`），`MidsceneYamlScript*Env` 反过来由它们组合而成（env = 连接 + yaml 配置 + agent 行为），所以连接参数纯净（web 的 `url` 必填，且不夹带 agent-opt / output 等无关字段）。改 core 类型这里会立即感知。
 
 **配置式样例：**
 

From 8566887dd7ffd859606c17bcb2642d659d854e01 Mon Sep 17 00:00:00 2001
From: ottomao <ottomao@gmail.com>
Date: Fri, 5 Jun 2026 00:36:13 -0700
Subject: [PATCH 33/33] chore(testing-framework): vendor example, drop RFC,
 mark package private
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Release prep for @midscene/testing-framework. The package is internal for
now (not exposed to users), so guard against accidental publishing and
tidy the layout:

- Move the top-level `example/` demo into `packages/testing-framework/example/`
  (it is a standalone copy-out demo, not a workspace member; `packages/*`
  globs are non-recursive so it stays standalone). Fix the relative link in
  its README and the package README pointer.
- Mark the package `"private": true` and drop `publishConfig`. The release
  workflow publishes via `pnpm -r publish`, which skips private packages —
  verified the package is no longer picked up for publish.
- Delete the Phase 0 design RFC now that the work has landed.
---
 packages/testing-framework/README.md          |   2 +-
 .../testing-framework/example}/.gitignore     |   0
 .../testing-framework/example}/README.md      |   2 +-
 .../example}/e2e/add-to-cart.yaml             |   0
 .../example}/e2e/product-detail.yaml          |   0
 .../example}/midscene.config.ts               |   0
 .../testing-framework/example}/package.json   |   0
 .../example}/site/index.html                  |   0
 .../example}/skills/catalog/SKILL.md          |   0
 packages/testing-framework/package.json       |   5 +-
 rfcs/0001-v2-testing-framework-phase0.md      | 480 ------------------
 11 files changed, 3 insertions(+), 486 deletions(-)
 rename {example => packages/testing-framework/example}/.gitignore (100%)
 rename {example => packages/testing-framework/example}/README.md (95%)
 rename {example => packages/testing-framework/example}/e2e/add-to-cart.yaml (100%)
 rename {example => packages/testing-framework/example}/e2e/product-detail.yaml (100%)
 rename {example => packages/testing-framework/example}/midscene.config.ts (100%)
 rename {example => packages/testing-framework/example}/package.json (100%)
 rename {example => packages/testing-framework/example}/site/index.html (100%)
 rename {example => packages/testing-framework/example}/skills/catalog/SKILL.md (100%)
 delete mode 100644 rfcs/0001-v2-testing-framework-phase0.md

diff --git a/packages/testing-framework/README.md b/packages/testing-framework/README.md
index 052f07e1ce..00ba21717a 100644
--- a/packages/testing-framework/README.md
+++ b/packages/testing-framework/README.md
@@ -56,7 +56,7 @@ midscene-tf run            # run all discovered cases
 midscene-tf run e2e/x.yaml # run a specific case
 ```
 
-See a runnable demo in the repository's `example/` directory.
+See a runnable demo in this package's [`example/`](./example) directory.
 
 ## Programmatic API
 
diff --git a/example/.gitignore b/packages/testing-framework/example/.gitignore
similarity index 100%
rename from example/.gitignore
rename to packages/testing-framework/example/.gitignore
diff --git a/example/README.md b/packages/testing-framework/example/README.md
similarity index 95%
rename from example/README.md
rename to packages/testing-framework/example/README.md
index 823d07c2eb..637e075344 100644
--- a/example/README.md
+++ b/packages/testing-framework/example/README.md
@@ -1,6 +1,6 @@
 # Midscene v2 Testing Framework — Example
 
-A self-contained demo of [`@midscene/testing-framework`](../packages/testing-framework)
+A self-contained demo of [`@midscene/testing-framework`](..)
 (the AI-native v2 UI testing framework, Phase 0). Copy this folder out, install,
 set your model env vars, and run.
 
diff --git a/example/e2e/add-to-cart.yaml b/packages/testing-framework/example/e2e/add-to-cart.yaml
similarity index 100%
rename from example/e2e/add-to-cart.yaml
rename to packages/testing-framework/example/e2e/add-to-cart.yaml
diff --git a/example/e2e/product-detail.yaml b/packages/testing-framework/example/e2e/product-detail.yaml
similarity index 100%
rename from example/e2e/product-detail.yaml
rename to packages/testing-framework/example/e2e/product-detail.yaml
diff --git a/example/midscene.config.ts b/packages/testing-framework/example/midscene.config.ts
similarity index 100%
rename from example/midscene.config.ts
rename to packages/testing-framework/example/midscene.config.ts
diff --git a/example/package.json b/packages/testing-framework/example/package.json
similarity index 100%
rename from example/package.json
rename to packages/testing-framework/example/package.json
diff --git a/example/site/index.html b/packages/testing-framework/example/site/index.html
similarity index 100%
rename from example/site/index.html
rename to packages/testing-framework/example/site/index.html
diff --git a/example/skills/catalog/SKILL.md b/packages/testing-framework/example/skills/catalog/SKILL.md
similarity index 100%
rename from example/skills/catalog/SKILL.md
rename to packages/testing-framework/example/skills/catalog/SKILL.md
diff --git a/packages/testing-framework/package.json b/packages/testing-framework/package.json
index 9789689aa4..e3a10f8e68 100644
--- a/packages/testing-framework/package.json
+++ b/packages/testing-framework/package.json
@@ -1,6 +1,7 @@
 {
   "name": "@midscene/testing-framework",
   "version": "1.8.9",
+  "private": true,
   "repository": {
     "type": "git",
     "url": "https://github.com/web-infra-dev/midscene.git",
@@ -64,9 +65,5 @@
   "engines": {
     "node": ">=18.19.0"
   },
-  "publishConfig": {
-    "access": "public",
-    "registry": "https://registry.npmjs.org"
-  },
   "license": "MIT"
 }
diff --git a/rfcs/0001-v2-testing-framework-phase0.md b/rfcs/0001-v2-testing-framework-phase0.md
deleted file mode 100644
index 57615803e9..0000000000
--- a/rfcs/0001-v2-testing-framework-phase0.md
+++ /dev/null
@@ -1,480 +0,0 @@
-# RFC 0001 · v2 Testing Framework — Phase 0 设计稿
-
-状态：**草稿 / 待讨论**
-范围：只覆盖 Phase 0 —— 节点模型、`midscene.config.ts`、`defineRuntime` / `$name` skill、verify 判定契约、output 契约与护栏、上下文装配。
-不覆盖：Pi 内部实现、Rstest 接线细节、v1→v2 迁移工具。
-
-> 本稿目标：把"动手前必须先定的接口"钉死成可评审的草案。每节末尾的 **🔶 待讨论** 是我留的开放决策点。
-
-> **实现状态（Phase 0）**：本稿契约已落地为新包 `@midscene/testing-framework`（`packages/testing-framework`），含 `defineMidsceneConfig` / `defineRuntime`、v2 YAML 解析、节点引擎（`ui`/`verify`/`soft`/`agent`/自定义）、上下文装配、verify fail-closed 判定、默认 Pi 通用 agent（已解决 C′），以及一个轻量 runner 与 CLI（`midscene-tf`）。可 copy 演示的样例在仓库根 `example/`。唯一开放项 C′ 已落实（见 §4.1）。
-
----
-
-## 0. 术语与分层回顾（已达成共识，作为前提）
-
-- **新引擎，不改 `ScriptPlayer`**：v2 作为新包 `@midscene/testing-framework`，把 `@midscene/core` 的 `Agent` / Device / `ReportGenerator` 当库复用。
-- **两类 Agent**：`ui` 节点 → Midscene **UI Agent**（`agent.aiAct` 等）；`verify` / `agent` 节点 → **可替换的通用 Agent 层**（当前 Pi）。
-- **上下文契约**：执行任一 `verify` / `agent` 时，可见上下文 = `所有过往步骤(含意图) + 每步输出 + 当前 UI`，**没有别的**。
-- **判定语义**：`verify` = 确定性闸门（gate CI）；`agent` = 探索性、非确定、**不参与 pass/fail**。
-
----
-
-## 1. 用例文件（v2 YAML schema）
-
-### 1.1 顶层
-
-```yaml
-name: Create Order          # 可选，人类可读名
-flow:                       # 有序步骤列表
-  - <step>
-  - <step>
-```
-
-不再有 v1 的 `web:` / `android:` / `tasks:` 等顶层环境字段——**环境/target 全部移到 `midscene.config.ts`**。用例文件只描述"用户要完成什么"。
-
-### 1.2 步骤（step）
-
-每个 step 是一个单键 map，键 = 节点类型或自定义节点名，值 = 该节点的输入。
-
-**内置节点（`ui` / `verify` / `agent`）的输入只有自然语言**，输出也用自然语言描述——**不引入 schema**。YAML 就是为简单而生，保持纯文本：
-
-```yaml
-flow:
-  - ui: Search for "running shoes"
-  - ui: |
-      创建一笔测试订单。
-      将这一步的输出命名为 createOrder，记录订单号 orderId 和页面状态 pageState。
-  - verify: The product detail page shows a visible Add to cart button
-  - agent: Freely inspect this page for anything that looks off
-```
-
-**自定义（runtime）节点的输入可以是 object**（指令不一定是文本）：
-
-```yaml
-flow:
-  - prepareOrderFixture:
-      scenario: paid-order
-```
-
-> 规则：内置节点的值是字符串；自定义节点的值可为字符串或 object，整个值作为 `input` 交给 runtime（见 §3）。
-
-### 1.3 内置节点类型
-
-| 节点 | 执行者 | 语义 | 能否 gate |
-|---|---|---|---|
-| `ui` | UI Agent | 自然语言 UI 操作 | 操作失败抛错 → case 失败 |
-| `verify` | Pi Agent | 带判定的断言，必须给出 pass/fail | **是** |
-| `soft` | Pi Agent | 软断言，同 `verify` 但失败只记 warning（§6.1） | **否** |
-| `agent` | Pi Agent | 自由探索，产出诊断/建议 | **否**（advisory） |
-| `<自定义名>` | runtime（TS） | 项目扩展节点（见 §3） | 抛错 → case 失败 |
-
-**已定：`verify` / `agent` 只读 UI。** 它们只观察"当前截图"+ 调 skill，**不驱动页面**（不点击、不输入）；驱动页面只由 `ui` 和 runtime 负责。理由：gating 可控、避免 agent 中途把应用点到别处破坏后续步骤。"让 agent 自主驱动 UI 深查"作为后续扩展，Phase 0 不做。
-
----
-
-## 2. `midscene.config.ts`
-
-```ts
-import { defineMidsceneConfig } from '@midscene/testing-framework';
-
-export default defineMidsceneConfig({
-  // —— 运行目标：单字段 uiAgent，容纳配置式与编程式（见 §2.1）——
-  // 配置式按 type 判别，options 直接复用 @midscene/core 的逐平台「连接类型」
-  // （WebConnectionOpt / AndroidConnectionOpt / IOSConnectionOpt /
-  // ComputerConnectionOpt）——这些是一等的纯连接契约，env 类型反过来由它们
-  // 组合而成（env = 连接 + yaml 配置 + agent 行为）。与 agent launcher 入参
-  // 同源，不再是手写的 Record。
-  uiAgent:
-    | { type: 'web'; options: WebConnectionOpt }
-    | { type: 'android'; options?: AndroidConnectionOpt }
-    | { type: 'ios'; options?: IOSConnectionOpt }
-    | { type: 'computer'; options?: ComputerConnectionOpt }
-    | ((ctx: UIAgentFactoryCtx) => Promise<{ agent: Agent }>);
-
-  // —— 用例发现 ——
-  testDir: string;
-  include?: string[];                            // 默认 ['**/*.yaml']
-  exclude?: string[];
-
-  // —— 执行策略（对齐 Rstest 概念）——
-  testRunner?: {
-    maxConcurrency?: number;
-    bail?: number;
-    testTimeout?: number;
-    retry?: number;
-  };
-
-  // —— 输出 ——
-  output?: {
-    summary?: string;
-    reportDir?: string;
-  };
-
-  // —— 共享 UI Agent 参数 ——
-  uiAgentOptions?: UIAgentOptions;               // aiActContext, generateReport, ...
-
-  // —— 扩展点 ——
-  runtime?: Record<string, RuntimeNode>;         // 自定义 YAML 节点（§3）
-  generalAgent?: GeneralAgentAdapter;            // Pi 的替换点（§6）
-});
-```
-
-**没有 `skills` 字段。** `$name` skill 不在 config 里注册——由 Pi 自行发现与加载，框架只负责"识别 `$name` 并交给 Pi"（见 §4）。
-
-### 2.1 运行目标：单字段 `uiAgent`（已定）
-
-**决定：用单个 `uiAgent` key 同时容纳配置式与编程式**（即上一轮的方案 b），且 key 名从 `target` 改为 `uiAgent`——和 `uiAgentOptions`、`RuntimeNodeContext.uiAgent` 统一命名，一眼看出这字段就是"UI Agent 怎么来"。
-
-- 值是**对象** → 配置式：框架据 `type + options` 创建 UI Agent。
-- 值是**函数** → 编程式：项目完全掌控构造。
-
-两者唯一的 key，类型层就是 union，从根上消除"两套运行目标定义"的气味。`options`（平台连接参数，如 url / deviceId）与 `uiAgentOptions`（Agent 行为，如 aiActContext / generateReport）是两类不同的东西，都保留。`options` 不是手写的 `Record`，而是按 `type` 判别后落到 `@midscene/core` 暴露的逐平台「连接类型」`WebConnectionOpt` / `AndroidConnectionOpt` / … 上。这些连接类型是**一等公民**（定义在 core 的 `connection-options.ts`），`MidsceneYamlScript*Env` 反过来由它们组合而成（env = 连接 + yaml 配置 + agent 行为），所以连接参数纯净（web 的 `url` 必填，且不夹带 agent-opt / output 等无关字段）。改 core 类型这里会立即感知。
-
-**配置式样例：**
-
-```ts
-import { defineMidsceneConfig } from '@midscene/testing-framework';
-
-export default defineMidsceneConfig({
-  uiAgent: {
-    type: 'web',
-    options: { url: 'https://shop.example.com' },   // 平台连接参数
-  },
-
-  testDir: './e2e',
-  include: ['**/*.yaml'],
-
-  testRunner: { maxConcurrency: 2, testTimeout: 120_000 },
-  output: {
-    summary: './midscene_run/output/summary.json',
-    reportDir: './midscene_run/report',
-  },
-
-  uiAgentOptions: {                                  // Agent 行为参数
-    aiActContext: 'The user is already signed in as a smoke-test account.',
-    generateReport: true,
-  },
-});
-```
-
-**编程式样例（同一个 `uiAgent` key，填工厂函数）：**
-
-```ts
-import { agentFromAdbDevice } from '@midscene/android';
-import { defineMidsceneConfig } from '@midscene/testing-framework';
-
-export default defineMidsceneConfig({
-  uiAgent: async ({ uiAgentOptions, env }) => ({
-    agent: await agentFromAdbDevice(env.ANDROID_DEVICE_ID, {
-      ...uiAgentOptions,
-      androidAdbPath: env.ANDROID_ADB_PATH,
-      autoDismissKeyboard: false,
-    }),
-  }),
-
-  testDir: './e2e',
-  uiAgentOptions: {
-    aiActContext: 'The user is already signed in as a smoke-test account.',
-    generateReport: true,
-  },
-});
-```
-
-### 2.2 配套的完整用例 YAML
-
-用例文件里**没有任何环境/target**——那些都在 `midscene.config.ts`。`e2e/create-order.yaml` 就是纯 flow：
-
-```yaml
-name: Create Order
-
-flow:
-  - prepareOrderFixture:            # 自定义节点，input 为 object
-      scenario: paid-order
-
-  - ui: |                          # UI Agent，纯自然语言
-      使用测试账号登录并创建一笔测试订单。
-      将这一步的输出命名为 createOrder，记录订单号 orderId 与是否创建成功。
-
-  - verify: |                      # Pi，$name skill 由 Pi 按需加载，强制 verdict
-      使用 $database 验证名为 createOrder 的输出中的 orderId 真实存在，且状态为 paid。
-
-  - verify: |
-      使用 $logs 检查测试期间是否出现相关 ERROR。
-
-  - verify: 订单详情页展示支付成功    # 纯 UI 截图判定
-
-  - agent: 根据以上所有验证结果与当前截图，分析本次测试风险并给出后续建议  # advisory，不 gate
-
-  - notifySlack                    # 自定义节点
-```
-
-目录结构：
-
-```text
-.
-  midscene.config.ts
-  e2e/
-    create-order.yaml
-    checkout.yaml
-```
-
----
-
-## 3. `defineRuntime` —— 自定义节点（更底层扩展）
-
-```ts
-type RuntimeNode = (
-  input: unknown,                 // 该节点的 YAML 值（字符串或 object）
-  context: RuntimeNodeContext,
-) => Promise<RuntimeNodeResult>;
-
-interface RuntimeNodeContext {
-  uiAgent: Agent;                 // UI Agent，runtime 也可驱动页面
-  outputs: OutputStore;           // 所有过往"面向上下文的输出"（只读）
-  state: Record<string, unknown>; // ★ TS 侧状态，agent 看不到（见 §7）
-  result: TestResultSoFar;        // 当前 case 已累积的结果
-  env: NodeJS.ProcessEnv;
-}
-
-interface RuntimeNodeResult {
-  conclusion: string;                       // ★ 面向上下文的输出，进 Pi 上下文
-  output?: Record<string, unknown>;         // 可选结构化输出（同样进上下文）
-}
-
-function defineRuntime(node: RuntimeNode): RuntimeNode;
-```
-
-要点：
-- 节点入参拆成两个：`input`（这个节点自己的 YAML 值）+ `context`（环境上下文）。
-- `conclusion`（和可选 `output`）= **面向上下文信道**，进后续 `verify` / `agent`。
-- `state` = **面向工程信道**，runtime 节点之间传结构化数据，**不进 agent 上下文**。
-- runtime 抛错 → 该 case 失败。
-
----
-
-## 4. `$name` skill —— 复用 Pi 自己的 Skills 机制
-
-**已核对 Pi（earendil-works/pi）的实际能力**，结论：不用我们造轮子，直接复用 Pi 内建的 Skills。
-
-Pi 的 Skills = Anthropic Agent-Skills 模型：每个 skill 是一个含 `SKILL.md`（YAML frontmatter：`name` + `description` + markdown 指令）的目录；**渐进式披露**——启动时只把各 skill 的 `name`/`description` 放进 system prompt，**完整指令按需加载**；模型在任务匹配时自行决定加载哪个。来源包括目录扫描、`package.json` 的 `skills/`、settings 的 `skills` 数组、CLI `--skill`。
-
-Pi 的可嵌入 SDK（`@earendil-works/pi-coding-agent`）提供了我们需要的全部接线点：
-
-```ts
-import { createAgentSession, SessionManager, DefaultResourceLoader }
-  from '@earendil-works/pi-coding-agent';
-
-// 1) 把项目的 skills 提供给 Pi（让其 description 进上下文）
-const loader = new DefaultResourceLoader({
-  skillsOverride: (cur) => ({ skills: [...cur.skills, ...projectSkills], diagnostics: cur.diagnostics }),
-});
-
-// 2) 创建会话
-const { session } = await createAgentSession({
-  sessionManager: SessionManager.inMemory(),
-  resourceLoader: loader,
-});
-
-// 3) 跑一个 verify/agent 节点：当前截图直接作为 image 传入
-await session.prompt(assembledContext, {
-  images: [{ type: 'image', source: { type: 'base64', mediaType: 'image/png', data: screenshotBase64 } }],
-});
-```
-
-**框架的全部职责，就这些：**
-1. 把项目可用的 skills 通过 `resourceLoader` / `skillsOverride` 交给 Pi（描述进上下文）。
-2. 组装上下文（§7）+ 当前截图（走 `prompt` 的 `images`）喂给 Pi。
-3. 节点自然语言里的 `$database`、`$logs` 等 `$name` token 作为**引导**，让 Pi 自行按需加载对应 skill。
-
-之后"加载哪个、怎么调、调几次"全由 Pi 决定，框架不介入。
-
-**关于 `$name` 的激活方式（对应你的判断）**：Pi SDK **没有**"按名字强制激活 skill"的程序化入口——skill 是靠模型推理按需加载的。所以 `$name` 落地为**你说的那个方案：在 prompt 里引导 Pi 自行加载**。代价是多一步模型决策、略慢，但实现零特殊接口、最贴合 agentic。`$name` 这个显式 token 恰好是很强的加载信号，比纯靠 description 匹配更稳。
-
-可选增强（非 Phase 0 必需）：框架可以静态提取 `$name` 集合，用来 ① 校验引用的 skill 是否存在（不存在直接报错，避免静默跑空）；② 把被引用的 skill 描述在 prompt 里置顶强调。但**激活本身仍是 Pi 按需加载**。
-
-生命周期（已入用户文档）：**skill 结果只属于这一次执行**，不自动进后续上下文；要留就由当前节点写进自己的 output。
-
-### 4.1 Pi 接线：已确认 vs 唯一缺口
-
-已对照 Pi SDK 文档核实，下面这些**都已存在**，足够支撑 Phase 0：
-
-| 需求 | Pi SDK | 状态 |
-|---|---|---|
-| 单节点跑完整 agent loop（多轮工具调用直到结束） | `session.prompt()` 跑完整 loop，turn 结束才 resolve | ✅ |
-| 读 agent 最终结果 | `subscribe` 的 `turn_end` 事件，带 `message` + `toolResults` | ✅ |
-| 注入当前截图 | `prompt(text, { images: [{ base64 png }] })` | ✅ |
-| 自定义 tool（verify 的 verdict 工具） | `customTools: [defineTool(...)]` 或 extension `pi.registerTool` | ✅ |
-| skills 注入 | `DefaultResourceLoader` + `skillsOverride` | ✅ |
-| 选模型 / 鉴权 | `getModel(provider, model)`；`AuthStorage.setRuntimeApiKey` 或 env | ✅ |
-
-✅ **C′ 已落实（不再是开放项）**：核对 Pi SDK 源码（`@earendil-works/pi-coding-agent` 0.78）确认 `ModelRegistry.registerProvider(name, config)` 接受 `baseUrl` + `apiKey` + 一组 `models`（可指定 `api: 'openai-completions'`、`input: ['text','image']`）。因此框架可以：
-
-```ts
-const authStorage = AuthStorage.inMemory();
-const registry = ModelRegistry.inMemory(authStorage);
-registry.registerProvider('midscene', {
-  baseUrl: process.env.MIDSCENE_MODEL_BASE_URL,
-  apiKey: process.env.MIDSCENE_MODEL_API_KEY,
-  models: [{
-    id: process.env.MIDSCENE_MODEL_NAME, name: process.env.MIDSCENE_MODEL_NAME,
-    api: 'openai-completions', reasoning: false, input: ['text', 'image'],
-    cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
-    contextWindow: 128_000, maxTokens: 8_192,
-  }],
-});
-const model = registry.find('midscene', process.env.MIDSCENE_MODEL_NAME);
-const { session } = await createAgentSession({ model, modelRegistry: registry, authStorage, ... });
-```
-
-这样 `verify`/`agent`（Pi）与 `ui`（Midscene UI Agent）走**同一个 `MIDSCENE_MODEL_BASE_URL` 端点**，零 Pi 改动。实现见 `@midscene/testing-framework` 的 `PiGeneralAgent`（`src/general-agent/pi-general-agent.ts`），并有 `tests/smoke/pi-wiring.mjs` 验证（provider 注册 / apiKey 解析 / session 选模型 / `report_verdict` customTool 激活）均通过。
-
-> 注：`MIDSCENE_MODEL_EXTRA_BODY_JSON`（如 `{"service_tier":"fast"}`）只对 `ui` 节点的 Midscene UI Agent 生效；Phase 0 未把它透传给 Pi 节点（属性能优化、非正确性，后续可经 stream `onPayload` 接入）。
-
----
-
-## 5. output —— 纯自然语言，不做 schema
-
-**决定：output 没有 schema。** YAML 就是为简单而生，schema 会让作者搞不清楚，违背初衷。每个步骤的输出就是一段自然语言——要命名、要记哪些字段，都在自然语言里说清楚：
-
-```yaml
-- ui: |
-    创建一笔测试订单。
-    将这一步的输出命名为 createOrder，记录订单号 orderId 和页面状态 pageState。
-```
-
-后续节点同样用自然语言引用"名为 createOrder 的输出中的 orderId"。命名是为了**无歧义指代**，不是为了校验。
-
-**已知取舍（明确接受）**：output 是 LLM 生成的自然语言，缺字段不会硬失败——"静默丢字段"的风险在 Phase 0 **不做引擎级护栏**。
-
-**后续迭代的兜底**：真要确定性校验时，单独做一个**校验代码节点**（一个 runtime 节点形态的 TS 校验，从 `outputs` 里取值、用代码断言、不通过就 fail），而**不是**往 YAML 里塞 schema。把"确定性证据"留在 TS 侧，YAML 侧保持纯自然语言。这条排进 Phase 0 之后。
-
----
-
-## 6. verify 判定契约
-
-`verify` 跑的是 Pi Agent（自由推理），但必须落到结构化判定。
-
-**提案：verify 节点强制收尾一个结构化 verdict。**
-
-```ts
-interface Verdict {
-  pass: boolean;
-  reason: string;            // 人类可读判定依据
-  evidence?: unknown;        // 可选：截图引用、skill 返回片段等
-}
-```
-
-**落地方式（已据 Pi SDK 确认）**：Pi 没有原生"强制 JSON 输出"，但有 `customTools`——所以引擎给 `verify` 这次运行注册一个 `report_verdict` 工具，并在 prompt 里要求 agent 在收尾时调用它；verdict 从 `turn_end` 的 `toolResults` 里取：
-
-```ts
-const reportVerdict = defineTool({
-  name: 'report_verdict',
-  description: '在判定完成时调用，提交本次 verify 的结论',
-  parameters: Type.Object({
-    pass: Type.Boolean(),
-    reason: Type.String(),
-    evidence: Type.Optional(Type.Unknown()),
-  }),
-  execute: async (_id, v) => v,        // 引擎从 toolResults 读回
-});
-```
-
-失败模型 **fail-closed**：
-- `pass === false` → 该 case 失败；
-- agent 没调 `report_verdict` / 无法解析 → **也判失败**（不确定一律按失败处理）；
-- `reason` 始终写进报告。
-
-`agent` 节点不收 verdict，其输出永远不改变 case 的 pass/fail。
-
-### 6.1 `soft` —— 过渡期软断言（已定：做）
-
-给一个"想看但还不想 gate"的档位：`soft` 和 `verify` 用法完全一样、同样产出 `Verdict` 进报告，**区别只在失败时不让 case 变红、不中断后续步骤**（只记录为 warning）。
-
-```yaml
-flow:
-  - verify: 订单详情页展示支付成功        # 失败 → case 红
-  - soft: 页面没有明显的布局错位      # 失败 → 仅记录 warning，不 gate
-```
-
-为什么做成**独立节点**而不是给 `verify` 加 `soft: true` 标志：§1.2 定了内置节点输入只有自然语言、不带 object/flag。新增一个 `soft` 节点类型，既保住"纯自然语言输入"，又把"软/硬"表达得一眼清楚。
-
-命名：**`soft`**（已定）。在 flow 里紧挨 `verify` 出现，`- soft: ...` 自然读作"soft (verify)"，短、够清楚。
-
-失败模型：`soft` `pass:false` → 记 warning，**不改变** case pass/fail；其余与 `verify` 一致（未产出 verdict 也按 warning 处理）。
-
----
-
-## 7. 上下文装配（把文档契约形式化）
-
-执行某个 `verify` / `agent` 时，引擎注入 Pi 的上下文**精确等于**：
-
-```
-对每个过往步骤（按顺序）：
-  - 节点类型 + 指令（自然语言文本，或自定义节点的 object 输入）
-  - 该步骤的输出（自然语言；runtime 节点为其 conclusion）
-  - 若是 verify：其 pass/fail 与 reason
-+ 当前 UI 截图（仅当前这一张）
-+ 本节点预载入 Pi 的 skills（见 §4）
-```
-
-**显式排除**（"没有别的"）：执行过程 trace、历史截图、`context.state`、过往 skill 调用的中间结果。
-
-**已定：Phase 0 不截断。** 长 flow 的上下文会随"所有过往输出"线性增长，但我们选择**预测性 > 紧凑性**（一截断"可推理"卖点就破）。截断/压缩策略后面要加也容易，先不做。
-
----
-
-## 8. 失败模型汇总
-
-| 情况 | 结果 |
-|---|---|
-| `ui` 操作失败抛错 | case 失败 |
-| `verify` `pass:false` | case 失败 |
-| `verify` 未产出 / 无法解析 verdict | case 失败（fail-closed） |
-| `soft` `pass:false` 或未产出 | 记 warning，**不改变** case pass/fail |
-| `agent` 内部出错 | 记录为诊断，**不改变** case pass/fail |
-| runtime 节点抛错 | case 失败 |
-
----
-
-## 9. 端到端示例
-
-完整的 `midscene.config.ts` + 用例 YAML 配套样例见 **§2.1 / §2.2**（贯穿 `uiAgent`、自定义节点、`ui`、`verify`、`agent`、`$name` skill 的全链路）。`soft` 的用法见 **§6.1**。
-
----
-
-## 10. 决策状态汇总
-
-### 已定（本轮拍板）
-
-| 决策 | 结论 |
-|---|---|
-| `ui`/`verify`/`agent` 输入 | 纯自然语言，无 schema |
-| `verify` / `agent` 与 UI | 只读，不驱动页面（驱动留给后续扩展） |
-| output | 纯自然语言，无 schema；确定性校验后续做成 TS 校验节点 |
-| config `skills` 字段 | 不要；skill 由 Pi 自行发现/加载 |
-| 框架对 skill 的职责 | 只识别 `$name` + 调 Pi 方法预载入，其余交给 Pi |
-| `RuntimeNodeContext` 字段名 | `agent` → `uiAgent` |
-| verify 判定契约 | 做：`report_verdict` customTool + `turn_end.toolResults`，fail-closed（§6） |
-| 软断言（F） | 做：独立节点 **`soft`**，失败只记 warning（§6.1） |
-| 运行目标（B） | 单字段 `uiAgent`，union 容纳配置式对象 / 编程式工厂（§2.1） |
-| skill 机制（C） | 复用 Pi 内建 Skills；框架经 `resourceLoader` 提供、`$name` 在 prompt 里引导按需加载（§4） |
-| Pi 接线（loop / 截图 / tool / 模型） | 已确认 SDK 支持（§4.1） |
-| 节点指令形态 | 内置=文本；自定义=文本或 object |
-| 长 flow 上下文 | 不截断（Phase 0） |
-
-### 待对接
-
-| # | 事项 | 状态 |
-|---|---|---|
-| C′ | Pi 能否指定自定义模型 **base URL**（对齐 `MIDSCENE_MODEL_BASE_URL`），让 verify/agent 与 ui 同端点 | ✅ **已落实**：经 `ModelRegistry.registerProvider({ baseUrl, apiKey, models })` 实现，见 §4.1 与 `PiGeneralAgent` |
-
-（无剩余待对接项。）
-
----
-
-## 附：Phase 0 之后（不在本稿讨论范围，仅备忘）
-
-- Pi `GeneralAgentAdapter` 的最小接口（让 Codex Agent SDK 等可替换）。
-- Rstest 接线：用例 → 虚拟测试模块 → 生命周期/fixture 映射。
-- 报告：复用 core `ReportGenerator`，把 verify verdict / agent 诊断如何呈现。
-- v1→v2 转译器（可选、外挂）。