feat(testing): orchestrate core real-machine verification

Jacobinwwey · Jacobinwwey · commit 0c7fe640edfc · 2026-05-18T16:39:44.000+08:00
diff --git a/docs/en/TEST_REPORT.md b/docs/en/TEST_REPORT.md
@@ -33,6 +33,8 @@
 
 ### Phase 2 Verification Refresh (2026-05-18)
 
+- [x] `npm run verify:core-real-machine:clean`
+  - PASS (`output/verification/core-real-machine/2026-05-18T08-31-26-579Z/`; automated summary `6/6`; transient tracked `src-tauri/bin/server-x86_64-pc-windows-msvc.exe` dirtiness auto-restored)
 - [x] `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`
   - PASS
 - [x] `node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache`
@@ -80,6 +82,10 @@
 7. This refresh still does **not** prove release-grade Phase-2 closure:
    - it closes visibility and browser/runtime proof for the new ANN governance summaries,
    - it does **not** close workload/threshold calibration for those budgets.
+8. The current core real-machine test slice is now operationalized behind one repeatable entrypoint:
+   - `npm run verify:core-real-machine` runs the automated foundation/browser/Tauri slice and emits JSON + Markdown reports under `output/verification/core-real-machine/`,
+   - `npm run verify:core-real-machine:clean` runs the same slice and auto-restores transient tracked `src-tauri/bin/server-*` dirtiness introduced by that verification run,
+   - manual interactive real-machine commands remain intentionally separate: `npm run tauri:dev:mini:gpu` for desktop and `npm run tauri:android:dev` for Android.
 
 ### What These Passes Prove
 
diff --git a/docs/en/task.md b/docs/en/task.md
@@ -37,6 +37,10 @@
 
 ### Core Real-Machine Test Commands
 
+- `npm run verify:core-real-machine`
+  - Unified orchestration entrypoint for the current core real-machine test slice. Runs the automated foundation/browser/Tauri checks sequentially and writes JSON + Markdown reports under `output/verification/core-real-machine/`.
+- `npm run verify:core-real-machine:clean`
+  - Same orchestration path, but also restores transient tracked `src-tauri/bin/server-*` dirtiness introduced by the current verification run so the worktree can be kept clean.
 - `npm run verify:foundation:sqlite-runtime:matrix`
   - Highest-value host/runtime proof for the embedded sqlite graph backend across `smoke` / `medium` / `heavy` workloads.
 - `npm run verify:foundation:ann-runtime:matrix`
@@ -49,3 +53,11 @@
   - Primary desktop real-machine interactive command when you want to manually drive the app in the mini GPU-enabled shell.
 - `npm run tauri:android:dev`
   - Primary Android real-device interactive command when you want to push the current app to a connected device.
+
+### Real-Machine Test Cautions
+
+- `verify:foundation:*` and `verify:core-real-machine*` are engineering-grade verification commands, not just lightweight smoke wrappers. Let them prepare `dist` and the host sidecar instead of manually skipping the prerequisite build path.
+- If `build:sidecar`, `ensure-sidecar-ready`, or runtime verification dirties tracked `src-tauri/bin/server-*` files, treat that as transient verification churn. Unless the current task is explicitly about sidecar build, supply, signing, or validation, restore those binary paths back to `HEAD` after the run. Prefer `npm run verify:core-real-machine:clean` when you want the verification flow to auto-restore transient sidecar dirtiness introduced by that run.
+- `verify:agent-workspace:browser` uses an isolated Playwright-managed browser session. Do not run it concurrently with other Playwright-driven browser jobs. It is intended to verify NoteConnection, not to take control of an already-open user Chrome window.
+- `npm run tauri:dev:mini:gpu` and `npm run tauri:android:dev` are manual interactive real-machine commands. Keep them outside automated CI, drive them manually, and close them yourself after collecting evidence.
+- The orchestration report is only trustworthy when the command exits `0` and the generated report under `output/verification/core-real-machine/` shows all automated steps as `PASS`.
diff --git a/docs/zh/TEST_REPORT.md b/docs/zh/TEST_REPORT.md
@@ -33,6 +33,8 @@
 
 ### Phase 2 验证刷新（2026-05-18）
 
+- [x] `npm run verify:core-real-machine:clean`
+  - 通过（`output/verification/core-real-machine/2026-05-18T08-31-26-579Z/`；自动化汇总 `6/6`；本轮引入的受跟踪 `src-tauri/bin/server-x86_64-pc-windows-msvc.exe` 脏改动已自动恢复）
 - [x] `node node_modules/jest/bin/jest.js src/agent_workspace.frontend.test.ts --runInBand --no-cache`
   - 通过
 - [x] `node node_modules/jest/bin/jest.js src/learning/runtimeCapability.test.ts src/knowledge.api.contract.test.ts --runInBand --no-cache`
@@ -80,6 +82,10 @@
 7. 这轮刷新仍然**不等于**发布级 Phase-2 闭环：
    - 它闭合的是新 ANN 治理摘要的可见性与 browser/runtime 证明，
    - 并**没有**闭合这些预算的工作负载/阈值校准。
+8. 当前“核心更新功能实机测试”已经被工程化为单一可重复入口：
+   - `npm run verify:core-real-machine` 会执行自动化 foundation/browser/Tauri 切片，并把 JSON + Markdown 报告落盘到 `output/verification/core-real-machine/`，
+   - `npm run verify:core-real-machine:clean` 会在执行同一自动化切片后，自动恢复该轮验证引入的受跟踪 `src-tauri/bin/server-*` 临时脏改动，
+   - 人工交互式实机命令仍保持分离：桌面端使用 `npm run tauri:dev:mini:gpu`，Android 实机使用 `npm run tauri:android:dev`。
 
 ### 这些通过项实际证明了什么
 
diff --git a/docs/zh/task.md b/docs/zh/task.md
@@ -33,6 +33,10 @@
 
 ### 核心实机测试命令
 
+- `npm run verify:core-real-machine`
+  - 当前“核心更新功能实机测试”的统一编排入口。会顺序执行 foundation/browser/Tauri 的自动化验证，并将 JSON + Markdown 报告写入 `output/verification/core-real-machine/`。
+- `npm run verify:core-real-machine:clean`
+  - 与上述统一编排相同，但会额外回滚本次验证新引入的受跟踪 `src-tauri/bin/server-*` 脏改动，用于保持工作区 clean。
 - `npm run verify:foundation:sqlite-runtime:matrix`
   - 当前 embedded sqlite 图后端最有价值的主机/runtime 证明，覆盖 `smoke` / `medium` / `heavy` 三档 workload。
 - `npm run verify:foundation:ann-runtime:matrix`
@@ -46,6 +50,14 @@
 - `npm run tauri:android:dev`
   - 你要把当前应用推到已连接 Android 实机上做交互测试时，优先使用的命令。
 
+### 实机测试注意事项
+
+- `verify:foundation:*` 与 `verify:core-real-machine*` 是工程级验证命令，不只是轻量 smoke。执行时应允许它们自行准备 `dist` 与 host sidecar，不要手工跳过前置 build 路径。
+- 如果 `build:sidecar`、`ensure-sidecar-ready` 或运行时验证让受跟踪的 `src-tauri/bin/server-*` 产生脏改动，应将其视为“验证过程引入的临时 sidecar 二进制漂移”。除非当前任务明确就是 sidecar build / supply / signing / validation，否则测试完成后要把这些二进制路径恢复到 `HEAD`。若你希望统一验证命令自动清理本次新引入的 sidecar 脏改动，优先使用 `npm run verify:core-real-machine:clean`。
+- `verify:agent-workspace:browser` 使用的是 Playwright 管理的隔离浏览器会话，不要与其他 Playwright 浏览器任务并发执行。它的目标是验证 NoteConnection，而不是接管你已经打开的用户 Chrome 窗口。
+- `npm run tauri:dev:mini:gpu` 与 `npm run tauri:android:dev` 都是人工交互式实机命令，不应放进自动化 CI；需要你手动驱动并在取证后自行关闭。
+- 只有当命令以 `0` 退出、且 `output/verification/core-real-machine/` 下生成的报告显示所有自动化步骤均为 `PASS` 时，才能把该轮统一验证视为可信结果。
+
 ---
 
 # Task: Refining Path Mode Visualization
diff --git a/package.json b/package.json
@@ -97,6 +97,8 @@
     "verify:foundation:sqlite-runtime": "npm run build && node scripts/ensure-sidecar-ready.js && node scripts/verify-foundation-sqlite-runtime.js",
     "verify:foundation:sqlite-runtime:heavy": "npm run build && node scripts/ensure-sidecar-ready.js && node scripts/verify-foundation-sqlite-runtime.js --heavy",
     "verify:foundation:sqlite-runtime:matrix": "npm run build && node scripts/ensure-sidecar-ready.js && node scripts/verify-foundation-sqlite-runtime.js --matrix",
+    "verify:core-real-machine": "node scripts/verify-core-real-machine-tests.js",
+    "verify:core-real-machine:clean": "node scripts/verify-core-real-machine-tests.js --restore-sidecar-binaries",
     "verify:agent-workspace:browser": "node scripts/verify-agent-workspace-browser.js",
     "verify:agent-workspace:tauri": "node scripts/verify-agent-workspace-tauri.js",
     "verify:agent-workspace:tauri:window-evidence": "node scripts/verify-agent-workspace-tauri-window-evidence.js",
diff --git a/scripts/verify-core-real-machine-tests.js b/scripts/verify-core-real-machine-tests.js