You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address PR review feedback on agent-device-evidence
- Add non-interactive overrides for the parent bring-up (deterministic
device pick, forced session reset) so the skill honors its autonomous
contract regardless of parent prompts.
- Key run directories by source kind (pr-N / issue-N) and rename PR_NUM
to SOURCE_NUM/SOURCE_KIND to match the documented output layout and
remove PR-vs-issue artifact ambiguity.
- Broaden exit code 7 from NO_BUILD to BRING_UP_FAILED so Metro, gate,
and simulator-boot failures don't masquerade as install failures.
- Rename "Non-goals" to "Out of scope (do not do these)" with an
imperative lead-in for clearer prohibition framing.
**In scope:**`iOS: Native` (iOS Simulator), `Android: Native` (Android Emulator), HybridApp dev build only. Inputs may come from PRs or issues - the skill does not gate on code changes.
20
20
21
-
**Out of scope:**`Android: mWeb Chrome`, `iOS: mWeb Safari`, `iOS: mWeb Chrome`, `Windows: Chrome`, `MacOS: Chrome / Safari`. Decline with `EXIT 4` and point to a browser-driver skill (`playwright-app-testing`). Standalone (non-HybridApp) builds. Decline with `EXIT 7 NO_BUILD` per the parent skill's gate.
21
+
**Out of scope:**`Android: mWeb Chrome`, `iOS: mWeb Safari`, `iOS: mWeb Chrome`, `Windows: Chrome`, `MacOS: Chrome / Safari`. Decline with `EXIT 4` and point to a browser-driver skill (`playwright-app-testing`). Standalone (non-HybridApp) builds. Decline with `EXIT 7 BRING_UP_FAILED` per the parent skill's gate.
22
22
23
23
## Inputs
24
24
@@ -120,8 +120,11 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
120
120
### Shared setup (run once per platform, before the first flow)
121
121
122
122
1.**Run the [agent-device bring-up](../agent-device/SKILL.md#bring-up)** for the target platform. The parent skill resolves bundle ID, starts Metro, picks/confirms the device, manages session, and opens the app for sanity verification. Capture the resolved `$APP_ID` (bundle ID) and `$DEVICE_NAME` for re-opens in Phases 1 and 2.
123
-
- If the bring-up's HybridApp gate fails or the dev build is not installed, **exit `7 NO_BUILD`**with the parent skill's install instructions.
123
+
- If bring-up fails for any reason (HybridApp gate, missing dev build, Metro start, simulator boot, etc.), **exit `7 BRING_UP_FAILED`**and surface the parent skill's error verbatim.
124
124
- Selector discipline (id > role+label, no coordinate fallback unless 0 a11y nodes) follows the parent skill's [`flows/README.md`](../agent-device/flows/README.md).
125
+
-**Non-interactive overrides for the parent bring-up** (this skill never prompts):
126
+
- Device pick (parent step 5, "If multiple are booted, ask the user which"): pick the **first booted device** in `agent-device devices --json` order, deterministically. Log the choice in the manifest under `device_selected`.
127
+
- Session reuse vs reset (parent step 6, line 73): **always `reset`** for sessions not created in the current invocation - run `agent-device close --shutdown --session <name>` without prompting. Phase 1 and Phase 2 both rely on cold starts, so reuse of stale sessions is never desired here.
125
128
126
129
2.**Close the bring-up session** so each phase starts cold:
127
130
```bash
@@ -130,8 +133,8 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
130
133
131
134
3.**Set up run directory** - persistent, append-only:
| Phase 1 step uninterpretable by LLM | Mark flow `phase1_failed`, log the step that failed, continue to next flow |
314
317
| Phase 1 a11y empty (0 nodes) on a screen | Use coordinate fallback; log `warnings: ["a11y_fallback:<screen>"]` |
315
318
| Phase 1 `$TEST_FLOW.ad` empty after warm-up | Mark flow `phase1_failed`, continue |
316
319
| Phase 2 `replay` fails on a step | Mark flow `phase2_failed`, continue. |
317
320
| `record stop` produces 0-byte file | Retry Phase 2 once for that flow; if still empty, mark `phase2_failed` |
318
321
| Android flow exceeds 3-min cap | Mark `phase2_failed`, continue (per-flow MP4s should rarely hit this; if they do, the Tests section is too coarse-grained) |
319
322
320
-
## Non-goals
323
+
## Out of scope (do not do these)
321
324
322
-
- Mobile web (`iOS: mWeb Safari`, `Android: mWeb Chrome`) and Desktop (`MacOS: Chrome / Safari`) - belong in `playwright-app-testing` or a future browser-driver skill.
323
-
- Standalone (non-HybridApp) builds - parent skill is HybridApp-only and this specialization inherits the gate. Production mobile evidence runs against HybridApp.
324
-
- Device lifecycle (Metro, simulator boot, bundle ID resolution, session reuse, app install verification) - fully delegated to the parent skill's [Bring-up](../agent-device/SKILL.md#bring-up). This skill does not call `agent-device metro prepare`, `xcrun simctl`, or `is-hybrid-app.sh` directly.
325
-
- Editing the PR body or posting PR comments - the skill only writes local files.
326
-
- Interactive prompts of any kind. CI is the eventual host; the skill must run end-to-end without human input.
325
+
The skill must not attempt any of the following. If a request implies one of these, refuse or delegate.
326
+
327
+
- **Mobile web and Desktop platforms** (`iOS: mWeb Safari`, `Android: mWeb Chrome`, `MacOS: Chrome / Safari`) - belong in `playwright-app-testing` or a future browser-driver skill. Exit `4 PLATFORM_UNSUPPORTED`.
328
+
- **Standalone (non-HybridApp) builds** - parent skill is HybridApp-only and this specialization inherits the gate. Production mobile evidence runs against HybridApp.
329
+
- **Device lifecycle** (Metro, simulator boot, bundle ID resolution, session reuse, app install verification) - fully delegated to the parent skill's [Bring-up](../agent-device/SKILL.md#bring-up). Do not call `agent-device metro prepare`, `xcrun simctl`, or `is-hybrid-app.sh` directly.
330
+
- **Editing the PR body or posting PR comments** - the skill only writes local files. The user handles upload.
331
+
- **Interactive prompts of any kind** - CI is the eventual host; the skill must run end-to-end without human input.
327
332
- Test data cleanup. Accounts/expenses/workspaces created during runs accumulate; rely on periodic test-account reset.
0 commit comments