Skip to content

Commit e14c073

Browse files
Address PR review feedback on agent-device-evidence
- Add non-interactive overrides for the parent bring-up (deterministic device pick, forced session reset) so the skill honors its autonomous contract regardless of parent prompts. - Key run directories by source kind (pr-N / issue-N) and rename PR_NUM to SOURCE_NUM/SOURCE_KIND to match the documented output layout and remove PR-vs-issue artifact ambiguity. - Broaden exit code 7 from NO_BUILD to BRING_UP_FAILED so Metro, gate, and simulator-boot failures don't masquerade as install failures. - Rename "Non-goals" to "Out of scope (do not do these)" with an imperative lead-in for clearer prohibition framing.
1 parent 4611ad6 commit e14c073

1 file changed

Lines changed: 17 additions & 12 deletions

File tree

  • .claude/skills/agent-device-evidence

.claude/skills/agent-device-evidence/SKILL.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ HybridApp-only (the parent skill's pre-flight enforces this). Standalone (non-Hy
1818

1919
**In scope:** `iOS: Native` (iOS Simulator), `Android: Native` (Android Emulator), HybridApp dev build only. Inputs may come from PRs or issues - the skill does not gate on code changes.
2020

21-
**Out of scope:** `Android: mWeb Chrome`, `iOS: mWeb Safari`, `iOS: mWeb Chrome`, `Windows: Chrome`, `MacOS: Chrome / Safari`. Decline with `EXIT 4` and point to a browser-driver skill (`playwright-app-testing`). Standalone (non-HybridApp) builds. Decline with `EXIT 7 NO_BUILD` per the parent skill's gate.
21+
**Out of scope:** `Android: mWeb Chrome`, `iOS: mWeb Safari`, `iOS: mWeb Chrome`, `Windows: Chrome`, `MacOS: Chrome / Safari`. Decline with `EXIT 4` and point to a browser-driver skill (`playwright-app-testing`). Standalone (non-HybridApp) builds. Decline with `EXIT 7 BRING_UP_FAILED` per the parent skill's gate.
2222

2323
## Inputs
2424

@@ -120,8 +120,11 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
120120
### Shared setup (run once per platform, before the first flow)
121121

122122
1. **Run the [agent-device bring-up](../agent-device/SKILL.md#bring-up)** for the target platform. The parent skill resolves bundle ID, starts Metro, picks/confirms the device, manages session, and opens the app for sanity verification. Capture the resolved `$APP_ID` (bundle ID) and `$DEVICE_NAME` for re-opens in Phases 1 and 2.
123-
- If the bring-up's HybridApp gate fails or the dev build is not installed, **exit `7 NO_BUILD`** with the parent skill's install instructions.
123+
- If bring-up fails for any reason (HybridApp gate, missing dev build, Metro start, simulator boot, etc.), **exit `7 BRING_UP_FAILED`** and surface the parent skill's error verbatim.
124124
- Selector discipline (id > role+label, no coordinate fallback unless 0 a11y nodes) follows the parent skill's [`flows/README.md`](../agent-device/flows/README.md).
125+
- **Non-interactive overrides for the parent bring-up** (this skill never prompts):
126+
- Device pick (parent step 5, "If multiple are booted, ask the user which"): pick the **first booted device** in `agent-device devices --json` order, deterministically. Log the choice in the manifest under `device_selected`.
127+
- Session reuse vs reset (parent step 6, line 73): **always `reset`** for sessions not created in the current invocation - run `agent-device close --shutdown --session <name>` without prompting. Phase 1 and Phase 2 both rely on cold starts, so reuse of stale sessions is never desired here.
125128

126129
2. **Close the bring-up session** so each phase starts cold:
127130
```bash
@@ -130,8 +133,8 @@ Two phases per flow. Lifecycle delegated to the parent skill's bring-up. Phase 1
130133

131134
3. **Set up run directory** - persistent, append-only:
132135
```bash
133-
PR_NUM=<num>; RUN_TS=$(date -u +%Y%m%dT%H%M%SZ)
134-
RUN_DIR="$HOME/.cache/agent-device-evidence/$PR_NUM/$RUN_TS"
136+
SOURCE_KIND=<pr|issue>; SOURCE_NUM=<num>; RUN_TS=$(date -u +%Y%m%dT%H%M%SZ)
137+
RUN_DIR="$HOME/.cache/agent-device-evidence/$SOURCE_KIND-$SOURCE_NUM/$RUN_TS"
135138
mkdir -p "$RUN_DIR/ios" "$RUN_DIR/android"
136139
```
137140

@@ -288,7 +291,7 @@ After all platforms, the skill prints the run directory and lists per-flow paths
288291
| `4` | `PLATFORM_UNSUPPORTED` - mWeb / Desktop / Windows requested or only out-of-scope platforms checked on the source. |
289292
| `5` | `PHASE1_TOTAL_FAILURE` - every flow failed Phase 1. |
290293
| `6` | `PHASE2_TOTAL_FAILURE` - every flow failed Phase 2 despite Phase 1 success. |
291-
| `7` | `NO_BUILD` - `agent-device open` failed because the dev build is not installed. |
294+
| `7` | `BRING_UP_FAILED` - parent skill bring-up failed (missing dev build, HybridApp gate, Metro start, simulator boot, etc.). Parent error is surfaced verbatim. |
292295
| `8` | `BAD_INPUT` - source URL is missing, malformed, or not a recognised PR/issue URL. |
293296

294297
## Cost guards
@@ -309,19 +312,21 @@ Hitting any cap marks the flow `phase1_failed` / `phase2_failed` and proceeds to
309312
| Steps section missing or empty (PR `### Tests` / issue `## Action Performed:`) | Exit `3 NO_FLOWS` |
310313
| Only out-of-scope platforms checked on issue (e.g. `MacOS: Chrome / Safari` only) | Exit `4 PLATFORM_UNSUPPORTED` |
311314
| mWeb / Desktop / Windows explicitly requested via `--platforms` | Exit `4 PLATFORM_UNSUPPORTED` |
312-
| Bring-up fails (HybridApp gate, missing dev build, Metro start, etc.) | Surface parent skill's error verbatim; exit `7 NO_BUILD` |
315+
| Bring-up fails (HybridApp gate, missing dev build, Metro start, etc.) | Surface parent skill's error verbatim; exit `7 BRING_UP_FAILED` |
313316
| Phase 1 step uninterpretable by LLM | Mark flow `phase1_failed`, log the step that failed, continue to next flow |
314317
| Phase 1 a11y empty (0 nodes) on a screen | Use coordinate fallback; log `warnings: ["a11y_fallback:<screen>"]` |
315318
| Phase 1 `$TEST_FLOW.ad` empty after warm-up | Mark flow `phase1_failed`, continue |
316319
| Phase 2 `replay` fails on a step | Mark flow `phase2_failed`, continue. |
317320
| `record stop` produces 0-byte file | Retry Phase 2 once for that flow; if still empty, mark `phase2_failed` |
318321
| Android flow exceeds 3-min cap | Mark `phase2_failed`, continue (per-flow MP4s should rarely hit this; if they do, the Tests section is too coarse-grained) |
319322
320-
## Non-goals
323+
## Out of scope (do not do these)
321324
322-
- Mobile web (`iOS: mWeb Safari`, `Android: mWeb Chrome`) and Desktop (`MacOS: Chrome / Safari`) - belong in `playwright-app-testing` or a future browser-driver skill.
323-
- Standalone (non-HybridApp) builds - parent skill is HybridApp-only and this specialization inherits the gate. Production mobile evidence runs against HybridApp.
324-
- Device lifecycle (Metro, simulator boot, bundle ID resolution, session reuse, app install verification) - fully delegated to the parent skill's [Bring-up](../agent-device/SKILL.md#bring-up). This skill does not call `agent-device metro prepare`, `xcrun simctl`, or `is-hybrid-app.sh` directly.
325-
- Editing the PR body or posting PR comments - the skill only writes local files.
326-
- Interactive prompts of any kind. CI is the eventual host; the skill must run end-to-end without human input.
325+
The skill must not attempt any of the following. If a request implies one of these, refuse or delegate.
326+
327+
- **Mobile web and Desktop platforms** (`iOS: mWeb Safari`, `Android: mWeb Chrome`, `MacOS: Chrome / Safari`) - belong in `playwright-app-testing` or a future browser-driver skill. Exit `4 PLATFORM_UNSUPPORTED`.
328+
- **Standalone (non-HybridApp) builds** - parent skill is HybridApp-only and this specialization inherits the gate. Production mobile evidence runs against HybridApp.
329+
- **Device lifecycle** (Metro, simulator boot, bundle ID resolution, session reuse, app install verification) - fully delegated to the parent skill's [Bring-up](../agent-device/SKILL.md#bring-up). Do not call `agent-device metro prepare`, `xcrun simctl`, or `is-hybrid-app.sh` directly.
330+
- **Editing the PR body or posting PR comments** - the skill only writes local files. The user handles upload.
331+
- **Interactive prompts of any kind** - CI is the eventual host; the skill must run end-to-end without human input.
327332
- Test data cleanup. Accounts/expenses/workspaces created during runs accumulate; rely on periodic test-account reset.

0 commit comments

Comments
 (0)