You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+15-13Lines changed: 15 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -153,7 +153,9 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
153
153
- Do not duplicate `makeSessionStore`, `makeSession`, or device constants when a shared helper already exists.
154
154
155
155
## Testing Matrix
156
-
- Docs/skills only: no tests required.
156
+
- Docs/skills only: no tests required unless a more specific rule below applies.
157
+
- CLI help/guidance changes in `src/utils/command-schema.ts`: run `pnpm exec vitest run src/utils/__tests__/args.test.ts`.
158
+
- SkillGym prompt/assertion changes: run the touched `--case` checks; for broad validation, run cases in batches of 20 or fewer because full-suite runs can hang.
157
159
- Non-TS, no behavior impact: no tests unless requested.
158
160
- Keep tests behavioral; do not assert shapes or cases TypeScript already proves.
159
161
- Any TS change: `pnpm typecheck` or `pnpm check:quick`.
@@ -182,18 +184,18 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
182
184
- Changing `tsconfig.lib.json`/build tooling without running `pnpm check:tooling`; declaration generation is stricter than `tsc --noEmit`.
- top-level `SKILL.md` should stay a thin router, not a full manual.
190
-
- keep detailed workflows/troubleshooting in a `references/` folder instead of growing the router.
191
-
- isolate true platform/infra exceptions (for example macOS-only or remote-tenancy-only guidance) in dedicated files.
192
-
- do not delete high-value operational guidance during refactors; move or condense it unless the behavior is obsolete.
193
-
- Optimize skills for cheap, less capable models:
194
-
-keep routing explicit, shallow, and easy to follow in one pass.
195
-
-prefer short task-first steps, concrete commands, and low-ambiguity wording over dense prose.
196
-
-avoid long reference chains or “figure it out” guidance when a direct next action can be stated.
187
+
-Versioned CLI help is the agent-facing source of truth. Put workflow guidance in `src/utils/command-schema.ts` help topics and assert important copy in `src/utils/__tests__/args.test.ts`.
188
+
-Skills are thin routers. Keep `skills/**/SKILL.md`focused on when to use the skill, version gating, which `agent-device help <topic>` page to read, and a short default loop. Do not duplicate full CLI manuals in skills.
189
+
-For behavior/CLI surface changes, update `README.md`, relevant `website/docs/**`, and router skills only when their short routing guidance or version assumptions change.
190
+
-For command-planning guidance changes, update `test/skillgym/suites/agent-device-smoke-suite.ts` when the change should alter what an agent plans.
191
+
- Keep SkillGym cases behavioral and command-planning oriented. Prefer prompts that assert the user-visible contract and expected command family over brittle exact output, but forbid known bad patterns.
192
+
- Build before SkillGym when local CLI help is needed: `pnpm build`, then `pnpm exec skillgym run ... --case <id>`.
193
+
- Run SkillGym broad validation in batches of 20 cases or fewer using repeated `--case` runs; do not rely on one full-suite invocation for large runs.
194
+
- Preserve current high-value workflow guidance:
195
+
- iOS Expo Go dogfood: prefer `agent-device open "Expo Go" <url> --platform ios` when the shell is known, then `snapshot -i` to confirm the project UI rather than the runner splash.
196
+
-`keyboard dismiss` is best-effort on iOS; prefer a visible app dismiss control, or `back --system` only when system navigation is acceptable.
197
+
-Empty replacement is not a supported clear-field command; do not document or test `fill <target> ""` as clearing. Prefer visible clear/reset controls or report the tool gap.
198
+
-Mutating commands against one session must run serially. Parallelize only read-only commands or commands on separate sessions/devices.
197
199
- In final summaries, state whether docs/skills were updated; if not, explain why.
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Let `help workflow` provide the exact command shapes.
34
+
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Serialize mutating commands against one session; only parallelize read-only work or separate sessions. Let `help workflow` provide the exact command shapes, including Expo Go, keyboard, and clear-field limits.
Copy file name to clipboardExpand all lines: skills/dogfood/SKILL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,4 +22,4 @@ agent-device help dogfood
22
22
23
23
Loop: open named session -> snapshot -i + screenshot -> explore flows -> capture evidence per issue -> close.
24
24
25
-
Target app is required; infer platform or ask. Default output is `./dogfood-output/`. Findings must come from runtime behavior, not source reads. Re-snapshot after mutations. Use logs, network, trace, perf, overlay screenshots, or react-devtools only when they add evidence.
25
+
Target app is required; infer platform or ask. Default output is `./dogfood-output/`. Findings must come from runtime behavior, not source reads. Re-snapshot after mutations. Keep mutating commands serial within one session; use `help dogfood` and `help workflow` for Expo Go, keyboard, and clear-field limits. Use logs, network, trace, perf, overlay screenshots, or react-devtools only when they add evidence.
'Read-only visible/state question: use snapshot/get/is/find; use snapshot -i only when refs are needed.',
181
181
'Truncated text/input preview: expand first with snapshot -s @e12, not get text.',
182
182
'RN warning/error overlays can block taps: snapshot -i, dismiss/close, then diff snapshot -i.',
183
-
'Expo Go/dev clients: use the provided URL when given; if only a target name is given, open that target and do not search project files for a URL.',
183
+
'Expo Go/dev clients: use the provided URL when given; on iOS prefer open "Expo Go" <url> when the host shell is known.',
184
184
'Install flows: install/install-from-source first, then open the installed id with --relaunch.',
185
185
'Text: fill \'id="field-email"\' "qa@example.com" replaces; type appends after press.',
186
+
'Clearing text: do not use fill <target> ""; use a visible clear/reset control or report that clearing is unsupported.',
187
+
'Run mutating commands serially against one session; parallelize only read-only commands or separate sessions.',
186
188
'Clipboard limits: iOS Allow Paste cannot be automated through XCUITest; prefill with clipboard write. Android non-ASCII should use fill/type, not raw adb input.',
'Raw coordinates are fallback-only: use snapshot -i -c --json rects when iOS refs no-op or child refs are missing.',
@@ -283,11 +285,17 @@ Text entry:
283
285
agent-device fill 'id="field-email"' "qa@example.com"
284
286
agent-device press 'id="product-note"'
285
287
agent-device type "Handle with care" --delay-ms 80
286
-
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: keyboard dismiss.
288
+
Empty replacement is not a supported clear-field command: do not plan fill <target> "" or fill <target> ''. Prefer a visible clear/reset control; if the app exposes none, report the tool gap instead of inventing a clear command.
289
+
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: try keyboard dismiss when supported.
290
+
iOS keyboard dismiss is best-effort and can return UNSUPPORTED_OPERATION when no native dismiss gesture/control is available. Prefer a visible app dismiss control, or use back --system only when system navigation is an acceptable side effect.
287
291
Search-as-you-type fields on iOS can drop characters when driven too fast; use --delay-ms on fill/type before trying clipboard paste.
288
292
iOS Allow Paste prompt cannot be exercised under XCUITest. To test paste-driven app behavior, prefill first with agent-device clipboard write "some text"; test the system prompt manually.
289
293
Android non-ASCII can fail on some system images. Try fill/type normally; agent-device uses safer fallbacks. If the shell reports unsupported non-ASCII input, configure a trusted ADB keyboard IME outside the command plan and restore the previous IME afterward.
290
294
295
+
Session ordering:
296
+
Stateful commands against one --session must run serially. Do not run open/press/fill/type/scroll/back/alert/replay/batch/close commands in parallel against the same session.
297
+
It is fine to parallelize independent read-only collection or commands that use different sessions/devices.
298
+
291
299
Read-only and waits:
292
300
Read-only visible/state question: use snapshot/get/is/find.
293
301
agent-device snapshot
@@ -334,9 +342,11 @@ React Native dev loop:
334
342
agent-device find "Home"
335
343
Do not use agent-device reload. Use open --relaunch for native startup reset.
336
344
Warning/error overlays can obscure UI and intercept taps. If snapshot -i shows one, dismiss/close its visible control (for example Dismiss or Close) if it is not the task target, then diff snapshot -i or snapshot -i before tapping the real UI.
337
-
Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. iOS simulators can open a URL directly; use host + URL when targeting a specific host shell:
338
-
agent-device open exp://127.0.0.1:8081 --platform ios
345
+
Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. On iOS, prefer host + URL when the host shell is known because direct URL open can report success while leaving the runner/shell focused; verify with snapshot -i after opening:
339
346
agent-device open "Expo Go" exp://127.0.0.1:8081 --platform ios
347
+
agent-device snapshot -i --platform ios
348
+
Direct iOS URL open remains valid when no host shell is known, but verify that the app UI loaded:
349
+
agent-device open exp://127.0.0.1:8081 --platform ios
340
350
Android uses the URL target directly; do not write open <app> <url> there:
341
351
agent-device open exp://127.0.0.1:8081 --platform android
342
352
If apps lookup misses the project but shows Expo Go/dev-client and a project URL is available, open the URL/host shell; if no URL is available, ask instead of inventing an app id.
@@ -536,11 +546,12 @@ Loop:
536
546
4. Map top-level navigation, then exercise primary flows and edge states.
537
547
5. For each issue, capture evidence and write the finding immediately, then continue.
538
548
6. Close the session and reconcile the report summary.
549
+
Keep stateful commands serial within the same session. Parallel runs can pollute text fields, focus, alerts, and navigation state.
539
550
540
551
Coverage:
541
552
Navigation, forms, empty/error/loading states, offline or retry behavior, permissions, settings, accessibility labels, orientation/keyboard, and obvious performance stalls.
542
553
React Native warning/error overlays can be real findings or test blockers. Capture them, dismiss if unrelated, re-snapshot, and report them.
543
-
Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested.
554
+
Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested. On iOS dogfood, prefer agent-device open "Expo Go" <url> when Expo Go is the known shell, then snapshot -i to confirm the project UI rather than the runner splash.
- remote config, macOS menu bar surfaces, replay update, and batch schema/recording
43
+
- remote config, macOS menu bar surfaces, replay update, same-session mutation ordering, and batch schema/recording
44
44
45
45
`assertAgentDeviceEvidence` is intentionally soft when a runner does not expose skill-detection telemetry. When telemetry exists, the suite asserts that `agent-device` was loaded; when it is absent, the cases still judge command-planning output instead of failing on missing runner metadata.
0 commit comments