Skip to content

Commit b605c83

Browse files
committed
docs: update agent-device guidance
1 parent d1a7641 commit b605c83

10 files changed

Lines changed: 171 additions & 48 deletions

File tree

AGENTS.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,9 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
153153
- Do not duplicate `makeSessionStore`, `makeSession`, or device constants when a shared helper already exists.
154154

155155
## Testing Matrix
156-
- Docs/skills only: no tests required.
156+
- Docs/skills only: no tests required unless a more specific rule below applies.
157+
- CLI help/guidance changes in `src/utils/command-schema.ts`: run `pnpm exec vitest run src/utils/__tests__/args.test.ts`.
158+
- SkillGym prompt/assertion changes: run the touched `--case` checks; for broad validation, run cases in batches of 20 or fewer because full-suite runs can hang.
157159
- Non-TS, no behavior impact: no tests unless requested.
158160
- Keep tests behavioral; do not assert shapes or cases TypeScript already proves.
159161
- Any TS change: `pnpm typecheck` or `pnpm check:quick`.
@@ -182,18 +184,18 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
182184
- Changing `tsconfig.lib.json`/build tooling without running `pnpm check:tooling`; declaration generation is stricter than `tsc --noEmit`.
183185

184186
## Docs & Skills
185-
- For behavior/CLI surface changes, evaluate docs/skills updates.
186-
- Update `README.md` and relevant `website/docs/**` pages for command behavior/flags/aliases/workflows.
187-
- Update relevant `skills/**/SKILL.md` when usage examples/workflow recommendations change.
188-
- Keep skill docs task-first:
189-
- top-level `SKILL.md` should stay a thin router, not a full manual.
190-
- keep detailed workflows/troubleshooting in a `references/` folder instead of growing the router.
191-
- isolate true platform/infra exceptions (for example macOS-only or remote-tenancy-only guidance) in dedicated files.
192-
- do not delete high-value operational guidance during refactors; move or condense it unless the behavior is obsolete.
193-
- Optimize skills for cheap, less capable models:
194-
- keep routing explicit, shallow, and easy to follow in one pass.
195-
- prefer short task-first steps, concrete commands, and low-ambiguity wording over dense prose.
196-
- avoid long reference chains or “figure it out” guidance when a direct next action can be stated.
187+
- Versioned CLI help is the agent-facing source of truth. Put workflow guidance in `src/utils/command-schema.ts` help topics and assert important copy in `src/utils/__tests__/args.test.ts`.
188+
- Skills are thin routers. Keep `skills/**/SKILL.md` focused on when to use the skill, version gating, which `agent-device help <topic>` page to read, and a short default loop. Do not duplicate full CLI manuals in skills.
189+
- For behavior/CLI surface changes, update `README.md`, relevant `website/docs/**`, and router skills only when their short routing guidance or version assumptions change.
190+
- For command-planning guidance changes, update `test/skillgym/suites/agent-device-smoke-suite.ts` when the change should alter what an agent plans.
191+
- Keep SkillGym cases behavioral and command-planning oriented. Prefer prompts that assert the user-visible contract and expected command family over brittle exact output, but forbid known bad patterns.
192+
- Build before SkillGym when local CLI help is needed: `pnpm build`, then `pnpm exec skillgym run ... --case <id>`.
193+
- Run SkillGym broad validation in batches of 20 cases or fewer using repeated `--case` runs; do not rely on one full-suite invocation for large runs.
194+
- Preserve current high-value workflow guidance:
195+
- iOS Expo Go dogfood: prefer `agent-device open "Expo Go" <url> --platform ios` when the shell is known, then `snapshot -i` to confirm the project UI rather than the runner splash.
196+
- `keyboard dismiss` is best-effort on iOS; prefer a visible app dismiss control, or `back --system` only when system navigation is acceptable.
197+
- Empty replacement is not a supported clear-field command; do not document or test `fill <target> ""` as clearing. Prefer visible clear/reset controls or report the tool gap.
198+
- Mutating commands against one session must run serially. Parallelize only read-only commands or commands on separate sessions/devices.
197199
- In final summaries, state whether docs/skills were updated; if not, explain why.
198200

199201
## When Blocked

skills/agent-device/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,4 @@ agent-device help dogfood
3131

3232
Default loop: `open -> snapshot/-i -> get/is/find or press/fill/scroll/wait -> verify -> close`.
3333

34-
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Let `help workflow` provide the exact command shapes.
34+
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Serialize mutating commands against one session; only parallelize read-only work or separate sessions. Let `help workflow` provide the exact command shapes, including Expo Go, keyboard, and clear-field limits.

skills/dogfood/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ agent-device help dogfood
2222

2323
Loop: open named session -> snapshot -i + screenshot -> explore flows -> capture evidence per issue -> close.
2424

25-
Target app is required; infer platform or ask. Default output is `./dogfood-output/`. Findings must come from runtime behavior, not source reads. Re-snapshot after mutations. Use logs, network, trace, perf, overlay screenshots, or react-devtools only when they add evidence.
25+
Target app is required; infer platform or ask. Default output is `./dogfood-output/`. Findings must come from runtime behavior, not source reads. Re-snapshot after mutations. Keep mutating commands serial within one session; use `help dogfood` and `help workflow` for Expo Go, keyboard, and clear-field limits. Use logs, network, trace, perf, overlay screenshots, or react-devtools only when they add evidence.

src/utils/__tests__/args.test.ts

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -796,9 +796,11 @@ test('usage includes agent workflows, config, environment, and examples footers'
796796
assert.match(usageText, /Truncated text\/input preview: expand first with snapshot -s @e12/);
797797
assert.match(usageText, /RN warning\/error overlays can block taps: snapshot -i/);
798798
assert.match(usageText, /Expo Go\/dev clients: use the provided URL when given/);
799-
assert.match(usageText, /if only a target name is given, open that target/);
799+
assert.match(usageText, /on iOS prefer open "Expo Go" <url>/);
800800
assert.match(usageText, /Install flows: install\/install-from-source first/);
801801
assert.match(usageText, /fill 'id="field-email"' "qa@example\.com" replaces/);
802+
assert.match(usageText, /do not use fill <target> ""/);
803+
assert.match(usageText, /Run mutating commands serially against one session/);
802804
assert.match(usageText, /After mutation: diff snapshot -i/);
803805
assert.match(usageText, /app-owned back uses back/);
804806
assert.match(usageText, /logs clear --restart\/mark\/path/);
@@ -856,13 +858,24 @@ test('usageForCommand resolves workflow help topic', () => {
856858
assert.match(help, /report that gap instead of typing\/searching\/navigating/);
857859
assert.match(help, /If snapshot -i shows one, dismiss\/close its visible control/);
858860
assert.match(help, /iOS Allow Paste prompt cannot be exercised under XCUITest/);
861+
assert.match(help, /Empty replacement is not a supported clear-field command/);
862+
assert.match(help, /do not plan fill <target> ""/);
863+
assert.match(help, /iOS keyboard dismiss is best-effort/);
864+
assert.match(help, /UNSUPPORTED_OPERATION/);
865+
assert.match(help, /Stateful commands against one --session must run serially/);
866+
assert.match(
867+
help,
868+
/Do not run open\/press\/fill\/type\/scroll\/back\/alert\/replay\/batch\/close commands in parallel/,
869+
);
859870
assert.match(help, /agent-device clipboard write "some text"/);
860871
assert.match(help, /trusted ADB keyboard IME/);
861872
assert.match(help, /if no URL is provided but a target\/app name is provided, open that target/);
862873
assert.match(help, /do not split clear\/restart/);
863874
assert.match(help, /do not write network log headers/);
864875
assert.match(help, /agent-device open exp:\/\/127\.0\.0\.1:8081 --platform ios/);
865876
assert.match(help, /agent-device open "Expo Go" exp:\/\/127\.0\.0\.1:8081 --platform ios/);
877+
assert.match(help, /direct URL open can report success while leaving the runner\/shell focused/);
878+
assert.match(help, /verify with snapshot -i after opening/);
866879
assert.match(help, /agent-device open exp:\/\/127\.0\.0\.1:8081 --platform android/);
867880
assert.match(help, /apps lookup misses the project but shows Expo Go\/dev-client/);
868881
assert.match(help, /metro prepare --kind expo/);
@@ -909,6 +922,8 @@ test('usageForCommand resolves dogfood help topic', () => {
909922
assert.match(help, /Static\/on-load issues can use one screenshot/);
910923
assert.match(help, /React Native warning\/error overlays can be real findings/);
911924
assert.match(help, /Expo Go\/dev-client shells/);
925+
assert.match(help, /Keep stateful commands serial within the same session/);
926+
assert.match(help, /prefer agent-device open "Expo Go" <url>/);
912927
assert.match(help, /dogfood-output\/report\.md/);
913928
assert.match(help, /ID, severity, category, title, affected flow\/screen/);
914929
assert.match(help, /Never delete screenshots, videos, traces, or report artifacts/);

src/utils/command-schema.ts

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -180,9 +180,11 @@ const AGENT_QUICKSTART_LINES = [
180180
'Read-only visible/state question: use snapshot/get/is/find; use snapshot -i only when refs are needed.',
181181
'Truncated text/input preview: expand first with snapshot -s @e12, not get text.',
182182
'RN warning/error overlays can block taps: snapshot -i, dismiss/close, then diff snapshot -i.',
183-
'Expo Go/dev clients: use the provided URL when given; if only a target name is given, open that target and do not search project files for a URL.',
183+
'Expo Go/dev clients: use the provided URL when given; on iOS prefer open "Expo Go" <url> when the host shell is known.',
184184
'Install flows: install/install-from-source first, then open the installed id with --relaunch.',
185185
'Text: fill \'id="field-email"\' "qa@example.com" replaces; type appends after press.',
186+
'Clearing text: do not use fill <target> ""; use a visible clear/reset control or report that clearing is unsupported.',
187+
'Run mutating commands serially against one session; parallelize only read-only commands or separate sessions.',
186188
'Clipboard limits: iOS Allow Paste cannot be automated through XCUITest; prefill with clipboard write. Android non-ASCII should use fill/type, not raw adb input.',
187189
'After mutation: diff snapshot -i. Off-screen hints: scroll, then snapshot -i.',
188190
'Raw coordinates are fallback-only: use snapshot -i -c --json rects when iOS refs no-op or child refs are missing.',
@@ -283,11 +285,17 @@ Text entry:
283285
agent-device fill 'id="field-email"' "qa@example.com"
284286
agent-device press 'id="product-note"'
285287
agent-device type "Handle with care" --delay-ms 80
286-
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: keyboard dismiss.
288+
Empty replacement is not a supported clear-field command: do not plan fill <target> "" or fill <target> ''. Prefer a visible clear/reset control; if the app exposes none, report the tool gap instead of inventing a clear command.
289+
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: try keyboard dismiss when supported.
290+
iOS keyboard dismiss is best-effort and can return UNSUPPORTED_OPERATION when no native dismiss gesture/control is available. Prefer a visible app dismiss control, or use back --system only when system navigation is an acceptable side effect.
287291
Search-as-you-type fields on iOS can drop characters when driven too fast; use --delay-ms on fill/type before trying clipboard paste.
288292
iOS Allow Paste prompt cannot be exercised under XCUITest. To test paste-driven app behavior, prefill first with agent-device clipboard write "some text"; test the system prompt manually.
289293
Android non-ASCII can fail on some system images. Try fill/type normally; agent-device uses safer fallbacks. If the shell reports unsupported non-ASCII input, configure a trusted ADB keyboard IME outside the command plan and restore the previous IME afterward.
290294
295+
Session ordering:
296+
Stateful commands against one --session must run serially. Do not run open/press/fill/type/scroll/back/alert/replay/batch/close commands in parallel against the same session.
297+
It is fine to parallelize independent read-only collection or commands that use different sessions/devices.
298+
291299
Read-only and waits:
292300
Read-only visible/state question: use snapshot/get/is/find.
293301
agent-device snapshot
@@ -334,9 +342,11 @@ React Native dev loop:
334342
agent-device find "Home"
335343
Do not use agent-device reload. Use open --relaunch for native startup reset.
336344
Warning/error overlays can obscure UI and intercept taps. If snapshot -i shows one, dismiss/close its visible control (for example Dismiss or Close) if it is not the task target, then diff snapshot -i or snapshot -i before tapping the real UI.
337-
Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. iOS simulators can open a URL directly; use host + URL when targeting a specific host shell:
338-
agent-device open exp://127.0.0.1:8081 --platform ios
345+
Expo Go is a host shell. Use a provided project URL instead of inventing a bundle id; if no URL is provided but a target/app name is provided, open that target and do not inspect project files to find one. On iOS, prefer host + URL when the host shell is known because direct URL open can report success while leaving the runner/shell focused; verify with snapshot -i after opening:
339346
agent-device open "Expo Go" exp://127.0.0.1:8081 --platform ios
347+
agent-device snapshot -i --platform ios
348+
Direct iOS URL open remains valid when no host shell is known, but verify that the app UI loaded:
349+
agent-device open exp://127.0.0.1:8081 --platform ios
340350
Android uses the URL target directly; do not write open <app> <url> there:
341351
agent-device open exp://127.0.0.1:8081 --platform android
342352
If apps lookup misses the project but shows Expo Go/dev-client and a project URL is available, open the URL/host shell; if no URL is available, ask instead of inventing an app id.
@@ -536,11 +546,12 @@ Loop:
536546
4. Map top-level navigation, then exercise primary flows and edge states.
537547
5. For each issue, capture evidence and write the finding immediately, then continue.
538548
6. Close the session and reconcile the report summary.
549+
Keep stateful commands serial within the same session. Parallel runs can pollute text fields, focus, alerts, and navigation state.
539550
540551
Coverage:
541552
Navigation, forms, empty/error/loading states, offline or retry behavior, permissions, settings, accessibility labels, orientation/keyboard, and obvious performance stalls.
542553
React Native warning/error overlays can be real findings or test blockers. Capture them, dismiss if unrelated, re-snapshot, and report them.
543-
Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested.
554+
Expo Go/dev-client shells: use the provided exp:// or dev-client URL and record whether the shell, project load, or app UI is being tested. On iOS dogfood, prefer agent-device open "Expo Go" <url> when Expo Go is the known shell, then snapshot -i to confirm the project UI rather than the runner splash.
544555
Categories: visual, functional, UX, content, performance, diagnostics, permissions, accessibility.
545556
Severity: critical blocks a core flow/data/crashes; high breaks a major feature; medium has friction or workaround; low is polish.
546557

test/skillgym/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The included suite focuses on the first two layers so it stays stable and CI-saf
1616

1717
- `../../examples/test-app/`: minimal Expo SDK 55 fixture app for broad UI coverage
1818
- `skillgym.config.ts`: starter config that runs Codex and Claude Haiku against this repo
19-
- `suites/agent-device-smoke-suite.ts`: 66-case suite for skill routing, fixture-aware planning, and skill-guidance regressions
19+
- `suites/agent-device-smoke-suite.ts`: planning suite for skill routing, fixture-aware flows, and skill-guidance regressions
2020

2121
## Current coverage
2222

@@ -28,19 +28,19 @@ Fixture smoke cases cover concrete app surfaces:
2828
- banners, alerts, toggles, and quick actions on Home
2929
- search debounce, filters, long-list scroll, favorites, and cart updates in Catalog
3030
- detail navigation, quantity edits, note append, and save-to-cart on Product
31-
- form validation, success submit, keyboard dismiss, and reset on Checkout form
31+
- form validation, success submit, iOS keyboard-dismiss fallback, and reset on Checkout form
3232
- diagnostics load/error/retry plus reset alert handling in Settings
3333
- accessibility audit via screenshot + snapshot
3434

3535
Skill-guidance regression cases cover distinct command-planning habits:
3636

3737
- read-only inspection versus mutation
3838
- fresh `@ref` targeting, durable selectors, raw-rect fallbacks, and off-screen scroll recovery
39-
- text replacement, append semantics, keyboard status, and keyboard dismiss
40-
- install/open setup, app discovery, session scoping, and app-owned navigation fallbacks
39+
- text replacement, append semantics, supported field clearing, keyboard status, and keyboard fallback
40+
- install/open setup, Expo Go host-shell launch, app discovery, session scoping, and app-owned navigation fallbacks
4141
- Metro reload, logs, network dump, alert fallback, and screenshot evidence
4242
- performance metrics, React DevTools profiling, gestures, settings, and trace capture
43-
- remote config, macOS menu bar surfaces, replay update, and batch schema/recording
43+
- remote config, macOS menu bar surfaces, replay update, same-session mutation ordering, and batch schema/recording
4444

4545
`assertAgentDeviceEvidence` is intentionally soft when a runner does not expose skill-detection telemetry. When telemetry exists, the suite asserts that `agent-device` was loaded; when it is absent, the cases still judge command-planning output instead of failing on missing runner metadata.
4646

0 commit comments

Comments
 (0)