Skip to content

Commit 6e8557f

Browse files
committed
chore: optimize agent-device skill onboarding
1 parent 5506548 commit 6e8557f

3 files changed

Lines changed: 79 additions & 233 deletions

File tree

skills/agent-device/SKILL.md

Lines changed: 66 additions & 231 deletions
Original file line numberDiff line numberDiff line change
@@ -5,288 +5,123 @@ description: Automates interactions for iOS simulators/devices and Android emula
55

66
# Mobile Automation with agent-device
77

8-
For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
8+
For exploration, use snapshot refs. For deterministic replay, use selectors.
99

10-
## Quick start
10+
## Start Here (Read This First)
11+
12+
Use this skill as a router, not a full manual.
13+
14+
1. Pick one mode:
15+
- Normal interaction flow
16+
- Debug/crash flow
17+
- Replay maintenance flow
18+
2. Run one canonical flow below.
19+
3. Open references only if blocked.
20+
21+
## Decision Map
22+
23+
- No target context yet: `devices` -> pick target -> `open`.
24+
- Normal UI task: `open` -> `snapshot -i` -> `press/fill` -> `diff snapshot -i` -> `close`
25+
- Debug/crash: `open <app>` -> `logs clear --restart` -> reproduce -> `logs path` -> targeted `grep`
26+
- Replay drift: `replay -u <path>` -> verify updated selectors
27+
28+
## Canonical Flows
29+
30+
### 1) Normal Interaction Flow
1131

1232
```bash
1333
agent-device open Settings --platform ios
1434
agent-device snapshot -i
1535
agent-device press @e3
16-
agent-device wait text "Camera"
17-
agent-device alert wait 10000
1836
agent-device diff snapshot -i
1937
agent-device fill @e5 "test"
2038
agent-device close
2139
```
2240

23-
If not installed, run:
41+
### 2) Debug/Crash Flow
2442

2543
```bash
26-
npx -y agent-device
44+
agent-device open MyApp --platform ios
45+
agent-device logs clear --restart
46+
agent-device logs path
2747
```
2848

29-
## Core workflow
49+
Logging is off by default. Enable only for debugging windows.
50+
`logs clear --restart` requires an active app session (`open <app>` first).
3051

31-
1. Open app or deep link: `open [app|url] [url]` (`open` handles target selection + boot/activation in the normal flow)
32-
2. Snapshot: `snapshot` to get refs from accessibility tree
33-
3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`)
34-
4. Re-snapshot after navigation/UI changes
35-
5. Close session when done
36-
37-
## Commands
38-
39-
### Navigation
52+
### 3) Replay Maintenance Flow
4053

4154
```bash
42-
agent-device boot # Ensure target is booted/ready without opening app
43-
agent-device boot --platform ios # Boot iOS target
44-
agent-device boot --platform android # Boot Android emulator/device target
45-
agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL
46-
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
47-
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
48-
agent-device open "myapp://home" --platform android # Android deep link
49-
agent-device open "https://example.com" --platform ios # iOS deep link (opens in browser)
50-
agent-device open MyApp "myapp://screen/to" --platform ios # iOS deep link in app context
51-
agent-device close [app] # Close app or just end session
52-
agent-device reinstall <app> <path> # Uninstall + install app in one command
53-
agent-device session list # List active sessions
55+
agent-device replay -u ./session.ad
5456
```
5557

56-
`boot` requires either an active session or an explicit selector (`--platform`, `--device`, `--udid`, or `--serial`).
57-
`boot` is a fallback, not a regular step: use it when starting a new session only if `open` cannot find/connect to an available target.
58+
## Command Skeleton (Minimal)
5859

59-
### Snapshot (page analysis)
60+
### Session and navigation
6061

6162
```bash
62-
agent-device snapshot # Full XCTest accessibility tree snapshot
63-
agent-device snapshot -i # Interactive elements only (recommended)
64-
agent-device snapshot -c # Compact output
65-
agent-device snapshot -d 3 # Limit depth
66-
agent-device snapshot -s "Camera" # Scope to label/identifier
67-
agent-device snapshot --raw # Raw node output
68-
agent-device diff snapshot # Structural diff against previous session baseline
63+
agent-device devices
64+
agent-device open [app|url] [url]
65+
agent-device open [app] --relaunch
66+
agent-device close [app]
67+
agent-device session list
6968
```
7069

71-
XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required.
72-
73-
Snapshot diff notes:
74-
- First `diff snapshot` call initializes baseline for the current session.
75-
- Subsequent `diff snapshot` calls compare current UI to prior baseline and then update baseline.
76-
- Use this for compact change tracking between adjacent UI states.
70+
Use `boot` only as fallback when `open` cannot find/connect to a ready target.
7771

78-
### Find (semantic)
72+
### Snapshot and targeting
7973

8074
```bash
75+
agent-device snapshot -i
76+
agent-device diff snapshot -i
8177
agent-device find "Sign In" click
82-
agent-device find text "Sign In" click
83-
agent-device find label "Email" fill "user@example.com"
84-
agent-device find value "Search" type "query"
85-
agent-device find role button click
86-
agent-device find id "com.example:id/login" click
87-
agent-device find "Settings" wait 10000
88-
agent-device find "Settings" exists
78+
agent-device press @e1
79+
agent-device fill @e2 "text"
80+
agent-device is visible 'id="anchor"'
8981
```
9082

91-
### Settings helpers
83+
`press` is canonical tap command; `click` is an alias.
9284

93-
```bash
94-
agent-device settings wifi on
95-
agent-device settings wifi off
96-
agent-device settings airplane on
97-
agent-device settings airplane off
98-
agent-device settings location on
99-
agent-device settings location off
100-
agent-device settings faceid match
101-
agent-device settings faceid nonmatch
102-
agent-device settings faceid enroll
103-
agent-device settings faceid unenroll
104-
```
105-
106-
Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
107-
Airplane off clears status bar overrides.
108-
iOS settings helpers are simulator-only.
109-
Use `match`/`nonmatch` as the canonical command values.
110-
Think of them as validate/invalidate outcomes when describing intent.
111-
112-
### Logs (token-efficient debugging)
113-
114-
Use the detailed logs workflow reference:
115-
`skills/agent-device/references/logs-and-debug.md`
116-
117-
Recommended minimum:
118-
119-
```bash
120-
agent-device logs doctor
121-
agent-device logs clear --restart
122-
agent-device logs path
123-
```
124-
125-
Logging is off by default for normal flows. Turn it on only for debugging windows.
126-
127-
### App state
85+
### Utilities
12886

12987
```bash
13088
agent-device appstate
131-
```
132-
133-
- Android: `appstate` reports live foreground package/activity.
134-
- iOS: `appstate` is session-scoped and reports the app tracked by the active session on the target device.
135-
- For iOS `appstate`, ensure a matching session exists (for example `open --session <name> --platform ios --device "<name>" <app>`).
136-
137-
### Interactions (use @refs from snapshot)
138-
139-
```bash
140-
agent-device press @e1 # Canonical tap command (`click` is an alias)
141-
agent-device focus @e2
142-
agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch)
143-
agent-device type "text" # Type into focused field without clearing
144-
agent-device press 300 500 # Tap by coordinates
145-
agent-device press 300 500 --count 12 --interval-ms 45
146-
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
147-
agent-device press @e1 --count 5 # Repeat taps on the same target
148-
agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
149-
agent-device swipe 540 1500 540 500 120
150-
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
151-
agent-device longpress 300 500 800 # Long press on iOS and Android
152-
agent-device scroll down 0.5
153-
agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
154-
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
155-
agent-device back
156-
agent-device home
157-
agent-device app-switcher
158-
agent-device wait 1000
159-
agent-device wait text "Settings"
160-
agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks
161-
agent-device is text 'id="header_title"' "Settings"
162-
agent-device alert get
163-
```
164-
165-
### Get information
166-
167-
```bash
16889
agent-device get text @e1
169-
agent-device get attrs @e1
17090
agent-device screenshot out.png
91+
agent-device trace start
92+
agent-device trace stop ./trace.log
17193
```
17294

173-
### Deterministic replay and updating
95+
### Batch (when sequence is already known)
17496

17597
```bash
176-
agent-device open App --relaunch # Fresh app process restart in the current session
177-
agent-device open App --save-script # Save session script (.ad) on close (default path)
178-
agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path
179-
agent-device replay ./session.ad # Run deterministic replay from .ad script
180-
agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
98+
agent-device batch --steps-file /tmp/batch-steps.json --json
18199
```
182100

183-
`replay` reads `.ad` recordings.
184-
`--relaunch` controls launch semantics; `--save-script` controls recording. Combine only when both are needed.
185-
`--save-script` path is a file path; parent directories are created automatically.
186-
For ambiguous bare values, use `--save-script=workflow.ad` or `./workflow.ad`.
187-
188-
### Fast batching (JSON steps)
101+
## Guardrails (High Value Only)
189102

190-
Use `batch` when an agent already has a known short sequence and wants fewer orchestration round trips.
191-
192-
```bash
193-
agent-device batch \
194-
--session sim \
195-
--platform ios \
196-
--udid 00008150-001849640CF8401C \
197-
--steps-file /tmp/batch-steps.json \
198-
--json
199-
```
200-
201-
Inline JSON works for small payloads:
202-
203-
```bash
204-
agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'
205-
```
103+
- Re-snapshot after UI mutations (navigation/modal/list changes).
104+
- Prefer `snapshot -i`; scope/depth only when needed.
105+
- Use refs for discovery, selectors for replay/assertions.
106+
- Use `fill` for clear-then-type semantics; use `type` for focused append typing.
107+
- iOS `appstate` is session-scoped; Android `appstate` is live foreground state.
108+
- iOS settings helpers are simulator-only; use faceid `match|nonmatch|enroll|unenroll`.
109+
- If using `--save-script`, prefer explicit path syntax (`--save-script=flow.ad` or `./flow.ad`).
206110

207-
Step format:
111+
## Common Mistakes
208112

209-
```json
210-
[
211-
{ "command": "open", "positionals": ["settings"], "flags": {} },
212-
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
213-
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
214-
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
215-
]
216-
```
217-
218-
Batch best practices:
219-
220-
- Batch one screen-local flow at a time.
221-
- Add sync guards (`wait`, `is exists`) after mutating steps (`open`, `click`, `fill`, `swipe`).
222-
- Treat prior refs/snapshot assumptions as stale after UI mutations.
223-
- Prefer `--steps-file` over inline JSON.
224-
- Keep batches moderate (about 5-20 steps).
225-
- Use failure context (`step`, `partialResults`) to replan from the failed step.
226-
227-
Stale accessibility tree note:
228-
229-
- Rapid mutations can outrun accessibility tree updates.
230-
- Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup).
231-
232-
### Trace logs (XCTest)
233-
234-
```bash
235-
agent-device trace start # Start trace capture
236-
agent-device trace start ./trace.log # Start trace capture to path
237-
agent-device trace stop # Stop trace capture
238-
agent-device trace stop ./trace.log # Stop and move trace log
239-
```
240-
241-
### Devices and apps
242-
243-
```bash
244-
agent-device devices
245-
agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps
246-
agent-device apps --platform ios --all # explicit include-all (same as default)
247-
agent-device apps --platform ios --user-installed
248-
agent-device apps --platform android # includes default/system apps
249-
agent-device apps --platform android --all # explicit include-all (same as default)
250-
agent-device apps --platform android --user-installed
251-
```
113+
- Mixing debug flow into normal runs (keep logs off unless debugging).
114+
- Continuing to use stale refs after screen transitions.
115+
- Using URL opens with Android `--activity` (unsupported combination).
116+
- Treating `boot` as default first step instead of fallback.
252117

253-
## Best practices
254-
255-
- `press` is the canonical tap command; `click` is an alias with the same behavior.
256-
- `press` (and `click`) accepts `x y`, `@ref`, and selector targets.
257-
- `press`/`click` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`.
258-
- `--double-tap` cannot be combined with `--hold-ms` or `--jitter-px`.
259-
- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`.
260-
- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid longpress side effects.
261-
- `longpress` is coordinate-based and supported on iOS and Android.
262-
- Pinch (`pinch <scale> [x y]`) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
263-
- Snapshot refs are the core mechanism for interactive agent flows.
264-
- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
265-
- Prefer `snapshot -i` to reduce output size.
266-
- Prefer scoped snapshots (`-s "<label>"` or `-s @ref`) for screen-local tasks.
267-
- Add `-d <depth>` when only upper tree levels matter; avoid full-tree snapshots by default.
268-
- Use `diff snapshot` after mutations to detect structural changes with less output than full re-read.
269-
- Refresh refs immediately after navigation/modal/list mutations before issuing next ref-targeted action.
270-
- Use `--raw` only for debugging parser/tree edge-cases; avoid it for normal agent loops due to size.
271-
- On iOS, snapshots use XCTest and do not require Accessibility permission.
272-
- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
273-
- `open <app|url> [url]` can be used within an existing session to switch apps or open deep links.
274-
- `open <app>` updates session app bundle context; `open <app> <url>` opens a deep link on iOS.
275-
- Use `open <app> --relaunch` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
276-
- Use `--session <name>` for parallel sessions; avoid device contention.
277-
- Use `--activity <component>` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
278-
- On iOS devices, `http(s)://` URLs fall back to Safari automatically; custom scheme URLs require an active app in the session.
279-
- iOS physical-device runner requires Xcode signing/provisioning; optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`.
280-
- Default daemon request timeout is `45000`ms. For slow physical-device setup/build, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (for example `120000`).
281-
- For daemon startup troubleshooting, follow stale metadata hints for `~/.agent-device/daemon.json` / `~/.agent-device/daemon.lock`.
282-
- Use `fill` when you want clear-then-type semantics.
283-
- Use `type` when you want to append/enter text without clearing.
284-
- On Android, prefer `fill` for important fields; it verifies entered text and retries once when IME reorders characters.
285-
- If using deterministic replay scripts, use `replay -u` during maintenance runs to update selector drift in replay scripts. Use plain `replay` in CI.
118+
If the CLI is not installed in environment, use:
119+
`npx -y agent-device`
286120

287121
## References
288122

289123
- [references/snapshot-refs.md](references/snapshot-refs.md)
124+
- [references/logs-and-debug.md](references/logs-and-debug.md)
290125
- [references/session-management.md](references/session-management.md)
291126
- [references/permissions.md](references/permissions.md)
292127
- [references/video-recording.md](references/video-recording.md)

skills/agent-device/references/logs-and-debug.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Logging is off by default in normal flows. Enable it on demand for debugging win
55
## Quick Flow
66

77
```bash
8-
agent-device open MyApp --platform ios
8+
agent-device open MyApp --platform ios # or --platform android
99
agent-device logs clear --restart # Preferred: stop stream, clear logs, and start streaming again
1010
agent-device logs path # Print path, e.g. ~/.agent-device/sessions/default/app.log
1111
agent-device logs doctor # Check tool/runtime readiness for current session/device
@@ -14,6 +14,8 @@ agent-device logs mark "before tap" # Insert a timeline marker into app.log
1414
agent-device logs stop # Stop streaming (optional; close also stops)
1515
```
1616

17+
Precondition: `logs clear --restart` requires an active app session (`open <app>` first).
18+
1719
## Command Notes
1820

1921
- `logs path`: returns log file path and metadata (`active`, `state`, `backend`, size, timestamps).
@@ -83,3 +85,8 @@ adb -s <serial> logcat -d | grep -n -E "FATAL EXCEPTION|Process: <package>|Abort
8385
- `FATAL EXCEPTION` with Java stack: uncaught Java/Kotlin exception.
8486
- `signal 6 (SIGABRT)` or `signal 11 (SIGSEGV)` with tombstone refs: native crash path (NDK/JNI/runtime).
8587
- `Low memory killer` / `Killing <pid>` entries: OS memory-pressure/process reclaim.
88+
89+
## Stop Conditions
90+
91+
- If no crash signature appears in app log, switch to platform-native crash sources (`.ips` on iOS, logcat/tombstone flow on Android).
92+
- If signatures are present and root cause class is identified (abort, native fault, memory pressure), stop collecting broad logs and focus on reproducing the specific path.

0 commit comments

Comments
 (0)