@@ -5,288 +5,123 @@ description: Automates interactions for iOS simulators/devices and Android emula
55
66# Mobile Automation with agent-device
77
8- For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
8+ For exploration, use snapshot refs. For deterministic replay, use selectors.
99
10- ## Quick start
10+ ## Start Here (Read This First)
11+
12+ Use this skill as a router, not a full manual.
13+
14+ 1 . Pick one mode:
15+ - Normal interaction flow
16+ - Debug/crash flow
17+ - Replay maintenance flow
18+ 2 . Run one canonical flow below.
19+ 3 . Open references only if blocked.
20+
21+ ## Decision Map
22+
23+ - No target context yet: ` devices ` -> pick target -> ` open ` .
24+ - Normal UI task: ` open ` -> ` snapshot -i ` -> ` press/fill ` -> ` diff snapshot -i ` -> ` close `
25+ - Debug/crash: ` open <app> ` -> ` logs clear --restart ` -> reproduce -> ` logs path ` -> targeted ` grep `
26+ - Replay drift: ` replay -u <path> ` -> verify updated selectors
27+
28+ ## Canonical Flows
29+
30+ ### 1) Normal Interaction Flow
1131
1232``` bash
1333agent-device open Settings --platform ios
1434agent-device snapshot -i
1535agent-device press @e3
16- agent-device wait text " Camera"
17- agent-device alert wait 10000
1836agent-device diff snapshot -i
1937agent-device fill @e5 " test"
2038agent-device close
2139```
2240
23- If not installed, run:
41+ ### 2) Debug/Crash Flow
2442
2543``` bash
26- npx -y agent-device
44+ agent-device open MyApp --platform ios
45+ agent-device logs clear --restart
46+ agent-device logs path
2747```
2848
29- ## Core workflow
49+ Logging is off by default. Enable only for debugging windows.
50+ ` logs clear --restart ` requires an active app session (` open <app> ` first).
3051
31- 1 . Open app or deep link: ` open [app|url] [url] ` (` open ` handles target selection + boot/activation in the normal flow)
32- 2 . Snapshot: ` snapshot ` to get refs from accessibility tree
33- 3 . Interact using refs (` press @ref ` , ` fill @ref "text" ` ; ` click ` is an alias of ` press ` )
34- 4 . Re-snapshot after navigation/UI changes
35- 5 . Close session when done
36-
37- ## Commands
38-
39- ### Navigation
52+ ### 3) Replay Maintenance Flow
4053
4154``` bash
42- agent-device boot # Ensure target is booted/ready without opening app
43- agent-device boot --platform ios # Boot iOS target
44- agent-device boot --platform android # Boot Android emulator/device target
45- agent-device open [app| url] [url] # Boot device/simulator; optionally launch app or deep link URL
46- agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
47- agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
48- agent-device open " myapp://home" --platform android # Android deep link
49- agent-device open " https://example.com" --platform ios # iOS deep link (opens in browser)
50- agent-device open MyApp " myapp://screen/to" --platform ios # iOS deep link in app context
51- agent-device close [app] # Close app or just end session
52- agent-device reinstall < app> < path> # Uninstall + install app in one command
53- agent-device session list # List active sessions
55+ agent-device replay -u ./session.ad
5456```
5557
56- ` boot ` requires either an active session or an explicit selector (` --platform ` , ` --device ` , ` --udid ` , or ` --serial ` ).
57- ` boot ` is a fallback, not a regular step: use it when starting a new session only if ` open ` cannot find/connect to an available target.
58+ ## Command Skeleton (Minimal)
5859
59- ### Snapshot (page analysis)
60+ ### Session and navigation
6061
6162``` bash
62- agent-device snapshot # Full XCTest accessibility tree snapshot
63- agent-device snapshot -i # Interactive elements only (recommended)
64- agent-device snapshot -c # Compact output
65- agent-device snapshot -d 3 # Limit depth
66- agent-device snapshot -s " Camera" # Scope to label/identifier
67- agent-device snapshot --raw # Raw node output
68- agent-device diff snapshot # Structural diff against previous session baseline
63+ agent-device devices
64+ agent-device open [app| url] [url]
65+ agent-device open [app] --relaunch
66+ agent-device close [app]
67+ agent-device session list
6968```
7069
71- XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required.
72-
73- Snapshot diff notes:
74- - First ` diff snapshot ` call initializes baseline for the current session.
75- - Subsequent ` diff snapshot ` calls compare current UI to prior baseline and then update baseline.
76- - Use this for compact change tracking between adjacent UI states.
70+ Use ` boot ` only as fallback when ` open ` cannot find/connect to a ready target.
7771
78- ### Find (semantic)
72+ ### Snapshot and targeting
7973
8074``` bash
75+ agent-device snapshot -i
76+ agent-device diff snapshot -i
8177agent-device find " Sign In" click
82- agent-device find text " Sign In" click
83- agent-device find label " Email" fill " user@example.com"
84- agent-device find value " Search" type " query"
85- agent-device find role button click
86- agent-device find id " com.example:id/login" click
87- agent-device find " Settings" wait 10000
88- agent-device find " Settings" exists
78+ agent-device press @e1
79+ agent-device fill @e2 " text"
80+ agent-device is visible ' id="anchor"'
8981```
9082
91- ### Settings helpers
83+ ` press ` is canonical tap command; ` click ` is an alias.
9284
93- ``` bash
94- agent-device settings wifi on
95- agent-device settings wifi off
96- agent-device settings airplane on
97- agent-device settings airplane off
98- agent-device settings location on
99- agent-device settings location off
100- agent-device settings faceid match
101- agent-device settings faceid nonmatch
102- agent-device settings faceid enroll
103- agent-device settings faceid unenroll
104- ```
105-
106- Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
107- Airplane off clears status bar overrides.
108- iOS settings helpers are simulator-only.
109- Use ` match ` /` nonmatch ` as the canonical command values.
110- Think of them as validate/invalidate outcomes when describing intent.
111-
112- ### Logs (token-efficient debugging)
113-
114- Use the detailed logs workflow reference:
115- ` skills/agent-device/references/logs-and-debug.md `
116-
117- Recommended minimum:
118-
119- ``` bash
120- agent-device logs doctor
121- agent-device logs clear --restart
122- agent-device logs path
123- ```
124-
125- Logging is off by default for normal flows. Turn it on only for debugging windows.
126-
127- ### App state
85+ ### Utilities
12886
12987``` bash
13088agent-device appstate
131- ```
132-
133- - Android: ` appstate ` reports live foreground package/activity.
134- - iOS: ` appstate ` is session-scoped and reports the app tracked by the active session on the target device.
135- - For iOS ` appstate ` , ensure a matching session exists (for example ` open --session <name> --platform ios --device "<name>" <app> ` ).
136-
137- ### Interactions (use @refs from snapshot)
138-
139- ``` bash
140- agent-device press @e1 # Canonical tap command (`click` is an alias)
141- agent-device focus @e2
142- agent-device fill @e2 " text" # Clear then type (Android: verifies value and retries once on mismatch)
143- agent-device type " text" # Type into focused field without clearing
144- agent-device press 300 500 # Tap by coordinates
145- agent-device press 300 500 --count 12 --interval-ms 45
146- agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
147- agent-device press @e1 --count 5 # Repeat taps on the same target
148- agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
149- agent-device swipe 540 1500 540 500 120
150- agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
151- agent-device longpress 300 500 800 # Long press on iOS and Android
152- agent-device scroll down 0.5
153- agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
154- agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
155- agent-device back
156- agent-device home
157- agent-device app-switcher
158- agent-device wait 1000
159- agent-device wait text " Settings"
160- agent-device is visible ' id="settings_anchor"' # selector assertions for deterministic checks
161- agent-device is text ' id="header_title"' " Settings"
162- agent-device alert get
163- ```
164-
165- ### Get information
166-
167- ``` bash
16889agent-device get text @e1
169- agent-device get attrs @e1
17090agent-device screenshot out.png
91+ agent-device trace start
92+ agent-device trace stop ./trace.log
17193```
17294
173- ### Deterministic replay and updating
95+ ### Batch (when sequence is already known)
17496
17597``` bash
176- agent-device open App --relaunch # Fresh app process restart in the current session
177- agent-device open App --save-script # Save session script (.ad) on close (default path)
178- agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path
179- agent-device replay ./session.ad # Run deterministic replay from .ad script
180- agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
98+ agent-device batch --steps-file /tmp/batch-steps.json --json
18199```
182100
183- ` replay ` reads ` .ad ` recordings.
184- ` --relaunch ` controls launch semantics; ` --save-script ` controls recording. Combine only when both are needed.
185- ` --save-script ` path is a file path; parent directories are created automatically.
186- For ambiguous bare values, use ` --save-script=workflow.ad ` or ` ./workflow.ad ` .
187-
188- ### Fast batching (JSON steps)
101+ ## Guardrails (High Value Only)
189102
190- Use ` batch ` when an agent already has a known short sequence and wants fewer orchestration round trips.
191-
192- ``` bash
193- agent-device batch \
194- --session sim \
195- --platform ios \
196- --udid 00008150-001849640CF8401C \
197- --steps-file /tmp/batch-steps.json \
198- --json
199- ```
200-
201- Inline JSON works for small payloads:
202-
203- ``` bash
204- agent-device batch --steps ' [{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'
205- ```
103+ - Re-snapshot after UI mutations (navigation/modal/list changes).
104+ - Prefer ` snapshot -i ` ; scope/depth only when needed.
105+ - Use refs for discovery, selectors for replay/assertions.
106+ - Use ` fill ` for clear-then-type semantics; use ` type ` for focused append typing.
107+ - iOS ` appstate ` is session-scoped; Android ` appstate ` is live foreground state.
108+ - iOS settings helpers are simulator-only; use faceid ` match|nonmatch|enroll|unenroll ` .
109+ - If using ` --save-script ` , prefer explicit path syntax (` --save-script=flow.ad ` or ` ./flow.ad ` ).
206110
207- Step format:
111+ ## Common Mistakes
208112
209- ``` json
210- [
211- { "command" : " open" , "positionals" : [" settings" ], "flags" : {} },
212- { "command" : " wait" , "positionals" : [" label=\" Privacy & Security\" " , " 3000" ], "flags" : {} },
213- { "command" : " click" , "positionals" : [" label=\" Privacy & Security\" " ], "flags" : {} },
214- { "command" : " get" , "positionals" : [" text" , " label=\" Tracking\" " ], "flags" : {} }
215- ]
216- ```
217-
218- Batch best practices:
219-
220- - Batch one screen-local flow at a time.
221- - Add sync guards (` wait ` , ` is exists ` ) after mutating steps (` open ` , ` click ` , ` fill ` , ` swipe ` ).
222- - Treat prior refs/snapshot assumptions as stale after UI mutations.
223- - Prefer ` --steps-file ` over inline JSON.
224- - Keep batches moderate (about 5-20 steps).
225- - Use failure context (` step ` , ` partialResults ` ) to replan from the failed step.
226-
227- Stale accessibility tree note:
228-
229- - Rapid mutations can outrun accessibility tree updates.
230- - Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup).
231-
232- ### Trace logs (XCTest)
233-
234- ``` bash
235- agent-device trace start # Start trace capture
236- agent-device trace start ./trace.log # Start trace capture to path
237- agent-device trace stop # Stop trace capture
238- agent-device trace stop ./trace.log # Stop and move trace log
239- ```
240-
241- ### Devices and apps
242-
243- ``` bash
244- agent-device devices
245- agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps
246- agent-device apps --platform ios --all # explicit include-all (same as default)
247- agent-device apps --platform ios --user-installed
248- agent-device apps --platform android # includes default/system apps
249- agent-device apps --platform android --all # explicit include-all (same as default)
250- agent-device apps --platform android --user-installed
251- ```
113+ - Mixing debug flow into normal runs (keep logs off unless debugging).
114+ - Continuing to use stale refs after screen transitions.
115+ - Using URL opens with Android ` --activity ` (unsupported combination).
116+ - Treating ` boot ` as default first step instead of fallback.
252117
253- ## Best practices
254-
255- - ` press ` is the canonical tap command; ` click ` is an alias with the same behavior.
256- - ` press ` (and ` click ` ) accepts ` x y ` , ` @ref ` , and selector targets.
257- - ` press ` /` click ` support gesture series controls: ` --count ` , ` --interval-ms ` , ` --hold-ms ` , ` --jitter-px ` , ` --double-tap ` .
258- - ` --double-tap ` cannot be combined with ` --hold-ms ` or ` --jitter-px ` .
259- - ` swipe ` supports coordinate + timing controls and repeat patterns: ` swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern ` .
260- - ` swipe ` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid longpress side effects.
261- - ` longpress ` is coordinate-based and supported on iOS and Android.
262- - Pinch (` pinch <scale> [x y] ` ) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
263- - Snapshot refs are the core mechanism for interactive agent flows.
264- - Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
265- - Prefer ` snapshot -i ` to reduce output size.
266- - Prefer scoped snapshots (` -s "<label>" ` or ` -s @ref ` ) for screen-local tasks.
267- - Add ` -d <depth> ` when only upper tree levels matter; avoid full-tree snapshots by default.
268- - Use ` diff snapshot ` after mutations to detect structural changes with less output than full re-read.
269- - Refresh refs immediately after navigation/modal/list mutations before issuing next ref-targeted action.
270- - Use ` --raw ` only for debugging parser/tree edge-cases; avoid it for normal agent loops due to size.
271- - On iOS, snapshots use XCTest and do not require Accessibility permission.
272- - If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
273- - ` open <app|url> [url] ` can be used within an existing session to switch apps or open deep links.
274- - ` open <app> ` updates session app bundle context; ` open <app> <url> ` opens a deep link on iOS.
275- - Use ` open <app> --relaunch ` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
276- - Use ` --session <name> ` for parallel sessions; avoid device contention.
277- - Use ` --activity <component> ` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
278- - On iOS devices, ` http(s):// ` URLs fall back to Safari automatically; custom scheme URLs require an active app in the session.
279- - iOS physical-device runner requires Xcode signing/provisioning; optional overrides: ` AGENT_DEVICE_IOS_TEAM_ID ` , ` AGENT_DEVICE_IOS_SIGNING_IDENTITY ` , ` AGENT_DEVICE_IOS_PROVISIONING_PROFILE ` .
280- - Default daemon request timeout is ` 45000 ` ms. For slow physical-device setup/build, increase ` AGENT_DEVICE_DAEMON_TIMEOUT_MS ` (for example ` 120000 ` ).
281- - For daemon startup troubleshooting, follow stale metadata hints for ` ~/.agent-device/daemon.json ` / ` ~/.agent-device/daemon.lock ` .
282- - Use ` fill ` when you want clear-then-type semantics.
283- - Use ` type ` when you want to append/enter text without clearing.
284- - On Android, prefer ` fill ` for important fields; it verifies entered text and retries once when IME reorders characters.
285- - If using deterministic replay scripts, use ` replay -u ` during maintenance runs to update selector drift in replay scripts. Use plain ` replay ` in CI.
118+ If the CLI is not installed in environment, use:
119+ ` npx -y agent-device `
286120
287121## References
288122
289123- [ references/snapshot-refs.md] ( references/snapshot-refs.md )
124+ - [ references/logs-and-debug.md] ( references/logs-and-debug.md )
290125- [ references/session-management.md] ( references/session-management.md )
291126- [ references/permissions.md] ( references/permissions.md )
292127- [ references/video-recording.md] ( references/video-recording.md )
0 commit comments