Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
297 changes: 66 additions & 231 deletions skills/agent-device/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,288 +5,123 @@ description: Automates interactions for iOS simulators/devices and Android emula

# Mobile Automation with agent-device

For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.
For exploration, use snapshot refs. For deterministic replay, use selectors.

## Quick start
## Start Here (Read This First)

Use this skill as a router, not a full manual.

1. Pick one mode:
- Normal interaction flow
- Debug/crash flow
- Replay maintenance flow
2. Run one canonical flow below.
3. Open references only if blocked.

## Decision Map

- No target context yet: `devices` -> pick target -> `open`.
- Normal UI task: `open` -> `snapshot -i` -> `press/fill` -> `diff snapshot -i` -> `close`
- Debug/crash: `open <app>` -> `logs clear --restart` -> reproduce -> `logs path` -> targeted `grep`
- Replay drift: `replay -u <path>` -> verify updated selectors

## Canonical Flows

### 1) Normal Interaction Flow

```bash
agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device wait text "Camera"
agent-device alert wait 10000
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device close
```

If not installed, run:
### 2) Debug/Crash Flow

```bash
npx -y agent-device
agent-device open MyApp --platform ios
agent-device logs clear --restart
agent-device logs path
```

## Core workflow
Logging is off by default. Enable only for debugging windows.
`logs clear --restart` requires an active app session (`open <app>` first).

1. Open app or deep link: `open [app|url] [url]` (`open` handles target selection + boot/activation in the normal flow)
2. Snapshot: `snapshot` to get refs from accessibility tree
3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`)
4. Re-snapshot after navigation/UI changes
5. Close session when done

## Commands

### Navigation
### 3) Replay Maintenance Flow

```bash
agent-device boot # Ensure target is booted/ready without opening app
agent-device boot --platform ios # Boot iOS target
agent-device boot --platform android # Boot Android emulator/device target
agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
agent-device open "myapp://home" --platform android # Android deep link
agent-device open "https://example.com" --platform ios # iOS deep link (opens in browser)
agent-device open MyApp "myapp://screen/to" --platform ios # iOS deep link in app context
agent-device close [app] # Close app or just end session
agent-device reinstall <app> <path> # Uninstall + install app in one command
agent-device session list # List active sessions
agent-device replay -u ./session.ad
```

`boot` requires either an active session or an explicit selector (`--platform`, `--device`, `--udid`, or `--serial`).
`boot` is a fallback, not a regular step: use it when starting a new session only if `open` cannot find/connect to an available target.
## Command Skeleton (Minimal)

### Snapshot (page analysis)
### Session and navigation

```bash
agent-device snapshot # Full XCTest accessibility tree snapshot
agent-device snapshot -i # Interactive elements only (recommended)
agent-device snapshot -c # Compact output
agent-device snapshot -d 3 # Limit depth
agent-device snapshot -s "Camera" # Scope to label/identifier
agent-device snapshot --raw # Raw node output
agent-device diff snapshot # Structural diff against previous session baseline
agent-device devices
agent-device open [app|url] [url]
agent-device open [app] --relaunch
agent-device close [app]
agent-device session list
```

XCTest is the iOS snapshot engine: fast, complete, and no Accessibility permission required.

Snapshot diff notes:
- First `diff snapshot` call initializes baseline for the current session.
- Subsequent `diff snapshot` calls compare current UI to prior baseline and then update baseline.
- Use this for compact change tracking between adjacent UI states.
Use `boot` only as fallback when `open` cannot find/connect to a ready target.

### Find (semantic)
### Snapshot and targeting

```bash
agent-device snapshot -i
agent-device diff snapshot -i
agent-device find "Sign In" click
agent-device find text "Sign In" click
agent-device find label "Email" fill "user@example.com"
agent-device find value "Search" type "query"
agent-device find role button click
agent-device find id "com.example:id/login" click
agent-device find "Settings" wait 10000
agent-device find "Settings" exists
agent-device press @e1
agent-device fill @e2 "text"
agent-device is visible 'id="anchor"'
```

### Settings helpers
`press` is canonical tap command; `click` is an alias.

```bash
agent-device settings wifi on
agent-device settings wifi off
agent-device settings airplane on
agent-device settings airplane off
agent-device settings location on
agent-device settings location off
agent-device settings faceid match
agent-device settings faceid nonmatch
agent-device settings faceid enroll
agent-device settings faceid unenroll
```

Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
Airplane off clears status bar overrides.
iOS settings helpers are simulator-only.
Use `match`/`nonmatch` as the canonical command values.
Think of them as validate/invalidate outcomes when describing intent.

### Logs (token-efficient debugging)

Use the detailed logs workflow reference:
`skills/agent-device/references/logs-and-debug.md`

Recommended minimum:

```bash
agent-device logs doctor
agent-device logs clear --restart
agent-device logs path
```

Logging is off by default for normal flows. Turn it on only for debugging windows.

### App state
### Utilities

```bash
agent-device appstate
```

- Android: `appstate` reports live foreground package/activity.
- iOS: `appstate` is session-scoped and reports the app tracked by the active session on the target device.
- For iOS `appstate`, ensure a matching session exists (for example `open --session <name> --platform ios --device "<name>" <app>`).

### Interactions (use @refs from snapshot)

```bash
agent-device press @e1 # Canonical tap command (`click` is an alias)
agent-device focus @e2
agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch)
agent-device type "text" # Type into focused field without clearing
agent-device press 300 500 # Tap by coordinates
agent-device press 300 500 --count 12 --interval-ms 45
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
agent-device press @e1 --count 5 # Repeat taps on the same target
agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
agent-device swipe 540 1500 540 500 120
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
agent-device longpress 300 500 800 # Long press on iOS and Android
agent-device scroll down 0.5
agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
agent-device back
agent-device home
agent-device app-switcher
agent-device wait 1000
agent-device wait text "Settings"
agent-device is visible 'id="settings_anchor"' # selector assertions for deterministic checks
agent-device is text 'id="header_title"' "Settings"
agent-device alert get
```

### Get information

```bash
agent-device get text @e1
agent-device get attrs @e1
agent-device screenshot out.png
agent-device trace start
agent-device trace stop ./trace.log
```

### Deterministic replay and updating
### Batch (when sequence is already known)

```bash
agent-device open App --relaunch # Fresh app process restart in the current session
agent-device open App --save-script # Save session script (.ad) on close (default path)
agent-device open App --save-script ./workflows/app-flow.ad # Save to custom file path
agent-device replay ./session.ad # Run deterministic replay from .ad script
agent-device replay -u ./session.ad # Update selector drift and rewrite .ad script in place
agent-device batch --steps-file /tmp/batch-steps.json --json
```

`replay` reads `.ad` recordings.
`--relaunch` controls launch semantics; `--save-script` controls recording. Combine only when both are needed.
`--save-script` path is a file path; parent directories are created automatically.
For ambiguous bare values, use `--save-script=workflow.ad` or `./workflow.ad`.

### Fast batching (JSON steps)
## Guardrails (High Value Only)

Use `batch` when an agent already has a known short sequence and wants fewer orchestration round trips.

```bash
agent-device batch \
--session sim \
--platform ios \
--udid 00008150-001849640CF8401C \
--steps-file /tmp/batch-steps.json \
--json
```

Inline JSON works for small payloads:

```bash
agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'
```
- Re-snapshot after UI mutations (navigation/modal/list changes).
- Prefer `snapshot -i`; scope/depth only when needed.
- Use refs for discovery, selectors for replay/assertions.
- Use `fill` for clear-then-type semantics; use `type` for focused append typing.
- iOS `appstate` is session-scoped; Android `appstate` is live foreground state.
- iOS settings helpers are simulator-only; use faceid `match|nonmatch|enroll|unenroll`.
- If using `--save-script`, prefer explicit path syntax (`--save-script=flow.ad` or `./flow.ad`).

Step format:
## Common Mistakes

```json
[
{ "command": "open", "positionals": ["settings"], "flags": {} },
{ "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
{ "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
{ "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
]
```

Batch best practices:

- Batch one screen-local flow at a time.
- Add sync guards (`wait`, `is exists`) after mutating steps (`open`, `click`, `fill`, `swipe`).
- Treat prior refs/snapshot assumptions as stale after UI mutations.
- Prefer `--steps-file` over inline JSON.
- Keep batches moderate (about 5-20 steps).
- Use failure context (`step`, `partialResults`) to replan from the failed step.

Stale accessibility tree note:

- Rapid mutations can outrun accessibility tree updates.
- Mitigate with explicit waits and phase splitting (navigate, verify/extract, cleanup).

### Trace logs (XCTest)

```bash
agent-device trace start # Start trace capture
agent-device trace start ./trace.log # Start trace capture to path
agent-device trace stop # Stop trace capture
agent-device trace stop ./trace.log # Stop and move trace log
```

### Devices and apps

```bash
agent-device devices
agent-device apps --platform ios # iOS simulator + iOS device, includes default/system apps
agent-device apps --platform ios --all # explicit include-all (same as default)
agent-device apps --platform ios --user-installed
agent-device apps --platform android # includes default/system apps
agent-device apps --platform android --all # explicit include-all (same as default)
agent-device apps --platform android --user-installed
```
- Mixing debug flow into normal runs (keep logs off unless debugging).
- Continuing to use stale refs after screen transitions.
- Using URL opens with Android `--activity` (unsupported combination).
- Treating `boot` as default first step instead of fallback.

## Best practices

- `press` is the canonical tap command; `click` is an alias with the same behavior.
- `press` (and `click`) accepts `x y`, `@ref`, and selector targets.
- `press`/`click` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`.
- `--double-tap` cannot be combined with `--hold-ms` or `--jitter-px`.
- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`.
- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid longpress side effects.
- `longpress` is coordinate-based and supported on iOS and Android.
- Pinch (`pinch <scale> [x y]`) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
- Snapshot refs are the core mechanism for interactive agent flows.
- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
- Prefer `snapshot -i` to reduce output size.
- Prefer scoped snapshots (`-s "<label>"` or `-s @ref`) for screen-local tasks.
- Add `-d <depth>` when only upper tree levels matter; avoid full-tree snapshots by default.
- Use `diff snapshot` after mutations to detect structural changes with less output than full re-read.
- Refresh refs immediately after navigation/modal/list mutations before issuing next ref-targeted action.
- Use `--raw` only for debugging parser/tree edge-cases; avoid it for normal agent loops due to size.
- On iOS, snapshots use XCTest and do not require Accessibility permission.
- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
- `open <app|url> [url]` can be used within an existing session to switch apps or open deep links.
- `open <app>` updates session app bundle context; `open <app> <url>` opens a deep link on iOS.
- Use `open <app> --relaunch` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
- Use `--session <name>` for parallel sessions; avoid device contention.
- Use `--activity <component>` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
- On iOS devices, `http(s)://` URLs fall back to Safari automatically; custom scheme URLs require an active app in the session.
- iOS physical-device runner requires Xcode signing/provisioning; optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`.
- Default daemon request timeout is `45000`ms. For slow physical-device setup/build, increase `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (for example `120000`).
- For daemon startup troubleshooting, follow stale metadata hints for `~/.agent-device/daemon.json` / `~/.agent-device/daemon.lock`.
- Use `fill` when you want clear-then-type semantics.
- Use `type` when you want to append/enter text without clearing.
- On Android, prefer `fill` for important fields; it verifies entered text and retries once when IME reorders characters.
- If using deterministic replay scripts, use `replay -u` during maintenance runs to update selector drift in replay scripts. Use plain `replay` in CI.
If the CLI is not installed in environment, use:
`npx -y agent-device`

## References

- [references/snapshot-refs.md](references/snapshot-refs.md)
- [references/logs-and-debug.md](references/logs-and-debug.md)
- [references/session-management.md](references/session-management.md)
- [references/permissions.md](references/permissions.md)
- [references/video-recording.md](references/video-recording.md)
Expand Down
9 changes: 8 additions & 1 deletion skills/agent-device/references/logs-and-debug.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Logging is off by default in normal flows. Enable it on demand for debugging win
## Quick Flow

```bash
agent-device open MyApp --platform ios
agent-device open MyApp --platform ios # or --platform android
agent-device logs clear --restart # Preferred: stop stream, clear logs, and start streaming again
agent-device logs path # Print path, e.g. ~/.agent-device/sessions/default/app.log
agent-device logs doctor # Check tool/runtime readiness for current session/device
Expand All @@ -14,6 +14,8 @@ agent-device logs mark "before tap" # Insert a timeline marker into app.log
agent-device logs stop # Stop streaming (optional; close also stops)
```

Precondition: `logs clear --restart` requires an active app session (`open <app>` first).

## Command Notes

- `logs path`: returns log file path and metadata (`active`, `state`, `backend`, size, timestamps).
Expand Down Expand Up @@ -83,3 +85,8 @@ adb -s <serial> logcat -d | grep -n -E "FATAL EXCEPTION|Process: <package>|Abort
- `FATAL EXCEPTION` with Java stack: uncaught Java/Kotlin exception.
- `signal 6 (SIGABRT)` or `signal 11 (SIGSEGV)` with tombstone refs: native crash path (NDK/JNI/runtime).
- `Low memory killer` / `Killing <pid>` entries: OS memory-pressure/process reclaim.

## Stop Conditions

- If no crash signature appears in app log, switch to platform-native crash sources (`.ips` on iOS, logcat/tombstone flow on Android).
- If signatures are present and root cause class is identified (abort, native fault, memory pressure), stop collecting broad logs and focus on reproducing the specific path.
Loading