Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 21 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ CLI to control iOS and Android devices for AI agents influenced by Vercel’s [a
The project is in early development and considered experimental. Pull requests are welcome!

## Features
- Platforms: iOS (simulator + limited device support) and Android (emulator + device).
- Core commands: `open`, `back`, `home`, `app-switcher`, `press`, `long-press`, `swipe`, `focus`, `type`, `fill`, `scroll`, `scrollintoview`, `pinch`, `wait`, `alert`, `screenshot`, `close`, `reinstall`.
- Platforms: iOS (simulator + physical device core automation) and Android (emulator + device).
- Core commands: `open`, `back`, `home`, `app-switcher`, `press`, `long-press`, `focus`, `type`, `fill`, `scroll`, `scrollintoview`, `wait`, `alert`, `screenshot`, `close`, `reinstall`.
- Inspection commands: `snapshot` (accessibility tree).
- Device tooling: `adb` (Android), `simctl`/`devicectl` (iOS via Xcode).
- Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).
Expand Down Expand Up @@ -99,9 +99,10 @@ agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-p
| `ax` | Fast | Medium | Accessibility permission for the terminal app, not recommended |

Notes:
- Default backend is `xctest` on iOS.
- Default backend is `xctest` on iOS simulators and iOS devices.
- Scope snapshots with `-s "<label>"` or `-s @ref`.
- If XCTest returns 0 nodes (e.g., foreground app changed), agent-device falls back to AX when available.
- If XCTest returns 0 nodes (e.g., foreground app changed), agent-device fails explicitly.
- `ax` backend is simulator-only.

Flags:
- `--version, -V` print version and exit
Expand Down Expand Up @@ -150,13 +151,13 @@ Navigation helpers:
- `boot --platform ios|android` ensures the target is ready without launching an app.
- Use `boot` mainly when starting a new session and `open` fails because no booted simulator/emulator is available.
- `open [app|url]` already boots/activates the selected target when needed.
- `reinstall <app> <path>` uninstalls and installs the app binary in one command (Android + iOS simulator in v1).
- `reinstall <app> <path>` uninstalls and installs the app binary in one command (Android + iOS simulator).
- `reinstall` accepts package/bundle id style app names and supports `~` in paths.

Deep links:
- `open <url>` supports deep links with `scheme://...`.
- Android opens deep links via `VIEW` intent.
- iOS deep link open is simulator-only in v1.
- iOS deep link open is simulator-only.
- `--activity` cannot be combined with URL opens.

```bash
Expand Down Expand Up @@ -207,22 +208,22 @@ Android fill reliability:
- If value does not match, agent-device clears the field and retries once with slower typing.
- This reduces IME-related character swaps on long strings (e.g. emails and IDs).

Settings helpers (simulators):
Settings helpers:
- `settings wifi on|off`
- `settings airplane on|off`
- `settings location on|off` (iOS uses per-app permission for the current session app)
Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.
Note: iOS supports these only on simulators. iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.

App state:
- `appstate` shows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify).
- `appstate` shows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it resolves via XCTest snapshot.
- `apps --metadata` returns app list with minimal metadata.

## Debug

- `agent-device trace start`
- `agent-device trace stop ./trace.log`
- The trace log includes snapshot logs and XCTest runner logs for the session.
- Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps.
- Built-in retries cover transient runner connection failures and Android UI dumps.
- For snapshot issues (missing elements), compare with `--raw` flag for unaltered output and scope with `-s "<label>"`.

Boot diagnostics:
Expand All @@ -238,9 +239,10 @@ Boot diagnostics:
- Built-in aliases include `Settings` for both platforms.

## iOS notes
- Input commands (`press`, `type`, `scroll`, etc.) are supported only on simulators in v1 and use the XCTest runner.
- `alert` and `scrollintoview` use the XCTest runner and are simulator-only in v1.
- Real device support (including snapshots) is on the roadmap for iOS.
- Core runner commands (`snapshot`, `wait`, `click`, `fill`, `get`, `is`, `find`, `press`, `long-press`, `focus`, `type`, `scroll`, `scrollintoview`, `back`, `home`, `app-switcher`) support iOS simulators and iOS devices.
- Simulator-only commands: `alert`, `pinch`, `record`, `reinstall`, `apps`, `settings`.
- iOS deep link open (`open <url>`) is simulator-only.
- iOS device runs require valid signing/provisioning (Automatic Signing recommended). Optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`.

## Testing

Expand All @@ -266,6 +268,12 @@ Environment selectors:
- `ANDROID_DEVICE=Pixel_9_Pro_XL` or `ANDROID_SERIAL=emulator-5554`
- `IOS_DEVICE="iPhone 17 Pro"` or `IOS_UDID=<udid>`
- `AGENT_DEVICE_IOS_BOOT_TIMEOUT_MS=<ms>` to adjust iOS simulator boot timeout (default: `120000`, minimum: `5000`).
- `AGENT_DEVICE_DAEMON_TIMEOUT_MS=<ms>` to increase daemon request timeout for slow first-run iOS device setup (for example `180000`).
- `AGENT_DEVICE_IOS_TEAM_ID=<team-id>` optional Team ID override for iOS device runner signing.
- `AGENT_DEVICE_IOS_SIGNING_IDENTITY=<identity>` optional signing identity override.
- `AGENT_DEVICE_IOS_PROVISIONING_PROFILE=<profile>` optional provisioning profile specifier for iOS device runner signing.
- `AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH=<path>` optional override for iOS runner derived data root. By default, agent-device separates caches by target kind (`.../derived/simulator` and `.../derived/device`). If you set this override, use separate paths per kind to avoid simulator/device artifact collisions.
- `AGENT_DEVICE_IOS_CLEAN_DERIVED=1` rebuild iOS runner artifacts from scratch. When `AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH` is set, cleanup is blocked by default; set `AGENT_DEVICE_IOS_ALLOW_OVERRIDE_DERIVED_CLEAN=1` only for trusted custom paths.

Test screenshots are written to:
- `test/screenshots/android-settings.png`
Expand Down
2 changes: 1 addition & 1 deletion ios-runner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ This folder is reserved for the lightweight XCUITest runner used to provide elem
- Support simulator prebuilds where compatible.

## Status
Planned for v1 automation layer. See `docs/ios-automation.md` and `docs/ios-runner-protocol.md`.
Planned for the automation layer. See `docs/ios-automation.md` and `docs/ios-runner-protocol.md`.
25 changes: 14 additions & 11 deletions skills/agent-device/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: agent-device
description: Automates mobile and simulator interactions for iOS and Android devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, pinching, or extracting UI info on mobile devices or simulators.
description: Automates interactions for iOS simulators/devices and Android emulators/devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile targets.
---

# Mobile Automation with agent-device
Expand Down Expand Up @@ -39,13 +39,13 @@ npx -y agent-device

```bash
agent-device boot # Ensure target is booted/ready without opening app
agent-device boot --platform ios # Boot iOS simulator
agent-device boot --platform ios # Boot iOS simulator/device target
agent-device boot --platform android # Boot Android emulator/device target
agent-device open [app|url] # Boot device/simulator; optionally launch app or deep link URL
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
agent-device open "myapp://home" --platform android # Android deep link
agent-device open "https://example.com" --platform ios # iOS simulator deep link
agent-device open "https://example.com" --platform ios # iOS simulator deep link (device unsupported)
agent-device close [app] # Close app or just end session
agent-device reinstall <app> <path> # Uninstall + install app in one command
agent-device session list # List active sessions
Expand All @@ -64,10 +64,10 @@ agent-device snapshot -d 3 # Limit depth
agent-device snapshot -s "Camera" # Scope to label/identifier
agent-device snapshot --raw # Raw node output
agent-device snapshot --backend xctest # default: XCTest snapshot (fast, complete, no permissions)
agent-device snapshot --backend ax # macOS Accessibility tree (fast, needs permissions, less fidelity, optional)
agent-device snapshot --backend ax # macOS Accessibility tree (manual diagnostics only; no automatic fallback)
```

XCTest is the default: fast and complete and does not require permissions. Use it in most cases and only fall back to AX when something breaks.
XCTest is the default: fast and complete and does not require permissions. Use AX only for manual diagnostics, and prefer XCTest for normal automation flows. agent-device does not automatically fall back to AX.

### Find (semantic)

Expand All @@ -82,7 +82,7 @@ agent-device find "Settings" wait 10000
agent-device find "Settings" exists
```

### Settings helpers (simulators)
### Settings helpers

```bash
agent-device settings wifi on
Expand All @@ -95,6 +95,7 @@ agent-device settings location off

Note: iOS wifi/airplane toggles status bar indicators, not actual network state.
Airplane off clears status bar overrides.
iOS settings helpers are simulator-only.

### App state

Expand All @@ -118,8 +119,8 @@ agent-device swipe 540 1500 540 500 120
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
agent-device long-press 300 500 800 # Long press (where supported)
agent-device scroll down 0.5
agent-device pinch 2.0 # Zoom in 2x (iOS simulator)
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator)
agent-device pinch 2.0 # Zoom in 2x (iOS simulator only)
agent-device pinch 0.5 200 400 # Zoom out at coordinates (iOS simulator only)
agent-device back
agent-device home
agent-device app-switcher
Expand Down Expand Up @@ -174,19 +175,21 @@ agent-device apps --platform android --user-installed
- `press` supports gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`.
- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`.
- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid long-press side effects.
- Pinch (`pinch <scale> [x y]`) is currently supported on iOS simulators only.
- Pinch (`pinch <scale> [x y]`) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
- Snapshot refs are the core mechanism for interactive agent flows.
- Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
- Prefer `snapshot -i` to reduce output size.
- On iOS, `xctest` is the default and does not require Accessibility permission.
- If XCTest returns 0 nodes (foreground app changed), agent-device falls back to AX when available.
- If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
- `open <app|url>` can be used within an existing session to switch apps or open deep links.
- `open <app>` updates session app bundle context; URL opens do not set an app bundle id.
- Use `open <app> --relaunch` during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
- If AX returns the Simulator window or empty tree, restart Simulator or use `--backend xctest`.
- Use `--session <name>` for parallel sessions; avoid device contention.
- Use `--activity <component>` on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
- iOS deep-link opens are simulator-only in v1.
- iOS deep-link opens are simulator-only.
- iOS physical-device runner requires Xcode signing/provisioning; optional overrides: `AGENT_DEVICE_IOS_TEAM_ID`, `AGENT_DEVICE_IOS_SIGNING_IDENTITY`, `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`.
- For long first-run physical-device setup/build, increase daemon timeout: `AGENT_DEVICE_DAEMON_TIMEOUT_MS=180000` (or higher).
- Use `fill` when you want clear-then-type semantics.
- Use `type` when you want to append/enter text without clearing.
- On Android, prefer `fill` for important fields; it verifies entered text and retries once when IME reorders characters.
Expand Down
16 changes: 15 additions & 1 deletion skills/agent-device/references/permissions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## iOS AX snapshot

AX snapshot is an alternative to XCTest for when it fails (which shouldn't happen usually); it uses macOS Accessibility APIs and requires permission:
AX snapshot is available for manual diagnostics when needed; it is not used as an automatic fallback. It uses macOS Accessibility APIs and requires permission:

System Settings > Privacy & Security > Accessibility

Expand All @@ -13,6 +13,20 @@ agent-device snapshot --backend xctest --platform ios
```

Hybrid/AX is fast; XCTest is equally fast but does not require permissions.
AX backend is simulator-only.

## iOS physical device runner

For iOS physical devices, XCTest runner setup requires valid signing/provisioning.
Use Automatic Signing in Xcode, or provide optional overrides:

- `AGENT_DEVICE_IOS_TEAM_ID`
- `AGENT_DEVICE_IOS_SIGNING_IDENTITY`
- `AGENT_DEVICE_IOS_PROVISIONING_PROFILE`

If first-run setup/build takes long, increase:

- `AGENT_DEVICE_DAEMON_TIMEOUT_MS` (for example `180000`)

## Simulator troubleshooting

Expand Down
1 change: 1 addition & 0 deletions skills/agent-device/references/session-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Sessions isolate device context. A device can only be held by one session at a t
- Name sessions semantically.
- Close sessions when done.
- Use separate sessions for parallel work.
- In iOS sessions, use `open <app>` for simulator/device. `open <url>` is simulator-only.
- For dev loops where runtime state can persist (for example React Native Fast Refresh), use `open <app> --relaunch` to restart the app process in the same session.
- For deterministic replay scripts, prefer selector-based actions and assertions.
- Use `replay -u` to update selector drift during maintenance.
Expand Down
2 changes: 2 additions & 0 deletions skills/agent-device/references/snapshot-refs.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ agent-device snapshot -i -s @e3
- Ref not found: re-snapshot.
- AX returns Simulator window: restart Simulator and re-run.
- AX empty: verify Accessibility permission or use `--backend xctest` (XCTest is more complete).
- AX backend is simulator-only; use `--backend xctest` on iOS devices.
- agent-device does not automatically fall back to AX when XCTest fails.

## Replay note

Expand Down
2 changes: 2 additions & 0 deletions skills/agent-device/references/video-recording.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ agent-device close
agent-device record stop
```

`record` is iOS simulator-only.

## Android Emulator/Device

Use `agent-device record` commands (wrapper around adb):
Expand Down
18 changes: 11 additions & 7 deletions src/core/__tests__/capabilities.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,17 @@ test('iOS simulator-only commands reject iOS devices and Android', () => {
}
});

test('iOS simulator + Android commands reject iOS devices', () => {
test('simulator-only iOS commands with Android support reject iOS devices', () => {
for (const cmd of ['apps', 'reinstall', 'record', 'settings', 'swipe']) {
assert.equal(isCommandSupportedOnDevice(cmd, iosSimulator), true, `${cmd} on iOS sim`);
assert.equal(isCommandSupportedOnDevice(cmd, iosDevice), false, `${cmd} on iOS device`);
assert.equal(isCommandSupportedOnDevice(cmd, androidDevice), true, `${cmd} on Android`);
}
});

test('core commands support iOS simulator, iOS device, and Android', () => {
for (const cmd of [
'app-switcher',
'apps',
'back',
'boot',
'click',
Expand All @@ -47,19 +54,16 @@ test('iOS simulator + Android commands reject iOS devices', () => {
'home',
'long-press',
'open',
'reinstall',
'press',
'record',
'screenshot',
'scroll',
'swipe',
'settings',
'scrollintoview',
'snapshot',
'type',
'wait',
]) {
assert.equal(isCommandSupportedOnDevice(cmd, iosSimulator), true, `${cmd} on iOS sim`);
assert.equal(isCommandSupportedOnDevice(cmd, iosDevice), false, `${cmd} on iOS device`);
assert.equal(isCommandSupportedOnDevice(cmd, iosDevice), true, `${cmd} on iOS device`);
assert.equal(isCommandSupportedOnDevice(cmd, androidDevice), true, `${cmd} on Android`);
}
});
Expand Down
Loading
Loading