Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 35 additions & 12 deletions skills/agent-device/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,58 @@ description: Automates interactions for Apple-platform apps (iOS, tvOS, macOS) a

# agent-device

Use this skill as a router.
Use this skill as a router with mandatory defaults. Read this file first. For normal device tasks, always load `references/bootstrap-install.md` and `references/exploration.md` before acting. Use bootstrap to confirm or establish deterministic setup. Use exploration for UI inspection, interaction, and verification once the app session is open.

## Default operating rules

- Start conservative. Prefer read-only inspection before mutating the UI.
- Use plain `snapshot` when the task is to verify what text or structure is currently visible on screen.
- Use `snapshot -i` only when you need interactive refs such as `@e3` for a requested action or targeted query.
- Avoid speculative mutations. You may take the smallest reversible UI action needed to unblock inspection or complete the requested task, such as dismissing a popup, closing an alert, or clearing an unintended surface.
- Do not browse the web or use external sources unless the user explicitly asks.
- Re-snapshot after meaningful UI changes instead of reusing stale refs.
- Prefer `@ref` or selector targeting over raw coordinates.
- Ensure the correct target is pinned and an app session is open before interacting.
- Keep the loop short: `open` -> inspect/act -> verify if needed -> `close`.

## Default flow

1. Load [references/bootstrap-install.md](references/bootstrap-install.md) and [references/exploration.md](references/exploration.md) before acting on a normal device task.
2. Use bootstrap first to confirm or establish the correct target, app install, and open app session.
3. Once the app session is open and stable, use exploration for inspection, interaction, and verification.
4. Start with plain `snapshot` if the goal is to read or verify what is visible.
5. Escalate to `snapshot -i` only if you need refs for interactive exploration or a requested action.
6. Use `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question.
7. End by capturing proof if needed, then `close`.

## QA modes

- Open-ended bug hunt with reporting: use [../dogfood/SKILL.md](../dogfood/SKILL.md).
- Pass/fail QA from acceptance criteria: stay in this skill, start with [references/bootstrap-install.md](references/bootstrap-install.md), then use the QA loop in [references/exploration.md](references/exploration.md).

## Mental model
## Required references

- First choose the correct target and open the app or session you want to work on.
- Then inspect the current UI with `snapshot -i` and pick targets from the actual UI state.
- Act with `press`, `fill`, `get`, `is`, `wait`, or `find`.
- Re-snapshot after meaningful UI changes instead of reusing stale refs.
- End by capturing proof if needed, then `close`.
- For every normal device task, after reading this file, load [references/bootstrap-install.md](references/bootstrap-install.md) first, then [references/exploration.md](references/exploration.md), before acting.
- Use bootstrap to confirm or establish deterministic setup, especially in sandbox or cloud environments.
- Use exploration once the app session is open and stable.
- Load additional references only when their scope is needed.

## Decision rules

- Use plain `snapshot` when you need to verify whether text is visible.
- Use `snapshot -i` mainly for interactive exploration and choosing refs.
- Use `get`, `is`, or `find` when they can answer the question without changing UI state.
- Use `fill` to replace text.
- Use `type` to append text.
- If there is no simulator, no app install, or no open app session yet, switch to `bootstrap-install.md` instead of improvising setup steps.
- Use the smallest unblock action first when transient UI blocks inspection, but do not navigate, search, or enter new text just to make the UI reveal data unless the user asked for that interaction.
- Do not use external lookups to compensate for missing on-screen data unless the user asked for them.
- If the needed information is not exposed on screen, say that plainly instead of compensating with extra navigation, text entry, or web search.
- Prefer `@ref` or selector targeting over raw coordinates.
- Keep the default loop short: `open` -> explore/act -> optional debug or verify -> `close`.

## Choose a reference
## Additional references

- Pick target device, install, open, or manage sessions: [references/bootstrap-install.md](references/bootstrap-install.md)
- Need to discover UI, pick refs, wait, query, or interact: [references/exploration.md](references/exploration.md)
- Need logs, network, alerts, permissions, or failure triage: [references/debugging.md](references/debugging.md)
- Need screenshots, diff, recording, replay maintenance, or perf data: [references/verification.md](references/verification.md)
- Need desktop surfaces, menu bar behavior, or macOS-specific interaction rules: [references/macos-desktop.md](references/macos-desktop.md)
- Need to connect to a remote `agent-device` daemon over HTTP or use tenant leases: [references/remote-tenancy.md](references/remote-tenancy.md)
- Need remote HTTP transport, `--remote-config` launches, or tenant leases on a remote macOS host: [references/remote-tenancy.md](references/remote-tenancy.md)
70 changes: 55 additions & 15 deletions skills/agent-device/references/bootstrap-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,79 @@

## When to open this file

Open this file when you still need to choose the right target, start the right session, install or relaunch the app, or pin automation to one device before interacting.
Open this file when you still need to choose the right target, start the right session, install or relaunch the app, or pin automation to one device before interacting. This is the deterministic setup layer for sandbox, cloud, or other environments where install paths, device state, or app readiness may be uncertain.

## Main commands to reach for first
## Open-first path

- `devices`
- `apps`
- `ensure-simulator`
- `open`
- `install` or `reinstall`
- `close`
- `session list`

## Install path

- `install` or `reinstall`

## Most common mistake to avoid

Do not start acting before you have pinned the correct target and opened an `app` session. In mixed-device environments, always pass `--device`, `--udid`, or `--serial`.

## Canonical loop
## Deterministic setup rule

If there is no simulator, no app install, no open app session, or any uncertainty about where the app should come from, stay in this file and use deterministic setup commands or bootstrap scripts first. Do not improvise install paths or app-launch flows while exploring.

After setup is confirmed or completed, move to `exploration.md` before doing UI inspection or interaction.

## Open-first rule

- If the user asks to test an app and does not provide an install artifact or explicit install instruction, try `open <app>` first.
- If `open <app>` fails, run `agent-device apps` and retry with a discovered app name before considering install steps.
- Do not install or reinstall on the first attempt unless the user explicitly asks for installation or provides a concrete artifact path or URL.
- When installation is required from a known location, prefer a checked-in shell script or other deterministic bootstrap command over ad hoc path guessing.

- If `open <app>` fails, or you are not sure which app name is available on the target, run `agent-device apps` first and choose from the discovered app list instead of guessing.
- Use `apps --platform <platform>` together with `--device`, `--udid`, or `--serial` when target selection matters.
- Once you have the correct app name, retry `open` with that exact discovered value.

## Common starting points

These are examples, not required exact sequences. Use the smallest setup flow that matches the task.

### Boot a simulator and open an app

```bash
agent-device ensure-simulator --platform ios --device "iPhone 17 Pro" --boot
agent-device open MyApp --platform ios --device "iPhone 17 Pro" --relaunch
agent-device snapshot -i
agent-device close
```

### Install an app artifact

```bash
agent-device install com.example.app ./build/app.apk --platform android --serial emulator-5554
```

```bash
agent-device install com.example.app ./build/MyApp.app --platform ios --device "iPhone 17 Pro"
```

## Install guidance

- Use `install <app> <path>` when the app may already be installed and you do not need a fresh-state reset.
- Use `reinstall <app> <path>` when you explicitly need uninstall plus install as one deterministic step.
- Keep install and open as separate phases. Do not turn them into one default command flow.
- Supported binary formats:
- Android: `.apk` and `.aab`
- iOS: `.app` and `.ipa`
- For iOS `.ipa` files, `<app>` is used as the bundle id or bundle name hint when the archive contains multiple app bundles.
- After install or reinstall, later use `open <app>` with the exact discovered or known package/bundle identifier, not the artifact path.

## Choose the right starting point

- iOS local QA: prefer simulators unless the task explicitly requires physical hardware.
- iOS in mixed simulator and device environments: run `ensure-simulator` first, then keep using `--device` or `--udid`.
- TV targets: use `--target tv` together with `--platform` when the task is for tvOS or Android TV rather than phone or tablet surfaces.
- Android binary flow: use `install` or `reinstall` for `.apk` or `.aab`, then open by installed package name.
- Android React Native plus Metro flow: `reinstall <app> <apk>` first, then `open <package> --remote-config <path> --relaunch`.
- macOS desktop app flow: use `open <app> --platform macos`. Only load [macos-desktop.md](macos-desktop.md) if a desktop surface or macOS-specific behavior matters.

TV example:
Expand Down Expand Up @@ -95,8 +137,6 @@ export AGENT_DEVICE_PLATFORM=ios
export AGENT_DEVICE_SESSION_LOCK=strip

agent-device open MyApp --relaunch
agent-device snapshot -i
agent-device close
```

- `AGENT_DEVICE_SESSION` plus `AGENT_DEVICE_PLATFORM` provides the default binding.
Expand All @@ -111,10 +151,7 @@ Android emulator variant:
export AGENT_DEVICE_SESSION=qa-android
export AGENT_DEVICE_PLATFORM=android

agent-device reinstall MyApp /path/to/app-debug.apk --serial emulator-5554
agent-device --session-lock reject open com.example.myapp --relaunch
agent-device snapshot -i
agent-device close --shutdown
```

## Scoped discovery
Expand Down Expand Up @@ -151,11 +188,14 @@ agent-device replay -u ./session.ad --session auth
- Once the correct target and session are pinned, move to [exploration.md](exploration.md).
- If opening, startup, permissions, or logs become the blocker, switch to [debugging.md](debugging.md).

## Install and open examples
## Install examples

```bash
agent-device reinstall MyApp /path/to/app-debug.apk --platform android --serial emulator-5554
agent-device open com.example.myapp --remote-config ./agent-device.remote.json --relaunch
```

```bash
agent-device install com.example.app ./build/MyApp.ipa --platform ios --device "iPhone 17 Pro"
```

Do not use `open <apk|aab> --relaunch` on Android.
Expand Down
56 changes: 49 additions & 7 deletions skills/agent-device/references/exploration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,48 @@

Open this file when the app or screen is already running and you need to discover the UI, choose targets, read state, wait for conditions, or perform normal interactions.

## Main commands to reach for first
## Read-only first

- If the question is what text, labels, or structure is visible on screen, start with plain `snapshot`.
- Escalate to `snapshot -i` only when you need refs such as `@e3` for interactive exploration or a requested action.
- If you intend to `press`, `fill`, or otherwise interact, start with `snapshot -i` and fall back to plain `snapshot` only if interactive refs are unavailable.
- Prefer `get`, `is`, or `find` before mutating the UI when a read-only command can answer the question.
- You may take the smallest reversible UI action needed to unblock inspection, such as dismissing a popup, closing an alert, or backing out of an unintended surface.
- Do not type or fill text just to make hidden information easier to access unless the user asked for that interaction.
- Do not use external sources to infer missing UI state unless the user explicitly asked.
- If the answer is not visible or exposed in the UI, report that gap instead of compensating with search, navigation, or text entry.

## Decision shortcut

- User asks what is visible on screen: `snapshot`
- User asks for exact text from a known target: `get text`
- User asks you to tap, type, or choose an element: `snapshot -i`, then act
- UI does not expose the answer: say so plainly; do not browse or force the app into a new state unless asked

## Read-only commands

- `snapshot`
- `get`
- `is`
- `find`

## Interaction commands

- `snapshot -i`
- `press`
- `fill`
- `get`
- `is`
- `type`
- `wait`
- `find`

## Most common mistake to avoid

Do not treat `@ref` values as durable after navigation or dynamic updates. Re-snapshot after the UI changes, and switch to selectors when the flow must stay stable.

## Canonical loop
## Common example loops

These are examples, not required exact sequences. Adapt them to the app, state, and task at hand.

### Interactive exploration loop

```bash
agent-device open Settings --platform ios
Expand All @@ -30,11 +56,21 @@ agent-device get text 'label="Privacy & Security"'
agent-device close
```

### Screen verification loop

```bash
agent-device open MyApp --platform ios
# perform the necessary actions to reach the state you need to verify
agent-device snapshot
# verify whether the expected element or text is present
agent-device close
```

## Snapshot choices

- Use plain `snapshot` when you only need to verify whether visible text or structure is on screen.
- Use `snapshot -i` when you need refs such as `@e3` for interactive exploration.
- Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview/truncation metadata, use `get text @ref` to expand the actual text after you choose the surface.
- Use `snapshot -i` when you need refs such as `@e3` for interactive exploration or for an intended interaction.
- Treat large text-surface lines in `snapshot -i` as discovery output. If a node shows preview or truncation metadata, use `get text @ref` only after you have already decided that `snapshot -i` is needed for that surface.
- Use `snapshot -i -s "Camera"` or `snapshot -i -s @e3` when you want a smaller, scoped result.

Example:
Expand Down Expand Up @@ -74,6 +110,7 @@ agent-device is visible 'id="camera_settings_anchor"'

- Use `fill` to replace text in an editable field.
- Use `type` to append text to the current insertion point.
- Do not use `fill` or `type` just to make the app reveal information that is not currently visible unless the user asked for that interaction.

## Query and sync rules

Expand Down Expand Up @@ -109,6 +146,11 @@ Anti-hallucination rules:
- Discover them first with `devices`, `open`, `snapshot -i`, `find`, or `session list`.
- If refs drift after navigation, re-snapshot or switch to selectors instead of guessing.

Avoid this escalation path for visible-text questions:

- Do not jump from `snapshot -i` to `get text @ref`, then to web search, then to typing into a search box just to force the app to reveal the answer.
- Start with `snapshot`. If the text is not visible or exposed, report that directly.

Canonical QA loop:

```bash
Expand Down
3 changes: 1 addition & 2 deletions skills/agent-device/references/macos-desktop.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ Do not treat every macOS surface the same. Use the normal `app` surface when you

```bash
agent-device open TextEdit --platform macos
agent-device snapshot -i
agent-device fill @e3 "desktop smoke test"
agent-device snapshot
agent-device close
```

Expand Down
17 changes: 15 additions & 2 deletions skills/agent-device/references/remote-tenancy.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

## When to open this file

Open this file only for remote daemon HTTP flows that require explicit daemon URL setup, authentication, lease allocation, or tenant-scoped command admission.
Open this file for remote daemon HTTP flows, including `--remote-config` launches, that let an agent running in a Linux sandbox talk to another `agent-device` instance on a remote macOS host in order to control devices that are not available locally. This file covers daemon URL setup, authentication, lease allocation, and tenant-scoped command admission.

## Main commands to reach for first

- `agent-device open <app> --remote-config <path> --relaunch`
- `AGENT_DEVICE_DAEMON_BASE_URL=...`
- `AGENT_DEVICE_DAEMON_AUTH_TOKEN=...`
- `curl ... agent_device.lease.allocate`
Expand All @@ -17,7 +18,19 @@ Open this file only for remote daemon HTTP flows that require explicit daemon UR

Do not run a tenant-isolated command without matching `tenant`, `run`, and `lease` scope. Admission checks require all three to line up.

## Canonical loop
## Preferred remote launch path

Use this when the agent needs the simplest remote control flow: a Linux sandbox agent talks over HTTP to `agent-device` on a remote macOS host and launches the target app through a checked-in `--remote-config` profile.

```bash
agent-device open com.example.myapp --remote-config ./agent-device.remote.json --relaunch
```

- This is the preferred remote launch path for sandbox or cloud agents.
- For Android React Native relaunch flows, install or reinstall the APK first, then relaunch by installed package name.
- Do not use `open <apk|aab> --relaunch`; remote runtime hints are applied through the installed app sandbox.

## Lease flow example

```bash
export AGENT_DEVICE_DAEMON_BASE_URL=http://mac-host.example:4310
Expand Down
5 changes: 2 additions & 3 deletions skills/agent-device/references/verification.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@ Do not use verification tools as the first exploration step. First get the app i

```bash
agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e5
agent-device diff snapshot -i
# after using exploration to reach the state you want to verify
agent-device snapshot
agent-device screenshot /tmp/settings-proof.png
agent-device close
```
Expand Down
Loading