|
1 | 1 | <a href="https://www.callstack.com/open-source?utm_campaign=generic&utm_source=github&utm_medium=referral&utm_content=agent-device" align="center"> |
2 | 2 | <picture> |
3 | | - <img alt="agent-device" src="website/docs/public/agent-device-banner.jpg"> |
| 3 | + <img alt="agent-device: device automation CLI for AI agents" src="website/docs/public/agent-device-banner.jpg"> |
4 | 4 | </picture> |
5 | 5 | </a> |
6 | 6 |
|
7 | 7 | --- |
8 | 8 |
|
9 | 9 | # agent-device |
10 | 10 |
|
11 | | -`agent-device` is a CLI for UI automation and app observability on iOS, tvOS, macOS, Android, and AndroidTV. It is built for agent-driven workflows: inspect the UI, interact deterministically, collect logs/network/perf evidence when behavior breaks, and keep the whole flow session-aware and replayable. |
| 11 | +[](https://www.npmjs.com/package/agent-device) |
| 12 | +[](https://github.com/callstackincubator/agent-device/actions/workflows/ci.yml) |
| 13 | +[](LICENSE) |
12 | 14 |
|
13 | | -If you know Vercel's [agent-browser](https://github.com/vercel-labs/agent-browser), this project applies the same broad idea to mobile apps and devices. |
| 15 | +Device automation CLI for AI agents. Mobile, TV, and desktop apps. |
14 | 16 |
|
15 | | -[](./website/docs/public/agent-device-contacts.mp4) |
| 17 | +`agent-device` lets coding agents run real apps, inspect UI state, interact with visible elements, and collect debugging evidence through one CLI. |
16 | 18 |
|
17 | | -## Project Goals |
| 19 | +It is built around token-efficient accessibility snapshots, not pixel-first screenshots. Agents read compact UI trees, locate elements through refs like `@e3`, perform touch and text actions, and capture screenshots, video, logs, network, perf, and React profiles only when evidence is needed. |
18 | 20 |
|
19 | | -- Give agents a practical way to understand mobile UI state through structured snapshots. |
20 | | -- Keep automation flows token-efficient enough for real agent loops. |
21 | | -- Make common interactions reliable enough for repeated automation runs. |
22 | | -- Make debugging evidence easy to collect through logs, network inspection, and performance snapshots. |
23 | | -- Keep automation grounded in sessions, selectors, and replayable flows instead of one-off scripts. |
| 21 | +Built for two agentic workflows: |
24 | 22 |
|
25 | | -## Core Ideas |
| 23 | +- **Quality Assurance**: dogfood flows, validate PR builds, check accessibility coverage, capture evidence, and turn stable explorations into `.ad` e2e tests. |
| 24 | +- **Development**: build from specs, reproduce crashes and support issues, inspect logs/network/perf data, and iterate until the UI matches the work. |
26 | 25 |
|
27 | | -- Sessions: open a target once, interact within that session, then close it cleanly. |
28 | | -- Snapshots: inspect the current accessibility tree in a compact form and get current-screen refs for exploration. |
29 | | -- Refs vs selectors: use refs for discovery, use selectors for durable replay and assertions. |
30 | | -- Observability: collect session logs, inspect recent HTTP traffic with `network dump`, and sample CPU/memory with `perf`. |
31 | | -- Tests: run deterministic `.ad` scripts as a light e2e test suite. |
32 | | -- Replay scripts: save `.ad` flows with `--save-script`, replay one script with `replay`, or run a folder/glob as a serial suite with `test`. |
33 | | - `test` supports metadata-aware retries up to 3 additional attempts, per-test timeouts, flaky pass reporting, and runner-managed artifacts under `.agent-device/test-artifacts` by default. Each attempt writes `replay.ad` and `result.txt`; failed attempts also keep copied logs and artifacts when available. |
34 | | -- Human docs vs agent skills: docs explain the system for people; skills provide compact operating guidance for agents. |
| 26 | +If you know Vercel's [agent-browser](https://github.com/vercel-labs/agent-browser), this is the same idea for apps and devices. |
35 | 27 |
|
36 | | -## Complementary Tooling |
| 28 | + |
37 | 29 |
|
38 | | -Use `agent-device` for on-device UI automation, screenshots/recordings, app logs, network inspection, and performance snapshots. |
| 30 | +## Quick Start |
39 | 31 |
|
40 | | -When the task needs the React Native component tree, props, state, hooks, or render profiling, use the bundled passthrough: |
| 32 | +Install the CLI. |
41 | 33 |
|
42 | 34 | ```bash |
43 | | -agent-device react-devtools status |
44 | | -agent-device react-devtools get tree --depth 3 |
45 | | -agent-device react-devtools profile start |
46 | | -agent-device react-devtools profile stop |
47 | | -agent-device react-devtools profile slow --limit 5 |
| 35 | +npm install -g agent-device |
48 | 36 | ``` |
49 | 37 |
|
50 | | -`react-devtools` dynamically runs pinned `agent-react-devtools@0.4.0` commands 1:1, so `agent-device` covers both the device/app runtime layer and React component internals without making React DevTools part of the daemon. |
51 | | - |
52 | | -When an Android session is connected through a remote bridge profile, `react-devtools` automatically opens a lease-scoped companion tunnel for the local DevTools daemon on port 8097 and cleans it up when the command exits. |
53 | | - |
54 | | -Remote Android React DevTools assumes the React Native-bundled DevTools behavior in React Native 0.83+. Older browser/Chromium DevTools workflows are not assumed to exist inside remote sandboxes. Expo projects should be verified against the SDK's bundled React Native version before relying on this path; this release does not claim a separately verified Expo SDK version. |
| 38 | +Prerequisites: Node.js 22+, Xcode for iOS/tvOS/macOS targets, Android SDK + ADB for Android, and macOS Accessibility permission for desktop automation. See [Installation](https://incubator.callstack.com/agent-device/docs/installation). |
55 | 39 |
|
56 | | -## Command Flow |
57 | | - |
58 | | -The canonical loop is: |
| 40 | +Try the loop. |
59 | 41 |
|
60 | 42 | ```bash |
| 43 | +# Find the app. |
61 | 44 | agent-device apps --platform ios |
| 45 | + |
| 46 | +# Start a session. |
62 | 47 | agent-device open SampleApp --platform ios |
| 48 | + |
| 49 | +# Inspect the current screen. -i returns interactive elements only. |
63 | 50 | agent-device snapshot -i |
64 | | -agent-device press @e3 |
65 | | -agent-device diff snapshot -i |
66 | | -agent-device fill @e5 "test" |
67 | | -agent-device press @e5 |
68 | | -agent-device type " more" --delay-ms 80 |
| 51 | +# @e1 [heading] "Settings" |
| 52 | +# @e2 [button] "Sign In" |
| 53 | +# @e3 [text-field] "Email" |
| 54 | + |
| 55 | +# Act, capture a screenshot, and close. |
| 56 | +agent-device fill @e3 "test" |
| 57 | +agent-device screenshot ./artifacts/settings.png |
69 | 58 | agent-device close |
70 | 59 | ``` |
71 | 60 |
|
72 | | -In practice, most work follows the same pattern: |
| 61 | +Snapshots assign refs like `@e1`, `@e2`, and `@e3` to current-screen elements. Refs from the default snapshot are immediately actionable; for hidden content, scroll and re-snapshot. |
| 62 | + |
| 63 | +## Where To Run agent-device |
| 64 | + |
| 65 | +| Path | Best for | Start with | |
| 66 | +| --- | --- | --- | |
| 67 | +| Local | Exploration, debugging, and development loops on simulators, emulators, physical devices, macOS apps, and Linux desktop targets. | Follow the Quick Start. | |
| 68 | +| CI/CD | Automated PR and merge validation with replay scripts and captured artifacts. | Start with the [EAS workflow template](https://github.com/callstackincubator/eas-agent-device/blob/main/.eas/workflows/agent-qa-mobile.yml). GitHub Actions template coming soon. | |
| 69 | +| Cloud | Linux runners, managed devices, and remote execution. | Use [Agent Device Cloud](https://agent-device.dev/cloud) or [contact Callstack](mailto:hello@callstack.com) for team-scale QA. | |
73 | 70 |
|
74 | | -1. Discover the exact app id with `apps` if the package or bundle name is uncertain. |
75 | | -2. `open` a target app or URL. |
76 | | -3. `snapshot -i` to inspect the current screen. |
77 | | -4. `press`, `fill`, `scroll`, `get`, or `wait` using refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs. If the target only appears in a hidden-content hint, use `scroll <direction>` and re-snapshot. |
78 | | - Use `rotate <orientation>` when a flow needs a deterministic portrait or landscape state on mobile targets. |
79 | | -5. `diff snapshot` or re-snapshot after UI changes. |
80 | | -6. `close` when the session is finished. |
| 71 | +## Capabilities |
81 | 72 |
|
82 | | -In non-JSON mode, core mutating commands print a short success acknowledgment so agents and humans can distinguish successful actions from dropped or silent no-ops. |
| 73 | +- **Platforms**: iOS, Android, tvOS, Android TV, macOS, and Linux. Real devices and simulators are supported. |
| 74 | +- **Capture**: screenshots, video, logs, network traffic, performance data, accessibility snapshots, and React render profiles. |
| 75 | +- **Produce**: replayable `.ad` scripts (recorded replay files that run locally or in CI), e2e test runs, snapshot and screenshot diffs, and debugging artifacts. |
| 76 | +- **React Native and Expo**: component tree inspection, props/state/hooks, and render profiling. |
| 77 | +- **License**: MIT. Free to use. |
83 | 78 |
|
84 | | -## Where To Go Next |
| 79 | +## How It Works |
85 | 80 |
|
86 | | -For people: |
| 81 | +`agent-device` runs session-aware commands through platform backends: XCTest for iOS and tvOS, ADB plus the Android snapshot helper for Android, a local helper for macOS desktop automation, and AT-SPI for Linux desktop targets. See [Introduction](https://incubator.callstack.com/agent-device/docs/introduction) and [Commands](https://incubator.callstack.com/agent-device/docs/commands) for platform details. |
87 | 82 |
|
88 | | -- [Website](https://agent-device.dev/) |
89 | | -- [Docs](https://incubator.callstack.com/agent-device/docs/introduction) |
90 | | -- [Skillgym starter](test/skillgym/README.md) |
| 83 | +## Used By |
91 | 84 |
|
92 | | -Local benchmark starter: |
| 85 | +Used by teams and developers at Callstack, Expensify, Shopify, Kindred, Total Wine & More, LegendList, HerLyfe, App & Flow, and more. |
93 | 86 |
|
94 | | -- `pnpm test:skillgym` |
| 87 | +## Documentation |
95 | 88 |
|
96 | | -For agents: |
| 89 | +- [Installation](https://incubator.callstack.com/agent-device/docs/installation) |
| 90 | +- [Commands](https://incubator.callstack.com/agent-device/docs/commands) |
| 91 | +- [Replay & E2E](https://incubator.callstack.com/agent-device/docs/replay-e2e) |
| 92 | +- [Known limitations](https://incubator.callstack.com/agent-device/docs/known-limitations) |
| 93 | + |
| 94 | +Agent integration: |
97 | 95 |
|
98 | 96 | - [agent-device skill](skills/agent-device/SKILL.md) |
99 | 97 | - [react-devtools skill](skills/react-devtools/SKILL.md) |
100 | 98 | - [dogfood skill](skills/dogfood/SKILL.md) |
101 | 99 | - [agent-device skill on ClawHub](https://clawhub.ai/okwasniewski/agent-device) |
102 | 100 |
|
103 | | -## Install |
104 | | - |
105 | | -```bash |
106 | | -npm install -g agent-device |
107 | | -``` |
108 | | - |
109 | | -`agent-device` now performs a lightweight background upgrade check for interactive CLI runs and, when a newer package is available, suggests a global reinstall command. Updating the package also refreshes the bundled `skills/` shipped with the CLI. |
110 | | - |
111 | | -Set `AGENT_DEVICE_NO_UPDATE_NOTIFIER=1` to disable the notice. |
112 | | - |
113 | | -On macOS, `agent-device` includes a local `agent-device-macos-helper` source package that is built on demand for desktop permission checks, alert handling, and helper-backed desktop snapshot surfaces. Release distribution should use a signed/notarized helper build; source checkouts fall back to a local Swift build. Local helper overrides through `AGENT_DEVICE_MACOS_HELPER_BIN` must use an absolute executable path. |
114 | | - |
115 | 101 | ## Contributing |
116 | 102 |
|
117 | 103 | See [CONTRIBUTING.md](CONTRIBUTING.md). |
118 | 104 |
|
119 | 105 | ## Made at Callstack |
120 | 106 |
|
121 | | -agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi. |
| 107 | +agent-device is open source and MIT licensed. Try the [EAS workflow template](https://github.com/callstackincubator/eas-agent-device/blob/main/.eas/workflows/agent-qa-mobile.yml), use [Agent Device Cloud](https://agent-device.dev/cloud), or contact us at hello@callstack.com. |
0 commit comments