Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The project is in early development and considered experimental. Pull requests a
## Features
- Platforms: iOS (simulator + physical device core automation) and Android (emulator + device).
- Core commands: `open`, `back`, `home`, `app-switcher`, `press`, `long-press`, `focus`, `type`, `fill`, `scroll`, `scrollintoview`, `wait`, `alert`, `screenshot`, `close`, `reinstall`.
- Inspection commands: `snapshot` (accessibility tree), `appstate`, `apps`, `devices`.
- Inspection commands: `snapshot` (accessibility tree), `diff snapshot` (snapshot diffs), `appstate`, `apps`, `devices`.
- Device tooling: `adb` (Android), `simctl`/`devicectl` (iOS via Xcode).
- Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).

Expand All @@ -34,13 +34,14 @@ npx agent-device open SampleApp
## Quick Start

Use refs for agent-driven exploration and normal automation flows.
Use `press` as the canonical tap command; `click` is an equivalent alias.
Use `press` as the canonical tap command; `click` is an equivalent alias; `dblclick` is an alias for `click --double-tap`.

```bash
agent-device open Contacts --platform ios # creates session on iOS Simulator
agent-device snapshot
agent-device press @e5
agent-device fill @e6 "John"
agent-device diff snapshot
agent-device fill @e7 "Doe"
agent-device press @e3
agent-device close
Expand Down Expand Up @@ -105,6 +106,7 @@ agent-device open SampleApp
agent-device snapshot
agent-device press @e7
agent-device fill @e8 "hello"
agent-device diff snapshot
agent-device close SampleApp
```

Expand All @@ -122,6 +124,7 @@ Coordinates:
- X increases to the right, Y increases downward.
- `press` is the canonical tap command.
- `click` is an equivalent alias and accepts the same targets (`x y`, `@ref`, selector) and flags.
- `dblclick` is shorthand for `click --double-tap`.

Gesture series examples:

Expand All @@ -135,8 +138,8 @@ agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-p
## Command Index
- `boot`, `open`, `close`, `reinstall`, `home`, `back`, `app-switcher`
- `batch`
- `snapshot`, `find`, `get`
- `press` (alias: `click`), `focus`, `type`, `fill`, `long-press`, `swipe`, `scroll`, `scrollintoview`, `pinch`, `is`
- `snapshot`, `diff`, `find`, `get`
- `press` (aliases: `click`, `dblclick`), `focus`, `type`, `fill`, `long-press`, `swipe`, `scroll`, `scrollintoview`, `pinch`, `is`
- `alert`, `wait`, `screenshot`
- `trace start`, `trace stop`
- `settings wifi|airplane|location on|off`
Expand All @@ -149,6 +152,7 @@ Notes:
- iOS snapshots use XCTest on simulators and physical devices.
- Scope snapshots with `-s "<label>"` or `-s @ref`.
- If XCTest returns 0 nodes (e.g., foreground app changed), agent-device fails explicitly.
- `diff snapshot` compares the current snapshot against the previous snapshot in the same session and then updates the baseline.

Flags:
- `--version, -V` print version and exit
Expand All @@ -162,7 +166,7 @@ Flags:
- `--interval-ms <ms>` delay between `press` iterations
- `--hold-ms <ms>` hold duration per `press` iteration
- `--jitter-px <n>` deterministic coordinate jitter for `press`
- `--double-tap` use a double-tap gesture per `press`/`click` iteration (cannot be combined with `--hold-ms` or `--jitter-px`)
- `--double-tap` use a double-tap gesture per `press`/`click`/`dblclick` iteration (cannot be combined with `--hold-ms` or `--jitter-px`)
- `--pause-ms <ms>` delay between `swipe` iterations
- `--pattern one-way|ping-pong` repeat pattern for `swipe`
- `--debug` (alias: `--verbose`) for debug diagnostics + daemon/runner logs
Expand Down Expand Up @@ -235,7 +239,7 @@ Replay update:
- `replay <path>` runs deterministic replay from `.ad` scripts.
- `replay -u <path>` attempts selector updates on failures and atomically rewrites the same file.
- Refs are the default/core mechanism for interactive agent flows.
- Update targets: `click`, `fill`, `get`, `is`, `wait`.
- Update targets: `click`, `dblclick`, `fill`, `get`, `is`, `wait`.
- Selector matching is a replay-update internal: replay parses `.ad` lines into actions, tries them, snapshots on failure, resolves a better selector, then rewrites that failing line.

Update examples:
Expand Down
11 changes: 6 additions & 5 deletions skills/agent-device/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ npx -y agent-device

1. Open app or deep link: `open [app|url] [url]` (`open` handles target selection + boot/activation in the normal flow)
2. Snapshot: `snapshot` to get refs from accessibility tree
3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`)
3. Interact using refs (`press @ref`, `fill @ref "text"`; `click` is an alias of `press`, `dblclick` is an alias of `click --double-tap`)
4. Re-snapshot after navigation/UI changes
5. Close session when done

Expand Down Expand Up @@ -115,7 +115,8 @@ agent-device appstate
### Interactions (use @refs from snapshot)

```bash
agent-device press @e1 # Canonical tap command (`click` is an alias)
agent-device press @e1 # Canonical tap command (`click` is an alias, `dblclick` is a double-tap alias)
agent-device dblclick @e1 # Equivalent to: click @e1 --double-tap
agent-device focus @e2
agent-device fill @e2 "text" # Clear then type (Android: verifies value and retries once on mismatch)
agent-device type "text" # Type into focused field without clearing
Expand Down Expand Up @@ -230,9 +231,9 @@ agent-device apps --platform android --user-installed

## Best practices

- `press` is the canonical tap command; `click` is an alias with the same behavior.
- `press` (and `click`) accepts `x y`, `@ref`, and selector targets.
- `press`/`click` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`.
- `press` is the canonical tap command; `click` is an alias with the same behavior; `dblclick` is shorthand for `click --double-tap`.
- `press`, `click`, and `dblclick` accept `x y`, `@ref`, and selector targets.
- `press`/`click`/`dblclick` support gesture series controls: `--count`, `--interval-ms`, `--hold-ms`, `--jitter-px`, `--double-tap`.
- `--double-tap` cannot be combined with `--hold-ms` or `--jitter-px`.
- `swipe` supports coordinate + timing controls and repeat patterns: `swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern`.
- `swipe` timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid long-press side effects.
Expand Down
9 changes: 7 additions & 2 deletions src/cli.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { parseArgs, toDaemonFlags, usage, usageForCommand } from './utils/args.ts';
import { asAppError, AppError, normalizeError } from './utils/errors.ts';
import { formatSnapshotText, printHumanError, printJson } from './utils/output.ts';
import { formatSnapshotDiffText, formatSnapshotText, printHumanError, printJson } from './utils/output.ts';

Check failure on line 3 in src/cli.ts

View workflow job for this annotation

GitHub Actions / Typecheck

'"./utils/output.ts"' has no exported member named 'formatSnapshotDiffText'. Did you mean 'formatSnapshotText'?
import { readVersion } from './utils/version.ts';
import { pathToFileURL } from 'node:url';
import { sendToDaemon } from './daemon-client.ts';
Expand Down Expand Up @@ -189,6 +189,11 @@
if (logTailStopper) logTailStopper();
return;
}
if (command === 'diff' && positionals[0]?.toLowerCase() === 'snapshot') {
process.stdout.write(formatSnapshotDiffText((response.data ?? {}) as Record<string, unknown>));
if (logTailStopper) logTailStopper();
return;
}
if (command === 'get') {
const sub = positionals[0];
if (sub === 'text') {
Expand Down Expand Up @@ -235,7 +240,7 @@
if (logTailStopper) logTailStopper();
return;
}
if (command === 'click' || command === 'press') {
if (command === 'click' || command === 'press' || command === 'dblclick') {
const ref = (response.data as any)?.ref ?? '';
const x = (response.data as any)?.x;
const y = (response.data as any)?.y;
Expand Down
11 changes: 9 additions & 2 deletions src/daemon.ts
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,15 @@ function finalizeDaemonResponse(response: DaemonResponse): DaemonResponse {
}

function normalizeAliasedCommands(req: DaemonRequest): DaemonRequest {
if (req.command !== 'click') return req;
return { ...req, command: 'press' };
if (req.command === 'dblclick') {
return {
...req,
command: 'press',
flags: { ...(req.flags ?? {}), doubleTap: true },
};
}
if (req.command === 'click') return { ...req, command: 'press' };
return req;
}

function writeInfo(port: number): void {
Expand Down
32 changes: 32 additions & 0 deletions src/daemon/handlers/__tests__/session.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1083,6 +1083,38 @@ test('replay parses press series flags and passes them to invoke', async () => {
assert.equal(invoked[0]?.flags?.doubleTap, true);
});

test('replay parses dblclick alias and passes click-series flags to invoke', async () => {
const sessionStore = makeSessionStore();
const replayRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-replay-dblclick-series-'));
const replayPath = path.join(replayRoot, 'dblclick-series.ad');
fs.writeFileSync(replayPath, 'dblclick @e5 --count 2\n');

const invoked: DaemonRequest[] = [];
const response = await handleSessionCommands({
req: {
token: 't',
session: 'default',
command: 'replay',
positionals: [replayPath],
flags: {},
},
sessionName: 'default',
logPath: path.join(os.tmpdir(), 'daemon.log'),
sessionStore,
invoke: async (req) => {
invoked.push(req);
return { ok: true, data: {} };
},
});

assert.ok(response);
assert.equal(response?.ok, true);
assert.equal(invoked.length, 1);
assert.equal(invoked[0]?.command, 'dblclick');
assert.deepEqual(invoked[0]?.positionals, ['@e5']);
assert.equal(invoked[0]?.flags?.count, 2);
});

test('replay inherits parent device selectors for each invoked step', async () => {
const sessionStore = makeSessionStore();
const replayRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-replay-parent-selectors-'));
Expand Down
4 changes: 2 additions & 2 deletions src/daemon/script-utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ const SWIPE_NUMERIC_FLAG_MAP = new Map<string, 'count' | 'pauseMs'>([
['--pause-ms', 'pauseMs'],
]);

export function isClickLikeCommand(command: string): command is 'click' | 'press' {
return command === 'click' || command === 'press';
export function isClickLikeCommand(command: string): command is 'click' | 'press' | 'dblclick' {
return command === 'click' || command === 'press' || command === 'dblclick';
}

export function formatScriptArg(value: string): string {
Expand Down
17 changes: 17 additions & 0 deletions src/utils/__tests__/args.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,13 @@ test('parseArgs recognizes click series flags', () => {
assert.equal(parsed.flags.intervalMs, 10);
});

test('parseArgs treats dblclick alias as parser-level command without implicit defaults', () => {
const parsed = parseArgs(['dblclick', '@e5'], { strictFlags: true });
assert.equal(parsed.command, 'dblclick');
assert.deepEqual(parsed.positionals, ['@e5']);
assert.equal(parsed.flags.doubleTap, undefined);
});

test('parseArgs recognizes double-tap flag for repeated press', () => {
const parsed = parseArgs(['press', '201', '545', '--count', '5', '--double-tap'], { strictFlags: true });
assert.equal(parsed.command, 'press');
Expand Down Expand Up @@ -149,6 +156,7 @@ test('parseArgs rejects invalid swipe pattern', () => {

test('usage includes --relaunch flag', () => {
assert.match(usage(), /--relaunch/);
assert.match(usage(), /dblclick <x y\|@ref\|selector>/);
assert.match(usage(), /--save-script \[path\]/);
assert.match(usage(), /pinch <scale> \[x\] \[y\]/);
assert.doesNotMatch(usage(), /--metadata/);
Expand Down Expand Up @@ -202,6 +210,14 @@ test('snapshot command accepts command-specific flags', () => {
assert.equal(parsed.flags.snapshotScope, 'Login');
});

test('diff snapshot command accepts snapshot flags', () => {
const parsed = parseArgs(['diff', 'snapshot', '--depth', '2', '--raw'], { strictFlags: true });
assert.equal(parsed.command, 'diff');
assert.deepEqual(parsed.positionals, ['snapshot']);
assert.equal(parsed.flags.snapshotDepth, 2);
assert.equal(parsed.flags.snapshotRaw, true);
});

test('unknown short flags are rejected', () => {
assert.throws(
() => parseArgs(['press', '10', '20', '-x'], { strictFlags: true }),
Expand Down Expand Up @@ -266,6 +282,7 @@ test('invalid range errors are deterministic', () => {
test('usage includes swipe and press series options', () => {
const help = usage();
assert.match(help, /swipe <x1> <y1> <x2> <y2>/);
assert.match(help, /diff snapshot/);
assert.match(help, /--pattern one-way\|ping-pong/);
assert.match(help, /--interval-ms/);
assert.match(help, /settings <wifi\|airplane\|location\|faceid>/);
Expand Down
29 changes: 27 additions & 2 deletions src/utils/command-schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,23 @@ const SNAPSHOT_FLAGS = [
'snapshotRaw',
] as const satisfies readonly FlagKey[];

const DIFF_SNAPSHOT_FLAGS = [...SNAPSHOT_FLAGS] as const satisfies readonly FlagKey[];

const SELECTOR_SNAPSHOT_FLAGS = [
'snapshotDepth',
'snapshotScope',
'snapshotRaw',
] as const satisfies readonly FlagKey[];

const CLICK_LIKE_FLAGS = [
'count',
'intervalMs',
'holdMs',
'jitterPx',
'doubleTap',
...SELECTOR_SNAPSHOT_FLAGS,
] as const satisfies readonly FlagKey[];

const FIND_SNAPSHOT_FLAGS = ['snapshotDepth', 'snapshotRaw'] as const satisfies readonly FlagKey[];

export const FLAG_DEFINITIONS: readonly FlagDefinition[] = [
Expand Down Expand Up @@ -370,6 +381,12 @@ export const COMMAND_SCHEMAS: Record<string, CommandSchema> = {
positionalArgs: [],
allowedFlags: [...SNAPSHOT_FLAGS],
},
diff: {
usageOverride: 'diff snapshot',
description: 'Compare current snapshot against previous session snapshot',
positionalArgs: ['kind'],
allowedFlags: [...DIFF_SNAPSHOT_FLAGS],
},
devices: {
description: 'List available devices',
positionalArgs: [],
Expand Down Expand Up @@ -421,7 +438,15 @@ export const COMMAND_SCHEMAS: Record<string, CommandSchema> = {
description: 'Tap/click by coordinates, snapshot ref, or selector',
positionalArgs: ['target'],
allowsExtraPositionals: true,
allowedFlags: ['count', 'intervalMs', 'holdMs', 'jitterPx', 'doubleTap', ...SELECTOR_SNAPSHOT_FLAGS],
allowedFlags: [...CLICK_LIKE_FLAGS],
},
dblclick: {
usageOverride: 'dblclick <x y|@ref|selector>',
description: 'Alias for click --double-tap',
positionalArgs: ['target'],
allowsExtraPositionals: true,
allowedFlags: [...CLICK_LIKE_FLAGS],
skipCapabilityCheck: true,
},
get: {
usageOverride: 'get text|attrs <@ref|selector>',
Expand All @@ -447,7 +472,7 @@ export const COMMAND_SCHEMAS: Record<string, CommandSchema> = {
description: 'Tap/press by coordinates, snapshot ref, or selector (supports repeated series)',
positionalArgs: ['targetOrX', 'y?'],
allowsExtraPositionals: true,
allowedFlags: ['count', 'intervalMs', 'holdMs', 'jitterPx', 'doubleTap', ...SELECTOR_SNAPSHOT_FLAGS],
allowedFlags: [...CLICK_LIKE_FLAGS],
},
'long-press': {
description: 'Long press (where supported)',
Expand Down
Loading