Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ In practice, most work follows the same pattern:
1. Discover the exact app id with `apps` if the package or bundle name is uncertain.
2. `open` a target app or URL.
3. `snapshot -i` to inspect the current screen.
4. `press`, `fill`, `scroll`, `get`, or `wait` using refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs.
4. `press`, `fill`, `scroll`, `get`, or `wait` using refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs. If the target only appears in a hidden-content hint, use `scroll <direction>` and re-snapshot.
Use `rotate <orientation>` when a flow needs a deterministic portrait or landscape state on mobile targets.
5. `diff snapshot` or re-snapshot after UI changes.
6. `close` when the session is finished.
Expand Down
2 changes: 1 addition & 1 deletion skills/agent-device/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Use this skill as a router with mandatory defaults. Read this file first. For no
- In React Native dev or debug builds, check early for visible warning or error overlays, tooltips, and toasts that can steal focus or intercept taps. If they are not part of the requested behavior, dismiss them and continue. If you saw them, report them in the final summary.
- Do not browse the web or use external sources unless the user explicitly asks.
- Re-snapshot after meaningful UI changes instead of reusing stale refs.
- Treat refs in default snapshot output as actionable-now, not durable identities. If a target is off-screen, use `scrollintoview` or scroll and re-snapshot.
- Treat refs in default snapshot output as actionable-now, not durable identities. If a target appears only in an off-screen summary, use `scroll <direction>` and re-snapshot until the target is visible.
- Prefer `@ref` or selector targeting over raw coordinates.
- Ensure the correct target is pinned and an app session is open before interacting.
- Keep the loop short: `open` -> inspect/act -> verify if needed -> `close`.
Expand Down
24 changes: 19 additions & 5 deletions skills/agent-device/references/exploration.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Open this file when the app or screen is already running and you need to discove
- `press`
- `fill`
- `type`
- `scrollintoview`
- `scroll`
- `wait`
- `keyboard dismiss` when the keyboard obscures the next target

Expand Down Expand Up @@ -115,10 +115,10 @@ App: com.apple.Preferences
## Refs vs selectors

- Use refs for discovery, debugging, and short local loops.
- Use `scrollintoview @ref` when the target is already known from the current snapshot and you want the command to re-snapshot after each swipe until the element reaches the viewport safe band.
- If `scrollintoview @ref` succeeds, prefer the returned `currentRef` for the next action.
- When a target appears only in a visible-first off-screen summary, such as `[off-screen below] ... "Battery"`, use `scroll down` and then `snapshot -i`. For `[off-screen above]`, use `scroll up` and then `snapshot -i`.
- For more than two repeated scroll checks, create a short shell loop instead of issuing each command by hand. Stop when the label appears or the snapshot stops changing.
- Visible-first off-screen summaries are intentionally compact. If you need the full off-screen tree instead of a short summary, retry with `snapshot --raw`.
- Cap long searches with `--max-scrolls <n>` when the list may be unbounded or the target may not exist.
- Cap long searches in the loop when the list may be unbounded or the target may not exist.
- Use selectors for deterministic scripts, assertions, and replay-friendly actions.
- Prefer selector or `@ref` targeting over raw coordinates.
- For tap interactions, `press` is canonical and `click` is an equivalent alias.
Expand All @@ -132,11 +132,25 @@ agent-device press 'id="camera_row" || label="Camera" role=button'
agent-device is visible 'id="camera_settings_anchor"'
```

Example loop:

```bash
previous=''
for _ in 1 2 3 4 5 6; do
current="$(agent-device snapshot -i)"
printf '%s\n' "$current"
printf '%s\n' "$current" | grep -q 'Battery' && break
[ "$current" = "$previous" ] && break
previous="$current"
agent-device scroll down 0.5 >/dev/null
done
```

## Interaction fallbacks

When `press @ref` fails:

1. If the error says the ref is off-screen, run `scrollintoview @ref` and reuse the returned `currentRef` or take one fresh snapshot.
1. If the error says the ref is off-screen, use the off-screen summary direction to run `scroll <direction>`, then take a fresh `snapshot -i`.
2. Re-snapshot if the UI may have changed.
3. Retry `press @ref` or a selector-based `press`.
4. If `screenshot --overlay-refs --json` returned a reliable `overlayRefs[].center`, use `agent-device press <x> <y>`.
Expand Down
11 changes: 0 additions & 11 deletions src/cli/commands/generic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -176,17 +176,6 @@ export const genericClientCommandHandlers = {
pixels: flags.pixels,
}),
),
[CLIENT_COMMANDS.scrollIntoView]: createGenericClientCommandHandler(
CLIENT_COMMANDS.scrollIntoView,
({ client, positionals, flags }) =>
client.interactions.scrollIntoView({
...buildSelectionOptions(flags),
...(positionals[0]?.startsWith('@')
? { ref: positionals[0], label: positionals.slice(1).join(' ') || undefined }
: { text: positionals.join(' ') }),
maxScrolls: flags.maxScrolls,
}),
),
[CLIENT_COMMANDS.pinch]: createGenericClientCommandHandler(
CLIENT_COMMANDS.pinch,
({ client, positionals, flags }) =>
Expand Down
12 changes: 0 additions & 12 deletions src/cli/commands/output.ts
Original file line number Diff line number Diff line change
Expand Up @@ -130,9 +130,6 @@ export function writeCommandCliOutput(
const successText = readCommandMessage(data);
if (successText) {
process.stdout.write(`${successText}\n`);
for (const extraLine of readCommandSuccessLines(command, data)) {
process.stdout.write(`${extraLine}\n`);
}
}
return 0;
}
Expand Down Expand Up @@ -216,12 +213,3 @@ function writeNetworkCliOutput(data: Record<string, unknown>): void {
}
}
}

function readCommandSuccessLines(command: string, data: Record<string, unknown>): string[] {
if (command !== CLIENT_COMMANDS.scrollIntoView) {
return [];
}
const ref = typeof data.ref === 'string' ? data.ref : '';
const currentRef = typeof data.currentRef === 'string' ? data.currentRef : '';
return currentRef && currentRef !== ref ? [`Current ref: @${currentRef}`] : [];
}
1 change: 0 additions & 1 deletion src/client-command-registry.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ export const CLIENT_COMMANDS = {
replay: 'replay',
rotate: 'rotate',
scroll: 'scroll',
scrollIntoView: 'scrollintoview',
screenshot: 'screenshot',
settings: 'settings',
snapshot: 'snapshot',
Expand Down
1 change: 0 additions & 1 deletion src/client-normalizers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,6 @@ export function buildFlags(options: InternalRequestOptions): CommandFlags {
clickButton: options.clickButton,
pauseMs: options.pauseMs,
pattern: options.pattern,
maxScrolls: options.maxScrolls,
headless: options.headless,
restart: options.restart,
replayUpdate: options.replayUpdate,
Expand Down
18 changes: 0 additions & 18 deletions src/client-types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -527,22 +527,6 @@ export type ScrollOptions = ClientCommandBaseOptions & {
pixels?: number;
};

export type ScrollIntoViewOptions = ClientCommandBaseOptions &
(
| {
text: string;
ref?: never;
label?: never;
}
| {
ref: string;
label?: string;
text?: never;
}
) & {
maxScrolls?: number;
};

export type PinchOptions = ClientCommandBaseOptions & {
scale: number;
x?: number;
Expand Down Expand Up @@ -704,7 +688,6 @@ type CommandExecutionOptions = {
clickButton?: 'primary' | 'secondary' | 'middle';
pauseMs?: number;
pattern?: 'one-way' | 'ping-pong';
maxScrolls?: number;
headless?: boolean;
restart?: boolean;
replayUpdate?: boolean;
Expand Down Expand Up @@ -802,7 +785,6 @@ export type AgentDeviceClient = {
type: (options: TypeTextOptions) => Promise<CommandRequestResult>;
fill: (options: FillOptions) => Promise<CommandRequestResult>;
scroll: (options: ScrollOptions) => Promise<CommandRequestResult>;
scrollIntoView: (options: ScrollIntoViewOptions) => Promise<CommandRequestResult>;
pinch: (options: PinchOptions) => Promise<CommandRequestResult>;
get: (options: GetOptions) => Promise<CommandRequestResult>;
is: (options: IsOptions) => Promise<CommandRequestResult>;
Expand Down
16 changes: 0 additions & 16 deletions src/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -351,12 +351,6 @@ export function createAgentDeviceClient(
[options.direction, ...optionalNumber(options.amount)],
options,
),
scrollIntoView: async (options) =>
await executeCommandRequest(
CLIENT_COMMANDS.scrollIntoView,
scrollIntoViewPositionals(options),
options,
),
pinch: async (options) =>
await executeCommandRequest(
CLIENT_COMMANDS.pinch,
Expand Down Expand Up @@ -458,15 +452,6 @@ function elementPositionals(options: ElementTarget): string[] {
return [options.selector];
}

function scrollIntoViewPositionals(options: {
text?: string;
ref?: string;
label?: string;
}): string[] {
if (options.ref !== undefined) return [options.ref, ...optionalString(options.label)];
return [options.text ?? ''];
}

function stringifyPayload(payload: AppPushOptions['payload']): string {
return typeof payload === 'string' ? payload : JSON.stringify(payload);
}
Expand Down Expand Up @@ -615,7 +600,6 @@ export type {
ReplayTestOptions,
RotateCommandOptions,
RotateCommandResult,
ScrollIntoViewOptions,
ScrollOptions,
SessionCloseResult,
SettingsUpdateOptions,
Expand Down
3 changes: 0 additions & 3 deletions src/core/__tests__/capabilities.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,6 @@ test('core commands support iOS simulator, iOS device, and Android', () => {
'rotate',
'screenshot',
'scroll',
'scrollintoview',
'snapshot',
'trigger-app-event',
'type',
Expand Down Expand Up @@ -200,7 +199,6 @@ test('macOS supports the Apple runner interaction core but excludes mobile-only
'settings',
'screenshot',
'scroll',
'scrollintoview',
'snapshot',
'swipe',
'trigger-app-event',
Expand Down Expand Up @@ -310,7 +308,6 @@ test('Linux supports desktop interaction commands and blocks mobile/unsupported
'record',
'reinstall',
'rotate',
'scrollintoview',
'settings',
'trigger-app-event',
],
Expand Down
5 changes: 0 additions & 5 deletions src/core/capabilities.ts
Original file line number Diff line number Diff line change
Expand Up @@ -203,11 +203,6 @@ const COMMAND_CAPABILITY_MATRIX: Record<string, CommandCapability> = {
android: { emulator: true, device: true, unknown: true },
linux: LINUX_DEVICE,
},
scrollintoview: {
apple: { simulator: true, device: true },
android: { emulator: true, device: true, unknown: true },
linux: LINUX_NONE,
},
swipe: {
apple: { simulator: true, device: true },
android: { emulator: true, device: true, unknown: true },
Expand Down
16 changes: 0 additions & 16 deletions src/core/dispatch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ type DispatchContext = {
backMode?: 'in-app' | 'system';
pauseMs?: number;
pattern?: 'one-way' | 'ping-pong';
maxScrolls?: number;
surface?: SessionSurface;
};

Expand Down Expand Up @@ -166,21 +165,6 @@ export async function dispatchCommand(
}
case 'scroll':
return handleScrollCommand(interactor, positionals, context);
case 'scrollintoview': {
const text = positionals.join(' ').trim();
if (!text) throw new AppError('INVALID_ARGS', 'scrollintoview requires text');
const result = await interactor.scrollIntoView(text, {
maxScrolls: context?.maxScrolls,
});
if (typeof result?.attempts === 'number') {
return {
text,
attempts: result.attempts,
...successText(`Scrolled into view: ${text}`),
};
}
return { text, ...successText(`Scrolled into view: ${text}`) };
}
case 'pinch':
return handlePinchCommand(device, positionals, context, runnerCtx);
case 'trigger-app-event': {
Expand Down
9 changes: 0 additions & 9 deletions src/core/interactors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ import {
rotateAndroid,
swipeAndroid,
scrollAndroid,
scrollIntoViewAndroid,
screenshotAndroid,
setAndroidSetting,
typeAndroid,
Expand Down Expand Up @@ -104,10 +103,6 @@ export type Interactor = {
direction: ScrollDirection,
options?: { amount?: number; pixels?: number },
): Promise<Record<string, unknown> | void>;
scrollIntoView(
text: string,
options?: { maxScrolls?: number },
): Promise<{ attempts?: number } | void>;
screenshot(outPath: string, options?: ScreenshotOptions): Promise<void>;
back(mode?: BackMode): Promise<void>;
home(): Promise<void>;
Expand Down Expand Up @@ -141,7 +136,6 @@ export function getInteractor(device: DeviceInfo, runnerContext: RunnerContext):
type: (text, delayMs) => typeAndroid(device, text, delayMs),
fill: (x, y, text, delayMs) => fillAndroid(device, x, y, text, delayMs),
scroll: (direction, options) => scrollAndroid(device, direction, options),
scrollIntoView: (text, options) => scrollIntoViewAndroid(device, text, options),
screenshot: (outPath) => screenshotAndroid(device, outPath),
back: (_mode) => backAndroid(device),
home: () => homeAndroid(device),
Expand All @@ -165,9 +159,6 @@ export function getInteractor(device: DeviceInfo, runnerContext: RunnerContext):
type: (text, delayMs) => typeLinux(text, delayMs),
fill: (x, y, text, delayMs) => fillLinux(x, y, text, delayMs),
scroll: (direction, options) => scrollLinux(direction, options),
scrollIntoView: () => {
throw new AppError('UNSUPPORTED_OPERATION', 'scrollIntoView not yet supported on Linux');
},
screenshot: (outPath) => screenshotLinux(outPath),
back: () => backLinux(),
home: () => homeLinux(),
Expand Down
84 changes: 0 additions & 84 deletions src/daemon/__tests__/scroll-planner.test.ts

This file was deleted.

Loading
Loading