diff --git a/README.md b/README.md index 5dfcd8f44..bd242b8d7 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ In practice, most work follows the same pattern: 1. Discover the exact app id with `apps` if the package or bundle name is uncertain. 2. `open` a target app or URL. 3. `snapshot -i` to inspect the current screen. -4. `press`, `fill`, `scroll`, `get`, or `wait` using refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs. +4. `press`, `fill`, `scroll`, `get`, or `wait` using refs or selectors. On iOS and Android, default snapshot text follows the same visible-first contract: refs shown in default output are actionable now, while hidden content is surfaced as scroll/list discovery hints instead of tappable off-screen refs. If the target only appears in a hidden-content hint, use `scroll ` and re-snapshot. Use `rotate ` when a flow needs a deterministic portrait or landscape state on mobile targets. 5. `diff snapshot` or re-snapshot after UI changes. 6. `close` when the session is finished. diff --git a/skills/agent-device/SKILL.md b/skills/agent-device/SKILL.md index ad63629c6..ca3d83f46 100644 --- a/skills/agent-device/SKILL.md +++ b/skills/agent-device/SKILL.md @@ -18,7 +18,7 @@ Use this skill as a router with mandatory defaults. Read this file first. For no - In React Native dev or debug builds, check early for visible warning or error overlays, tooltips, and toasts that can steal focus or intercept taps. If they are not part of the requested behavior, dismiss them and continue. If you saw them, report them in the final summary. - Do not browse the web or use external sources unless the user explicitly asks. - Re-snapshot after meaningful UI changes instead of reusing stale refs. -- Treat refs in default snapshot output as actionable-now, not durable identities. If a target is off-screen, use `scrollintoview` or scroll and re-snapshot. +- Treat refs in default snapshot output as actionable-now, not durable identities. If a target appears only in an off-screen summary, use `scroll ` and re-snapshot until the target is visible. - Prefer `@ref` or selector targeting over raw coordinates. - Ensure the correct target is pinned and an app session is open before interacting. - Keep the loop short: `open` -> inspect/act -> verify if needed -> `close`. diff --git a/skills/agent-device/references/exploration.md b/skills/agent-device/references/exploration.md index 2ff799f70..0290dc58a 100644 --- a/skills/agent-device/references/exploration.md +++ b/skills/agent-device/references/exploration.md @@ -38,7 +38,7 @@ Open this file when the app or screen is already running and you need to discove - `press` - `fill` - `type` -- `scrollintoview` +- `scroll` - `wait` - `keyboard dismiss` when the keyboard obscures the next target @@ -115,10 +115,10 @@ App: com.apple.Preferences ## Refs vs selectors - Use refs for discovery, debugging, and short local loops. -- Use `scrollintoview @ref` when the target is already known from the current snapshot and you want the command to re-snapshot after each swipe until the element reaches the viewport safe band. -- If `scrollintoview @ref` succeeds, prefer the returned `currentRef` for the next action. +- When a target appears only in a visible-first off-screen summary, such as `[off-screen below] ... "Battery"`, use `scroll down` and then `snapshot -i`. For `[off-screen above]`, use `scroll up` and then `snapshot -i`. +- For more than two repeated scroll checks, create a short shell loop instead of issuing each command by hand. Stop when the label appears or the snapshot stops changing. - Visible-first off-screen summaries are intentionally compact. If you need the full off-screen tree instead of a short summary, retry with `snapshot --raw`. -- Cap long searches with `--max-scrolls ` when the list may be unbounded or the target may not exist. +- Cap long searches in the loop when the list may be unbounded or the target may not exist. - Use selectors for deterministic scripts, assertions, and replay-friendly actions. - Prefer selector or `@ref` targeting over raw coordinates. - For tap interactions, `press` is canonical and `click` is an equivalent alias. @@ -132,11 +132,25 @@ agent-device press 'id="camera_row" || label="Camera" role=button' agent-device is visible 'id="camera_settings_anchor"' ``` +Example loop: + +```bash +previous='' +for _ in 1 2 3 4 5 6; do + current="$(agent-device snapshot -i)" + printf '%s\n' "$current" + printf '%s\n' "$current" | grep -q 'Battery' && break + [ "$current" = "$previous" ] && break + previous="$current" + agent-device scroll down 0.5 >/dev/null +done +``` + ## Interaction fallbacks When `press @ref` fails: -1. If the error says the ref is off-screen, run `scrollintoview @ref` and reuse the returned `currentRef` or take one fresh snapshot. +1. If the error says the ref is off-screen, use the off-screen summary direction to run `scroll `, then take a fresh `snapshot -i`. 2. Re-snapshot if the UI may have changed. 3. Retry `press @ref` or a selector-based `press`. 4. If `screenshot --overlay-refs --json` returned a reliable `overlayRefs[].center`, use `agent-device press `. diff --git a/src/cli/commands/generic.ts b/src/cli/commands/generic.ts index a475e6113..319207780 100644 --- a/src/cli/commands/generic.ts +++ b/src/cli/commands/generic.ts @@ -176,17 +176,6 @@ export const genericClientCommandHandlers = { pixels: flags.pixels, }), ), - [CLIENT_COMMANDS.scrollIntoView]: createGenericClientCommandHandler( - CLIENT_COMMANDS.scrollIntoView, - ({ client, positionals, flags }) => - client.interactions.scrollIntoView({ - ...buildSelectionOptions(flags), - ...(positionals[0]?.startsWith('@') - ? { ref: positionals[0], label: positionals.slice(1).join(' ') || undefined } - : { text: positionals.join(' ') }), - maxScrolls: flags.maxScrolls, - }), - ), [CLIENT_COMMANDS.pinch]: createGenericClientCommandHandler( CLIENT_COMMANDS.pinch, ({ client, positionals, flags }) => diff --git a/src/cli/commands/output.ts b/src/cli/commands/output.ts index 29e45390a..848fcfa63 100644 --- a/src/cli/commands/output.ts +++ b/src/cli/commands/output.ts @@ -130,9 +130,6 @@ export function writeCommandCliOutput( const successText = readCommandMessage(data); if (successText) { process.stdout.write(`${successText}\n`); - for (const extraLine of readCommandSuccessLines(command, data)) { - process.stdout.write(`${extraLine}\n`); - } } return 0; } @@ -216,12 +213,3 @@ function writeNetworkCliOutput(data: Record): void { } } } - -function readCommandSuccessLines(command: string, data: Record): string[] { - if (command !== CLIENT_COMMANDS.scrollIntoView) { - return []; - } - const ref = typeof data.ref === 'string' ? data.ref : ''; - const currentRef = typeof data.currentRef === 'string' ? data.currentRef : ''; - return currentRef && currentRef !== ref ? [`Current ref: @${currentRef}`] : []; -} diff --git a/src/client-command-registry.ts b/src/client-command-registry.ts index c9d791bb5..b33590b5f 100644 --- a/src/client-command-registry.ts +++ b/src/client-command-registry.ts @@ -28,7 +28,6 @@ export const CLIENT_COMMANDS = { replay: 'replay', rotate: 'rotate', scroll: 'scroll', - scrollIntoView: 'scrollintoview', screenshot: 'screenshot', settings: 'settings', snapshot: 'snapshot', diff --git a/src/client-normalizers.ts b/src/client-normalizers.ts index 3a6366640..af4768266 100644 --- a/src/client-normalizers.ts +++ b/src/client-normalizers.ts @@ -283,7 +283,6 @@ export function buildFlags(options: InternalRequestOptions): CommandFlags { clickButton: options.clickButton, pauseMs: options.pauseMs, pattern: options.pattern, - maxScrolls: options.maxScrolls, headless: options.headless, restart: options.restart, replayUpdate: options.replayUpdate, diff --git a/src/client-types.ts b/src/client-types.ts index 79d91efc4..e75a41855 100644 --- a/src/client-types.ts +++ b/src/client-types.ts @@ -527,22 +527,6 @@ export type ScrollOptions = ClientCommandBaseOptions & { pixels?: number; }; -export type ScrollIntoViewOptions = ClientCommandBaseOptions & - ( - | { - text: string; - ref?: never; - label?: never; - } - | { - ref: string; - label?: string; - text?: never; - } - ) & { - maxScrolls?: number; - }; - export type PinchOptions = ClientCommandBaseOptions & { scale: number; x?: number; @@ -704,7 +688,6 @@ type CommandExecutionOptions = { clickButton?: 'primary' | 'secondary' | 'middle'; pauseMs?: number; pattern?: 'one-way' | 'ping-pong'; - maxScrolls?: number; headless?: boolean; restart?: boolean; replayUpdate?: boolean; @@ -802,7 +785,6 @@ export type AgentDeviceClient = { type: (options: TypeTextOptions) => Promise; fill: (options: FillOptions) => Promise; scroll: (options: ScrollOptions) => Promise; - scrollIntoView: (options: ScrollIntoViewOptions) => Promise; pinch: (options: PinchOptions) => Promise; get: (options: GetOptions) => Promise; is: (options: IsOptions) => Promise; diff --git a/src/client.ts b/src/client.ts index 85750a934..d9135c97c 100644 --- a/src/client.ts +++ b/src/client.ts @@ -351,12 +351,6 @@ export function createAgentDeviceClient( [options.direction, ...optionalNumber(options.amount)], options, ), - scrollIntoView: async (options) => - await executeCommandRequest( - CLIENT_COMMANDS.scrollIntoView, - scrollIntoViewPositionals(options), - options, - ), pinch: async (options) => await executeCommandRequest( CLIENT_COMMANDS.pinch, @@ -458,15 +452,6 @@ function elementPositionals(options: ElementTarget): string[] { return [options.selector]; } -function scrollIntoViewPositionals(options: { - text?: string; - ref?: string; - label?: string; -}): string[] { - if (options.ref !== undefined) return [options.ref, ...optionalString(options.label)]; - return [options.text ?? '']; -} - function stringifyPayload(payload: AppPushOptions['payload']): string { return typeof payload === 'string' ? payload : JSON.stringify(payload); } @@ -615,7 +600,6 @@ export type { ReplayTestOptions, RotateCommandOptions, RotateCommandResult, - ScrollIntoViewOptions, ScrollOptions, SessionCloseResult, SettingsUpdateOptions, diff --git a/src/core/__tests__/capabilities.test.ts b/src/core/__tests__/capabilities.test.ts index 5d73f28ea..14d3d32d6 100644 --- a/src/core/__tests__/capabilities.test.ts +++ b/src/core/__tests__/capabilities.test.ts @@ -161,7 +161,6 @@ test('core commands support iOS simulator, iOS device, and Android', () => { 'rotate', 'screenshot', 'scroll', - 'scrollintoview', 'snapshot', 'trigger-app-event', 'type', @@ -200,7 +199,6 @@ test('macOS supports the Apple runner interaction core but excludes mobile-only 'settings', 'screenshot', 'scroll', - 'scrollintoview', 'snapshot', 'swipe', 'trigger-app-event', @@ -310,7 +308,6 @@ test('Linux supports desktop interaction commands and blocks mobile/unsupported 'record', 'reinstall', 'rotate', - 'scrollintoview', 'settings', 'trigger-app-event', ], diff --git a/src/core/capabilities.ts b/src/core/capabilities.ts index e5de56139..18e74d485 100644 --- a/src/core/capabilities.ts +++ b/src/core/capabilities.ts @@ -203,11 +203,6 @@ const COMMAND_CAPABILITY_MATRIX: Record = { android: { emulator: true, device: true, unknown: true }, linux: LINUX_DEVICE, }, - scrollintoview: { - apple: { simulator: true, device: true }, - android: { emulator: true, device: true, unknown: true }, - linux: LINUX_NONE, - }, swipe: { apple: { simulator: true, device: true }, android: { emulator: true, device: true, unknown: true }, diff --git a/src/core/dispatch.ts b/src/core/dispatch.ts index c6420bb5b..79c9b67c1 100644 --- a/src/core/dispatch.ts +++ b/src/core/dispatch.ts @@ -73,7 +73,6 @@ type DispatchContext = { backMode?: 'in-app' | 'system'; pauseMs?: number; pattern?: 'one-way' | 'ping-pong'; - maxScrolls?: number; surface?: SessionSurface; }; @@ -166,21 +165,6 @@ export async function dispatchCommand( } case 'scroll': return handleScrollCommand(interactor, positionals, context); - case 'scrollintoview': { - const text = positionals.join(' ').trim(); - if (!text) throw new AppError('INVALID_ARGS', 'scrollintoview requires text'); - const result = await interactor.scrollIntoView(text, { - maxScrolls: context?.maxScrolls, - }); - if (typeof result?.attempts === 'number') { - return { - text, - attempts: result.attempts, - ...successText(`Scrolled into view: ${text}`), - }; - } - return { text, ...successText(`Scrolled into view: ${text}`) }; - } case 'pinch': return handlePinchCommand(device, positionals, context, runnerCtx); case 'trigger-app-event': { diff --git a/src/core/interactors.ts b/src/core/interactors.ts index 3ea0946f1..5d240dafb 100644 --- a/src/core/interactors.ts +++ b/src/core/interactors.ts @@ -17,7 +17,6 @@ import { rotateAndroid, swipeAndroid, scrollAndroid, - scrollIntoViewAndroid, screenshotAndroid, setAndroidSetting, typeAndroid, @@ -104,10 +103,6 @@ export type Interactor = { direction: ScrollDirection, options?: { amount?: number; pixels?: number }, ): Promise | void>; - scrollIntoView( - text: string, - options?: { maxScrolls?: number }, - ): Promise<{ attempts?: number } | void>; screenshot(outPath: string, options?: ScreenshotOptions): Promise; back(mode?: BackMode): Promise; home(): Promise; @@ -141,7 +136,6 @@ export function getInteractor(device: DeviceInfo, runnerContext: RunnerContext): type: (text, delayMs) => typeAndroid(device, text, delayMs), fill: (x, y, text, delayMs) => fillAndroid(device, x, y, text, delayMs), scroll: (direction, options) => scrollAndroid(device, direction, options), - scrollIntoView: (text, options) => scrollIntoViewAndroid(device, text, options), screenshot: (outPath) => screenshotAndroid(device, outPath), back: (_mode) => backAndroid(device), home: () => homeAndroid(device), @@ -165,9 +159,6 @@ export function getInteractor(device: DeviceInfo, runnerContext: RunnerContext): type: (text, delayMs) => typeLinux(text, delayMs), fill: (x, y, text, delayMs) => fillLinux(x, y, text, delayMs), scroll: (direction, options) => scrollLinux(direction, options), - scrollIntoView: () => { - throw new AppError('UNSUPPORTED_OPERATION', 'scrollIntoView not yet supported on Linux'); - }, screenshot: (outPath) => screenshotLinux(outPath), back: () => backLinux(), home: () => homeLinux(), diff --git a/src/daemon/__tests__/scroll-planner.test.ts b/src/daemon/__tests__/scroll-planner.test.ts deleted file mode 100644 index c6ed9a65f..000000000 --- a/src/daemon/__tests__/scroll-planner.test.ts +++ /dev/null @@ -1,84 +0,0 @@ -import { test } from 'vitest'; -import assert from 'node:assert/strict'; -import { type RawSnapshotNode } from '../../utils/snapshot.ts'; -import { - buildScrollIntoViewPlan, - distanceFromSafeViewportBand, - isRectVisibleInViewport, - isRectWithinSafeViewportBand, - resolveViewportRect, -} from '../scroll-planner.ts'; - -function makeNode(index: number, type: string, rect?: RawSnapshotNode['rect']): RawSnapshotNode { - return { index, type, rect }; -} - -test('resolveViewportRect picks containing application/window viewport', () => { - const targetRect = { x: 20, y: 1700, width: 120, height: 40 }; - const nodes: RawSnapshotNode[] = [ - makeNode(0, 'Application', { x: 0, y: 0, width: 390, height: 844 }), - makeNode(1, 'Window', { x: 0, y: 0, width: 390, height: 844 }), - makeNode(2, 'Cell', targetRect), - ]; - const viewport = resolveViewportRect(nodes, targetRect); - assert.deepEqual(viewport, { x: 0, y: 0, width: 390, height: 844 }); -}); - -test('resolveViewportRect returns null when no valid viewport can be inferred', () => { - const targetRect = { x: 20, y: 100, width: 120, height: 40 }; - const nodes: RawSnapshotNode[] = [makeNode(0, 'Cell', undefined)]; - const viewport = resolveViewportRect(nodes, targetRect); - assert.equal(viewport, null); -}); - -test('buildScrollIntoViewPlan computes downward content scroll when target is below safe band', () => { - const targetRect = { x: 20, y: 2100, width: 120, height: 40 }; - const viewportRect = { x: 0, y: 0, width: 390, height: 844 }; - const plan = buildScrollIntoViewPlan(targetRect, viewportRect); - assert.ok(plan); - assert.equal(plan?.direction, 'down'); - assert.equal(plan?.x, 80); - assert.equal(plan?.startY, 726); - assert.equal(plan?.endY, 118); -}); - -test('buildScrollIntoViewPlan returns null when already in safe viewport band', () => { - const targetRect = { x: 20, y: 320, width: 120, height: 40 }; - const viewportRect = { x: 0, y: 0, width: 390, height: 844 }; - const plan = buildScrollIntoViewPlan(targetRect, viewportRect); - assert.equal(plan, null); - assert.equal(isRectWithinSafeViewportBand(targetRect, viewportRect), true); - assert.equal(distanceFromSafeViewportBand(targetRect, viewportRect), 0); -}); - -test('buildScrollIntoViewPlan keeps swipe lane inside viewport when target center is out of bounds', () => { - const targetRect = { x: 1000, y: 2100, width: 120, height: 40 }; - const viewportRect = { x: 0, y: 0, width: 390, height: 844 }; - const plan = buildScrollIntoViewPlan(targetRect, viewportRect); - assert.ok(plan); - assert.equal(plan?.x, 351); -}); - -test('distanceFromSafeViewportBand reports pixels outside the safe band', () => { - const viewportRect = { x: 0, y: 0, width: 390, height: 844 }; - assert.equal( - distanceFromSafeViewportBand({ x: 20, y: 2100, width: 120, height: 40 }, viewportRect) > 0, - true, - ); - assert.equal( - distanceFromSafeViewportBand({ x: 20, y: -200, width: 120, height: 40 }, viewportRect) > 0, - true, - ); -}); - -test('isRectVisibleInViewport treats zero-height lines inside the viewport as visible', () => { - const viewportRect = { x: 0, y: 0, width: 390, height: 844 }; - assert.equal( - isRectVisibleInViewport({ x: 20, y: 378, width: 120, height: 0 }, viewportRect), - true, - ); - assert.equal( - isRectVisibleInViewport({ x: 20, y: 1200, width: 120, height: 0 }, viewportRect), - false, - ); -}); diff --git a/src/daemon/context.ts b/src/daemon/context.ts index c4a063f94..8e9a68b07 100644 --- a/src/daemon/context.ts +++ b/src/daemon/context.ts @@ -27,7 +27,6 @@ export type DaemonCommandContext = { backMode?: 'in-app' | 'system'; pauseMs?: number; pattern?: 'one-way' | 'ping-pong'; - maxScrolls?: number; surface?: SessionSurface; }; @@ -63,6 +62,5 @@ export function contextFromFlags( backMode: flags?.backMode, pauseMs: flags?.pauseMs, pattern: flags?.pattern, - maxScrolls: flags?.maxScrolls, }; } diff --git a/src/daemon/handlers/__tests__/interaction.test.ts b/src/daemon/handlers/__tests__/interaction.test.ts index 7e55a7de0..1fde8e567 100644 --- a/src/daemon/handlers/__tests__/interaction.test.ts +++ b/src/daemon/handlers/__tests__/interaction.test.ts @@ -11,7 +11,6 @@ import { makeAndroidSession as makeBaseAndroidSession, makeMacOsSession as makeBaseMacOsSession, } from '../../../__tests__/test-utils/session-factories.ts'; -import { makeSnapshotState } from '../../../__tests__/test-utils/snapshot-builders.ts'; vi.mock('../../../core/dispatch.ts', async (importOriginal) => { const actual = await importOriginal(); @@ -87,21 +86,6 @@ function makeAndroidSession(name: string): SessionState { return makeBaseAndroidSession(name, { appBundleId: 'com.android.settings' }); } -function makeScrollSnapshot(nodes: Parameters[0]) { - return makeSnapshotState(nodes, { backend: 'xctest' }); -} - -function makeScrollSession( - sessionStore: SessionStore, - sessionName: string, - nodes: Parameters[0], -): SessionState { - const session = makeSession(sessionName); - session.snapshot = makeScrollSnapshot(nodes); - sessionStore.set(sessionName, session); - return session; -} - function makeMacOsDesktopSession(name: string): SessionState { return makeBaseMacOsSession(name, { surface: 'desktop' }); } @@ -1303,7 +1287,7 @@ test('press @ref fails fast when the target is off-screen', async () => { if (response && !response.ok) { expect(response.error.code).toBe('COMMAND_FAILED'); expect(response.error.message).toMatch(/off-screen/i); - expect(response.error.hint).toMatch(/scrollintoview @e2/i); + expect(response.error.hint).toMatch(/scroll.*fresh snapshot/i); expect(response.error.details?.reason).toBe('offscreen_ref'); } }); @@ -1431,7 +1415,7 @@ test('fill @ref fails fast when the target is off-screen', async () => { if (response && !response.ok) { expect(response.error.code).toBe('COMMAND_FAILED'); expect(response.error.message).toMatch(/off-screen/i); - expect(response.error.hint).toMatch(/scrollintoview @e2/i); + expect(response.error.hint).toMatch(/scroll.*fresh snapshot/i); expect(response.error.details?.reason).toBe('offscreen_ref'); } }); @@ -1540,388 +1524,6 @@ test('press coordinates does not treat extra trailing args as selector', async ( expect(sessionStore.get(sessionName)?.actions.length).toBe(1); }); -test('scrollintoview @ref dispatches geometry-based swipe series with verification snapshots', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 2600, width: 120, height: 40 }, - }, - ]); - - let snapshotCallCount = 0; - mockCaptureSnapshotForSession.mockImplementation(async (activeSession) => { - snapshotCallCount += 1; - activeSession.snapshot = makeScrollSnapshot([ - { index: 0, type: 'Application', rect: { x: 0, y: 0, width: 390, height: 844 } }, - ...(snapshotCallCount === 1 - ? [ - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Inserted item', - rect: { x: 20, y: 900, width: 120, height: 40 }, - }, - ] - : []), - { - index: snapshotCallCount === 1 ? 2 : 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: - snapshotCallCount === 1 - ? { x: 20, y: 1300, width: 120, height: 40 } - : { x: 20, y: 320, width: 120, height: 40 }, - }, - ]); - return activeSession.snapshot; - }); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(true); - expect(mockDispatch).toHaveBeenCalledTimes(2); - expect(mockDispatch.mock.calls[0]?.[1]).toBe('swipe'); - expect(mockDispatch.mock.calls[0]?.[2]?.length).toBe(5); - const context = mockDispatch.mock.calls[0]?.[4] as Record | undefined; - expect(context?.pattern).toBe('one-way'); - expect(context?.pauseMs).toBe(0); - expect(context?.count).toBe(1); - expect(mockDispatch.mock.calls[1]?.[1]).toBe('swipe'); - expect(snapshotCallCount).toBe(2); - - const stored = sessionStore.get(sessionName); - expect(stored).toBeTruthy(); - expect(stored?.actions.length).toBe(1); - expect(stored?.actions[0]?.command).toBe('scrollintoview'); - const result = (stored?.actions[0]?.result ?? {}) as Record; - expect(result.ref).toBe('e2'); - expect(result.attempts).toBe(2); -}); - -test('scrollintoview @ref returns the refreshed visible ref after scrolling', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 2600, width: 120, height: 40 }, - }, - ]); - - mockCaptureSnapshotForSession.mockImplementation(async (activeSession) => { - activeSession.snapshot = makeScrollSnapshot([ - { index: 0, type: 'Application', rect: { x: 0, y: 0, width: 390, height: 844 } }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Sticky helper', - rect: { x: 20, y: 160, width: 140, height: 40 }, - }, - { - index: 2, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 320, width: 120, height: 40 }, - }, - ]); - return activeSession.snapshot; - }); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(true); - if (response?.ok) { - expect(response.data?.ref).toBe('e2'); - expect(response.data?.currentRef).toBe('e3'); - } - - const stored = sessionStore.get(sessionName); - const result = (stored?.actions[0]?.result ?? {}) as Record; - expect(result.ref).toBe('e2'); - expect(result.currentRef).toBe('e3'); -}); - -test('scrollintoview @ref returns immediately when target is already in viewport safe band', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Visible item', - rect: { x: 20, y: 320, width: 120, height: 40 }, - }, - ]); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(true); - expect(mockDispatch).not.toHaveBeenCalled(); - if (response?.ok) { - expect(response.data?.attempts).toBe(0); - expect(response.data?.alreadyVisible).toBe(true); - } -}); - -test('scrollintoview @ref missing from snapshot reports structured not-found details', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - ]); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(false); - if (response && !response.ok) { - expect(response.error.message).toMatch(/not found/i); - expect(response.error.details?.reason).toBe('not_found'); - expect(response.error.details?.attempts).toBe(0); - } -}); - -test('scrollintoview @ref tolerates a single overshoot and recovers on the next swipe', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Edge item', - rect: { x: 20, y: 700, width: 120, height: 40 }, - }, - ]); - - let snapshotCallCount = 0; - mockCaptureSnapshotForSession.mockImplementation(async (activeSession) => { - snapshotCallCount += 1; - activeSession.snapshot = makeScrollSnapshot([ - { index: 0, type: 'Application', rect: { x: 0, y: 0, width: 390, height: 844 } }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Edge item', - rect: - snapshotCallCount === 1 - ? { x: 20, y: 0, width: 120, height: 40 } - : { x: 20, y: 320, width: 120, height: 40 }, - }, - ]); - return activeSession.snapshot; - }); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(true); - expect(snapshotCallCount).toBe(2); - if (response?.ok) { - expect(response.data?.attempts).toBe(2); - } -}); - -test('scrollintoview @ref stops when post-scroll snapshots make no progress', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 2600, width: 120, height: 40 }, - }, - ]); - - let snapshotCallCount = 0; - mockCaptureSnapshotForSession.mockImplementation(async (activeSession) => { - snapshotCallCount += 1; - activeSession.snapshot = makeScrollSnapshot([ - { index: 0, type: 'Application', rect: { x: 0, y: 0, width: 390, height: 844 } }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 2600, width: 120, height: 40 }, - }, - ]); - return activeSession.snapshot; - }); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: {}, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(false); - expect(snapshotCallCount).toBe(2); - if (response && !response.ok) { - expect(response.error.message).toMatch(/made no progress/i); - expect(response.error.details?.reason).toBe('not_found'); - expect(response.error.details?.attempts).toBe(2); - } -}); - -test('scrollintoview @ref respects --max-scrolls before failing not found', async () => { - const sessionStore = makeSessionStore(); - const sessionName = 'default'; - makeScrollSession(sessionStore, sessionName, [ - { - index: 0, - type: 'Application', - rect: { x: 0, y: 0, width: 390, height: 844 }, - }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: { x: 20, y: 2600, width: 120, height: 40 }, - }, - ]); - - let snapshotCallCount = 0; - mockCaptureSnapshotForSession.mockImplementation(async (activeSession) => { - snapshotCallCount += 1; - activeSession.snapshot = makeScrollSnapshot([ - { index: 0, type: 'Application', rect: { x: 0, y: 0, width: 390, height: 844 } }, - { - index: 1, - type: 'XCUIElementTypeStaticText', - label: 'Far item', - rect: - snapshotCallCount === 1 - ? { x: 20, y: 1900, width: 120, height: 40 } - : { x: 20, y: 1200, width: 120, height: 40 }, - }, - ]); - return activeSession.snapshot; - }); - - const response = await handleInteractionCommands({ - req: { - token: 't', - session: sessionName, - command: 'scrollintoview', - positionals: ['@e2'], - flags: { maxScrolls: 2 }, - }, - sessionName, - sessionStore, - contextFromFlags, - }); - - expect(response).toBeTruthy(); - expect(response?.ok).toBe(false); - expect(snapshotCallCount).toBe(2); - if (response && !response.ok) { - expect(response.error.message).toMatch(/--max-scrolls=2/); - expect(response.error.details?.reason).toBe('not_found'); - expect(response.error.details?.attempts).toBe(2); - } -}); - test('is visible captures one snapshot before evaluating selector predicate', async () => { const sessionStore = makeSessionStore(); const sessionName = 'default'; diff --git a/src/daemon/handlers/interaction-flags.ts b/src/daemon/handlers/interaction-flags.ts index 80612ec4e..891bc052c 100644 --- a/src/daemon/handlers/interaction-flags.ts +++ b/src/daemon/handlers/interaction-flags.ts @@ -9,7 +9,7 @@ const REF_UNSUPPORTED_FLAG_MAP: ReadonlyArray<[keyof CommandFlags, string]> = [ ]; export function refSnapshotFlagGuardResponse( - command: 'press' | 'fill' | 'get' | 'scrollintoview', + command: 'press' | 'fill' | 'get', flags: CommandFlags | undefined, ): DaemonResponse | null { const unsupported = unsupportedRefSnapshotFlags(flags); diff --git a/src/daemon/handlers/interaction-scroll.ts b/src/daemon/handlers/interaction-scroll.ts deleted file mode 100644 index 6432087de..000000000 --- a/src/daemon/handlers/interaction-scroll.ts +++ /dev/null @@ -1,344 +0,0 @@ -import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; -import { dispatchCommand } from '../../core/dispatch.ts'; -import { DEFAULT_SCROLL_INTO_VIEW_MAX_SCROLLS } from '../../utils/scroll-into-view.ts'; -import type { SnapshotNode } from '../../utils/snapshot.ts'; -import { successText } from '../../utils/success-text.ts'; -import { - buildSelectorChainForNode, - parseSelectorChain, - resolveSelectorChain, -} from '../selectors.ts'; -import { findNodeByLabel, resolveRefLabel } from '../snapshot-processing.ts'; -import { - buildScrollIntoViewPlan, - distanceFromSafeViewportBand, - resolveViewportRect, -} from '../scroll-planner.ts'; -import type { DaemonResponse, SessionState } from '../types.ts'; -import type { InteractionHandlerParams } from './interaction-common.ts'; -import { refSnapshotFlagGuardResponse } from './interaction-flags.ts'; -import { captureSnapshotForSession } from './interaction-snapshot.ts'; -import { resolveRefTarget } from './interaction-targeting.ts'; -import { errorResponse, type DaemonFailureResponse } from './response.ts'; - -type ScrollRefState = { - ref: string; - currentRef: string; - node: SnapshotNode & { rect: NonNullable }; - snapshotNodes: SnapshotNode[]; - viewportRect: NonNullable>; -}; - -type ScrollNotFoundDetails = { - message?: string; - ref?: string; - stalled?: boolean; - maxScrolls?: number; -}; - -export async function handleScrollIntoViewCommand( - params: InteractionHandlerParams, -): Promise { - const { req, sessionName, sessionStore, contextFromFlags } = params; - const session = sessionStore.get(sessionName); - if (!session) { - return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); - } - if (!isCommandSupportedOnDevice('scrollintoview', session.device)) { - return errorResponse('UNSUPPORTED_OPERATION', 'scrollintoview is not supported on this device'); - } - const targetInput = req.positionals?.[0] ?? ''; - if (!targetInput.startsWith('@')) { - return null; - } - const invalidRefFlagsResponse = refSnapshotFlagGuardResponse('scrollintoview', req.flags); - if (invalidRefFlagsResponse) return invalidRefFlagsResponse; - const fallbackLabel = - req.positionals && req.positionals.length > 1 ? req.positionals.slice(1).join(' ').trim() : ''; - const initialState = resolveInitialScrollRefState(session, targetInput, fallbackLabel); - if (!initialState.ok) return initialState; - - const { ref } = initialState.state; - let { currentRef, node, snapshotNodes, viewportRect } = initialState.state; - const refLabel = resolveRefLabel(node, snapshotNodes); - const selectorChain = buildSelectorChainForNode(node, session.device.platform, { - action: 'get', - }); - const trackingLabel = fallbackLabel || refLabel || node.label || ''; - - if (!buildScrollIntoViewPlan(node.rect, viewportRect)) { - const result = buildScrollIntoViewSuccessData({ - ref, - currentRef, - attempts: 0, - alreadyVisible: true, - }); - sessionStore.recordAction(session, { - command: req.command, - positionals: req.positionals ?? [], - flags: req.flags ?? {}, - result: { - refLabel, - selectorChain, - ...result, - }, - }); - return { ok: true, data: result }; - } - const maxScrolls = req.flags?.maxScrolls ?? DEFAULT_SCROLL_INTO_VIEW_MAX_SCROLLS; - let attempts = 0; - let stalledCount = 0; - let lastDirection: 'up' | 'down' | undefined; - let lastDistance = distanceFromSafeViewportBand(node.rect, viewportRect); - let data: Record | void = undefined; - - while (attempts < maxScrolls) { - const plan = buildScrollIntoViewPlan(node.rect, viewportRect); - if (!plan) break; - lastDirection = plan.direction; - data = await dispatchCommand( - session.device, - 'swipe', - [String(plan.x), String(plan.startY), String(plan.x), String(plan.endY), '16'], - req.flags?.out, - { - ...contextFromFlags(req.flags, session.appBundleId, session.trace?.outPath), - count: 1, - pauseMs: 0, - pattern: 'one-way', - }, - ); - attempts += 1; - - await captureSnapshotForSession(session, req.flags, sessionStore, contextFromFlags, { - interactiveOnly: true, - }); - const refreshedState = resolveRefreshedScrollRefState({ - session, - targetInput, - fallbackLabel: trackingLabel, - attempts, - ref, - selectorChain, - platform: session.device.platform, - }); - if (!refreshedState.ok) return refreshedState; - ({ currentRef, node, snapshotNodes, viewportRect } = refreshedState.state); - - const distance = distanceFromSafeViewportBand(node.rect, viewportRect); - if (distance === 0) break; - // If the target didn't get closer after a scroll, count it as a stall. Two - // consecutive stalls (no progress in either attempt) indicate the element is - // likely unreachable — e.g. inside a non-scrollable container or already at - // the scroll boundary. - if (distance >= lastDistance) { - stalledCount += 1; - if (stalledCount >= 2) { - return notFoundScrollResponse(targetInput, attempts, { - message: `scrollintoview made no progress toward ${targetInput} after ${attempts} scroll${attempts === 1 ? '' : 's'}`, - ref, - stalled: true, - }); - } - } else { - stalledCount = 0; - } - lastDistance = distance; - } - - if (distanceFromSafeViewportBand(node.rect, viewportRect) > 0) { - return notFoundScrollResponse(targetInput, attempts, { - message: `scrollintoview reached --max-scrolls=${maxScrolls} before ${targetInput} entered view`, - ref, - maxScrolls, - }); - } - - const result = buildScrollIntoViewSuccessData({ - data, - ref, - currentRef, - attempts, - direction: lastDirection, - }); - sessionStore.recordAction(session, { - command: req.command, - positionals: req.positionals ?? [], - flags: req.flags ?? {}, - result: { - refLabel, - selectorChain, - ...result, - }, - }); - return { ok: true, data: result }; -} - -function resolveInitialScrollRefState( - session: SessionState, - targetInput: string, - fallbackLabel: string, -): { ok: true; state: ScrollRefState } | DaemonFailureResponse { - const resolvedRefTarget = resolveRefTarget({ - session, - refInput: targetInput, - fallbackLabel, - requireRect: true, - invalidRefMessage: 'scrollintoview requires a ref like @e2', - notFoundMessage: `Ref ${targetInput} not found or has no bounds`, - }); - if (!resolvedRefTarget.ok) { - if (resolvedRefTarget.error.code !== 'COMMAND_FAILED') { - return resolvedRefTarget; - } - return notFoundScrollResponse(targetInput, 0, { - message: resolvedRefTarget.error.message, - }); - } - return finalizeScrollRefState(targetInput, 0, resolvedRefTarget.target); -} - -function resolveRefreshedScrollRefState(params: { - session: SessionState; - targetInput: string; - fallbackLabel: string; - attempts: number; - ref: string; - selectorChain: string[]; - platform: SessionState['device']['platform']; -}): { ok: true; state: ScrollRefState } | DaemonFailureResponse { - const { session, targetInput, fallbackLabel, attempts, ref, selectorChain, platform } = params; - if (session.snapshot) { - const trackedNode = resolveTrackedScrollNode( - session.snapshot.nodes, - selectorChain, - fallbackLabel, - platform, - ); - if (trackedNode) { - return finalizeScrollRefState( - targetInput, - attempts, - { - ref, - node: trackedNode, - snapshotNodes: session.snapshot.nodes, - }, - { currentRef: trackedNode.ref }, - ); - } - } - - const resolvedRefTarget = resolveRefTarget({ - session, - refInput: targetInput, - fallbackLabel, - requireRect: true, - invalidRefMessage: 'scrollintoview requires a ref like @e2', - notFoundMessage: `Ref ${targetInput} not found or has no bounds`, - }); - if (!resolvedRefTarget.ok) { - if (resolvedRefTarget.error.code !== 'COMMAND_FAILED') { - return resolvedRefTarget; - } - return notFoundScrollResponse(targetInput, attempts, { - message: `scrollintoview lost track of ${targetInput} after ${attempts} scroll${attempts === 1 ? '' : 's'}`, - ref, - }); - } - return finalizeScrollRefState(targetInput, attempts, resolvedRefTarget.target, { - ref, - currentRef: resolvedRefTarget.target.node.ref, - missingBoundsMessage: `scrollintoview lost bounds for ${targetInput} after ${attempts} scroll${attempts === 1 ? '' : 's'}`, - }); -} - -function finalizeScrollRefState( - targetInput: string, - attempts: number, - resolvedTarget: { ref: string; node: SnapshotNode; snapshotNodes: SnapshotNode[] }, - options: { ref?: string; currentRef?: string; missingBoundsMessage?: string } = {}, -): { ok: true; state: ScrollRefState } | DaemonFailureResponse { - const { ref, currentRef, missingBoundsMessage } = options; - const node = resolvedTarget.node; - if (!node.rect) { - return notFoundScrollResponse(targetInput, attempts, { - message: missingBoundsMessage ?? `Ref ${targetInput} not found or has no bounds`, - ref: ref ?? resolvedTarget.ref, - }); - } - const viewportRect = resolveViewportRect(resolvedTarget.snapshotNodes, node.rect); - if (!viewportRect) { - return errorResponse( - 'COMMAND_FAILED', - `scrollintoview could not infer viewport for ${targetInput}`, - ); - } - return { - ok: true, - state: { - ref: ref ?? resolvedTarget.ref, - currentRef: currentRef ?? resolvedTarget.node.ref, - node: node as ScrollRefState['node'], - snapshotNodes: resolvedTarget.snapshotNodes, - viewportRect, - }, - }; -} - -function resolveTrackedScrollNode( - nodes: SnapshotNode[], - selectorChain: string[], - fallbackLabel: string, - platform: SessionState['device']['platform'], -): SnapshotNode | null { - for (const selectorExpression of selectorChain) { - const resolved = resolveSelectorChain(nodes, parseSelectorChain(selectorExpression), { - platform, - requireRect: true, - requireUnique: true, - disambiguateAmbiguous: true, - }); - if (resolved?.node.rect) { - return resolved.node; - } - } - return fallbackLabel ? findNodeByLabel(nodes, fallbackLabel) : null; -} - -function notFoundScrollResponse( - targetInput: string, - attempts: number, - details: ScrollNotFoundDetails = {}, -): DaemonFailureResponse { - const { message, ...rest } = details; - return errorResponse( - 'COMMAND_FAILED', - typeof message === 'string' ? message : `scrollintoview could not find ${targetInput}`, - { - reason: 'not_found', - attempts, - ...rest, - }, - ); -} - -function buildScrollIntoViewSuccessData(params: { - data?: Record | void; - ref: string; - currentRef: string; - attempts: number; - alreadyVisible?: boolean; - direction?: 'up' | 'down'; -}): Record { - const { data, ref, currentRef, attempts, alreadyVisible, direction } = params; - return { - ...(data ?? {}), - ref, - currentRef, - attempts, - ...(alreadyVisible ? { alreadyVisible } : {}), - ...(direction ? { direction } : {}), - ...successText(`Scrolled into view: @${ref}`), - }; -} diff --git a/src/daemon/handlers/interaction-targeting.ts b/src/daemon/handlers/interaction-targeting.ts index d2b529efe..28a2de738 100644 --- a/src/daemon/handlers/interaction-targeting.ts +++ b/src/daemon/handlers/interaction-targeting.ts @@ -189,7 +189,7 @@ export async function resolveRefTargetWithRectRefresh(params: { error: { code: 'COMMAND_FAILED', message: `Ref ${refInput} is off-screen and not safe to ${commandLabel}`, - hint: `Run scrollintoview ${refInput}, then retry ${commandLabel} with the returned currentRef or a fresh snapshot.`, + hint: `Use scroll with the direction from the off-screen summary, take a fresh snapshot, then retry ${commandLabel} with the new ref or a selector.`, details: { reason: 'offscreen_ref', ref, diff --git a/src/daemon/handlers/interaction.ts b/src/daemon/handlers/interaction.ts index 2b450e694..972cf5765 100644 --- a/src/daemon/handlers/interaction.ts +++ b/src/daemon/handlers/interaction.ts @@ -3,7 +3,6 @@ import type { InteractionHandlerParams } from './interaction-common.ts'; import { handleTouchInteractionCommands } from './interaction-touch.ts'; import { handleGetCommand } from './interaction-get.ts'; import { handleIsCommand } from './interaction-is.ts'; -import { handleScrollIntoViewCommand } from './interaction-scroll.ts'; import { captureSnapshotForSession } from './interaction-snapshot.ts'; import { resolveRefTarget } from './interaction-targeting.ts'; import { refSnapshotFlagGuardResponse } from './interaction-flags.ts'; @@ -28,8 +27,6 @@ export async function handleInteractionCommands( return await handleGetCommand(params); case 'is': return await handleIsCommand(params); - case 'scrollintoview': - return await handleScrollIntoViewCommand(params); default: return null; } diff --git a/src/daemon/scroll-planner.ts b/src/daemon/scroll-planner.ts deleted file mode 100644 index 4f9249ea6..000000000 --- a/src/daemon/scroll-planner.ts +++ /dev/null @@ -1,74 +0,0 @@ -import type { Rect } from '../utils/snapshot.ts'; -import { - distanceFromSafeViewportBand, - isRectVisibleInViewport, - isRectWithinSafeViewportBand, - resolveViewportRect, -} from '../utils/rect-visibility.ts'; - -type ScrollIntoViewPlan = { - x: number; - startY: number; - endY: number; - direction: 'up' | 'down'; -}; - -export { resolveViewportRect, isRectVisibleInViewport, isRectWithinSafeViewportBand }; -export { distanceFromSafeViewportBand }; - -export function buildScrollIntoViewPlan( - targetRect: Rect, - viewportRect: Rect, -): ScrollIntoViewPlan | null { - const viewportHeight = Math.max(1, viewportRect.height); - const viewportWidth = Math.max(1, viewportRect.width); - const viewportTop = viewportRect.y; - const viewportBottom = viewportRect.y + viewportHeight; - const viewportLeft = viewportRect.x; - const viewportRight = viewportRect.x + viewportWidth; - // The "safe band" is the middle 50% of the viewport — elements inside this band - // are considered comfortably visible. The 25% margins on each side account for - // toolbars, nav bars, and partially clipped content that may overlap the edges. - const safeTop = viewportTop + viewportHeight * 0.25; - const safeBottom = viewportBottom - viewportHeight * 0.25; - // Keep the swipe lane at least 8 px or 10% of viewport width from the edge to - // avoid triggering system edge gestures (iOS swipe-back, Android nav drawer). - const lanePaddingPx = Math.max(8, viewportWidth * 0.1); - const targetCenterY = targetRect.y + targetRect.height / 2; - const targetCenterX = targetRect.x + targetRect.width / 2; - - if (targetCenterY >= safeTop && targetCenterY <= safeBottom) { - return null; - } - - const x = Math.round( - clamp(targetCenterX, viewportLeft + lanePaddingPx, viewportRight - lanePaddingPx), - ); - // Drag from 86% to 14% of viewport height (~72% travel) to produce a reliable - // scroll gesture. Starting/ending too close to the edges risks triggering - // notification shade or home-indicator areas on modern devices. - const dragUpStartY = Math.round(viewportTop + viewportHeight * 0.86); - const dragUpEndY = Math.round(viewportTop + viewportHeight * 0.14); - const dragDownStartY = dragUpEndY; - const dragDownEndY = dragUpStartY; - - if (targetCenterY > safeBottom) { - return { - x, - startY: dragUpStartY, - endY: dragUpEndY, - direction: 'down', - }; - } - - return { - x, - startY: dragDownStartY, - endY: dragDownEndY, - direction: 'up', - }; -} - -function clamp(value: number, min: number, max: number): number { - return Math.min(max, Math.max(min, value)); -} diff --git a/src/index.ts b/src/index.ts index 89720f565..60258b5f4 100644 --- a/src/index.ts +++ b/src/index.ts @@ -74,7 +74,6 @@ export type { ReplayTestOptions, RotateCommandOptions, RotateCommandResult, - ScrollIntoViewOptions, ScrollOptions, SessionCloseResult, SettingsUpdateOptions, diff --git a/src/platforms/SNAPSHOT_CONTRACT.md b/src/platforms/SNAPSHOT_CONTRACT.md index dcab0b469..ef33c47be 100644 --- a/src/platforms/SNAPSHOT_CONTRACT.md +++ b/src/platforms/SNAPSHOT_CONTRACT.md @@ -91,14 +91,13 @@ Linux supports: `back`, `click`, `close`, `diff`, `fill`, `find`, `focus`, Not supported (blocked at capability level): `alert`, `app-switcher`, `apps`, `boot`, `install`, `keyboard`, `logs`, `network`, `perf`, `pinch`, -`push`, `record`, `reinstall`, `rotate`, `scrollintoview`, `settings`, +`push`, `record`, `reinstall`, `rotate`, `settings`, `trigger-app-event`. ### Known limitations - Input synthesis uses `xdotool` (X11) or `ydotool` (Wayland) — availability depends on the desktop environment. - On Wayland without `ydotool`, falls back to `xdotool` with a diagnostic warning (may not work). -- `scrollIntoView` is not yet implemented. - Clipboard requires `xclip`/`xsel` (X11) or `wl-copy`/`wl-paste` (Wayland). - Settings operations are not supported. diff --git a/src/platforms/android/index.ts b/src/platforms/android/index.ts index d52e98cd4..b2443fd2a 100644 --- a/src/platforms/android/index.ts +++ b/src/platforms/android/index.ts @@ -29,7 +29,6 @@ export { fillAndroid, readAndroidTextAtPoint, scrollAndroid, - scrollIntoViewAndroid, getAndroidScreenSize, } from './input-actions.ts'; diff --git a/src/platforms/android/input-actions.ts b/src/platforms/android/input-actions.ts index 3f70bdf1d..1a906647b 100644 --- a/src/platforms/android/input-actions.ts +++ b/src/platforms/android/input-actions.ts @@ -3,8 +3,7 @@ import { AppError } from '../../utils/errors.ts'; import type { DeviceInfo } from '../../utils/device.ts'; import type { DeviceRotation } from '../../core/device-rotation.ts'; import { buildScrollGesturePlan, type ScrollDirection } from '../../core/scroll-gesture.ts'; -import { DEFAULT_ANDROID_SCROLL_INTO_VIEW_MAX_SCROLLS } from '../../utils/scroll-into-view.ts'; -import { findBounds, parseBounds, readNodeAttributes } from './ui-hierarchy.ts'; +import { parseBounds, readNodeAttributes } from './ui-hierarchy.ts'; import { dumpUiHierarchy } from './snapshot.ts'; import { adbArgs, isClipboardShellUnsupported, sleep } from './adb.ts'; @@ -249,49 +248,6 @@ export async function scrollAndroid( return plan; } -export async function scrollIntoViewAndroid( - device: DeviceInfo, - text: string, - options?: { maxScrolls?: number }, -): Promise<{ attempts: number }> { - const maxScrolls = options?.maxScrolls ?? DEFAULT_ANDROID_SCROLL_INTO_VIEW_MAX_SCROLLS; - let previousXml = ''; - - try { - previousXml = await dumpUiHierarchy(device); - } catch (err) { - const message = err instanceof Error ? err.message : String(err); - throw new AppError('UNSUPPORTED_OPERATION', `uiautomator dump failed: ${message}`); - } - if (findBounds(previousXml, text)) return { attempts: 0 }; - - for (let attempts = 1; attempts <= maxScrolls; attempts += 1) { - await scrollAndroid(device, 'down', { amount: 0.5 }); - - let xml = ''; - try { - xml = await dumpUiHierarchy(device); - } catch (err) { - const message = err instanceof Error ? err.message : String(err); - throw new AppError('UNSUPPORTED_OPERATION', `uiautomator dump failed: ${message}`); - } - if (findBounds(xml, text)) return { attempts }; - if (xml === previousXml) { - throw new AppError('COMMAND_FAILED', `scrollintoview could not find text: ${text}`, { - reason: 'not_found', - attempts, - stalled: true, - }); - } - previousXml = xml; - } - - throw new AppError('COMMAND_FAILED', `scrollintoview could not find text: ${text}`, { - reason: 'not_found', - attempts: maxScrolls, - }); -} - function resolveAndroidUserRotation(orientation: DeviceRotation): string { switch (orientation) { case 'portrait': diff --git a/src/platforms/ios/interactions.ts b/src/platforms/ios/interactions.ts index d8f78a56d..43984b720 100644 --- a/src/platforms/ios/interactions.ts +++ b/src/platforms/ios/interactions.ts @@ -1,10 +1,7 @@ import { AppError } from '../../utils/errors.ts'; import type { DeviceInfo } from '../../utils/device.ts'; import { buildScrollGesturePlan, type ScrollDirection } from '../../core/scroll-gesture.ts'; -import type { RunnerCommand } from './runner-client.ts'; import { runIosRunnerCommand } from './runner-client.ts'; -import { createRequestCanceledError, isRequestCanceled } from '../../daemon/request-cancel.ts'; -import { DEFAULT_SCROLL_INTO_VIEW_MAX_SCROLLS } from '../../utils/scroll-into-view.ts'; import type { BackMode, Interactor, RunnerContext } from '../../core/interactors.ts'; export type AppleBackRunnerCommand = 'backInApp' | 'backSystem'; @@ -29,18 +26,9 @@ type NormalizedScrollOptions = { preferProvidedPixels?: boolean; }; -type RunnerCommandExecutor = (command: RunnerCommand) => Promise>; type IosRunnerOverrides = Pick< Interactor, - | 'tap' - | 'doubleTap' - | 'swipe' - | 'longPress' - | 'focus' - | 'type' - | 'fill' - | 'scroll' - | 'scrollIntoView' + 'tap' | 'doubleTap' | 'swipe' | 'longPress' | 'focus' | 'type' | 'fill' | 'scroll' >; export function resolveAppleBackRunnerCommand(mode?: BackMode): AppleBackRunnerCommand { @@ -48,47 +36,6 @@ export function resolveAppleBackRunnerCommand(mode?: BackMode): AppleBackRunnerC return 'backInApp'; } -export async function scrollIntoViewIosRunnerText( - runCommand: RunnerCommandExecutor, - throwIfCanceled: () => void, - text: string, - options?: { maxScrolls?: number }, -): Promise<{ attempts?: number }> { - const maxScrolls = options?.maxScrolls ?? DEFAULT_SCROLL_INTO_VIEW_MAX_SCROLLS; - const initial = await runCommand({ command: 'findText', text }); - if (initial?.found) return { attempts: 0 }; - - let previousSnapshot = snapshotProgressFingerprint( - await runCommand({ command: 'snapshot', interactiveOnly: true, compact: true }), - ); - - for (let attempts = 1; attempts <= maxScrolls; attempts += 1) { - throwIfCanceled(); - await runCommand({ command: 'swipe', direction: 'up' }); - // Small settle keeps gesture chain stable without long visible pauses. - await new Promise((resolve) => setTimeout(resolve, 80)); - const found = await runCommand({ command: 'findText', text }); - if (found?.found) return { attempts }; - - const snapshot = snapshotProgressFingerprint( - await runCommand({ command: 'snapshot', interactiveOnly: true, compact: true }), - ); - if (snapshot === previousSnapshot) { - throw new AppError('COMMAND_FAILED', `scrollintoview could not find text: ${text}`, { - reason: 'not_found', - attempts, - stalled: true, - }); - } - previousSnapshot = snapshot; - } - - throw new AppError('COMMAND_FAILED', `scrollintoview could not find text: ${text}`, { - reason: 'not_found', - attempts: maxScrolls, - }); -} - export function iosRunnerOverrides( device: DeviceInfo, ctx: RunnerContext, @@ -102,11 +49,6 @@ export function iosRunnerOverrides( traceLogPath: ctx.traceLogPath, requestId: ctx.requestId, }; - const throwIfCanceled = () => { - if (!isRequestCanceled(ctx.requestId)) return; - throw createRequestCanceledError(); - }; - return { runnerOpts, overrides: { @@ -189,24 +131,10 @@ export function iosRunnerOverrides( options, ); }, - scrollIntoView: async (text, options) => { - return await scrollIntoViewIosRunnerText( - (command) => - runIosRunnerCommand(device, { ...command, appBundleId: ctx.appBundleId }, runnerOpts), - throwIfCanceled, - text, - options, - ); - }, }, }; } -function snapshotProgressFingerprint(snapshot: Record): string { - const nodes = snapshot.nodes; - return JSON.stringify(Array.isArray(nodes) ? nodes : snapshot); -} - function invertScrollDirection(direction: ScrollDirection): ScrollDirection { switch (direction) { case 'up': diff --git a/src/utils/__tests__/args.test.ts b/src/utils/__tests__/args.test.ts index 89a920279..faf2471ea 100644 --- a/src/utils/__tests__/args.test.ts +++ b/src/utils/__tests__/args.test.ts @@ -108,16 +108,6 @@ test('parseArgs recognizes command-specific flag combinations', async () => { assert.equal(parsed.flags.artifactsDir, '.agent-device/test-artifacts'); }, }, - { - label: 'scrollintoview --max-scrolls', - argv: ['scrollintoview', '@e2', '--max-scrolls', '5'], - strictFlags: true, - assertParsed: (parsed) => { - assert.equal(parsed.command, 'scrollintoview'); - assert.deepEqual(parsed.positionals, ['@e2']); - assert.equal(parsed.flags.maxScrolls, 5); - }, - }, ]; for (const scenario of scenarios) { diff --git a/src/utils/__tests__/interactors.test.ts b/src/utils/__tests__/interactors.test.ts index ca3796eab..97103b855 100644 --- a/src/utils/__tests__/interactors.test.ts +++ b/src/utils/__tests__/interactors.test.ts @@ -2,7 +2,6 @@ import { beforeEach, test, vi } from 'vitest'; import assert from 'node:assert/strict'; import type { RunnerCommand } from '../../platforms/ios/runner-client.ts'; import type { DeviceInfo } from '../device.ts'; -import { AppError } from '../errors.ts'; vi.mock('../../platforms/ios/runner-client.ts', async (importOriginal) => { const actual = await importOriginal(); @@ -10,10 +9,7 @@ vi.mock('../../platforms/ios/runner-client.ts', async (importOriginal) => { }); import { getInteractor } from '../../core/interactors.ts'; -import { - resolveAppleBackRunnerCommand, - scrollIntoViewIosRunnerText, -} from '../../platforms/ios/interactions.ts'; +import { resolveAppleBackRunnerCommand } from '../../platforms/ios/interactions.ts'; import { runIosRunnerCommand } from '../../platforms/ios/runner-client.ts'; const iosSimulator: DeviceInfo = { @@ -40,46 +36,6 @@ test('resolveAppleBackRunnerCommand maps explicit back modes to runner commands' assert.equal(resolveAppleBackRunnerCommand('system'), 'backSystem'); }); -test('ios scrollIntoView uses snapshot progress checks between swipes', async () => { - vi.spyOn(globalThis, 'setTimeout').mockImplementation(((cb: () => void, _ms: number) => { - cb(); - return 0 as unknown as ReturnType; - }) as typeof setTimeout); - - const commands: string[] = []; - let findTextCalls = 0; - let snapshotCalls = 0; - mockRunIosRunnerCommand.mockImplementation(async (_device, command) => { - commands.push(command.command); - if (command.command === 'findText') { - findTextCalls += 1; - return { found: findTextCalls > 2 }; - } - if (command.command === 'snapshot') { - snapshotCalls += 1; - return { - nodes: [{ type: 'XCUIElementTypeStaticText', label: `frame-${snapshotCalls}` }], - }; - } - if (command.command === 'swipe') return {}; - throw new Error(`Unexpected runner command: ${command.command}`); - }); - const interactor = getInteractor(iosSimulator, { appBundleId: 'com.example.app' }); - const result = await interactor.scrollIntoView('Target'); - - assert.deepEqual(result, { attempts: 2 }); - assert.deepEqual(commands, [ - 'findText', - 'snapshot', - 'swipe', - 'findText', - 'snapshot', - 'swipe', - 'findText', - ]); - assert.equal(snapshotCalls, 2); -}); - test('ios scroll reports planned pixels without recomputing from runner coordinates', async () => { mockRunIosRunnerCommand.mockImplementation(async (_device, command) => { if (command.command === 'interactionFrame') { @@ -131,41 +87,3 @@ test('ios fill clears the focused field after tapping the target coordinates', a }, ]); }); - -test('scrollIntoViewIosRunnerText stops when post-swipe snapshots stall', async () => { - vi.spyOn(globalThis, 'setTimeout').mockImplementation(((cb: () => void, _ms: number) => { - cb(); - return 0 as unknown as ReturnType; - }) as typeof setTimeout); - - let snapshotCalls = 0; - const runCommand = async (command: RunnerCommand): Promise> => { - switch (command.command) { - case 'findText': - return { found: false }; - case 'snapshot': - snapshotCalls += 1; - return { - nodes: [{ type: 'XCUIElementTypeStaticText', label: 'Still here' }], - }; - case 'swipe': - return {}; - default: - throw new Error(`Unexpected command: ${command.command}`); - } - }; - - await assert.rejects( - () => scrollIntoViewIosRunnerText(runCommand, () => {}, 'Missing item', { maxScrolls: 4 }), - (error: unknown) => { - assert.ok(error instanceof AppError); - assert.equal(error.code, 'COMMAND_FAILED'); - assert.equal(error.details?.reason, 'not_found'); - assert.equal(error.details?.attempts, 1); - assert.equal(error.details?.stalled, true); - return true; - }, - ); - - assert.equal(snapshotCalls, 2); -}); diff --git a/src/utils/command-schema.ts b/src/utils/command-schema.ts index e0cf1efa5..944d90a6e 100644 --- a/src/utils/command-schema.ts +++ b/src/utils/command-schema.ts @@ -72,7 +72,6 @@ export type CliFlags = { backMode?: 'in-app' | 'system'; pauseMs?: number; pattern?: 'one-way' | 'ping-pong'; - maxScrolls?: number; activity?: string; header?: string[]; saveScript?: boolean | string; @@ -639,15 +638,6 @@ const FLAG_DEFINITIONS: readonly FlagDefinition[] = [ usageLabel: '--pattern one-way|ping-pong', usageDescription: 'Swipe repeat pattern', }, - { - key: 'maxScrolls', - names: ['--max-scrolls'], - type: 'int', - min: 1, - max: 200, - usageLabel: '--max-scrolls ', - usageDescription: 'scrollintoview: cap the number of scroll gestures before failing', - }, { key: 'verbose', names: ['--debug', '--verbose', '-v'], @@ -1229,14 +1219,6 @@ const COMMAND_SCHEMAS: Record = { positionalArgs: ['direction', 'amount?'], allowedFlags: ['pixels'], }, - scrollintoview: { - usageOverride: 'scrollintoview ', - helpDescription: 'Scroll until text appears or a snapshot ref is brought into view', - summary: 'Scroll until text or ref is visible', - positionalArgs: ['target'], - allowsExtraPositionals: true, - allowedFlags: ['maxScrolls'], - }, pinch: { helpDescription: 'Pinch/zoom gesture (Apple simulator or macOS app session)', positionalArgs: ['scale', 'x?', 'y?'], diff --git a/src/utils/scroll-into-view.ts b/src/utils/scroll-into-view.ts deleted file mode 100644 index f0a7221be..000000000 --- a/src/utils/scroll-into-view.ts +++ /dev/null @@ -1,5 +0,0 @@ -export const DEFAULT_SCROLL_INTO_VIEW_MAX_SCROLLS = 48; - -// Android text scans re-dump the full UI hierarchy every attempt, so keep the -// default tighter than Apple/ref-backed paths. -export const DEFAULT_ANDROID_SCROLL_INTO_VIEW_MAX_SCROLLS = 8; diff --git a/website/docs/docs/client-api.md b/website/docs/docs/client-api.md index 87c013cf4..c90871e69 100644 --- a/website/docs/docs/client-api.md +++ b/website/docs/docs/client-api.md @@ -115,7 +115,7 @@ Additional CLI-backed methods are exposed on their domain groups with typed opti - `client.apps.push()` - `client.apps.triggerEvent()` - `client.capture.diff()` -- `client.interactions.click()`, `press()`, `longPress()`, `swipe()`, `focus()`, `type()`, `fill()`, `scroll()`, `scrollIntoView()`, `pinch()`, `get()`, `is()`, `find()` +- `client.interactions.click()`, `press()`, `longPress()`, `swipe()`, `focus()`, `type()`, `fill()`, `scroll()`, `pinch()`, `get()`, `is()`, `find()` - `client.replay.run()` and `client.replay.test()` - `client.batch.run()` - `client.observability.perf()`, `logs()`, and `network()` diff --git a/website/docs/docs/commands.md b/website/docs/docs/commands.md index 70a89d152..1502e78c2 100644 --- a/website/docs/docs/commands.md +++ b/website/docs/docs/commands.md @@ -241,9 +241,6 @@ agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-p agent-device longpress 300 500 800 agent-device scroll down 0.5 agent-device scroll down --pixels 320 -agent-device scrollintoview "Sign in" -agent-device scrollintoview "Sign in" --max-scrolls 6 -agent-device scrollintoview @e42 agent-device pinch 2.0 # zoom in 2x (Apple simulator or macOS app session) agent-device pinch 0.5 200 400 # zoom out at coordinates (Apple simulator or macOS app session) ``` @@ -259,12 +256,21 @@ Some Android images cannot enter non-ASCII text over shell input; in that case u `swipe` accepts an optional `durationMs` argument (default `250ms`, range `16..10000`). On iOS, swipe duration is clamped to a safe range (`16..60ms`) to avoid longpress side effects. `scroll` accepts either a relative amount (`0.5` means roughly half of the viewport on that axis) or `--pixels ` for a fixed-distance gesture. Large distances are clamped to the usable drag band so the gesture stays reliable across Android, iOS, and macOS. -`scrollintoview` accepts plain text or a snapshot ref (`@eN`). -Use `--max-scrolls ` to cap the number of scroll gestures explicitly. -When omitted, Apple text/ref paths default to `48` scrolls; Android text mode defaults to `8` because each attempt re-dumps the full UI hierarchy. -Ref mode re-snapshots after each swipe, returns a refreshed `currentRef` when it can track the target, and stops early when the target enters the safe viewport band or scrolling stops making progress. Default snapshot text output is visible-first, so off-screen interactive content is summarized instead of shown as tappable refs. -`press @ref` and `fill @ref` fail fast when the target is off-screen; use `scrollintoview @ref` first, then retry with the returned `currentRef` or a fresh snapshot. +When a target only appears in an off-screen summary, use `scroll ` and then take a fresh `snapshot -i`. For repeated checks, a small shell loop is enough: + +```bash +previous='' +for _ in 1 2 3 4 5 6; do + current="$(agent-device snapshot -i)" + printf '%s\n' "$current" + printf '%s\n' "$current" | grep -q 'Sign in' && break + [ "$current" = "$previous" ] && break + previous="$current" + agent-device scroll down 0.5 >/dev/null +done +``` + `longpress` is supported on iOS and Android. `pinch` is supported on Apple simulators and macOS app sessions. diff --git a/website/docs/docs/introduction.md b/website/docs/docs/introduction.md index a97ce6d17..565cedad5 100644 --- a/website/docs/docs/introduction.md +++ b/website/docs/docs/introduction.md @@ -21,7 +21,7 @@ For exploratory QA and bug-hunting workflows, see `skills/dogfood/SKILL.md` in t ## Platform support highlights -- iOS core runner commands: `snapshot`, `snapshot --diff`, `diff snapshot`, `wait`, `click`, `fill`, `get`, `is`, `find`, `press`, `long-press`, `focus`, `type`, `scroll`, `scrollintoview`, `back`, `home`, `rotate`, `app-switcher`, `open` (app), `close`, `screenshot`, `apps`, `appstate`, `install`, `install-from-source`, `reinstall`, `trigger-app-event`. +- iOS core runner commands: `snapshot`, `snapshot --diff`, `diff snapshot`, `wait`, `click`, `fill`, `get`, `is`, `find`, `press`, `long-press`, `focus`, `type`, `scroll`, `back`, `home`, `rotate`, `app-switcher`, `open` (app), `close`, `screenshot`, `apps`, `appstate`, `install`, `install-from-source`, `reinstall`, `trigger-app-event`. - iOS `appstate` is session-scoped on the selected target device. - iOS/tvOS simulator-only: `settings`, `push`, `clipboard`. - Apple simulators and macOS desktop app sessions: `alert`, `pinch`. diff --git a/website/docs/docs/quick-start.md b/website/docs/docs/quick-start.md index ab9e36c23..34418565c 100644 --- a/website/docs/docs/quick-start.md +++ b/website/docs/docs/quick-start.md @@ -23,7 +23,7 @@ agent-device snapshot -i # 4. Interact using refs agent-device click @e2 -# 5. Re-snapshot before next interactions, or use scrollintoview first if the target is off-screen +# 5. Re-snapshot before next interactions; if a target only appears in an off-screen summary, scroll and re-snapshot first agent-device snapshot -i # 6. Optional: see structural changes since last baseline diff --git a/website/docs/docs/snapshots.md b/website/docs/docs/snapshots.md index 34ba5e112..76584f9c1 100644 --- a/website/docs/docs/snapshots.md +++ b/website/docs/docs/snapshots.md @@ -32,6 +32,7 @@ It does not automatically switch to AX. - iOS and Android share the same mobile snapshot contract: visible-first output, actionable-now refs, and hidden list content communicated via discovery hints. - Default to `snapshot -i` for agent loops. - Default human-readable snapshot output is visible-first. Off-screen interactive content is collapsed into compact discovery summaries such as `[off-screen below] 3 interactive items: "Privacy", "Battery", "About"`. +- If a target only appears in an off-screen summary, use `scroll ` and re-snapshot until the target becomes visible. - When container ownership is known, hidden content is shown inline under the visible scroll/list container, for example `[content above scroll-area hidden]` or `[content below list hidden]`. - Those summaries intentionally show only a few labels for token efficiency. Use `snapshot --raw` when you need the full off-screen tree instead of the compact summary. - Add `-s "