Skip to content

Commit 7819952

Browse files
committed
fix: handle iOS keyboard Done dismiss controls
1 parent d1a7641 commit 7819952

8 files changed

Lines changed: 45 additions & 7 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
103103
## Hard Rules
104104
- Use `runCmd`/`runCmdSync` from `src/utils/exec.ts` for process execution.
105105
- Use daemon session flow for interactions (`open` before interactions, `close` after).
106+
- Use `keyboard dismiss` for iOS keyboard dismissal; it may tap safe native controls such as `Done` but must not fall back to system back navigation.
106107
- Do not remove shared snapshot/session model behavior without full migration.
107108
- Command/device support must come from `src/core/capabilities.ts`.
108109
- Apple-family target changes must keep `src/utils/device.ts`, `src/core/capabilities.ts`, `src/core/dispatch-resolve.ts`, `src/platforms/ios/devices.ts`, and `src/platforms/ios/runner-xctestrun.ts` in sync.

ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,20 +285,44 @@ extension RunnerTests {
285285
}
286286

287287
private func tapKeyboardDismissControl(app: XCUIApplication) -> Bool {
288-
for label in ["Hide keyboard", "Dismiss keyboard"] {
288+
let keyboardFrame = app.keyboards.firstMatch.frame
289+
for label in ["Hide keyboard", "Dismiss keyboard", "Done"] {
289290
let candidates = [
290291
app.keyboards.buttons[label],
291292
app.keyboards.keys[label],
292-
app.toolbars.buttons[label],
293+
app.keyboards.toolbars.buttons[label],
293294
]
294295
if let hittable = candidates.first(where: { $0.exists && $0.isHittable }) {
295296
hittable.tap()
296297
return true
297298
}
299+
300+
let toolbarButtonPredicate = NSPredicate(
301+
format: "label == %@ OR identifier == %@",
302+
label,
303+
label
304+
)
305+
let toolbarButtons = app.toolbars.buttons
306+
.matching(toolbarButtonPredicate)
307+
.allElementsBoundByIndex
308+
if let hittable = toolbarButtons.first(where: {
309+
$0.exists && $0.isHittable && isKeyboardAccessoryControl($0, keyboardFrame: keyboardFrame)
310+
}) {
311+
hittable.tap()
312+
return true
313+
}
298314
}
299315
return false
300316
}
301317

318+
private func isKeyboardAccessoryControl(_ element: XCUIElement, keyboardFrame: CGRect) -> Bool {
319+
let frame = element.frame
320+
guard !frame.isEmpty && !keyboardFrame.isEmpty else {
321+
return false
322+
}
323+
return frame.intersects(keyboardFrame) || abs(frame.maxY - keyboardFrame.minY) <= 80
324+
}
325+
302326
private func moveCaretToEnd(element: XCUIElement) {
303327
let frame = element.frame
304328
guard !frame.isEmpty else {

skills/agent-device/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,4 @@ agent-device help dogfood
3131

3232
Default loop: `open -> snapshot/-i -> get/is/find or press/fill/scroll/wait -> verify -> close`.
3333

34-
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. Let `help workflow` provide the exact command shapes.
34+
Keep refs current, prefer selectors/refs over coordinates, use `fill` to replace text, and use `back` for app-owned navigation. On iOS, use `keyboard dismiss` before manually pressing visible keyboard controls such as `Done`; fall back only if the command reports unsupported. Let `help workflow` provide the exact command shapes.

src/utils/command-schema.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,7 @@ Text entry:
283283
agent-device fill 'id="field-email"' "qa@example.com"
284284
agent-device press 'id="product-note"'
285285
agent-device type "Handle with care" --delay-ms 80
286-
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: keyboard dismiss.
286+
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: keyboard dismiss. On iOS, prefer keyboard dismiss before manually pressing visible Done; the runner can use safe native keyboard controls and still reports unsupported layouts explicitly.
287287
Search-as-you-type fields on iOS can drop characters when driven too fast; use --delay-ms on fill/type before trying clipboard paste.
288288
iOS Allow Paste prompt cannot be exercised under XCUITest. To test paste-driven app behavior, prefill first with agent-device clipboard write "some text"; test the system prompt manually.
289289
Android non-ASCII can fail on some system images. Try fill/type normally; agent-device uses safer fallbacks. If the shell reports unsupported non-ASCII input, configure a trusted ADB keyboard IME outside the command plan and restore the previous IME afterward.

test/skillgym/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The included suite focuses on the first two layers so it stays stable and CI-saf
1616

1717
- `../../examples/test-app/`: minimal Expo SDK 55 fixture app for broad UI coverage
1818
- `skillgym.config.ts`: starter config that runs Codex and Claude Haiku against this repo
19-
- `suites/agent-device-smoke-suite.ts`: 66-case suite for skill routing, fixture-aware planning, and skill-guidance regressions
19+
- `suites/agent-device-smoke-suite.ts`: 68-case suite for skill routing, fixture-aware planning, and skill-guidance regressions
2020

2121
## Current coverage
2222

test/skillgym/suites/agent-device-smoke-suite.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,19 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [
329329
outputs: [/field-name/i, /keyboard dismiss/i],
330330
forbiddenOutputs: [commandPattern('back')],
331331
}),
332+
makeCase({
333+
id: 'form-keyboard-dismiss-ios-done-control',
334+
contract: [
335+
'Platform: iOS',
336+
'App name: Agent Device Tester',
337+
'Current screen: Checkout form tab',
338+
'testID=field-name',
339+
'The focused field shows an iOS keyboard toolbar with a visible Done control',
340+
],
341+
task: 'Plan the commands to focus the Full name field and dismiss the iOS keyboard without manually pressing Done.',
342+
outputs: [/field-name/i, /keyboard dismiss/i],
343+
forbiddenOutputs: [commandPattern('back'), /press\s+.*Done/i, /click\s+.*Done/i],
344+
}),
332345
makeCase({
333346
id: 'form-reset',
334347
contract: [

website/docs/docs/commands.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -534,7 +534,7 @@ agent-device keyboard dismiss
534534
```
535535

536536
- `keyboard status` (or `keyboard get`) returns keyboard visibility and best-effort input type classification on Android.
537-
- `keyboard dismiss` attempts a non-navigation keyboard dismissal on Android and a native dismiss gesture/control on iOS, then confirms the keyboard is hidden.
537+
- `keyboard dismiss` attempts a non-navigation keyboard dismissal on Android and a native dismiss gesture/control on iOS, including common safe controls such as a keyboard toolbar `Done` button, then confirms the keyboard is hidden.
538538
- If the keyboard remains visible after the platform-native dismiss path, the command returns an explicit `UNSUPPORTED_OPERATION` error instead of falling back to back navigation.
539539
- Works with active sessions and explicit selectors (`--platform`, `--device`, `--udid`, `--serial`).
540540
- `keyboard status|get` is supported on Android emulator/device.

website/docs/docs/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ For agent-oriented operating guidance, start with `agent-device help` or `agent-
3636
- Physical-device recording defaults to 15 FPS and supports `--fps` caps.
3737
- `record start --quality <5-10>` scales recording resolution from 50% through native resolution; omitting it keeps native/current resolution.
3838
- Android supports the same core interaction set, plus `rotate`, `push` notification simulation, `clipboard read/write`, and `keyboard status|get|dismiss`.
39-
- iOS supports `keyboard dismiss` through the XCTest runner when the on-screen keyboard is visible.
39+
- iOS supports `keyboard dismiss` through the XCTest runner when the on-screen keyboard is visible, including common native controls such as keyboard toolbar `Done`.
4040
- App-event triggers are available on iOS and Android through app-defined deep-link hooks (`trigger-app-event`), using active session context or explicit device selectors.
4141

4242
## Architecture (high level)

0 commit comments

Comments
 (0)