Skip to content

Commit 961d22c

Browse files
committed
fix: handle iOS keyboard Done dismiss controls
1 parent 6c368bf commit 961d22c

7 files changed

Lines changed: 48 additions & 9 deletions

File tree

AGENTS.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
103103
## Hard Rules
104104
- Use `runCmd`/`runCmdSync` from `src/utils/exec.ts` for process execution.
105105
- Use daemon session flow for interactions (`open` before interactions, `close` after).
106+
- Use `keyboard dismiss` for iOS keyboard dismissal; it may tap safe native controls such as `Done` but must not fall back to system back navigation.
106107
- Do not remove shared snapshot/session model behavior without full migration.
107108
- Command/device support must come from `src/core/capabilities.ts`.
108109
- Apple-family target changes must keep `src/utils/device.ts`, `src/core/capabilities.ts`, `src/core/dispatch-resolve.ts`, `src/platforms/ios/devices.ts`, and `src/platforms/ios/runner-xctestrun.ts` in sync.
@@ -186,14 +187,15 @@ Command-only flags (like `find --first`) that don't flow to the platform layer o
186187
## Docs & Skills
187188
- Versioned CLI help is the agent-facing source of truth. Put workflow guidance in `src/utils/command-schema.ts` help topics and assert important copy in `src/utils/__tests__/args.test.ts`.
188189
- Skills are thin routers. Keep `skills/**/SKILL.md` focused on when to use the skill, version gating, which `agent-device help <topic>` page to read, and a short default loop. Do not duplicate full CLI manuals in skills.
189-
- For behavior/CLI surface changes, update `README.md`, relevant `website/docs/**`, and router skills only when their short routing guidance or version assumptions change.
190-
- For command-planning guidance changes, update `test/skillgym/suites/agent-device-smoke-suite.ts` when the change should alter what an agent plans.
190+
- For behavior/CLI surface changes, update the versioned help instructions in `src/utils/command-schema.ts` and assert important help copy in `src/utils/__tests__/args.test.ts`. Also update `README.md` and relevant `website/docs/**` when user-facing docs need it.
191+
- For behavior/CLI surface changes and command-planning guidance changes, write or update a SkillGym case in `test/skillgym/suites/agent-device-smoke-suite.ts` that captures the expected agent command plan.
192+
- Do not update `skills/**/SKILL.md` for command behavior or workflow guidance unless the user explicitly asks; skills must route to versioned CLI help instead of carrying behavior details.
191193
- Keep SkillGym cases behavioral and command-planning oriented. Prefer prompts that assert the user-visible contract and expected command family over brittle exact output, but forbid known bad patterns.
192194
- Build before SkillGym when local CLI help is needed: `pnpm build`, then `pnpm exec skillgym run ... --case <id>`.
193195
- Run SkillGym broad validation in batches of 20 cases or fewer using repeated `--case` runs; do not rely on one full-suite invocation for large runs.
194196
- Preserve current high-value workflow guidance:
195197
- iOS Expo Go dogfood: prefer `agent-device open "Expo Go" <url> --platform ios` when the shell is known, then `snapshot -i` to confirm the project UI rather than the runner splash.
196-
- `keyboard dismiss` is best-effort on iOS; prefer a visible app dismiss control, or `back --system` only when system navigation is acceptable.
198+
- `keyboard dismiss` is the preferred iOS keyboard-dismissal path before manually pressing visible keyboard controls such as `Done`; it remains best-effort and can report unsupported layouts explicitly.
197199
- Empty replacement is not a supported clear-field command; do not document or test `fill <target> ""` as clearing. Prefer visible clear/reset controls or report the tool gap.
198200
- Mutating commands against one session must run serially. Parallelize only read-only commands or commands on separate sessions/devices.
199201
- In final summaries, state whether docs/skills were updated; if not, explain why.

ios-runner/AgentDeviceRunner/AgentDeviceRunnerUITests/RunnerTests+Interaction.swift

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -285,20 +285,44 @@ extension RunnerTests {
285285
}
286286

287287
private func tapKeyboardDismissControl(app: XCUIApplication) -> Bool {
288-
for label in ["Hide keyboard", "Dismiss keyboard"] {
288+
let keyboardFrame = app.keyboards.firstMatch.frame
289+
for label in ["Hide keyboard", "Dismiss keyboard", "Done"] {
289290
let candidates = [
290291
app.keyboards.buttons[label],
291292
app.keyboards.keys[label],
292-
app.toolbars.buttons[label],
293+
app.keyboards.toolbars.buttons[label],
293294
]
294295
if let hittable = candidates.first(where: { $0.exists && $0.isHittable }) {
295296
hittable.tap()
296297
return true
297298
}
299+
300+
let toolbarButtonPredicate = NSPredicate(
301+
format: "label == %@ OR identifier == %@",
302+
label,
303+
label
304+
)
305+
let toolbarButtons = app.toolbars.buttons
306+
.matching(toolbarButtonPredicate)
307+
.allElementsBoundByIndex
308+
if let hittable = toolbarButtons.first(where: {
309+
$0.exists && $0.isHittable && isKeyboardAccessoryControl($0, keyboardFrame: keyboardFrame)
310+
}) {
311+
hittable.tap()
312+
return true
313+
}
298314
}
299315
return false
300316
}
301317

318+
private func isKeyboardAccessoryControl(_ element: XCUIElement, keyboardFrame: CGRect) -> Bool {
319+
let frame = element.frame
320+
guard !frame.isEmpty && !keyboardFrame.isEmpty else {
321+
return false
322+
}
323+
return frame.intersects(keyboardFrame) || abs(frame.maxY - keyboardFrame.minY) <= 80
324+
}
325+
302326
private func moveCaretToEnd(element: XCUIElement) {
303327
let frame = element.frame
304328
guard !frame.isEmpty else {

src/utils/__tests__/args.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -860,7 +860,7 @@ test('usageForCommand resolves workflow help topic', () => {
860860
assert.match(help, /iOS Allow Paste prompt cannot be exercised under XCUITest/);
861861
assert.match(help, /Empty replacement is not a supported clear-field command/);
862862
assert.match(help, /do not plan fill <target> ""/);
863-
assert.match(help, /iOS keyboard dismiss is best-effort/);
863+
assert.match(help, /prefer keyboard dismiss before manually pressing visible Done/);
864864
assert.match(help, /UNSUPPORTED_OPERATION/);
865865
assert.match(help, /Stateful commands against one --session must run serially/);
866866
assert.match(

src/utils/command-schema.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ Text entry:
287287
agent-device type "Handle with care" --delay-ms 80
288288
Empty replacement is not a supported clear-field command: do not plan fill <target> "" or fill <target> ''. Prefer a visible clear/reset control; if the app exposes none, report the tool gap instead of inventing a clear command.
289289
Debounced field with no result selector: agent-device wait 1000. Keyboard read-only: keyboard status/get. Blocked control: try keyboard dismiss when supported.
290-
iOS keyboard dismiss is best-effort and can return UNSUPPORTED_OPERATION when no native dismiss gesture/control is available. Prefer a visible app dismiss control, or use back --system only when system navigation is an acceptable side effect.
290+
On iOS, prefer keyboard dismiss before manually pressing visible Done; the runner can use safe native keyboard controls and still reports unsupported layouts explicitly. If it returns UNSUPPORTED_OPERATION, prefer a visible app dismiss control, or use back --system only when system navigation is an acceptable side effect.
291291
Search-as-you-type fields on iOS can drop characters when driven too fast; use --delay-ms on fill/type before trying clipboard paste.
292292
iOS Allow Paste prompt cannot be exercised under XCUITest. To test paste-driven app behavior, prefill first with agent-device clipboard write "some text"; test the system prompt manually.
293293
Android non-ASCII can fail on some system images. Try fill/type normally; agent-device uses safer fallbacks. If the shell reports unsupported non-ASCII input, configure a trusted ADB keyboard IME outside the command plan and restore the previous IME afterward.

test/skillgym/suites/agent-device-smoke-suite.ts

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,19 @@ const FIXTURE_SMOKE_CASES: TestCase[] = [
350350
outputs: [/field-name/i, /Done/i, commandAlternativesPattern(['press', 'click'])],
351351
forbiddenOutputs: [commandPattern('keyboard dismiss'), commandPattern('back')],
352352
}),
353+
makeCase({
354+
id: 'form-keyboard-dismiss-ios-done-control',
355+
contract: [
356+
'Platform: iOS',
357+
'App name: Agent Device Tester',
358+
'Current screen: Checkout form tab',
359+
'testID=field-name',
360+
'The focused field shows an iOS keyboard toolbar with a visible Done control',
361+
],
362+
task: 'Plan the commands to focus the Full name field and dismiss the iOS keyboard without manually pressing Done.',
363+
outputs: [/field-name/i, /keyboard dismiss/i],
364+
forbiddenOutputs: [commandPattern('back'), /press\s+.*Done/i, /click\s+.*Done/i],
365+
}),
353366
makeCase({
354367
id: 'form-reset',
355368
contract: [

website/docs/docs/commands.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -534,7 +534,7 @@ agent-device keyboard dismiss
534534
```
535535

536536
- `keyboard status` (or `keyboard get`) returns keyboard visibility and best-effort input type classification on Android.
537-
- `keyboard dismiss` attempts a non-navigation keyboard dismissal on Android and a native dismiss gesture/control on iOS, then confirms the keyboard is hidden.
537+
- `keyboard dismiss` attempts a non-navigation keyboard dismissal on Android and a native dismiss gesture/control on iOS, including common safe controls such as a keyboard toolbar `Done` button, then confirms the keyboard is hidden.
538538
- If the keyboard remains visible after the platform-native dismiss path, the command returns an explicit `UNSUPPORTED_OPERATION` error instead of falling back to back navigation.
539539
- On iOS, `keyboard dismiss` is best-effort and can fail when the active app exposes no native dismiss gesture/control. Prefer a visible app dismiss control, or use `back --system` only when system navigation is an acceptable side effect.
540540
- Works with active sessions and explicit selectors (`--platform`, `--device`, `--udid`, `--serial`).

website/docs/docs/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ For agent-oriented operating guidance, start with `agent-device help` or `agent-
3636
- Physical-device recording defaults to 15 FPS and supports `--fps` caps.
3737
- `record start --quality <5-10>` scales recording resolution from 50% through native resolution; omitting it keeps native/current resolution.
3838
- Android supports the same core interaction set, plus `rotate`, `push` notification simulation, `clipboard read/write`, and `keyboard status|get|dismiss`.
39-
- iOS `keyboard dismiss` is best-effort through the XCTest runner and can fail when the app exposes no native dismiss gesture/control.
39+
- iOS `keyboard dismiss` is best-effort through the XCTest runner, including common native controls such as keyboard toolbar `Done`, and can fail when the app exposes no native dismiss gesture/control.
4040
- App-event triggers are available on iOS and Android through app-defined deep-link hooks (`trigger-app-event`), using active session context or explicit device selectors.
4141

4242
## Architecture (high level)

0 commit comments

Comments
 (0)