Skip to content

Commit 857954e

Browse files
authored
feat(ui-automation): Add rs/1 runtime automation parity (#416)
1 parent 8c91fa4 commit 857954e

209 files changed

Lines changed: 21930 additions & 6195 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,4 @@ DerivedData
117117
/.pr-learning
118118
/repros
119119
/.xcodebuildmcp
120+
/out.nosync

AGENTS.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ ESM TypeScript project (`type: module`). Key layers:
7070

7171
## Tool Development
7272
- Tool manifests in `manifests/tools/*.yaml` define `id`, `module`, `names.mcp` (snake_case), optional `names.cli` (kebab-case), predicates, and annotations
73+
- MCP `readOnlyHint` describes whether a tool mutates host/project state such as files, build artifacts, configuration, or external services. Simulator HID/UI actions that only tap, type, press, or gesture inside the simulator may remain `readOnlyHint: true`; do not flip them to `false` merely because app UI state changes.
7374
- Workflow manifests in `manifests/workflows/*.yaml` group tools and define exposure rules
7475
- Tool modules export a Zod `schema`, a pure `*Logic` function, and a `handler` built with `createTypedTool` or `createSessionAwareTool`
7576
- Resource modules export a `handler` (and a pure `*Logic` function); `uri`, `name`, `description`, and `mimeType` are declared in `manifests/resources/*.yaml`
@@ -95,6 +96,12 @@ When reading issues:
9596
- Use shared lock and atomic-write helpers for mutable shared files.
9697
- Prefer one-record-per-file registries over shared aggregate files.
9798
- Cleanup must verify ownership before deleting shared artifacts.
99+
- Multi-process safety means concurrent processes must not corrupt or delete each other's state.
100+
It does not mean ephemeral runtime handles should become portable between invocation surfaces.
101+
- Keep runtime/session-scoped handles isolated unless the product explicitly defines a cross-process
102+
contract. For example, UI automation `elementRef` values from runtime snapshots are handles for
103+
the runtime/session that produced them, not durable IDs to share between separate MCP and CLI
104+
invocations.
98105
- User-facing artifact/log paths in final text or structured output must use `displayPath()` from `src/utils/build-preflight.ts`, so paths are cwd-relative when possible or `~/...` instead of absolute home paths. Keep stored files at their real absolute paths; only normalize response/display values.
99106

100107
## Style

CHANGELOG.md

Lines changed: 63 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,69 @@
55
### Added
66

77
- Added `nextSteps` hint lines to MCP `structuredContent` and CLI `--output json` envelopes so agents can consume follow-up actions without scraping text. CLI JSON renders shell command lines; MCP structured content renders MCP tool-call hints. Structured result schemas that include `nextSteps` now use schema version 2; existing version 1 schema files remain available for current validators.
8+
- Added `snapshot_ui sinceScreenHash` / CLI `--since-screen-hash` so callers can skip full runtime snapshot output when the screen hash is unchanged.
9+
- Added `batch` for executing multiple AXe UI automation steps in one simulator session.
10+
- Added `wait_for_ui` for polling runtime UI snapshots until UI predicates such as existence, enabled state, focus, text, or settled layout are satisfied. `textContains` can also wait on visible text without a selector when the match is unique.
11+
- Added structured element-ref `batch` tap steps, preserved same-screen refs after successful `tap` and `batch` actions, and improved UI automation guidance and next steps for one-observation interactions.
12+
- Added a `replaceExisting` option to `type_text` so agents can replace an existing text-field value instead of accidentally appending to it.
13+
- Added `drag` for element-ref based drag gestures, enabling agents to expand foreground sheets and drag real scroll/list regions without raw coordinate guesses.
14+
15+
### Changed
16+
17+
- Successful mutating UI automation calls now always attempt to refresh the runtime snapshot after the action instead of preserving or patching cached switch state.
18+
- Runtime snapshot guidance no longer advertises synthetic sheet swipe targets for foreground sheets. Agents should use real sheet grabber expansion and real descendant scroll/list targets with `drag` instead of inferred app/window-root sheet swipes.
19+
20+
### Fixed
21+
22+
- Fixed simulator launch failures before simulator-name resolution so they are not reported as macOS launch failures.
23+
- Fixed CLI JSON output so simulator-name resolution failures return the structured error envelope instead of plain stderr.
24+
- Fixed accessibility hierarchy tips so UI automation guidance prefers runtime element refs over raw coordinate guessing.
25+
- Fixed `swipe` distance handling so distance is a normalized stroke fraction used for endpoint calculation, and improved sheet/list scroll guidance so real descendant scroll containers are preferred over application/window root fallbacks.
26+
- Fixed compact runtime snapshots so top-level app and window refs are not advertised as swipe targets just because a generic descendant overflows their frame.
27+
- Fixed `wait_for_ui` focus waits so elements that do not expose focus state return a typed recoverable error instead of timing out.
28+
- Fixed invalid `touch` calls so structured output no longer reports a fake touch event when neither `down` nor `up` was requested.
29+
- Fixed compact runtime snapshots so standalone `other` elements, such as keyboard suggestions, are not advertised as swipe targets unless they behave like scrollable containers.
30+
- Fixed runtime snapshots so off-screen elements, and clipped elements whose activation point is offscreen, are not advertised as actionable targets.
31+
- Fixed full-screen swipe gestures so app-level scroll refs avoid unsafe screen edges such as the status bar and notch area.
32+
- Clarified runtime snapshot tips so agents know element refs are snapshot-specific and must come from the latest `snapshot_ui` or `wait_for_ui` output, and only show swipe guidance when the snapshot includes a scroll ref.
33+
- Made `wait_for_ui` `textContains` matching case-insensitive so assertions survive platform text normalization such as keyboard auto-capitalization, treat duplicate exact text matches as successful presence assertions, narrow broad selectors by text before reporting ambiguity, reject `text` on non-`textContains` predicates instead of silently ignoring it, and keep recoverable-error candidates compact in structured output.
34+
- Fixed `tap` on SwiftUI switch element refs by using a touch down/up activation instead of AXe's coordinate tap path.
35+
- Fixed selector fallback for AXe duplicate-match diagnostics that include parenthesized match counts.
36+
- Fixed semantic taps and text-field focusing so element refs with duplicate AXe selectors use their resolved snapshot coordinates immediately.
37+
- Fixed bottom-clipped UI automation targets so taps, touches, and long presses use a visible activation point instead of the hidden center of the accessibility frame.
38+
- Fixed app-level horizontal swipes so full-screen refs use a content-area y-coordinate instead of missing horizontal carousels by swiping near the hero area.
39+
- Fixed CLI commands with `simulatorId`-only contracts so `simulatorName` session defaults are resolved to a simulator ID without adding conflicting simulator arguments to tools that already accept `simulatorName`, and fixed simulator lifecycle tools so name-only defaults resolve before simctl operations.
40+
- Fixed `snapshot_ui` and `wait_for_ui` next steps so they use the resolved simulator ID instead of leaking `SIMULATOR_UUID` placeholders.
41+
- Fixed the Weather example app so saved-location rows are not reused as search-result rows after editing locations.
42+
- Fixed the Weather example app's current-location button so it selects the current saved location instead of appearing as a no-op UI automation target.
43+
- Fixed `type_text` so AXe-unsupported international/accented characters fail before focusing the field, with a clear recoverable error instead of a generic typing failure.
44+
- Fixed `snapshot_ui` next-step guidance so the suggested tap ref prefers useful tappable controls over text fields, sheet grabbers, close buttons, and clear-search buttons.
45+
- Fixed compact runtime snapshot JSON so target ordering matches compact text output and prioritizes useful content targets before low-value sheet chrome.
46+
- Fixed `wait_for_ui` success output so compact text and JSON include the matched elements that satisfied the wait predicate.
47+
- Fixed `wait_for_ui textContains` so duplicate elements with the same matching visible text satisfy presence-style assertions instead of reporting ambiguity.
48+
- Fixed CLI `--style minimal` so final text output suppresses generated next steps for daemon-routed tools as intended.
49+
- Fixed `snapshot_ui` next-step guidance so snapshots with no tappable targets no longer suggest tapping the first non-actionable element.
50+
- Fixed next-step rendering for tools shared across workflows so follow-up commands prefer the workflow that produced the result instead of drifting to another workflow alias.
51+
- Fixed `snapshot_ui` next-step guidance so calculator-style utility and operator buttons no longer outrank more useful digit/content controls.
52+
- Fixed `snapshot_ui` compact text, JSON, and next-step guidance so already-selected segmented controls no longer outrank unselected choices.
53+
- Fixed compact runtime snapshots and next-step guidance so sheet grabbers remain visible as low-priority targets, allowing agents to expand or dismiss sheets without outranking useful content controls.
54+
- Fixed compact wait-match rows so static assertion matches render with `none` instead of exposing low-level long-press/touch actions as if they were primary agent actions.
55+
- Fixed compact runtime snapshot ordering and next-step guidance so destructive controls such as Remove/Delete are demoted behind safer content and navigation targets.
56+
- Clarified simulator keyboard shortcut failures when Simulator.app is running without a visible device window.
57+
- Fixed hardware button automation so successful button presses wait briefly for system UI transitions before returning, reducing stale immediate follow-up snapshots.
58+
- Fixed runtime snapshots so modal sheet hosts remain swipeable after the currently visible sheet content fits inside the viewport.
59+
- Fixed `wait_for_ui` validation so unknown JSON fields are rejected instead of silently broadening waits.
60+
- Fixed CLI numeric array flags so comma-separated values such as `--key-codes 23,18,14` are parsed as numbers instead of failing validation.
61+
- Fixed runtime snapshots so unlabeled internal custom-action nodes, such as SpringBoard icon subviews, are no longer advertised as likely tap targets.
62+
- Fixed AXe bundling so downloaded artifacts must report the pinned AXe version, and dirty local AXe builds require an explicit opt-in.
63+
- Fixed runtime snapshot tips so compact output names all target-ref action tools, including `long_press` and `touch`.
64+
- Clarified key press and key sequence tool descriptions so agents know key codes are AXe/macOS virtual key codes and should prefer `type_text` for text entry.
65+
- Clarified `wait_for_ui` timeout recovery hints so agents know selector fields match exact values and should use `textContains` for partial visible text.
66+
- Fixed UI action success next steps so agents are prompted to refresh runtime snapshots before reusing element refs after actions such as swipes.
67+
- Fixed `snapshot_ui` next-step guidance so state-changing controls such as segmented units and switches remain available in targets without being promoted as generic tap or batch suggestions.
68+
- Fixed `snapshot_ui` tap next-step priority so content-rich cards are suggested before navigation controls like Settings.
69+
- Fixed successful UI action results so they include a fresh runtime snapshot and actionable next steps, reducing follow-up refresh calls after taps, typing, swipes, and batches.
70+
- Fixed same-simulator UI automation transactions so runtime snapshot resolution, actions, invalidation, and refreshes cannot interleave within one MCP or daemon process.
871

972
## [2.5.2]
1073

@@ -641,5 +704,3 @@ Please note that the UI automation features are an early preview and currently i
641704
## [v1.0.1] - 2025-04-02
642705
- Initial release of XcodeBuildMCP
643706
- Basic support for building iOS and macOS applications
644-
645-

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ When reading issues:
1818
-
1919
## Tools
2020
- GitHub CLI for issues/PRs
21+
- MCP `readOnlyHint` describes whether a tool mutates host/project state such as files, build artifacts, configuration, or external services. Simulator HID/UI actions that only tap, type, press, or gesture inside the simulator may remain `readOnlyHint: true`; do not flip them to `false` merely because app UI state changes.
2122
- CLI design note: do not rely on CLI session-default writes. CLI is intentionally deterministic for CI/scripting and should use explicit command arguments as the primary input surface.
2223
- When working on skill sources in `skills/`, use the `skill-creator` skill workflow.
2324
- After modifying any skill source, run `npx skill-check <skill-directory>` and address all errors/warnings before handoff.

example_projects/.xcodebuildmcp/config.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,11 @@ sessionDefaultsProfiles:
44
workspacePath: ./iOS_Calculator/CalculatorApp.xcworkspace
55
scheme: CalculatorApp
66
simulatorName: iPhone 17 Pro
7-
simulatorId: B38FE93D-578B-454B-BE9A-C6FA0CE5F096
87
simulatorPlatform: iOS Simulator
98
ios-test:
109
projectPath: ./iOS/MCPTest.xcodeproj
1110
scheme: MCPTest
1211
simulatorName: iPhone 17 Pro
13-
simulatorId: B38FE93D-578B-454B-BE9A-C6FA0CE5F096
1412
simulatorPlatform: iOS Simulator
1513
macos-test:
1614
projectPath: ./macOS/MCPTest.xcodeproj

example_projects/Weather/.xcodebuildmcp/config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sentryDisabled: false
77
sessionDefaults:
88
projectPath: Weather.xcodeproj
99
scheme: Weather
10-
simulatorName: iPhone 17 Pro
10+
simulatorName: iPhone 17 Pro Max
1111
setupPreferences:
1212
platforms:
1313
- iOS

example_projects/Weather/README.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,14 @@
22

33
Atmos Weather is a native SwiftUI weather app prototype for iOS.
44

5-
## Launch with mock weather data
5+
## Launch
66

7-
Build and run the app with XcodeBuildMCP first:
7+
Build and run the app with XcodeBuildMCP:
88

99
```bash
1010
../../build/cli.js simulator build-and-run
1111
```
1212

13-
Then relaunch the installed app with the mock API argument:
14-
15-
```bash
16-
../../build/cli.js simulator launch-app \
17-
--bundle-id com.sentry.weather.Weather \
18-
--args=--mock-weather-api
19-
```
20-
2113
## JSON fixtures
2214

2315
Fixture JSON files live in:
@@ -98,4 +90,4 @@ Run the app test suite through XcodeBuildMCP:
9890
../../build/cli.js simulator test
9991
```
10092

101-
UI tests inject `--mock-weather-api` themselves so they do not depend on the production API endpoint.
93+
The app uses bundled deterministic weather data so UI tests do not depend on the production API endpoint.

example_projects/Weather/Weather/Services/MockWeatherAPIClient.swift

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,10 @@ struct MockWeatherAPIClient: WeatherAPIClient, Sendable {
2929
guard !trimmed.isEmpty else { return [] }
3030

3131
let needle = trimmed.localizedLowercase
32-
return fixtures.searchPool.filter { location in
33-
location.name.localizedLowercase.contains(needle)
32+
var seenLocationIDs = Set<WeatherLocation.ID>()
33+
return (fixtures.locations + fixtures.searchPool).filter { location in
34+
guard seenLocationIDs.insert(location.id).inserted else { return false }
35+
return location.name.localizedLowercase.contains(needle)
3436
|| location.subtitle.localizedLowercase.contains(needle)
3537
|| (location.country?.localizedLowercase.contains(needle) ?? false)
3638
}

example_projects/Weather/Weather/Views/Overlays/LocationPickerView.swift

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ struct LocationPickerView: View {
103103
}
104104

105105
private var currentLocationButton: some View {
106-
Button(action: {}) {
106+
Button(action: selectCurrentLocation) {
107107
HStack(spacing: 12) {
108108
Image(systemName: "location.fill")
109109
.font(.system(size: 14))
@@ -145,6 +145,7 @@ struct LocationPickerView: View {
145145
onSelect: { select(location) },
146146
onRemove: { remove(location) }
147147
)
148+
.id("saved-\(location.id)-\(isEditing)")
148149
}
149150
} else if isLoading {
150151
ForEach(0..<3, id: \.self) { _ in SearchSkeletonRow() }
@@ -160,6 +161,7 @@ struct LocationPickerView: View {
160161
onPreview: { preview(location) },
161162
onAdd: { add(location) }
162163
)
164+
.id("search-\(location.id)-\(isSaved(location))-\(justAddedID == location.id)")
163165
}
164166
}
165167
}
@@ -229,6 +231,11 @@ struct LocationPickerView: View {
229231
justAddedID = location.id
230232
}
231233

234+
private func selectCurrentLocation() {
235+
guard let currentLocation = savedLocations.first else { return }
236+
select(currentLocation)
237+
}
238+
232239
private func clearAddedIndicator() async {
233240
guard let id = justAddedID else { return }
234241
try? await Task.sleep(for: .milliseconds(1_400))

example_projects/Weather/Weather/Views/Overlays/LocationRows.swift

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ struct SearchLocationRow: View {
9696
.frame(maxWidth: .infinity, alignment: .leading)
9797
}
9898
.buttonStyle(.plain)
99+
.accessibilityValue(saved || added ? "saved" : "not saved")
99100

100101
VStack(alignment: .trailing, spacing: 3) {
101102
Text(WeatherUnitFormatter.temperatureString(location.temperatureC, units: units))

0 commit comments

Comments
 (0)