Skip to content

Commit 35f54a8

Browse files
authored
fix: resolve Maestro taps from regular snapshots (#709)
* fix: resolve Maestro taps from regular snapshots * fix: tighten Maestro snapshot compatibility * test: align Maestro snapshot freshness fixtures
1 parent 2014cb6 commit 35f54a8

28 files changed

Lines changed: 1014 additions & 463 deletions

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ Command-only flags (like `find --first`) that do not flow to the platform layer
183183
## React Native Verification
184184
- After changing runtime code exercised through `bin/agent-device.mjs` or the daemon, run `pnpm build` and `pnpm clean:daemon` before manual device verification so snapshots use current `dist` output.
185185
- For Android RN/Expo/dev-client apps connected to any local Metro port, `adb reverse tcp:<port> tcp:<port>` is harmless and should be run before opening the app or URL on the emulator/device.
186+
- In sandboxed agent environments, run manual `agent-device` CLI verification that starts the daemon outside the sandbox with escalation. The daemon binds localhost, and sandboxed runs can fail before any product code executes with `listen EPERM: operation not permitted 127.0.0.1` or repeated `Failed to start daemon`/metadata cleanup messages. Do not spend time debugging those as agent-device regressions; rerun the same command with escalation. Unit tests, typecheck, lint, and build can stay sandboxed unless they need platform devices or network/listener access.
186187

187188
## Manual Device Session Hygiene
188189
- Treat every manually opened `agent-device` session as a resource that must be closed, including exploratory sessions and failed verification attempts.

docs/adr/0004-ios-snapshot-backend-strategy.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,13 @@ Accepted
77
## Context
88

99
Agent Device exposes iOS UI state through snapshots produced by the long-lived XCTest runner. The
10-
runner has two different snapshot needs:
10+
runner has three different snapshot needs:
1111

12-
- rich diagnostics and selector disambiguation, where a recursive XCTest snapshot is useful because
13-
it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
12+
- agent-facing regular context, where the important contract is the effective user-visible UI,
13+
fixed controls such as tab bars, and scroll-hidden hints for content outside visible scroll
14+
containers;
15+
- rich diagnostics and selector disambiguation, where a raw recursive XCTest snapshot is useful
16+
because it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
1417
- agent-facing compact interactive context, where the important contract is fast, bounded discovery
1518
of visible controls and stable refs for the next action.
1619

@@ -31,13 +34,22 @@ predictable.
3134
Keep XCTest as the default iOS automation runner and split iOS snapshot capture into explicit
3235
strategies:
3336

34-
- **Full tree strategy**: use recursive XCTest snapshots for normal/full snapshots, raw snapshots,
35-
diagnostics, and cases that need hierarchy. If XCTest reports a real AX serialization failure,
36-
preserve that error instead of pretending the UI is empty.
37+
- **Regular visible strategy**: use recursive XCTest snapshots, but emit only the effective
38+
user-visible tree plus visible ancestors and scroll-hidden hints. A node inside a scroll
39+
container is user-visible only when it intersects both the app viewport and the nearest visible
40+
scroll container. Offscreen descendants should be visited to set `hiddenContentAbove` /
41+
`hiddenContentBelow`, not emitted as normal visible nodes. This strategy must not use an
42+
arbitrary node-count cutoff: fixed controls that are later in traversal order, such as bottom tab
43+
bars after long lists, are part of the visible UI contract.
44+
- **Raw diagnostic strategy**: use recursive XCTest snapshots for raw snapshots, diagnostics, and
45+
cases that need hierarchy. Raw output is allowed to be noisy and large; if the transport cannot
46+
carry the response, fail explicitly instead of silently truncating the tree at a hard node count.
47+
If XCTest reports a real AX serialization failure, preserve that error instead of pretending the
48+
UI is empty.
3749
- **Compact interactive strategy**: for `snapshot -i -c`, use a bounded flat XCTest query strategy
3850
that avoids recursive root snapshots and app/window property reads. It should prefer fast,
3951
one-screen actionability over hierarchy fidelity and should return a sparse root quickly when
40-
XCTest cannot enumerate controls.
52+
XCTest cannot enumerate controls. Its bound is time-based, not a hidden fixed node budget.
4153
- **Future simulator AX-service strategy**: treat Bluesky-class failures as evidence that XCTest is
4254
not a complete semantic snapshot backend. A robust semantic fix should add a host-side simulator
4355
accessibility backend, similar in role to `idb` accessibility commands or Argent's `ax-service`,
@@ -62,18 +74,20 @@ avoid those app/window reads.
6274

6375
## Consequences
6476

65-
Compact interactive snapshots are allowed to be less complete than full snapshots, but they must be
66-
bounded and honest. They should never block for the full daemon snapshot timeout because one app has
67-
a pathological AX tree.
77+
Compact interactive snapshots are allowed to be less complete than regular or raw snapshots, but
78+
they must be bounded and honest. They should never block for the full daemon snapshot timeout
79+
because one app has a pathological AX tree.
6880

69-
Full snapshots remain the right tool when hierarchy matters. They may still fail loudly on
70-
XCTest-broken trees; that failure is useful because retrying the same recursive capture is unlikely
71-
to reveal a different tree.
81+
Regular snapshots remain the right tool for agents and Maestro compatibility because they describe
82+
what a user can currently perceive and interact with. Raw snapshots remain the right tool when
83+
hierarchy matters. Both may still fail loudly on XCTest-broken trees; that failure is useful because
84+
retrying the same recursive capture is unlikely to reveal a different tree.
7285

7386
A future AX-service backend is the correct place to regain Bluesky-class semantic coverage. It
7487
should be added as a platform backend with its own lifecycle, protocol, normalization, timing
7588
metrics, and fallback rules, not as another special case inside the XCTest runner.
7689

7790
When adding new iOS snapshot behavior, maintainers should first decide which strategy owns it. If a
78-
change tries to make compact snapshots rich by reintroducing recursive snapshots, or tries to make
79-
full snapshots fast by hiding XCTest failures, it is probably crossing strategy boundaries.
91+
change tries to make compact snapshots rich by reintroducing recursive snapshots, tries to make
92+
regular snapshots fast by dropping visible controls behind a node budget, or tries to make raw
93+
snapshots safe by silently truncating, it is probably crossing strategy boundaries.

0 commit comments

Comments
 (0)