Skip to content

Commit 57cd3f3

Browse files
authored
perf: recover iOS runner responses by status (#661)
* perf: recover ios runner responses by status * docs: plan ios runner protocol optimizations * fix: clarify ios runner recovery errors * fix: retry read-only in-flight runner commands * refactor: split ios runner status recovery handling
1 parent daaf71a commit 57cd3f3

6 files changed

Lines changed: 532 additions & 17 deletions

File tree

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# iOS runner protocol optimization plan
2+
3+
Issue #656 is now split into protocol infrastructure plus follow-up optimizations. The lifecycle
4+
protocol makes commands identifiable, but the performance wins come from changing when the daemon
5+
uses `uptime`, retries, invalidates sessions, and asks the runner for lifecycle status.
6+
7+
## Work slices
8+
9+
### 1. Status-before-invalidate recovery
10+
11+
Status: in progress on `codex/ios-runner-status-recovery`.
12+
13+
Goal: when a command has been sent and the HTTP response is lost, ask the runner for
14+
`status(statusCommandId)` before invalidating the session or surfacing an ambiguous transport
15+
failure.
16+
17+
Acceptance criteria:
18+
19+
- Post-send retryable transport failures issue one bounded `status` probe with the original
20+
`commandId` before session invalidation.
21+
- `completed` with retained small response JSON returns the recovered command result without
22+
invalidating or resending the command.
23+
- `failed` returns the runner failure code/message/hint instead of a generic transport failure.
24+
- `notAccepted`, status timeout, or status transport failure preserves the existing invalidation
25+
behavior.
26+
- Read-only commands whose response was not retained keep the existing retry behavior.
27+
- Status recovery probes are short-budget and do not consume the full command timeout.
28+
29+
iOS simulator validation:
30+
31+
- Unit: `pnpm exec vitest run src/platforms/ios/__tests__/runner-command-retry.test.ts`.
32+
- Unit bundle: `pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-command-retry.test.ts src/platforms/ios/__tests__/runner-provider.test.ts`.
33+
- Build: `pnpm build:xcuitest`.
34+
- Manual sim smoke after build:
35+
- `pnpm build`
36+
- `pnpm clean:daemon`
37+
- run a simple iOS simulator session against Settings with `open`, `snapshot -i`, one selector
38+
interaction, and `close`.
39+
- confirm there is no visible behavior change and diagnostics show no unexpected session
40+
invalidation.
41+
42+
### 2. Adaptive `uptime` preflight policy
43+
44+
Goal: stop paying eager `uptime` before low-risk mutating commands when the runner has recently
45+
completed a command, relying on status-before-invalidate recovery for the rare ambiguous transport
46+
failure.
47+
48+
Acceptance criteria:
49+
50+
- Existing first-command/startup readiness behavior is preserved.
51+
- Existing failed-preflight stale-session recovery is preserved.
52+
- Repeated hot interactions skip `uptime` when the runner has a recent successful response.
53+
- Commands that still need conservative readiness checks remain preflighted until measured.
54+
- A transport failure after skipping preflight runs status recovery before invalidation.
55+
- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight.
56+
57+
iOS simulator validation:
58+
59+
- Start a fresh simulator session and run one interaction: verify the first mutating command still
60+
preflights.
61+
- Run a hot loop of repeated selector interactions against the same visible control: verify only
62+
the first command pays `uptime`, subsequent commands emit `ios_runner_readiness_preflight_skipped`,
63+
and the UI still responds correctly.
64+
- Compare median command latency for a hot interaction loop before and after the change. A useful
65+
threshold is at least one fewer runner request per hot command and no increase in failure rate.
66+
67+
### 3. Status-visible transport path
68+
69+
Goal: make `accepted` and `started` states practically observable while a command is still running.
70+
The Swift journal already records these states, but the runner currently serializes connection
71+
handling, so a concurrent status request can be blocked behind the command it is querying.
72+
73+
Acceptance criteria:
74+
75+
- `status` can be answered while another runner command is waiting on main-thread XCTest work.
76+
- The status path remains journal-only and does not touch app activation, XCTest dispatch, or
77+
command retry logic.
78+
- Long-running command status can report `accepted` or `started` before the command reaches a
79+
terminal state.
80+
- Existing command execution remains serial where mutation ordering matters.
81+
82+
iOS simulator validation:
83+
84+
- Run a deliberately long runner command in one request.
85+
- While it is in flight, query `status(statusCommandId)` from another request.
86+
- Verify status returns before the long command completes and reports `accepted` or `started`.
87+
- Verify normal command ordering is unchanged for back-to-back mutating commands.
88+
89+
### 4. Session invalidation reduction
90+
91+
Goal: avoid tearing down otherwise healthy runner sessions when lifecycle status proves the command
92+
completed or failed cleanly.
93+
94+
Acceptance criteria:
95+
96+
- Completed/failed lifecycle status suppresses invalidation for ambiguous post-send transport
97+
errors when the runner remains reachable.
98+
- Unknown status states still invalidate to preserve current safety.
99+
- Diagnostics record why invalidation was skipped or retained.
100+
- No command is replayed after an observed mutating `accepted`, `started`, `completed`, or `failed`
101+
state.
102+
103+
iOS simulator validation:
104+
105+
- Inject or simulate a lost response after a command completes.
106+
- Verify status recovery prevents runner restart.
107+
- Run the next command in the same session and verify it succeeds without re-launching xcodebuild.
108+
109+
### 5. Response retention tuning
110+
111+
Goal: retain enough small command results for useful recovery without making the runner retain large
112+
snapshots or binary-like payloads.
113+
114+
Acceptance criteria:
115+
116+
- Small scalar responses can be recovered from `lifecycleResponseJson`.
117+
- Snapshot node trees and screenshots are not serialized or retained in the journal.
118+
- The journal memory cap remains bounded by entry count and response JSON size.
119+
- Retention policy is documented in tests or runner fixtures so future commands do not accidentally
120+
store large payloads.
121+
122+
iOS simulator validation:
123+
124+
- Run small-result commands and verify status can recover retained JSON.
125+
- Run snapshot-heavy commands and verify status reports terminal state without retained response JSON.
126+
- Confirm the runner remains responsive after repeated snapshots.
127+
128+
## Suggested ordering
129+
130+
1. Land status-before-invalidate recovery first. It is the safety net needed before reducing
131+
defensive preflights.
132+
2. Add diagnostics/metrics for preflight use, skipped preflights, status recovery, and invalidation
133+
reason. This can happen alongside slice 1 or 2.
134+
3. Reduce `uptime` for hot interaction loops with a conservative command allowlist.
135+
4. Make the status transport path observable during long-running commands.
136+
5. Broaden the preflight policy only after simulator measurements show stable behavior.
137+
138+
## Side-by-side work
139+
140+
- Status recovery and diagnostics can be developed together or separately.
141+
- Transport status visibility can proceed independently once the protocol is on `main`.
142+
- Adaptive `uptime` should wait for status recovery, because it relies on the same recovery path for
143+
ambiguous post-send failures.
144+
- Response retention tuning can proceed independently as long as it preserves the current caps.

src/compat/maestro/__tests__/runtime-assertions.test.ts

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -123,9 +123,7 @@ test('invokeMaestroAssertVisible does not dismiss React Native overlays during n
123123
});
124124

125125
assert.equal(response.ok, false);
126-
assert.deepEqual(calls, [
127-
['wait', ['Ready', '60000']],
128-
]);
126+
assert.deepEqual(calls, [['wait', ['Ready', '60000']]]);
129127
});
130128

131129
test('invokeMaestroAssertVisible uses snapshot resolution for short iOS assertions', async () => {
@@ -267,9 +265,7 @@ test('invokeMaestroAssertVisible fails fast when a RedBox has no dismiss target'
267265
if (!response.ok) {
268266
assert.match(response.error.message, /React Native overlay is covering app content/);
269267
}
270-
assert.deepEqual(calls, [
271-
['snapshot', []],
272-
]);
268+
assert.deepEqual(calls, [['snapshot', []]]);
273269
});
274270

275271
test('invokeMaestroAssertNotVisible passes after a slow hidden sample exhausts the timeout', async () => {

src/compat/maestro/runtime-assertions.ts

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -151,9 +151,7 @@ function handleFailedVisibleSample(
151151
args: MaestroVisibilityAssertionArgs,
152152
sample: Exclude<MaestroVisibilitySample, { visible: true }>,
153153
startedAt: number,
154-
):
155-
| { kind: 'continue' }
156-
| { kind: 'return'; response: DaemonResponse } {
154+
): { kind: 'continue' } | { kind: 'return'; response: DaemonResponse } {
157155
if (isReactNativeOverlayBlockingAssertion(sample.response)) {
158156
return { kind: 'return', response: sample.response };
159157
}

src/compat/maestro/runtime-interactions.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -481,7 +481,6 @@ async function clickMaestroSnapshotTarget(
481481
};
482482
}
483483

484-
485484
async function invokeMaestroFuzzyTapOn(
486485
params: MaestroTapOnParams,
487486
query: string,

src/platforms/ios/__tests__/runner-command-retry.test.ts

Lines changed: 165 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,9 @@ test('mutating commands do not restart or replay after command send failure', as
135135
const session = makeRunnerSession({ port: 8100, ready: true });
136136

137137
mockEnsureRunnerSession.mockResolvedValueOnce(session);
138-
mockExecuteRunnerCommandWithSession.mockRejectedValueOnce(
139-
new AppError('COMMAND_FAILED', 'fetch failed'),
140-
);
138+
mockExecuteRunnerCommandWithSession
139+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
140+
.mockResolvedValueOnce({ lifecycleState: 'notAccepted' });
141141

142142
await assert.rejects(() =>
143143
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
@@ -150,7 +150,165 @@ test('mutating commands do not restart or replay after command send failure', as
150150
'transport_error_after_command_send',
151151
]);
152152
assert.equal(mockStopRunnerSession.mock.calls.length, 0);
153-
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 1);
153+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
154+
});
155+
156+
test('mutating commands recover cached responses before invalidating after command send failure', async () => {
157+
const session = makeRunnerSession({ port: 8100, ready: true });
158+
159+
mockEnsureRunnerSession.mockResolvedValueOnce(session);
160+
mockExecuteRunnerCommandWithSession
161+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
162+
.mockResolvedValueOnce({
163+
lifecycleState: 'completed',
164+
lifecycleResponseJson: JSON.stringify({ ok: true, data: { message: 'tapped' } }),
165+
});
166+
167+
const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 });
168+
169+
assert.deepEqual(result, { message: 'tapped' });
170+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
171+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
172+
const sentCommand = mockExecuteRunnerCommandWithSession.mock.calls[0]?.[2];
173+
const statusCommand = mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2];
174+
assert.equal(statusCommand.command, 'status');
175+
assert.equal(statusCommand.statusCommandId, sentCommand.commandId);
176+
});
177+
178+
test('mutating commands keep invalidating when status cannot find the command', async () => {
179+
const session = makeRunnerSession({ port: 8100, ready: true });
180+
181+
mockEnsureRunnerSession.mockResolvedValueOnce(session);
182+
mockExecuteRunnerCommandWithSession
183+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
184+
.mockResolvedValueOnce({
185+
lifecycleState: 'notAccepted',
186+
});
187+
188+
await assert.rejects(() =>
189+
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
190+
);
191+
192+
assert.deepEqual(mockInvalidateRunnerSession.mock.calls, [
193+
[session, 'transport_error_after_command_send'],
194+
]);
195+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
196+
});
197+
198+
test('read-only commands retry when completed status has no retained response', async () => {
199+
const session = makeRunnerSession({ port: 8100, ready: true });
200+
201+
mockEnsureRunnerSession.mockResolvedValue(session);
202+
mockExecuteRunnerCommandWithSession
203+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
204+
.mockResolvedValueOnce({ lifecycleState: 'completed' })
205+
.mockResolvedValueOnce({ nodes: [], truncated: false });
206+
207+
const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });
208+
209+
assert.deepEqual(result, { nodes: [], truncated: false });
210+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
211+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
212+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
213+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
214+
});
215+
216+
test('read-only commands retry when status shows in-flight work', async () => {
217+
const session = makeRunnerSession({ port: 8100, ready: true });
218+
219+
mockEnsureRunnerSession.mockResolvedValue(session);
220+
mockExecuteRunnerCommandWithSession
221+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
222+
.mockResolvedValueOnce({ lifecycleState: 'started' })
223+
.mockResolvedValueOnce({ nodes: [], truncated: false });
224+
225+
const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });
226+
227+
assert.deepEqual(result, { nodes: [], truncated: false });
228+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
229+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
230+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
231+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
232+
});
233+
234+
test('mutating commands report recovery guidance when completed status has no retained response', async () => {
235+
const session = makeRunnerSession({ port: 8100, ready: true });
236+
237+
mockEnsureRunnerSession.mockResolvedValueOnce(session);
238+
mockExecuteRunnerCommandWithSession
239+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
240+
.mockResolvedValueOnce({ lifecycleState: 'completed' });
241+
242+
await assert.rejects(
243+
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
244+
(error: unknown) => {
245+
assert.ok(error instanceof AppError);
246+
assert.match(error.message, /"tap" completed after the transport response was lost/);
247+
assert.equal(error.details?.recovery, 'completed_without_retained_response');
248+
assert.match(String(error.details?.hint), /will not replay/);
249+
assert.match(String(error.details?.hint), /snapshot -i/);
250+
assert.equal(error.details?.transportError, 'fetch failed');
251+
return true;
252+
},
253+
);
254+
255+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
256+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
257+
});
258+
259+
test('mutating commands preserve runner failure details from status recovery', async () => {
260+
const session = makeRunnerSession({ port: 8100, ready: true });
261+
262+
mockEnsureRunnerSession.mockResolvedValueOnce(session);
263+
mockExecuteRunnerCommandWithSession
264+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
265+
.mockResolvedValueOnce({
266+
lifecycleState: 'failed',
267+
lifecycleErrorCode: 'AMBIGUOUS_MATCH',
268+
lifecycleErrorMessage: 'Found 2 matching buttons',
269+
lifecycleErrorHint: 'Use a more specific selector.',
270+
});
271+
272+
await assert.rejects(
273+
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
274+
(error: unknown) => {
275+
assert.ok(error instanceof AppError);
276+
assert.equal(error.code, 'AMBIGUOUS_MATCH');
277+
assert.equal(error.message, 'Found 2 matching buttons');
278+
assert.equal(error.details?.recovery, 'runner_reported_failure');
279+
assert.equal(error.details?.hint, 'Use a more specific selector.');
280+
assert.equal(error.details?.transportError, 'fetch failed');
281+
return true;
282+
},
283+
);
284+
285+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
286+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
287+
});
288+
289+
test('mutating commands report wait-and-inspect guidance when status shows in-flight work', async () => {
290+
const session = makeRunnerSession({ port: 8100, ready: true });
291+
292+
mockEnsureRunnerSession.mockResolvedValueOnce(session);
293+
mockExecuteRunnerCommandWithSession
294+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
295+
.mockResolvedValueOnce({ lifecycleState: 'started' });
296+
297+
await assert.rejects(
298+
() => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
299+
(error: unknown) => {
300+
assert.ok(error instanceof AppError);
301+
assert.match(error.message, /"tap" is still started/);
302+
assert.equal(error.details?.recovery, 'command_still_in_flight');
303+
assert.match(String(error.details?.hint), /may still finish/);
304+
assert.match(String(error.details?.hint), /snapshot -i/);
305+
assert.equal(error.details?.transportError, 'fetch failed');
306+
return true;
307+
},
308+
);
309+
310+
assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
311+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
154312
});
155313

156314
test('mutating commands invalidate the retry session without replaying again', async () => {
@@ -160,7 +318,8 @@ test('mutating commands invalidate the retry session without replaying again', a
160318
mockEnsureRunnerSession.mockResolvedValueOnce(staleSession).mockResolvedValueOnce(freshSession);
161319
mockExecuteRunnerCommandWithSession
162320
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection'))
163-
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'));
321+
.mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
322+
.mockResolvedValueOnce({ lifecycleState: 'notAccepted' });
164323

165324
await assert.rejects(() =>
166325
runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
@@ -171,7 +330,7 @@ test('mutating commands invalidate the retry session without replaying again', a
171330
[staleSession, 'runner_connect_failed_before_command_send'],
172331
[freshSession, 'transport_error_after_retry_command_send'],
173332
]);
174-
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
333+
assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
175334
});
176335

177336
function makeRunnerSession(overrides: Partial<RunnerSession> = {}): RunnerSession {

0 commit comments

Comments
 (0)