perf: recover iOS runner responses by status (#661)

thymikee · web-flow · commit 57cd3f3a07c9 · 2026-06-01T21:26:15.000-05:00
* perf: recover ios runner responses by status

* docs: plan ios runner protocol optimizations

* fix: clarify ios runner recovery errors

* fix: retry read-only in-flight runner commands

* refactor: split ios runner status recovery handling
diff --git a/docs/ios-runner-protocol-optimizations.md b/docs/ios-runner-protocol-optimizations.md
@@ -0,0 +1,144 @@
+# iOS runner protocol optimization plan
+
+Issue #656 is now split into protocol infrastructure plus follow-up optimizations. The lifecycle
+protocol makes commands identifiable, but the performance wins come from changing when the daemon
+uses `uptime`, retries, invalidates sessions, and asks the runner for lifecycle status.
+
+## Work slices
+
+### 1. Status-before-invalidate recovery
+
+Status: in progress on `codex/ios-runner-status-recovery`.
+
+Goal: when a command has been sent and the HTTP response is lost, ask the runner for
+`status(statusCommandId)` before invalidating the session or surfacing an ambiguous transport
+failure.
+
+Acceptance criteria:
+
+- Post-send retryable transport failures issue one bounded `status` probe with the original
+  `commandId` before session invalidation.
+- `completed` with retained small response JSON returns the recovered command result without
+  invalidating or resending the command.
+- `failed` returns the runner failure code/message/hint instead of a generic transport failure.
+- `notAccepted`, status timeout, or status transport failure preserves the existing invalidation
+  behavior.
+- Read-only commands whose response was not retained keep the existing retry behavior.
+- Status recovery probes are short-budget and do not consume the full command timeout.
+
+iOS simulator validation:
+
+- Unit: `pnpm exec vitest run src/platforms/ios/__tests__/runner-command-retry.test.ts`.
+- Unit bundle: `pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-command-retry.test.ts src/platforms/ios/__tests__/runner-provider.test.ts`.
+- Build: `pnpm build:xcuitest`.
+- Manual sim smoke after build:
+  - `pnpm build`
+  - `pnpm clean:daemon`
+  - run a simple iOS simulator session against Settings with `open`, `snapshot -i`, one selector
+    interaction, and `close`.
+  - confirm there is no visible behavior change and diagnostics show no unexpected session
+    invalidation.
+
+### 2. Adaptive `uptime` preflight policy
+
+Goal: stop paying eager `uptime` before low-risk mutating commands when the runner has recently
+completed a command, relying on status-before-invalidate recovery for the rare ambiguous transport
+failure.
+
+Acceptance criteria:
+
+- Existing first-command/startup readiness behavior is preserved.
+- Existing failed-preflight stale-session recovery is preserved.
+- Repeated hot interactions skip `uptime` when the runner has a recent successful response.
+- Commands that still need conservative readiness checks remain preflighted until measured.
+- A transport failure after skipping preflight runs status recovery before invalidation.
+- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight.
+
+iOS simulator validation:
+
+- Start a fresh simulator session and run one interaction: verify the first mutating command still
+  preflights.
+- Run a hot loop of repeated selector interactions against the same visible control: verify only
+  the first command pays `uptime`, subsequent commands emit `ios_runner_readiness_preflight_skipped`,
+  and the UI still responds correctly.
+- Compare median command latency for a hot interaction loop before and after the change. A useful
+  threshold is at least one fewer runner request per hot command and no increase in failure rate.
+
+### 3. Status-visible transport path
+
+Goal: make `accepted` and `started` states practically observable while a command is still running.
+The Swift journal already records these states, but the runner currently serializes connection
+handling, so a concurrent status request can be blocked behind the command it is querying.
+
+Acceptance criteria:
+
+- `status` can be answered while another runner command is waiting on main-thread XCTest work.
+- The status path remains journal-only and does not touch app activation, XCTest dispatch, or
+  command retry logic.
+- Long-running command status can report `accepted` or `started` before the command reaches a
+  terminal state.
+- Existing command execution remains serial where mutation ordering matters.
+
+iOS simulator validation:
+
+- Run a deliberately long runner command in one request.
+- While it is in flight, query `status(statusCommandId)` from another request.
+- Verify status returns before the long command completes and reports `accepted` or `started`.
+- Verify normal command ordering is unchanged for back-to-back mutating commands.
+
+### 4. Session invalidation reduction
+
+Goal: avoid tearing down otherwise healthy runner sessions when lifecycle status proves the command
+completed or failed cleanly.
+
+Acceptance criteria:
+
+- Completed/failed lifecycle status suppresses invalidation for ambiguous post-send transport
+  errors when the runner remains reachable.
+- Unknown status states still invalidate to preserve current safety.
+- Diagnostics record why invalidation was skipped or retained.
+- No command is replayed after an observed mutating `accepted`, `started`, `completed`, or `failed`
+  state.
+
+iOS simulator validation:
+
+- Inject or simulate a lost response after a command completes.
+- Verify status recovery prevents runner restart.
+- Run the next command in the same session and verify it succeeds without re-launching xcodebuild.
+
+### 5. Response retention tuning
+
+Goal: retain enough small command results for useful recovery without making the runner retain large
+snapshots or binary-like payloads.
+
+Acceptance criteria:
+
+- Small scalar responses can be recovered from `lifecycleResponseJson`.
+- Snapshot node trees and screenshots are not serialized or retained in the journal.
+- The journal memory cap remains bounded by entry count and response JSON size.
+- Retention policy is documented in tests or runner fixtures so future commands do not accidentally
+  store large payloads.
+
+iOS simulator validation:
+
+- Run small-result commands and verify status can recover retained JSON.
+- Run snapshot-heavy commands and verify status reports terminal state without retained response JSON.
+- Confirm the runner remains responsive after repeated snapshots.
+
+## Suggested ordering
+
+1. Land status-before-invalidate recovery first. It is the safety net needed before reducing
+   defensive preflights.
+2. Add diagnostics/metrics for preflight use, skipped preflights, status recovery, and invalidation
+   reason. This can happen alongside slice 1 or 2.
+3. Reduce `uptime` for hot interaction loops with a conservative command allowlist.
+4. Make the status transport path observable during long-running commands.
+5. Broaden the preflight policy only after simulator measurements show stable behavior.
+
+## Side-by-side work
+
+- Status recovery and diagnostics can be developed together or separately.
+- Transport status visibility can proceed independently once the protocol is on `main`.
+- Adaptive `uptime` should wait for status recovery, because it relies on the same recovery path for
+  ambiguous post-send failures.
+- Response retention tuning can proceed independently as long as it preserves the current caps.
diff --git a/src/compat/maestro/__tests__/runtime-assertions.test.ts b/src/compat/maestro/__tests__/runtime-assertions.test.ts
@@ -123,9 +123,7 @@ test('invokeMaestroAssertVisible does not dismiss React Native overlays during n
   });
 
   assert.equal(response.ok, false);
-  assert.deepEqual(calls, [
-    ['wait', ['Ready', '60000']],
-  ]);
+  assert.deepEqual(calls, [['wait', ['Ready', '60000']]]);
 });
 
 test('invokeMaestroAssertVisible uses snapshot resolution for short iOS assertions', async () => {
@@ -267,9 +265,7 @@ test('invokeMaestroAssertVisible fails fast when a RedBox has no dismiss target'
   if (!response.ok) {
     assert.match(response.error.message, /React Native overlay is covering app content/);
   }
-  assert.deepEqual(calls, [
-    ['snapshot', []],
-  ]);
+  assert.deepEqual(calls, [['snapshot', []]]);
 });
 
 test('invokeMaestroAssertNotVisible passes after a slow hidden sample exhausts the timeout', async () => {
diff --git a/src/compat/maestro/runtime-assertions.ts b/src/compat/maestro/runtime-assertions.ts
@@ -151,9 +151,7 @@ function handleFailedVisibleSample(
   args: MaestroVisibilityAssertionArgs,
   sample: Exclude<MaestroVisibilitySample, { visible: true }>,
   startedAt: number,
-):
-  | { kind: 'continue' }
-  | { kind: 'return'; response: DaemonResponse } {
+): { kind: 'continue' } | { kind: 'return'; response: DaemonResponse } {
   if (isReactNativeOverlayBlockingAssertion(sample.response)) {
     return { kind: 'return', response: sample.response };
   }
diff --git a/src/compat/maestro/runtime-interactions.ts b/src/compat/maestro/runtime-interactions.ts
@@ -481,7 +481,6 @@ async function clickMaestroSnapshotTarget(
   };
 }
 
-
 async function invokeMaestroFuzzyTapOn(
   params: MaestroTapOnParams,
   query: string,
diff --git a/src/platforms/ios/__tests__/runner-command-retry.test.ts b/src/platforms/ios/__tests__/runner-command-retry.test.ts
@@ -135,9 +135,9 @@ test('mutating commands do not restart or replay after command send failure', as
   const session = makeRunnerSession({ port: 8100, ready: true });
 
   mockEnsureRunnerSession.mockResolvedValueOnce(session);
-  mockExecuteRunnerCommandWithSession.mockRejectedValueOnce(
-    new AppError('COMMAND_FAILED', 'fetch failed'),
-  );
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'notAccepted' });
 
   await assert.rejects(() =>
     runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
@@ -150,7 +150,165 @@ test('mutating commands do not restart or replay after command send failure', as
     'transport_error_after_command_send',
   ]);
   assert.equal(mockStopRunnerSession.mock.calls.length, 0);
-  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 1);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+});
+
+test('mutating commands recover cached responses before invalidating after command send failure', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValueOnce(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({
+      lifecycleState: 'completed',
+      lifecycleResponseJson: JSON.stringify({ ok: true, data: { message: 'tapped' } }),
+    });
+
+  const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 });
+
+  assert.deepEqual(result, { message: 'tapped' });
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+  const sentCommand = mockExecuteRunnerCommandWithSession.mock.calls[0]?.[2];
+  const statusCommand = mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2];
+  assert.equal(statusCommand.command, 'status');
+  assert.equal(statusCommand.statusCommandId, sentCommand.commandId);
+});
+
+test('mutating commands keep invalidating when status cannot find the command', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValueOnce(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({
+      lifecycleState: 'notAccepted',
+    });
+
+  await assert.rejects(() =>
+    runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
+  );
+
+  assert.deepEqual(mockInvalidateRunnerSession.mock.calls, [
+    [session, 'transport_error_after_command_send'],
+  ]);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+});
+
+test('read-only commands retry when completed status has no retained response', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValue(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'completed' })
+    .mockResolvedValueOnce({ nodes: [], truncated: false });
+
+  const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });
+
+  assert.deepEqual(result, { nodes: [], truncated: false });
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
+});
+
+test('read-only commands retry when status shows in-flight work', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValue(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'started' })
+    .mockResolvedValueOnce({ nodes: [], truncated: false });
+
+  const result = await runIosRunnerCommand(IOS_SIMULATOR, { command: 'snapshot' });
+
+  assert.deepEqual(result, { nodes: [], truncated: false });
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[1]?.[2].command, 'status');
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls[2]?.[2].command, 'snapshot');
+});
+
+test('mutating commands report recovery guidance when completed status has no retained response', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValueOnce(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'completed' });
+
+  await assert.rejects(
+    () => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
+    (error: unknown) => {
+      assert.ok(error instanceof AppError);
+      assert.match(error.message, /"tap" completed after the transport response was lost/);
+      assert.equal(error.details?.recovery, 'completed_without_retained_response');
+      assert.match(String(error.details?.hint), /will not replay/);
+      assert.match(String(error.details?.hint), /snapshot -i/);
+      assert.equal(error.details?.transportError, 'fetch failed');
+      return true;
+    },
+  );
+
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+});
+
+test('mutating commands preserve runner failure details from status recovery', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValueOnce(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({
+      lifecycleState: 'failed',
+      lifecycleErrorCode: 'AMBIGUOUS_MATCH',
+      lifecycleErrorMessage: 'Found 2 matching buttons',
+      lifecycleErrorHint: 'Use a more specific selector.',
+    });
+
+  await assert.rejects(
+    () => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
+    (error: unknown) => {
+      assert.ok(error instanceof AppError);
+      assert.equal(error.code, 'AMBIGUOUS_MATCH');
+      assert.equal(error.message, 'Found 2 matching buttons');
+      assert.equal(error.details?.recovery, 'runner_reported_failure');
+      assert.equal(error.details?.hint, 'Use a more specific selector.');
+      assert.equal(error.details?.transportError, 'fetch failed');
+      return true;
+    },
+  );
+
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+});
+
+test('mutating commands report wait-and-inspect guidance when status shows in-flight work', async () => {
+  const session = makeRunnerSession({ port: 8100, ready: true });
+
+  mockEnsureRunnerSession.mockResolvedValueOnce(session);
+  mockExecuteRunnerCommandWithSession
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'started' });
+
+  await assert.rejects(
+    () => runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
+    (error: unknown) => {
+      assert.ok(error instanceof AppError);
+      assert.match(error.message, /"tap" is still started/);
+      assert.equal(error.details?.recovery, 'command_still_in_flight');
+      assert.match(String(error.details?.hint), /may still finish/);
+      assert.match(String(error.details?.hint), /snapshot -i/);
+      assert.equal(error.details?.transportError, 'fetch failed');
+      return true;
+    },
+  );
+
+  assert.equal(mockInvalidateRunnerSession.mock.calls.length, 0);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
 });
 
 test('mutating commands invalidate the retry session without replaying again', async () => {
@@ -160,7 +318,8 @@ test('mutating commands invalidate the retry session without replaying again', a
   mockEnsureRunnerSession.mockResolvedValueOnce(staleSession).mockResolvedValueOnce(freshSession);
   mockExecuteRunnerCommandWithSession
     .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'Runner did not accept connection'))
-    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'));
+    .mockRejectedValueOnce(new AppError('COMMAND_FAILED', 'fetch failed'))
+    .mockResolvedValueOnce({ lifecycleState: 'notAccepted' });
 
   await assert.rejects(() =>
     runIosRunnerCommand(IOS_SIMULATOR, { command: 'tap', x: 120, y: 240 }),
@@ -171,7 +330,7 @@ test('mutating commands invalidate the retry session without replaying again', a
     [staleSession, 'runner_connect_failed_before_command_send'],
     [freshSession, 'transport_error_after_retry_command_send'],
   ]);
-  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 2);
+  assert.equal(mockExecuteRunnerCommandWithSession.mock.calls.length, 3);
 });
 
 function makeRunnerSession(overrides: Partial<RunnerSession> = {}): RunnerSession {
diff --git a/src/platforms/ios/runner-client.ts b/src/platforms/ios/runner-client.ts

Original file line number	Diff line number	Diff line change
`@@ -481,7 +481,6 @@ async function clickMaestroSnapshotTarget(`
`481`	`481`	`};`
`482`	`482`	`}`
`483`	`483`
`484`		`-`
`485`	`484`	`async function invokeMaestroFuzzyTapOn(`
`486`	`485`	`params: MaestroTapOnParams,`
`487`	`486`	`query: string,`