|
| 1 | +# iOS runner protocol optimization plan |
| 2 | + |
| 3 | +Issue #656 is now split into protocol infrastructure plus follow-up optimizations. The lifecycle |
| 4 | +protocol makes commands identifiable, but the performance wins come from changing when the daemon |
| 5 | +uses `uptime`, retries, invalidates sessions, and asks the runner for lifecycle status. |
| 6 | + |
| 7 | +## Work slices |
| 8 | + |
| 9 | +### 1. Status-before-invalidate recovery |
| 10 | + |
| 11 | +Status: in progress on `codex/ios-runner-status-recovery`. |
| 12 | + |
| 13 | +Goal: when a command has been sent and the HTTP response is lost, ask the runner for |
| 14 | +`status(statusCommandId)` before invalidating the session or surfacing an ambiguous transport |
| 15 | +failure. |
| 16 | + |
| 17 | +Acceptance criteria: |
| 18 | + |
| 19 | +- Post-send retryable transport failures issue one bounded `status` probe with the original |
| 20 | + `commandId` before session invalidation. |
| 21 | +- `completed` with retained small response JSON returns the recovered command result without |
| 22 | + invalidating or resending the command. |
| 23 | +- `failed` returns the runner failure code/message/hint instead of a generic transport failure. |
| 24 | +- `notAccepted`, status timeout, or status transport failure preserves the existing invalidation |
| 25 | + behavior. |
| 26 | +- Read-only commands whose response was not retained keep the existing retry behavior. |
| 27 | +- Status recovery probes are short-budget and do not consume the full command timeout. |
| 28 | + |
| 29 | +iOS simulator validation: |
| 30 | + |
| 31 | +- Unit: `pnpm exec vitest run src/platforms/ios/__tests__/runner-command-retry.test.ts`. |
| 32 | +- Unit bundle: `pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-command-retry.test.ts src/platforms/ios/__tests__/runner-provider.test.ts`. |
| 33 | +- Build: `pnpm build:xcuitest`. |
| 34 | +- Manual sim smoke after build: |
| 35 | + - `pnpm build` |
| 36 | + - `pnpm clean:daemon` |
| 37 | + - run a simple iOS simulator session against Settings with `open`, `snapshot -i`, one selector |
| 38 | + interaction, and `close`. |
| 39 | + - confirm there is no visible behavior change and diagnostics show no unexpected session |
| 40 | + invalidation. |
| 41 | + |
| 42 | +### 2. Adaptive `uptime` preflight policy |
| 43 | + |
| 44 | +Goal: stop paying eager `uptime` before low-risk mutating commands when the runner has recently |
| 45 | +completed a command, relying on status-before-invalidate recovery for the rare ambiguous transport |
| 46 | +failure. |
| 47 | + |
| 48 | +Acceptance criteria: |
| 49 | + |
| 50 | +- Existing first-command/startup readiness behavior is preserved. |
| 51 | +- Existing failed-preflight stale-session recovery is preserved. |
| 52 | +- Repeated hot interactions skip `uptime` when the runner has a recent successful response. |
| 53 | +- Commands that still need conservative readiness checks remain preflighted until measured. |
| 54 | +- A transport failure after skipping preflight runs status recovery before invalidation. |
| 55 | +- Diagnostics expose whether a command used, skipped, or recovered from a readiness preflight. |
| 56 | + |
| 57 | +iOS simulator validation: |
| 58 | + |
| 59 | +- Start a fresh simulator session and run one interaction: verify the first mutating command still |
| 60 | + preflights. |
| 61 | +- Run a hot loop of repeated selector interactions against the same visible control: verify only |
| 62 | + the first command pays `uptime`, subsequent commands emit `ios_runner_readiness_preflight_skipped`, |
| 63 | + and the UI still responds correctly. |
| 64 | +- Compare median command latency for a hot interaction loop before and after the change. A useful |
| 65 | + threshold is at least one fewer runner request per hot command and no increase in failure rate. |
| 66 | + |
| 67 | +### 3. Status-visible transport path |
| 68 | + |
| 69 | +Goal: make `accepted` and `started` states practically observable while a command is still running. |
| 70 | +The Swift journal already records these states, but the runner currently serializes connection |
| 71 | +handling, so a concurrent status request can be blocked behind the command it is querying. |
| 72 | + |
| 73 | +Acceptance criteria: |
| 74 | + |
| 75 | +- `status` can be answered while another runner command is waiting on main-thread XCTest work. |
| 76 | +- The status path remains journal-only and does not touch app activation, XCTest dispatch, or |
| 77 | + command retry logic. |
| 78 | +- Long-running command status can report `accepted` or `started` before the command reaches a |
| 79 | + terminal state. |
| 80 | +- Existing command execution remains serial where mutation ordering matters. |
| 81 | + |
| 82 | +iOS simulator validation: |
| 83 | + |
| 84 | +- Run a deliberately long runner command in one request. |
| 85 | +- While it is in flight, query `status(statusCommandId)` from another request. |
| 86 | +- Verify status returns before the long command completes and reports `accepted` or `started`. |
| 87 | +- Verify normal command ordering is unchanged for back-to-back mutating commands. |
| 88 | + |
| 89 | +### 4. Session invalidation reduction |
| 90 | + |
| 91 | +Goal: avoid tearing down otherwise healthy runner sessions when lifecycle status proves the command |
| 92 | +completed or failed cleanly. |
| 93 | + |
| 94 | +Acceptance criteria: |
| 95 | + |
| 96 | +- Completed/failed lifecycle status suppresses invalidation for ambiguous post-send transport |
| 97 | + errors when the runner remains reachable. |
| 98 | +- Unknown status states still invalidate to preserve current safety. |
| 99 | +- Diagnostics record why invalidation was skipped or retained. |
| 100 | +- No command is replayed after an observed mutating `accepted`, `started`, `completed`, or `failed` |
| 101 | + state. |
| 102 | + |
| 103 | +iOS simulator validation: |
| 104 | + |
| 105 | +- Inject or simulate a lost response after a command completes. |
| 106 | +- Verify status recovery prevents runner restart. |
| 107 | +- Run the next command in the same session and verify it succeeds without re-launching xcodebuild. |
| 108 | + |
| 109 | +### 5. Response retention tuning |
| 110 | + |
| 111 | +Goal: retain enough small command results for useful recovery without making the runner retain large |
| 112 | +snapshots or binary-like payloads. |
| 113 | + |
| 114 | +Acceptance criteria: |
| 115 | + |
| 116 | +- Small scalar responses can be recovered from `lifecycleResponseJson`. |
| 117 | +- Snapshot node trees and screenshots are not serialized or retained in the journal. |
| 118 | +- The journal memory cap remains bounded by entry count and response JSON size. |
| 119 | +- Retention policy is documented in tests or runner fixtures so future commands do not accidentally |
| 120 | + store large payloads. |
| 121 | + |
| 122 | +iOS simulator validation: |
| 123 | + |
| 124 | +- Run small-result commands and verify status can recover retained JSON. |
| 125 | +- Run snapshot-heavy commands and verify status reports terminal state without retained response JSON. |
| 126 | +- Confirm the runner remains responsive after repeated snapshots. |
| 127 | + |
| 128 | +## Suggested ordering |
| 129 | + |
| 130 | +1. Land status-before-invalidate recovery first. It is the safety net needed before reducing |
| 131 | + defensive preflights. |
| 132 | +2. Add diagnostics/metrics for preflight use, skipped preflights, status recovery, and invalidation |
| 133 | + reason. This can happen alongside slice 1 or 2. |
| 134 | +3. Reduce `uptime` for hot interaction loops with a conservative command allowlist. |
| 135 | +4. Make the status transport path observable during long-running commands. |
| 136 | +5. Broaden the preflight policy only after simulator measurements show stable behavior. |
| 137 | + |
| 138 | +## Side-by-side work |
| 139 | + |
| 140 | +- Status recovery and diagnostics can be developed together or separately. |
| 141 | +- Transport status visibility can proceed independently once the protocol is on `main`. |
| 142 | +- Adaptive `uptime` should wait for status recovery, because it relies on the same recovery path for |
| 143 | + ambiguous post-send failures. |
| 144 | +- Response retention tuning can proceed independently as long as it preserves the current caps. |
0 commit comments