Skip to content

ci: automate iOS runner request-count gate for the Apple runner unwind (Phase 3 step c prep)#966

Merged
thymikee merged 1 commit into
mainfrom
phase3-runner-request-count-gate
Jun 30, 2026
Merged

ci: automate iOS runner request-count gate for the Apple runner unwind (Phase 3 step c prep)#966
thymikee merged 1 commit into
mainfrom
phase3-runner-request-count-gate

Conversation

@thymikee

Copy link
Copy Markdown
Member

Why

Phase 3 step (c) (unwinding macOS out of platforms/ios into an apple/ family — see plans/apple-platform-consolidation.md / plans/phase3-platform-plugin-progress.md / ADR-0009) relocates the shared Apple XCTest runner and must prove the iOS runner request count is unchanged before/after. Today that check is manual: a human runs commands with --debug, reads the per-request ndjson, and hand-counts the runner phases. This PR makes it an automated, committed assertion.

There was no existing standalone harness for this — the request count only existed in-process (the --cost graft's runnerRoundTrips). This PR closes that gap; it does not duplicate anything.

What it counts (and where the daemon emits it)

The daemon already defines the canonical phase set — this PR makes it the single source of truth shared by the in-process cost graft and the new external counter:

  • ios_runner_command_send — emitted in src/platforms/ios/runner-session.ts:629 (sendRunnerCommandAfterPreflight, via withDiagnosticTimer): the command round-trip itself.
  • ios_runner_readiness_preflight — emitted in src/platforms/ios/runner-session.ts:663 (runRunnerReadinessPreflight): the pre-command uptime probe (a real network round-trip).

The ..._skipped / ..._recovered markers do not hit the runner and are excluded (matching the comment + RUNNER_ROUND_TRIP_PHASES previously at src/daemon/request-router.ts, now moved to the counter module and imported back). Each --debug request appends one JSON object per line to <state-dir>/daemon.log (src/utils/diagnostics.ts emitDiagnostic).

What this PR adds

1. Pure countersrc/daemon/runner-request-count.ts

  • RUNNER_ROUND_TRIP_PHASES (single source of truth; request-router.ts now imports it instead of defining a local copy — byte-identical behavior).
  • parseDiagnosticNdjson (tolerant: skips plain daemon-log lines, blank/malformed lines, strips the [agent-device][diag] stderr prefix), countRunnerRequests, and pure baseline parse/build/compare helpers. No I/O, no hardware.

2. Unit testssrc/daemon/__tests__/runner-request-count.test.ts (13 tests)

  • Feed synthetic/representative ndjson fixtures, assert counts; cover the same semantics as request-router-cost.test.ts (1 preflight + 2 command_send + skipped + unrelated → 3), baseline validation, and drift diffs. Run in the normal unit suite (vitest --project unit).

3. Assertion harnessscripts/runner-request-count/run.ts + expected-counts.json

  • Reuses the existing smoke-ios scenario (test/integration/replays/ios/simulator/01-settings.ad) — no new app flow.
  • Isolated --state-dir; prepare ios-runner without --debug (so prepare diagnostics don't pollute the count), truncates daemon.log, runs the scenario with --debug --retries 0 (single deterministic attempt), counts runner round-trips from daemon.log, and asserts against the committed baseline.
  • --update / --save regenerates the baseline. npm/pnpm script: validate:runner-count.

4. CI wiring.github/workflows/ios.yml

  • New Assert iOS runner request count step in the smoke-ios job, reusing the booted simulator (pnpm clean:daemon first to release the smoke daemon's UDID lease, then the harness runs its own isolated daemon).

Expected-count baseline (storage + regeneration)

Committed at scripts/runner-request-count/expected-counts.json, keyed by scenario, with per-phase counts + total. Regenerate with node --experimental-strip-types scripts/runner-request-count/run.ts --udid <UDID> --update (or pnpm validate:runner-count --udid <UDID> --update) on a host with a booted simulator + prepared runner.

It ships unarmed (established: false) because real counts can only be captured on a simulator (not available locally). While unarmed, the harness records observed counts (printed to the step log + written to test/artifacts/runner-request-count/expected-counts.observed.json, which the existing Upload iOS artifacts step uploads) and does not fail. To arm the gate: after this lands, read the observed counts from the first smoke-ios run and commit them as the baseline (set established: true) — or run --update in CI. Once armed, any count drift fails the step loudly.

Flakiness mitigation

  • Deterministic single attempt (--retries 0) — retries would double the count.
  • Isolated daemon (--state-dir) + clean:daemon to avoid runner-lease contention with the smoke daemon.
  • Infra hiccups are inconclusive, not failures: a failed scenario run, or a passing run that captured zero round-trips (a capture/wiring problem, not a real removal), exits 0 with a loud warning. Only a real count drift vs. an armed baseline fails. --strict flips inconclusive to a hard failure for local debugging.

Local vs. CI

Validated locally (no simulator): full unit suite (2884 tests, incl. the 13 new), request-router-cost.test.ts (const move sanity), typecheck, oxlint --deny-warnings, oxfmt --check, rslib build, fallow audit (clean on changed files), workflow YAML parse, and the harness end-to-end on the no-udid / --help / --strict / baseline-parse paths.

Validated only in CI (needs a booted sim + test runner): the actual simulator scenario run, daemon.log ndjson capture, and the real counts. That leg runs in the smoke-ios job.

Scope

Observability/CI tooling only — no runner/leaf behavior changes; the --debug ndjson contract is consumed, not modified. The one src/ runtime touch is moving RUNNER_ROUND_TRIP_PHASES into the counter module and importing it back into request-router.ts (behaviorless).

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-30 15:26 UTC

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.4 MB 1.4 MB 0 B
JS gzip 450.1 kB 450.1 kB 0 B
npm tarball 549.0 kB 549.0 kB +20 B
npm unpacked 1.9 MB 1.9 MB +100 B

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 27.3 ms 27.4 ms +0.1 ms
CLI --help 47.5 ms 47.3 ms -0.1 ms

Top changed chunks: no changes in the largest emitted chunks.

@thymikee

Copy link
Copy Markdown
Member Author

Reviewed the runner request-count gate against the Phase 3 Apple runner unwind plan and the existing diagnostics/cost path. The phase list is now shared between the in-process cost graft and the external NDJSON counter, skipped/recovered markers remain excluded, the parser is tolerant of daemon log noise, and the workflow writes observed counts under test/artifacts so the existing artifact upload captures the unarmed baseline evidence. The gate is intentionally unarmed in this PR, so the human follow-up is to commit the first observed baseline and set established=true before relying on it as a hard drift gate. No code blockers; checks are green. Marking ready-for-human.

@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jun 30, 2026
@thymikee thymikee force-pushed the phase3-runner-request-count-gate branch from 24c2f8e to e8de55e Compare June 30, 2026 13:53
…d (Phase 3 step c prep)

Replaces the manual "run with --debug, hand-count the runner phases" check with
an automated, committed assertion so the Phase 3 step (c) runner relocation (and
future runner refactors) can prove byte-identical runner request behavior.

- src/daemon/runner-request-count.ts: pure, unit-testable counter. Parses the
  daemon --debug diagnostics ndjson and counts the iOS-runner round-trip phases,
  plus baseline parse/compare logic. Owns RUNNER_ROUND_TRIP_PHASES as the single
  source of truth, now imported by request-router.ts (was a local const) so the
  in-process cost graft and the external counter never drift.
- src/daemon/__tests__/runner-request-count.test.ts: 13 unit tests over synthetic
  ndjson fixtures (tolerant parse, counting, baseline parse/compare). Run in the
  normal unit suite; no hardware.
- scripts/runner-request-count/: assertion harness (run.ts) + committed baseline
  (expected-counts.json). Drives the existing smoke-ios replay scenario with
  --debug in an isolated --state-dir, counts runner round-trips from daemon.log,
  and asserts against the baseline. --update regenerates the baseline. Infra
  hiccups are inconclusive (don't fail); only a real count drift fails.
- .github/workflows/ios.yml: new "Assert iOS runner request count" step in the
  smoke-ios job, reusing the booted simulator.
- package.json: `validate:runner-count` script. .fallowrc.json: harness entry.

The baseline ships unarmed (established=false); the harness records observed
counts (printed + uploaded as a test/artifacts artifact) without failing, so the
maintainer arms it once from a real CI run.
@thymikee thymikee force-pushed the phase3-runner-request-count-gate branch from e8de55e to 9cab517 Compare June 30, 2026 15:07
@thymikee thymikee merged commit edc8dd0 into main Jun 30, 2026
22 checks passed
@thymikee thymikee deleted the phase3-runner-request-count-gate branch June 30, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant