ci: automate iOS runner request-count gate for the Apple runner unwind (Phase 3 step c prep) by thymikee · Pull Request #966 · callstack/agent-device

thymikee · 2026-06-30T13:38:53Z

Why

Phase 3 step (c) (unwinding macOS out of platforms/ios into an apple/ family — see plans/apple-platform-consolidation.md / plans/phase3-platform-plugin-progress.md / ADR-0009) relocates the shared Apple XCTest runner and must prove the iOS runner request count is unchanged before/after. Today that check is manual: a human runs commands with --debug, reads the per-request ndjson, and hand-counts the runner phases. This PR makes it an automated, committed assertion.

There was no existing standalone harness for this — the request count only existed in-process (the --cost graft's runnerRoundTrips). This PR closes that gap; it does not duplicate anything.

What it counts (and where the daemon emits it)

The daemon already defines the canonical phase set — this PR makes it the single source of truth shared by the in-process cost graft and the new external counter:

ios_runner_command_send — emitted in src/platforms/ios/runner-session.ts:629 (sendRunnerCommandAfterPreflight, via withDiagnosticTimer): the command round-trip itself.
ios_runner_readiness_preflight — emitted in src/platforms/ios/runner-session.ts:663 (runRunnerReadinessPreflight): the pre-command uptime probe (a real network round-trip).

The ..._skipped / ..._recovered markers do not hit the runner and are excluded (matching the comment + RUNNER_ROUND_TRIP_PHASES previously at src/daemon/request-router.ts, now moved to the counter module and imported back). Each --debug request appends one JSON object per line to <state-dir>/daemon.log (src/utils/diagnostics.ts emitDiagnostic).

What this PR adds

1. Pure counter — src/daemon/runner-request-count.ts

RUNNER_ROUND_TRIP_PHASES (single source of truth; request-router.ts now imports it instead of defining a local copy — byte-identical behavior).
parseDiagnosticNdjson (tolerant: skips plain daemon-log lines, blank/malformed lines, strips the [agent-device][diag] stderr prefix), countRunnerRequests, and pure baseline parse/build/compare helpers. No I/O, no hardware.

2. Unit tests — src/daemon/__tests__/runner-request-count.test.ts (13 tests)

Feed synthetic/representative ndjson fixtures, assert counts; cover the same semantics as request-router-cost.test.ts (1 preflight + 2 command_send + skipped + unrelated → 3), baseline validation, and drift diffs. Run in the normal unit suite (vitest --project unit).

3. Assertion harness — scripts/runner-request-count/run.ts + expected-counts.json

Reuses the existing smoke-ios scenario (test/integration/replays/ios/simulator/01-settings.ad) — no new app flow.
Isolated --state-dir; prepare ios-runner without --debug (so prepare diagnostics don't pollute the count), truncates daemon.log, runs the scenario with --debug --retries 0 (single deterministic attempt), counts runner round-trips from daemon.log, and asserts against the committed baseline.
--update / --save regenerates the baseline. npm/pnpm script: validate:runner-count.

4. CI wiring — .github/workflows/ios.yml

New Assert iOS runner request count step in the smoke-ios job, reusing the booted simulator (pnpm clean:daemon first to release the smoke daemon's UDID lease, then the harness runs its own isolated daemon).

Expected-count baseline (storage + regeneration)

Committed at scripts/runner-request-count/expected-counts.json, keyed by scenario, with per-phase counts + total. Regenerate with node --experimental-strip-types scripts/runner-request-count/run.ts --udid <UDID> --update (or pnpm validate:runner-count --udid <UDID> --update) on a host with a booted simulator + prepared runner.

It ships unarmed (established: false) because real counts can only be captured on a simulator (not available locally). While unarmed, the harness records observed counts (printed to the step log + written to test/artifacts/runner-request-count/expected-counts.observed.json, which the existing Upload iOS artifacts step uploads) and does not fail. To arm the gate: after this lands, read the observed counts from the first smoke-ios run and commit them as the baseline (set established: true) — or run --update in CI. Once armed, any count drift fails the step loudly.

Flakiness mitigation

Deterministic single attempt (--retries 0) — retries would double the count.
Isolated daemon (--state-dir) + clean:daemon to avoid runner-lease contention with the smoke daemon.
Infra hiccups are inconclusive, not failures: a failed scenario run, or a passing run that captured zero round-trips (a capture/wiring problem, not a real removal), exits 0 with a loud warning. Only a real count drift vs. an armed baseline fails. --strict flips inconclusive to a hard failure for local debugging.

Local vs. CI

Validated locally (no simulator): full unit suite (2884 tests, incl. the 13 new), request-router-cost.test.ts (const move sanity), typecheck, oxlint --deny-warnings, oxfmt --check, rslib build, fallow audit (clean on changed files), workflow YAML parse, and the harness end-to-end on the no-udid / --help / --strict / baseline-parse paths.

Validated only in CI (needs a booted sim + test runner): the actual simulator scenario run, daemon.log ndjson capture, and the real counts. That leg runs in the smoke-ios job.

Scope

Observability/CI tooling only — no runner/leaf behavior changes; the --debug ndjson contract is consumed, not modified. The one src/ runtime touch is moving RUNNER_ROUND_TRIP_PHASES into the counter module and importing it back into request-router.ts (behaviorless).

github-actions · 2026-06-30T13:39:24Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-30 15:26 UTC

github-actions · 2026-06-30T13:40:19Z

Size Report

Metric	Base	Current	Diff
JS raw	1.4 MB	1.4 MB	0 B
JS gzip	450.1 kB	450.1 kB	0 B
npm tarball	549.0 kB	549.0 kB	+20 B
npm unpacked	1.9 MB	1.9 MB	+100 B

Startup median (7 runs, lower is better):

Scenario	Base	Current	Diff
CLI --version	27.3 ms	27.4 ms	+0.1 ms
CLI --help	47.5 ms	47.3 ms	-0.1 ms

Top changed chunks: no changes in the largest emitted chunks.

thymikee · 2026-06-30T13:53:28Z

Reviewed the runner request-count gate against the Phase 3 Apple runner unwind plan and the existing diagnostics/cost path. The phase list is now shared between the in-process cost graft and the external NDJSON counter, skipped/recovered markers remain excluded, the parser is tolerant of daemon log noise, and the workflow writes observed counts under test/artifacts so the existing artifact upload captures the unarmed baseline evidence. The gate is intentionally unarmed in this PR, so the human follow-up is to commit the first observed baseline and set established=true before relying on it as a hard drift gate. No code blockers; checks are green. Marking ready-for-human.

…d (Phase 3 step c prep) Replaces the manual "run with --debug, hand-count the runner phases" check with an automated, committed assertion so the Phase 3 step (c) runner relocation (and future runner refactors) can prove byte-identical runner request behavior. - src/daemon/runner-request-count.ts: pure, unit-testable counter. Parses the daemon --debug diagnostics ndjson and counts the iOS-runner round-trip phases, plus baseline parse/compare logic. Owns RUNNER_ROUND_TRIP_PHASES as the single source of truth, now imported by request-router.ts (was a local const) so the in-process cost graft and the external counter never drift. - src/daemon/__tests__/runner-request-count.test.ts: 13 unit tests over synthetic ndjson fixtures (tolerant parse, counting, baseline parse/compare). Run in the normal unit suite; no hardware. - scripts/runner-request-count/: assertion harness (run.ts) + committed baseline (expected-counts.json). Drives the existing smoke-ios replay scenario with --debug in an isolated --state-dir, counts runner round-trips from daemon.log, and asserts against the baseline. --update regenerates the baseline. Infra hiccups are inconclusive (don't fail); only a real count drift fails. - .github/workflows/ios.yml: new "Assert iOS runner request count" step in the smoke-ios job, reusing the booted simulator. - package.json: `validate:runner-count` script. .fallowrc.json: harness entry. The baseline ships unarmed (established=false); the harness records observed counts (printed + uploaded as a test/artifacts artifact) without failing, so the maintainer arms it once from a real CI run.

thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jun 30, 2026

thymikee force-pushed the phase3-runner-request-count-gate branch from 24c2f8e to e8de55e Compare June 30, 2026 13:53

thymikee force-pushed the phase3-runner-request-count-gate branch from e8de55e to 9cab517 Compare June 30, 2026 15:07

thymikee merged commit edc8dd0 into main Jun 30, 2026
22 checks passed

thymikee deleted the phase3-runner-request-count-gate branch June 30, 2026 15:26

thymikee mentioned this pull request Jun 30, 2026

refactor: consolidate Apple platform internals #968

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: automate iOS runner request-count gate for the Apple runner unwind (Phase 3 step c prep)#966

ci: automate iOS runner request-count gate for the Apple runner unwind (Phase 3 step c prep)#966
thymikee merged 1 commit into
mainfrom
phase3-runner-request-count-gate

thymikee commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

thymikee commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thymikee commented Jun 30, 2026

Why

What it counts (and where the daemon emits it)

What this PR adds

Expected-count baseline (storage + regeneration)

Flakiness mitigation

Local vs. CI

Scope

Uh oh!

github-actions Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size Report

Uh oh!

thymikee commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 30, 2026 •

edited

Loading

github-actions Bot commented Jun 30, 2026 •

edited

Loading