feat(ios): source frameSequence frames from WDA MJPEG stream#2720
feat(ios): source frameSequence frames from WDA MJPEG stream#2720quanru wants to merge 2 commits into
Conversation
On iOS, WDA's takeScreenshot is too slow to sample a short time window densely, so frameSequence missed transient UI (toasts) that Android catches via its scrcpy stream. iOS already runs WDA's native MJPEG server; this consumes it as a fast frame source. - core(device): add optional AbstractInterface.captureFrameSequence — a fast multi-frame capture for devices with a continuous frame stream. - core(agent): captureUIContextSequence prefers captureFrameSequence for the earlier frames (the representative last frame still uses a normal full-quality screenshot), and falls back to sequential screenshotBase64 capture when the stream is unavailable or yields nothing. - ios: MjpegFrameSource consumes the WDA MJPEG stream (lazy-started, auto-reconnecting, stopped on destroy) and keeps the latest decoded JPEG; IOSDevice.captureFrameSequence samples it at the requested cadence. Tests: extractJpegFrames parsing (incl. split-chunk / boundary cases) and a streamed MjpegFrameSource; agent fast-path selection and fallback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3bf66f1289
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| abortSignal?: AbortSignal; | ||
| }): Promise<Array<{ base64: string; capturedAt: number }>> { | ||
| const { count, intervalMs, abortSignal } = opt; | ||
| const source = await this.ensureMjpegFrameSource(); |
There was a problem hiding this comment.
Honor cancellation before starting the MJPEG stream
When frameSequence is invoked with an already-aborted signal, this starts and waits for the WDA MJPEG source before the first throwIfAborted() in the loop. On iOS that can turn a cancelled aiAssert into a 3s ensureStarted() wait when the stream is unavailable, or start a long-lived stream before throwing when it is available. Check the signal before ensureMjpegFrameSource() (and ideally while waiting for the first frame) so the fast path preserves the existing prompt cancellation behavior.
Useful? React with 👍 / 👎.
| if (this.latest) { | ||
| return; |
There was a problem hiding this comment.
Reject stale MJPEG frames after stream loss
After one frame has ever been decoded, ensureStarted() returns immediately even if the MJPEG connection has since ended and the reconnect loop is only retrying. If the port forward or WDA MJPEG server drops after a previous capture, later frameSequence calls can include an old screen as an “earliest” frame instead of throwing and falling back to fresh screenshots. Clear or age-check latest on disconnect/retry so unavailable streams degrade instead of feeding stale UI to the model.
Useful? React with 👍 / 👎.
Mirror Android scrcpy (disabled by default): the MJPEG frame source is only used for frameSequence when explicitly enabled, so default behavior is unchanged (frameSequence falls back to sequential screenshotBase64). - core(device): add IOSDeviceOpt.wdaMjpegFrameSource.enabled (default off). - ios: expose the captureFrameSequence capability only when enabled, so the agent transparently falls back when it is off. - ios(agent-tools): plumb wdaMjpegFrameSource through the WDA init args. Multi-device concurrency: the stream URL is built from the existing per-device wdaMjpegPort option, so each device streams from its own port (set a distinct wdaMjpegPort per device, like wdaPort). Added tests for the opt-in gating and per-device stream URL.
|
Follow-up (7d8101d): the WDA MJPEG frame source is now opt-in, disabled by default, mirroring Android scrcpy.
Added tests for the opt-in gating and the per-device stream URL. |
Background
frameSequenceworks well on Android because itsscreenshotBase64()pulls the latest frame from a continuously-running scrcpy video stream (near-instant). On iOS,screenshotBase64()goes through WDA'stakeScreenshot()— a full per-call capture that is too slow to sample a short time window densely, so the effective frame spacing becomescaptureLatency + intervalMsand short-lived UI (toasts) slips through the gaps.Key point: iOS already runs WDA's native MJPEG server (
mjpegStreamUrl, withmjpegServerFramerate/mjpegServerScreenshotQualityconfigured) — the iOS analog of scrcpy. It just wasn't wired into frame capture. This PR consumes it as a fast frame source.Targets
feat/frame-sequence-transient-ui(the frameSequence feature branch).What this PR does
AbstractInterface.captureFrameSequence({count, intervalMs, abortSignal})— a fast multi-frame capture implemented by devices that maintain a continuous frame stream. Returns data-URL frames in temporal order; may return fewer than requested.captureUIContextSequencepreferscaptureFrameSequencefor the earlier frames and captures the representative (last) frame with a normal full-quality screenshot. If the fast source throws or yields nothing, it falls back to the existing sequentialscreenshotBase64()loop. Abort is honored throughout.MjpegFrameSourceconsumes the WDA MJPEG stream (lazy-started, auto-reconnecting with backoff, stopped ondestroy()) and keeps the latest decoded JPEG.IOSDevice.captureFrameSequencesamples it at the requested cadence — pulling is near-instant, so frames land atintervalMsinstead ofintervalMs + slowScreenshot.screenshotBase64()(normal locate/asserts) is unchanged — it still uses full-qualitytakeScreenshot(). The MJPEG stream is only used for the multi-frame sampling.Why this design
Tests
packages/ios/tests/unit-test/mjpeg-frame-source.test.ts—extractJpegFrames(single/multiple frames, boundary/header bytes, incomplete frame, split-across-chunks, trailing-FF), plus a streamedMjpegFrameSourcedecoding the latest frame from a mocked fetch stream. 7/7 pass.packages/core/tests/unit-test/agent-frame-sequence.test.ts— fast-path is used when the device exposescaptureFrameSequence(earlier frames from stream + one representative screenshot), and falls back to sequential capture when it fails.Validation
npx tsc --noEmitfor@midscene/coreand@midscene/ios— cleannpx nx build coreandnpx nx build ios— passdevice.test.ts(54) + new MJPEG tests — passNote: end-to-end behavior was verified on Android/web in the base PR; the iOS path here is covered by unit tests (no physical iOS device in CI).