Skip to content

Commit b4c8591

Browse files
committed
test(ios): additional instrumentation and mitigation for e2e orchestration flakes
1 parent 56261f8 commit b4c8591

4 files changed

Lines changed: 334 additions & 23 deletions

File tree

.github/workflows/tests_e2e_ios.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ jobs:
313313
- name: Pre-fetch Javascript bundle
314314
if: contains(matrix.buildmode, 'debug')
315315
run: |
316-
nohup yarn tests:packager:jet-ci &
316+
nohup yarn tests:packager:jet-ci > metro.log 2>&1 &
317317
printf 'Waiting for packager to come online'
318318
until curl --output /dev/null --silent --head --fail http://localhost:8081/status; do
319319
printf '.'
@@ -388,6 +388,15 @@ jobs:
388388
name: simulator-${{ matrix.buildmode }}-${{ matrix.iteration }}_log
389389
path: simulator.log
390390

391+
- name: Upload Metro log
392+
# https://github.com/actions/upload-artifact/releases
393+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
394+
continue-on-error: true
395+
if: always() && contains(matrix.buildmode, 'debug')
396+
with:
397+
name: metro-${{ matrix.buildmode }}-${{ matrix.iteration }}_log
398+
path: metro.log
399+
391400
- name: Upload Screen Recording
392401
# https://github.com/actions/upload-artifact/releases
393402
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1

okf-bundle/ci-workflows/ios.md

Lines changed: 63 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Artifacts are uploaded on every run (`if: always()`), even when tests fail.
4949
| Artifact | Source | Use when |
5050
|----------|--------|----------|
5151
| `simulator-<buildmode>-<iteration>_log` | `xcrun simctl spawn booted log stream``simulator.log` | In-simulator system/app logs during Detox |
52+
| `metro-<buildmode>-<iteration>_log` | Metro stdout/stderr from `yarn tests:packager:jet-ci``metro.log` (debug only) | Metro hung, slow bundle, or unresponsive `/status` during app launch |
5253
| `simulator-<buildmode>-<iteration>_video` | `xcrun simctl io booted recordVideo``simulator.mp4` | Visual confirmation of UI state |
5354
| `screenrecording-<buildmode>-<iteration>` | `screencapture` of the Mac desktop | Includes Simulator.app window |
5455
| `screenrecording-setup-<buildmode>-<iteration>.mov` | Guidepup setup recording | Very early environment setup |
@@ -57,7 +58,7 @@ Artifacts are uploaded on every run (`if: always()`), even when tests fail.
5758
**When to use which log**
5859

5960
- **Boot / migration / “simulator won’t start”** — read the **Pre-Boot Simulator** step log in GitHub Actions first. Look for `[boot-status]` lines and `bootstatus -d` migration output. That captures first-boot migration even though `simulator.log` starts only after pre-boot succeeds.
60-
- **Detox / app / test failures** — download `simulator-*_log` and use [E2E test app orchestration](#e2e-test-app-orchestration-detox--jet) grep patterns (`[rnfb-lifecycle]`, `waitForActive`, SpringBoard foreground). Jet WS drops (1006/1001) appear in the **Detox step log** (`[jet-ws]`, `[rnfb-e2e]`).
61+
- **Detox / app / test failures** — download `simulator-*_log` and use [E2E test app orchestration](#e2e-test-app-orchestration-detox--jet) grep patterns (`[rnfb-lifecycle]`, `waitForActive`, SpringBoard foreground). Jet WS drops (1006/1001) appear in the **Detox step log** (`[jet-ws]`, `[rnfb-e2e]`). For debug Metro issues, also download `metro-*_log`.
6162
- **UI regressions**`simulator-*_video` or `screenrecording-*`.
6263

6364
**Downloading artifacts**
@@ -86,7 +87,7 @@ Device type is defined in `tests/.detoxrc.js` (`devices.simulator.device.type`).
8687

8788
### E2E test app orchestration (Detox + Jet)
8889

89-
After pre-boot succeeds, failures often move **inside** the test app process (`com.invertase.testing`, binary `testing`). Simulator boot and app install are fine; Detox `launchApp` stalls while the app stays alive. Three overlapping issues show up in CI logs.
90+
After pre-boot succeeds, failures often move **inside** the test app process (`com.invertase.testing`, binary `testing`). Simulator boot and app install are fine; Detox `launchApp` stalls while the app stays alive. Several overlapping issues show up in CI logs.
9091

9192
#### 1. Early `ready` race
9293

@@ -110,6 +111,8 @@ ws-server The app has dispatched "ready" action too early.
110111
| Change | Location |
111112
|--------|----------|
112113
| Wait for Jet (port 8090) before `launchApp` | `tests/e2e/firebase.test.js` |
114+
| Wait for Metro (`/status`) before `launchApp` | `tests/e2e/firebase.test.js` (debug Detox configs only) |
115+
| Bounded `launchApp` timeout (default 180s) | `tests/e2e/firebase.test.js` |
113116
| `detoxEnableSynchronization: 'NO'` at launch | `tests/e2e/firebase.test.js` |
114117
| `detoxURLBlacklistRegex: '.*'` | `tests/e2e/firebase.test.js` (existing) |
115118

@@ -119,14 +122,39 @@ ws-server The app has dispatched "ready" action too early.
119122

120123
**Instrumentation**`tests/ios/testing/AppDelegate.mm` logs `[rnfb-lifecycle]` at launch, on UIApplication/UIScene lifecycle notifications, and at **+30s / +60s** one-shot probes if the app never becomes active. Confirms whether the stall is Detox-side or the app never reaching `UIApplicationStateActive` / `foregroundActive`.
121124

122-
**Mitigations in this repo (summary)**
125+
**Distinguish from issue 4** — If `[rnfb-lifecycle]` probes show `UIApplication.state=active` / `foregroundActive` but Detox still hangs on `waitForActive`, the cause is likely Metro/JS bundle load failure (issue 4), not scene activation.
126+
127+
#### 4. Metro unresponsive at launch → `waitForActive` hang (active app)
128+
129+
**Symptom** — Same Detox-side pattern as issue 3 (`waitForActive` without `waitForActiveDone`, multi-minute hang in `device.launchApp()`), but **`[rnfb-lifecycle]` shows the app is already active**. Often seen on debug CI only; release on the same run passes (embedded bundle, no live Metro).
130+
131+
**Cause chain** (observed on run [27727525262](https://github.com/invertase/react-native-firebase/actions/runs/27727525262)):
132+
133+
1. Debug app requests `http://localhost:8081/status` from inside the simulator; TCP connects but Metro returns no bytes within ~10s (`NSURLErrorDomain Code=-1001`).
134+
2. RN logs `No script URL provided. Make sure the packager is running...` and shows RedBox; DetoxSync adds an RN-load idling resource that never clears.
135+
3. Detox WS login/`isReady` succeed (early-ready patch works), then `waitForActive` is received but **`waitForActiveDone` never arrives** even though UIKit is active.
136+
137+
Metro can look healthy on the **host** during pre-fetch minutes earlier while being unresponsive when the app actually launches.
138+
139+
**Instrumentation**`AppDelegate.mm` observes `RCTJavaScriptDidFailToLoadNotification` and logs:
140+
141+
- `event=RCTJavaScriptDidFailToLoad` with NSError domain/code/description
142+
- `event=packager-probe``[RCTBundleURLProvider isPackagerRunning:]`, `packagerServerHostPort`, resolved `bundleURL`
143+
- `event=packager-status-fetch` — direct `http://localhost:8081/status` fetch from inside the app with explicit timeout and HTTP status/body or error
144+
145+
**Mitigations in this repo**
123146

124147
| Change | Location |
125148
|--------|----------|
126-
| Buffer early `ready`, replay after app `login` | `.yarn/patches/detox-npm-20.51.0-*.patch` |
127-
| Jet wait + Detox launch args | `tests/e2e/firebase.test.js` |
128-
| Lifecycle logging for post-mortem | `tests/ios/testing/AppDelegate.mm` |
129-
| Pre-boot + `bootstatus` before install | `boot-simulator.sh` (orthogonal; fixes boot/migration only) |
149+
| Wait for Metro `/status` before `launchApp` | `tests/e2e/firebase.test.js` (debug Detox configs only) |
150+
| One in-process `launchApp` retry on Metro/bundle load timeout | `tests/e2e/firebase.test.js` (`RNFB_LAUNCH_APP_MAX_ATTEMPTS`, debug only) |
151+
| Full Jet e2e retry on launch failure (after in-process retries exhausted) | `tests/e2e/firebase.test.js` (debug Metro failures only) |
152+
| Bounded `launchApp` timeout (default 180s, env `RNFB_LAUNCH_APP_TIMEOUT_MS`) | `tests/e2e/firebase.test.js` |
153+
| Metro log artifact (`metro.log`) | `.github/workflows/tests_e2e_ios.yml` |
154+
| JS load failure + packager probe logging | `tests/ios/testing/AppDelegate.mm` |
155+
| Bundle URL pinned to `127.0.0.1` (bypasses RN `localhost` fallback) | `tests/ios/testing/AppDelegate.mm` |
156+
157+
**Known intermittent pattern (community)** — RN iOS debug builds commonly hit `No script URL provided` when Metro is down, slow, or reachable from the host but not from the simulator process. Reported causes include: Metro event loop blocked under load (TCP connect succeeds, HTTP hangs — matches our `-1001` on `/status`), macOS network proxy intercepting `localhost:8081`, port 8081 contention, and `RCTBundleURLProvider` returning a nil bundle URL despite `isPackagerRunning=YES` ([react-native#49173](https://github.com/facebook/react-native/issues/49173)). On **iOS 26+ simulators**, hostname resolution for `localhost` vs `127.0.0.1` can be unreliable; hardcoding `127.0.0.1` in bundle URL resolution has been reported to fix intermittent Metro disconnects ([react-native#56447](https://github.com/facebook/react-native/issues/56447)). Our test app now pins the debug bundle URL to `127.0.0.1:8081` and the e2e harness retries `launchApp` once on Metro/bundle-load timeout before failing the Jet attempt (**debug CI only** — release uses an embedded bundle and does not start Metro). These are typically **environment/timing** flakes rather than app logic bugs; release builds avoid them by using embedded bundles.
130158

131159
#### Diagnosing from `simulator.log`
132160

@@ -141,6 +169,11 @@ rg 'testing\[' simulator.log | rg -i 'waitForActive|waitForActiveDone|com\.wix\.
141169
# App lifecycle confirmation (AppDelegate instrumentation)
142170
rg '\[rnfb-lifecycle\]' simulator.log
143171

172+
# Metro / JS bundle load failure (issue 4)
173+
rg '\[rnfb-lifecycle\].*RCTJavaScriptDidFailToLoad|packager-probe|packager-status-fetch' simulator.log
174+
rg 'No script URL provided|com\.facebook\.react\.log' simulator.log
175+
rg 'testing\[' simulator.log | rg '8081/status|index\.bundle'
176+
144177
# SpringBoard launch intent vs app scene state
145178
rg 'com\.invertase\.testing' simulator.log | rg -i 'foreground-interactive|visibility.*Foreground|running-active'
146179
rg 'testing\[' simulator.log | rg -i 'Deactivation reason|activationState|UISceneActivationState'
@@ -157,14 +190,33 @@ rg 'testing\[' simulator.log | rg -i 'FIRApp|RNFB|RCTBridge|configure'
157190
| Pattern | Meaning |
158191
|---------|---------|
159192
| `ready action too early` in Detox step only | Early-ready race (patch should fix; check patch applied in `yarn install`) |
160-
| `waitForActive` without `waitForActiveDone` | Scene/active hang; check `[rnfb-lifecycle]` probes still `unattached` / not `active` |
161-
| `probe+30s` / `probe+60s` with `UIApplication.state=inactive` or scene `unattached` | App never became foreground-active; compare with SpringBoard `foreground-interactive` lines |
193+
| `waitForActive` without `waitForActiveDone` | Scene/active hang (issue 3) **or** Metro/JS load failure (issue 4) — check `[rnfb-lifecycle]` probes |
194+
| `probe+30s` / `probe+60s` with `UIApplication.state=inactive` or scene `unattached` | App never became foreground-active (issue 3); compare with SpringBoard `foreground-interactive` lines |
195+
| `probe+30s` / `probe+60s` with `active` / `foregroundActive` but no `waitForActiveDone` | Metro/JS bundle failure despite active UIKit (issue 4); check `RCTJavaScriptDidFailToLoad` and `packager-status-fetch` |
196+
| `packager-status-fetch ... error domain=NSURLErrorDomain code=-1001` | Metro TCP reachable but HTTP timed out from simulator |
197+
| `No script URL provided` in `com.facebook.react.log` | RN never got bundle URL; only `/status` attempts, no `index.bundle` |
162198
| `handshake successful` 30–60s after `simctl launch` | Main-thread startup delay; not a boot failure |
163199
| Only `com.apple.datamigrator` activity, no `testing[` | Pre-boot / migration issue — use Actions `[boot-status]` log, not Detox orchestration |
164200

165-
**Example healthy sequence** (abbreviated): `didFinishLaunching+after``UIApplicationDidBecomeActiveNotification` / scene `foregroundActive` → Detox `loginSuccess``isReady``waitForActiveDone`. A gap between SpringBoard foreground request and `[rnfb-lifecycle]` `active` is the smoking gun for issue 3.
201+
**Example healthy sequence** (abbreviated): `didFinishLaunching+after``UIApplicationDidBecomeActiveNotification` / scene `foregroundActive` → Detox `loginSuccess``isReady``waitForActiveDone`. A gap between SpringBoard foreground request and `[rnfb-lifecycle]` `active` is the smoking gun for issue 3. For issue 4, lifecycle probes show `active` but you will see `RCTJavaScriptDidFailToLoad` and/or `No script URL provided` before Detox stalls on `waitForActive`.
202+
203+
For Metro-side post-mortem, also download `metro-*_log` and check whether Metro logged bundle requests or stalled around the launch timestamp:
204+
205+
```bash
206+
rg -i 'BUNDLE|index\.bundle|8081|error|ECONN|transform' metro.log
207+
```
208+
209+
**Mitigations in this repo (summary)**
210+
211+
| Change | Location |
212+
|--------|----------|
213+
| Buffer early `ready`, replay after app `login` | `.yarn/patches/detox-npm-20.51.0-*.patch` |
214+
| Jet + Metro wait, bounded `launchApp` with retry, Detox launch args | `tests/e2e/firebase.test.js` |
215+
| Lifecycle + JS load / packager probe logging; bundle URL `127.0.0.1` | `tests/ios/testing/AppDelegate.mm` |
216+
| Pre-boot + `bootstatus` before install | `boot-simulator.sh` (orthogonal; fixes boot/migration only) |
217+
| Metro stdout/stderr artifact | `.github/workflows/tests_e2e_ios.yml` |
166218

167-
#### 4. Jet WebSocket disconnect (1006 / 1001)
219+
#### 5. Jet WebSocket disconnect (1006 / 1001)
168220

169221
**Symptom** (Detox Test Debug step, often debug build only):
170222

tests/e2e/firebase.test.js

Lines changed: 126 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,17 @@ const path = require('path');
2222
const { pullAndroidCoverage, pullIosCoverage } = require('../scripts/pull-native-coverage');
2323

2424
const JET_REMOTE_PORT = parseInt(process.env.JET_REMOTE_PORT || '8090', 10);
25+
const METRO_PORT = parseInt(process.env.JET_METRO_PORT || process.env.RCT_METRO_PORT || '8081', 10);
26+
const LAUNCH_APP_TIMEOUT_MS = parseInt(process.env.RNFB_LAUNCH_APP_TIMEOUT_MS || '180000', 10);
27+
const LAUNCH_APP_MAX_ATTEMPTS = parseInt(process.env.RNFB_LAUNCH_APP_MAX_ATTEMPTS || '2', 10);
2528
const JET_RETRYABLE_WS_RE = /\[jet-ws\] RETRYABLE_DISCONNECT code=(1006|1001)\b/;
29+
const RETRYABLE_LAUNCH_RE =
30+
/launchApp timed out|RCTJavaScriptDidFailToLoad|packager-probe|Metro not responding/i;
31+
32+
function usesLiveMetro() {
33+
const config = process.env.DETOX_CONFIGURATION || '';
34+
return /debug/i.test(config);
35+
}
2636

2737
function waitForTcpPort(port, host = '127.0.0.1', timeoutMs = 120000) {
2838
const start = Date.now();
@@ -53,6 +63,101 @@ function isRetryableJetDisconnect(output) {
5363
return JET_RETRYABLE_WS_RE.test(output);
5464
}
5565

66+
function isRetryableLaunchFailure(err) {
67+
const message = err?.message || '';
68+
if (!usesLiveMetro()) {
69+
return /launchApp timed out/i.test(message);
70+
}
71+
return RETRYABLE_LAUNCH_RE.test(message);
72+
}
73+
74+
function isRetryableE2eFailure(err) {
75+
const jetOutput = err?.jetOutput || '';
76+
return isRetryableJetDisconnect(jetOutput) || isRetryableLaunchFailure(err);
77+
}
78+
79+
async function waitForMetro(port = METRO_PORT, timeoutMs = 120000) {
80+
const start = Date.now();
81+
82+
while (Date.now() - start < timeoutMs) {
83+
try {
84+
const response = await fetch(`http://127.0.0.1:${port}/status`);
85+
const body = await response.text();
86+
if (body.includes('packager-status:running')) {
87+
console.log(`[rnfb-e2e] Metro OK on 127.0.0.1:${port}`);
88+
return;
89+
}
90+
console.warn(
91+
`[rnfb-e2e] Metro 127.0.0.1:${port}/status returned unexpected body: ${body.slice(0, 120)}`,
92+
);
93+
} catch (_) {
94+
// Metro not ready yet; keep polling.
95+
}
96+
await new Promise(resolve => setTimeout(resolve, 500));
97+
}
98+
99+
throw new Error(
100+
`Metro not responding with packager-status:running on 127.0.0.1:${port} after ${timeoutMs}ms`,
101+
);
102+
}
103+
104+
async function launchAppWithTimeout(launchArgs, timeoutMs = LAUNCH_APP_TIMEOUT_MS) {
105+
console.log(`[rnfb-e2e] launchApp starting (timeout=${timeoutMs}ms)`);
106+
let timer;
107+
108+
try {
109+
await Promise.race([
110+
device.launchApp({
111+
newInstance: true,
112+
delete: true,
113+
launchArgs,
114+
}),
115+
new Promise((_, reject) => {
116+
timer = setTimeout(() => {
117+
const err = new Error(
118+
`[rnfb-e2e] launchApp timed out after ${timeoutMs}ms — check simulator.log for ` +
119+
`[rnfb-lifecycle] packager-probe / RCTJavaScriptDidFailToLoad and Detox waitForActive`,
120+
);
121+
err.retryableLaunchFailure = true;
122+
reject(err);
123+
}, timeoutMs);
124+
}),
125+
]);
126+
console.log('[rnfb-e2e] launchApp complete');
127+
} finally {
128+
clearTimeout(timer);
129+
}
130+
}
131+
132+
async function launchAppWithRetry(launchArgs) {
133+
for (let launchAttempt = 1; launchAttempt <= LAUNCH_APP_MAX_ATTEMPTS; launchAttempt++) {
134+
try {
135+
if (launchAttempt > 1) {
136+
console.warn(
137+
`[rnfb-e2e] Retrying launchApp after Metro/bundle load failure (attempt ${launchAttempt}/${LAUNCH_APP_MAX_ATTEMPTS})`,
138+
);
139+
try {
140+
await device.terminateApp();
141+
} catch (_) {
142+
// No-op
143+
}
144+
if (usesLiveMetro()) {
145+
await waitForMetro(METRO_PORT);
146+
}
147+
await new Promise(resolve => setTimeout(resolve, 2000));
148+
}
149+
150+
await launchAppWithTimeout(launchArgs);
151+
return;
152+
} catch (err) {
153+
const retryable = launchAttempt < LAUNCH_APP_MAX_ATTEMPTS && isRetryableLaunchFailure(err);
154+
if (!retryable) {
155+
throw err;
156+
}
157+
}
158+
}
159+
}
160+
56161
function runJetE2eAttempt(attempt) {
57162
const platform = detox.device.getPlatform();
58163
const testsDir = path.resolve(__dirname, '..');
@@ -98,15 +203,19 @@ function runJetE2eAttempt(attempt) {
98203
try {
99204
console.log(`[rnfb-e2e] Jet attempt ${attempt}: waiting for port ${JET_REMOTE_PORT}`);
100205
await waitForTcpPort(JET_REMOTE_PORT);
206+
if (usesLiveMetro()) {
207+
console.log(`[rnfb-e2e] Jet attempt ${attempt}: waiting for Metro on port ${METRO_PORT}`);
208+
await waitForMetro(METRO_PORT);
209+
} else {
210+
console.log(
211+
`[rnfb-e2e] Jet attempt ${attempt}: skipping Metro wait (DETOX_CONFIGURATION=${process.env.DETOX_CONFIGURATION || 'unknown'})`,
212+
);
213+
}
101214
console.log(`[rnfb-e2e] Jet attempt ${attempt}: launching app`);
102-
await device.launchApp({
103-
newInstance: true,
104-
delete: true,
105-
launchArgs: {
106-
detoxURLBlacklistRegex: `.*`,
107-
// Avoid sync/idling blocking the main queue while Detox WS login is pending.
108-
detoxEnableSynchronization: 'NO',
109-
},
215+
await launchAppWithRetry({
216+
detoxURLBlacklistRegex: `.*`,
217+
// Avoid sync/idling blocking the main queue while Detox WS login is pending.
218+
detoxEnableSynchronization: 'NO',
110219
});
111220
} catch (err) {
112221
jetProcess.kill();
@@ -124,10 +233,16 @@ describe('Jet Tests', function () {
124233
const deviceId = detox.device.id;
125234
const testsDir = path.resolve(__dirname, '..');
126235

236+
let lastFailure;
237+
127238
for (let attempt = 1; attempt <= 2; attempt++) {
128239
try {
129240
if (attempt > 1) {
130-
console.warn('[rnfb-e2e] Retrying after transient Jet WS disconnect (1006/1001)');
241+
if (isRetryableJetDisconnect(lastFailure?.jetOutput || '')) {
242+
console.warn('[rnfb-e2e] Retrying after transient Jet WS disconnect (1006/1001)');
243+
} else if (isRetryableLaunchFailure(lastFailure)) {
244+
console.warn('[rnfb-e2e] Retrying after Metro/bundle load launch failure');
245+
}
131246
try {
132247
await device.terminateApp();
133248
} catch (_) {
@@ -138,8 +253,9 @@ describe('Jet Tests', function () {
138253
await runJetE2eAttempt(attempt);
139254
break;
140255
} catch (err) {
256+
lastFailure = err;
141257
const jetOutput = err.jetOutput || '';
142-
const retryable = attempt === 1 && isRetryableJetDisconnect(jetOutput);
258+
const retryable = attempt === 1 && isRetryableE2eFailure(err);
143259
console.warn(
144260
`[rnfb-e2e] Jet attempt ${attempt} failed (retryable=${retryable}, exit=${err.jetExitCode ?? 'n/a'})`,
145261
);

0 commit comments

Comments
 (0)