Skip to content

Commit 4172954

Browse files
committed
test(ios): additional instrumentation and mitigation for e2e orchestration flakes
1 parent 56261f8 commit 4172954

4 files changed

Lines changed: 317 additions & 23 deletions

File tree

.github/workflows/tests_e2e_ios.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,7 @@ jobs:
313313
- name: Pre-fetch Javascript bundle
314314
if: contains(matrix.buildmode, 'debug')
315315
run: |
316-
nohup yarn tests:packager:jet-ci &
316+
nohup yarn tests:packager:jet-ci > metro.log 2>&1 &
317317
printf 'Waiting for packager to come online'
318318
until curl --output /dev/null --silent --head --fail http://localhost:8081/status; do
319319
printf '.'
@@ -388,6 +388,15 @@ jobs:
388388
name: simulator-${{ matrix.buildmode }}-${{ matrix.iteration }}_log
389389
path: simulator.log
390390

391+
- name: Upload Metro log
392+
# https://github.com/actions/upload-artifact/releases
393+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
394+
continue-on-error: true
395+
if: always() && contains(matrix.buildmode, 'debug')
396+
with:
397+
name: metro-${{ matrix.buildmode }}-${{ matrix.iteration }}_log
398+
path: metro.log
399+
391400
- name: Upload Screen Recording
392401
# https://github.com/actions/upload-artifact/releases
393402
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1

okf-bundle/ci-workflows/ios.md

Lines changed: 63 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ Artifacts are uploaded on every run (`if: always()`), even when tests fail.
4949
| Artifact | Source | Use when |
5050
|----------|--------|----------|
5151
| `simulator-<buildmode>-<iteration>_log` | `xcrun simctl spawn booted log stream``simulator.log` | In-simulator system/app logs during Detox |
52+
| `metro-<buildmode>-<iteration>_log` | Metro stdout/stderr from `yarn tests:packager:jet-ci``metro.log` (debug only) | Metro hung, slow bundle, or unresponsive `/status` during app launch |
5253
| `simulator-<buildmode>-<iteration>_video` | `xcrun simctl io booted recordVideo``simulator.mp4` | Visual confirmation of UI state |
5354
| `screenrecording-<buildmode>-<iteration>` | `screencapture` of the Mac desktop | Includes Simulator.app window |
5455
| `screenrecording-setup-<buildmode>-<iteration>.mov` | Guidepup setup recording | Very early environment setup |
@@ -57,7 +58,7 @@ Artifacts are uploaded on every run (`if: always()`), even when tests fail.
5758
**When to use which log**
5859

5960
- **Boot / migration / “simulator won’t start”** — read the **Pre-Boot Simulator** step log in GitHub Actions first. Look for `[boot-status]` lines and `bootstatus -d` migration output. That captures first-boot migration even though `simulator.log` starts only after pre-boot succeeds.
60-
- **Detox / app / test failures** — download `simulator-*_log` and use [E2E test app orchestration](#e2e-test-app-orchestration-detox--jet) grep patterns (`[rnfb-lifecycle]`, `waitForActive`, SpringBoard foreground). Jet WS drops (1006/1001) appear in the **Detox step log** (`[jet-ws]`, `[rnfb-e2e]`).
61+
- **Detox / app / test failures** — download `simulator-*_log` and use [E2E test app orchestration](#e2e-test-app-orchestration-detox--jet) grep patterns (`[rnfb-lifecycle]`, `waitForActive`, SpringBoard foreground). Jet WS drops (1006/1001) appear in the **Detox step log** (`[jet-ws]`, `[rnfb-e2e]`). For debug Metro issues, also download `metro-*_log`.
6162
- **UI regressions**`simulator-*_video` or `screenrecording-*`.
6263

6364
**Downloading artifacts**
@@ -86,7 +87,7 @@ Device type is defined in `tests/.detoxrc.js` (`devices.simulator.device.type`).
8687

8788
### E2E test app orchestration (Detox + Jet)
8889

89-
After pre-boot succeeds, failures often move **inside** the test app process (`com.invertase.testing`, binary `testing`). Simulator boot and app install are fine; Detox `launchApp` stalls while the app stays alive. Three overlapping issues show up in CI logs.
90+
After pre-boot succeeds, failures often move **inside** the test app process (`com.invertase.testing`, binary `testing`). Simulator boot and app install are fine; Detox `launchApp` stalls while the app stays alive. Several overlapping issues show up in CI logs.
9091

9192
#### 1. Early `ready` race
9293

@@ -110,6 +111,8 @@ ws-server The app has dispatched "ready" action too early.
110111
| Change | Location |
111112
|--------|----------|
112113
| Wait for Jet (port 8090) before `launchApp` | `tests/e2e/firebase.test.js` |
114+
| Wait for Metro (`/status`) before `launchApp` | `tests/e2e/firebase.test.js` |
115+
| Bounded `launchApp` timeout (default 180s) | `tests/e2e/firebase.test.js` |
113116
| `detoxEnableSynchronization: 'NO'` at launch | `tests/e2e/firebase.test.js` |
114117
| `detoxURLBlacklistRegex: '.*'` | `tests/e2e/firebase.test.js` (existing) |
115118

@@ -119,14 +122,39 @@ ws-server The app has dispatched "ready" action too early.
119122

120123
**Instrumentation**`tests/ios/testing/AppDelegate.mm` logs `[rnfb-lifecycle]` at launch, on UIApplication/UIScene lifecycle notifications, and at **+30s / +60s** one-shot probes if the app never becomes active. Confirms whether the stall is Detox-side or the app never reaching `UIApplicationStateActive` / `foregroundActive`.
121124

122-
**Mitigations in this repo (summary)**
125+
**Distinguish from issue 4** — If `[rnfb-lifecycle]` probes show `UIApplication.state=active` / `foregroundActive` but Detox still hangs on `waitForActive`, the cause is likely Metro/JS bundle load failure (issue 4), not scene activation.
126+
127+
#### 4. Metro unresponsive at launch → `waitForActive` hang (active app)
128+
129+
**Symptom** — Same Detox-side pattern as issue 3 (`waitForActive` without `waitForActiveDone`, multi-minute hang in `device.launchApp()`), but **`[rnfb-lifecycle]` shows the app is already active**. Often seen on debug CI only; release on the same run passes (embedded bundle, no live Metro).
130+
131+
**Cause chain** (observed on run [27727525262](https://github.com/invertase/react-native-firebase/actions/runs/27727525262)):
132+
133+
1. Debug app requests `http://localhost:8081/status` from inside the simulator; TCP connects but Metro returns no bytes within ~10s (`NSURLErrorDomain Code=-1001`).
134+
2. RN logs `No script URL provided. Make sure the packager is running...` and shows RedBox; DetoxSync adds an RN-load idling resource that never clears.
135+
3. Detox WS login/`isReady` succeed (early-ready patch works), then `waitForActive` is received but **`waitForActiveDone` never arrives** even though UIKit is active.
136+
137+
Metro can look healthy on the **host** during pre-fetch minutes earlier while being unresponsive when the app actually launches.
138+
139+
**Instrumentation**`AppDelegate.mm` observes `RCTJavaScriptDidFailToLoadNotification` and logs:
140+
141+
- `event=RCTJavaScriptDidFailToLoad` with NSError domain/code/description
142+
- `event=packager-probe``[RCTBundleURLProvider isPackagerRunning:]`, `packagerServerHostPort`, resolved `bundleURL`
143+
- `event=packager-status-fetch` — direct `http://localhost:8081/status` fetch from inside the app with explicit timeout and HTTP status/body or error
144+
145+
**Mitigations in this repo**
123146

124147
| Change | Location |
125148
|--------|----------|
126-
| Buffer early `ready`, replay after app `login` | `.yarn/patches/detox-npm-20.51.0-*.patch` |
127-
| Jet wait + Detox launch args | `tests/e2e/firebase.test.js` |
128-
| Lifecycle logging for post-mortem | `tests/ios/testing/AppDelegate.mm` |
129-
| Pre-boot + `bootstatus` before install | `boot-simulator.sh` (orthogonal; fixes boot/migration only) |
149+
| Wait for Metro `/status` before `launchApp` | `tests/e2e/firebase.test.js` |
150+
| One in-process `launchApp` retry on Metro/bundle load timeout | `tests/e2e/firebase.test.js` (`RNFB_LAUNCH_APP_MAX_ATTEMPTS`) |
151+
| Full Jet e2e retry on launch failure (after in-process retries exhausted) | `tests/e2e/firebase.test.js` |
152+
| Bounded `launchApp` timeout (default 180s, env `RNFB_LAUNCH_APP_TIMEOUT_MS`) | `tests/e2e/firebase.test.js` |
153+
| Metro log artifact (`metro.log`) | `.github/workflows/tests_e2e_ios.yml` |
154+
| JS load failure + packager probe logging | `tests/ios/testing/AppDelegate.mm` |
155+
| Bundle URL pinned to `127.0.0.1` (bypasses RN `localhost` fallback) | `tests/ios/testing/AppDelegate.mm` |
156+
157+
**Known intermittent pattern (community)** — RN iOS debug builds commonly hit `No script URL provided` when Metro is down, slow, or reachable from the host but not from the simulator process. Reported causes include: Metro event loop blocked under load (TCP connect succeeds, HTTP hangs — matches our `-1001` on `/status`), macOS network proxy intercepting `localhost:8081`, port 8081 contention, and `RCTBundleURLProvider` returning a nil bundle URL despite `isPackagerRunning=YES` ([react-native#49173](https://github.com/facebook/react-native/issues/49173)). On **iOS 26+ simulators**, hostname resolution for `localhost` vs `127.0.0.1` can be unreliable; hardcoding `127.0.0.1` in bundle URL resolution has been reported to fix intermittent Metro disconnects ([react-native#56447](https://github.com/facebook/react-native/issues/56447)). Our test app now pins the debug bundle URL to `127.0.0.1:8081` and the e2e harness retries `launchApp` once on Metro/bundle-load timeout before failing the Jet attempt. These are typically **environment/timing** flakes rather than app logic bugs; release builds avoid them by using embedded bundles.
130158

131159
#### Diagnosing from `simulator.log`
132160

@@ -141,6 +169,11 @@ rg 'testing\[' simulator.log | rg -i 'waitForActive|waitForActiveDone|com\.wix\.
141169
# App lifecycle confirmation (AppDelegate instrumentation)
142170
rg '\[rnfb-lifecycle\]' simulator.log
143171

172+
# Metro / JS bundle load failure (issue 4)
173+
rg '\[rnfb-lifecycle\].*RCTJavaScriptDidFailToLoad|packager-probe|packager-status-fetch' simulator.log
174+
rg 'No script URL provided|com\.facebook\.react\.log' simulator.log
175+
rg 'testing\[' simulator.log | rg '8081/status|index\.bundle'
176+
144177
# SpringBoard launch intent vs app scene state
145178
rg 'com\.invertase\.testing' simulator.log | rg -i 'foreground-interactive|visibility.*Foreground|running-active'
146179
rg 'testing\[' simulator.log | rg -i 'Deactivation reason|activationState|UISceneActivationState'
@@ -157,14 +190,33 @@ rg 'testing\[' simulator.log | rg -i 'FIRApp|RNFB|RCTBridge|configure'
157190
| Pattern | Meaning |
158191
|---------|---------|
159192
| `ready action too early` in Detox step only | Early-ready race (patch should fix; check patch applied in `yarn install`) |
160-
| `waitForActive` without `waitForActiveDone` | Scene/active hang; check `[rnfb-lifecycle]` probes still `unattached` / not `active` |
161-
| `probe+30s` / `probe+60s` with `UIApplication.state=inactive` or scene `unattached` | App never became foreground-active; compare with SpringBoard `foreground-interactive` lines |
193+
| `waitForActive` without `waitForActiveDone` | Scene/active hang (issue 3) **or** Metro/JS load failure (issue 4) — check `[rnfb-lifecycle]` probes |
194+
| `probe+30s` / `probe+60s` with `UIApplication.state=inactive` or scene `unattached` | App never became foreground-active (issue 3); compare with SpringBoard `foreground-interactive` lines |
195+
| `probe+30s` / `probe+60s` with `active` / `foregroundActive` but no `waitForActiveDone` | Metro/JS bundle failure despite active UIKit (issue 4); check `RCTJavaScriptDidFailToLoad` and `packager-status-fetch` |
196+
| `packager-status-fetch ... error domain=NSURLErrorDomain code=-1001` | Metro TCP reachable but HTTP timed out from simulator |
197+
| `No script URL provided` in `com.facebook.react.log` | RN never got bundle URL; only `/status` attempts, no `index.bundle` |
162198
| `handshake successful` 30–60s after `simctl launch` | Main-thread startup delay; not a boot failure |
163199
| Only `com.apple.datamigrator` activity, no `testing[` | Pre-boot / migration issue — use Actions `[boot-status]` log, not Detox orchestration |
164200

165-
**Example healthy sequence** (abbreviated): `didFinishLaunching+after``UIApplicationDidBecomeActiveNotification` / scene `foregroundActive` → Detox `loginSuccess``isReady``waitForActiveDone`. A gap between SpringBoard foreground request and `[rnfb-lifecycle]` `active` is the smoking gun for issue 3.
201+
**Example healthy sequence** (abbreviated): `didFinishLaunching+after``UIApplicationDidBecomeActiveNotification` / scene `foregroundActive` → Detox `loginSuccess``isReady``waitForActiveDone`. A gap between SpringBoard foreground request and `[rnfb-lifecycle]` `active` is the smoking gun for issue 3. For issue 4, lifecycle probes show `active` but you will see `RCTJavaScriptDidFailToLoad` and/or `No script URL provided` before Detox stalls on `waitForActive`.
202+
203+
For Metro-side post-mortem, also download `metro-*_log` and check whether Metro logged bundle requests or stalled around the launch timestamp:
204+
205+
```bash
206+
rg -i 'BUNDLE|index\.bundle|8081|error|ECONN|transform' metro.log
207+
```
208+
209+
**Mitigations in this repo (summary)**
210+
211+
| Change | Location |
212+
|--------|----------|
213+
| Buffer early `ready`, replay after app `login` | `.yarn/patches/detox-npm-20.51.0-*.patch` |
214+
| Jet + Metro wait, bounded `launchApp` with retry, Detox launch args | `tests/e2e/firebase.test.js` |
215+
| Lifecycle + JS load / packager probe logging; bundle URL `127.0.0.1` | `tests/ios/testing/AppDelegate.mm` |
216+
| Pre-boot + `bootstatus` before install | `boot-simulator.sh` (orthogonal; fixes boot/migration only) |
217+
| Metro stdout/stderr artifact | `.github/workflows/tests_e2e_ios.yml` |
166218

167-
#### 4. Jet WebSocket disconnect (1006 / 1001)
219+
#### 5. Jet WebSocket disconnect (1006 / 1001)
168220

169221
**Symptom** (Detox Test Debug step, often debug build only):
170222

tests/e2e/firebase.test.js

Lines changed: 109 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,12 @@ const path = require('path');
2222
const { pullAndroidCoverage, pullIosCoverage } = require('../scripts/pull-native-coverage');
2323

2424
const JET_REMOTE_PORT = parseInt(process.env.JET_REMOTE_PORT || '8090', 10);
25+
const METRO_PORT = parseInt(process.env.JET_METRO_PORT || process.env.RCT_METRO_PORT || '8081', 10);
26+
const LAUNCH_APP_TIMEOUT_MS = parseInt(process.env.RNFB_LAUNCH_APP_TIMEOUT_MS || '180000', 10);
27+
const LAUNCH_APP_MAX_ATTEMPTS = parseInt(process.env.RNFB_LAUNCH_APP_MAX_ATTEMPTS || '2', 10);
2528
const JET_RETRYABLE_WS_RE = /\[jet-ws\] RETRYABLE_DISCONNECT code=(1006|1001)\b/;
29+
const RETRYABLE_LAUNCH_RE =
30+
/launchApp timed out|RCTJavaScriptDidFailToLoad|packager-probe|Metro not responding/i;
2631

2732
function waitForTcpPort(port, host = '127.0.0.1', timeoutMs = 120000) {
2833
const start = Date.now();
@@ -53,6 +58,95 @@ function isRetryableJetDisconnect(output) {
5358
return JET_RETRYABLE_WS_RE.test(output);
5459
}
5560

61+
function isRetryableLaunchFailure(err) {
62+
return RETRYABLE_LAUNCH_RE.test(err?.message || '');
63+
}
64+
65+
function isRetryableE2eFailure(err) {
66+
const jetOutput = err?.jetOutput || '';
67+
return isRetryableJetDisconnect(jetOutput) || isRetryableLaunchFailure(err);
68+
}
69+
70+
async function waitForMetro(port = METRO_PORT, timeoutMs = 120000) {
71+
const start = Date.now();
72+
73+
while (Date.now() - start < timeoutMs) {
74+
try {
75+
const response = await fetch(`http://127.0.0.1:${port}/status`);
76+
const body = await response.text();
77+
if (body.includes('packager-status:running')) {
78+
console.log(`[rnfb-e2e] Metro OK on 127.0.0.1:${port}`);
79+
return;
80+
}
81+
console.warn(
82+
`[rnfb-e2e] Metro 127.0.0.1:${port}/status returned unexpected body: ${body.slice(0, 120)}`,
83+
);
84+
} catch (_) {
85+
// Metro not ready yet; keep polling.
86+
}
87+
await new Promise(resolve => setTimeout(resolve, 500));
88+
}
89+
90+
throw new Error(
91+
`Metro not responding with packager-status:running on 127.0.0.1:${port} after ${timeoutMs}ms`,
92+
);
93+
}
94+
95+
async function launchAppWithTimeout(launchArgs, timeoutMs = LAUNCH_APP_TIMEOUT_MS) {
96+
console.log(`[rnfb-e2e] launchApp starting (timeout=${timeoutMs}ms)`);
97+
let timer;
98+
99+
try {
100+
await Promise.race([
101+
device.launchApp({
102+
newInstance: true,
103+
delete: true,
104+
launchArgs,
105+
}),
106+
new Promise((_, reject) => {
107+
timer = setTimeout(() => {
108+
const err = new Error(
109+
`[rnfb-e2e] launchApp timed out after ${timeoutMs}ms — check simulator.log for ` +
110+
`[rnfb-lifecycle] packager-probe / RCTJavaScriptDidFailToLoad and Detox waitForActive`,
111+
);
112+
err.retryableLaunchFailure = true;
113+
reject(err);
114+
}, timeoutMs);
115+
}),
116+
]);
117+
console.log('[rnfb-e2e] launchApp complete');
118+
} finally {
119+
clearTimeout(timer);
120+
}
121+
}
122+
123+
async function launchAppWithRetry(launchArgs) {
124+
for (let launchAttempt = 1; launchAttempt <= LAUNCH_APP_MAX_ATTEMPTS; launchAttempt++) {
125+
try {
126+
if (launchAttempt > 1) {
127+
console.warn(
128+
`[rnfb-e2e] Retrying launchApp after Metro/bundle load failure (attempt ${launchAttempt}/${LAUNCH_APP_MAX_ATTEMPTS})`,
129+
);
130+
try {
131+
await device.terminateApp();
132+
} catch (_) {
133+
// No-op
134+
}
135+
await waitForMetro(METRO_PORT);
136+
await new Promise(resolve => setTimeout(resolve, 2000));
137+
}
138+
139+
await launchAppWithTimeout(launchArgs);
140+
return;
141+
} catch (err) {
142+
const retryable = launchAttempt < LAUNCH_APP_MAX_ATTEMPTS && isRetryableLaunchFailure(err);
143+
if (!retryable) {
144+
throw err;
145+
}
146+
}
147+
}
148+
}
149+
56150
function runJetE2eAttempt(attempt) {
57151
const platform = detox.device.getPlatform();
58152
const testsDir = path.resolve(__dirname, '..');
@@ -98,15 +192,13 @@ function runJetE2eAttempt(attempt) {
98192
try {
99193
console.log(`[rnfb-e2e] Jet attempt ${attempt}: waiting for port ${JET_REMOTE_PORT}`);
100194
await waitForTcpPort(JET_REMOTE_PORT);
195+
console.log(`[rnfb-e2e] Jet attempt ${attempt}: waiting for Metro on port ${METRO_PORT}`);
196+
await waitForMetro(METRO_PORT);
101197
console.log(`[rnfb-e2e] Jet attempt ${attempt}: launching app`);
102-
await device.launchApp({
103-
newInstance: true,
104-
delete: true,
105-
launchArgs: {
106-
detoxURLBlacklistRegex: `.*`,
107-
// Avoid sync/idling blocking the main queue while Detox WS login is pending.
108-
detoxEnableSynchronization: 'NO',
109-
},
198+
await launchAppWithRetry({
199+
detoxURLBlacklistRegex: `.*`,
200+
// Avoid sync/idling blocking the main queue while Detox WS login is pending.
201+
detoxEnableSynchronization: 'NO',
110202
});
111203
} catch (err) {
112204
jetProcess.kill();
@@ -124,10 +216,16 @@ describe('Jet Tests', function () {
124216
const deviceId = detox.device.id;
125217
const testsDir = path.resolve(__dirname, '..');
126218

219+
let lastFailure;
220+
127221
for (let attempt = 1; attempt <= 2; attempt++) {
128222
try {
129223
if (attempt > 1) {
130-
console.warn('[rnfb-e2e] Retrying after transient Jet WS disconnect (1006/1001)');
224+
if (isRetryableJetDisconnect(lastFailure?.jetOutput || '')) {
225+
console.warn('[rnfb-e2e] Retrying after transient Jet WS disconnect (1006/1001)');
226+
} else if (isRetryableLaunchFailure(lastFailure)) {
227+
console.warn('[rnfb-e2e] Retrying after Metro/bundle load launch failure');
228+
}
131229
try {
132230
await device.terminateApp();
133231
} catch (_) {
@@ -138,8 +236,9 @@ describe('Jet Tests', function () {
138236
await runJetE2eAttempt(attempt);
139237
break;
140238
} catch (err) {
239+
lastFailure = err;
141240
const jetOutput = err.jetOutput || '';
142-
const retryable = attempt === 1 && isRetryableJetDisconnect(jetOutput);
241+
const retryable = attempt === 1 && isRetryableE2eFailure(err);
143242
console.warn(
144243
`[rnfb-e2e] Jet attempt ${attempt} failed (retryable=${retryable}, exit=${err.jetExitCode ?? 'n/a'})`,
145244
);

0 commit comments

Comments
 (0)