Skip to content

Commit 326b35e

Browse files
committed
test: mitigate detox e2e flake in device lockfile under load
this is a known Detox issue, there are various ways to mitigate it, but reaching in and just directly patching it to wait for 20s vs 10s before failing seems adequate given the low frequency of this one
1 parent e24d7e6 commit 326b35e

6 files changed

Lines changed: 195 additions & 40 deletions

File tree

.yarn/patches/detox-npm-20.51.0-3e13b6e309.patch

Lines changed: 48 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,31 @@ index d6db5d5d7a4173eae88d97bf06eaeaf8b38c3b94..a85ebdf5a30607459be68463b3238b6f
102102
}
103103

104104
override fun registerIdleTransitionCallback(callback: IdlingResource.ResourceCallback?) {
105+
diff --git a/src/devices/common/drivers/android/exec/ADB.js b/src/devices/common/drivers/android/exec/ADB.js
106+
index 3954219065ca5ac7c3ea81cb801b37ae5b5ff2d1..a10727b16240a91ad382d08284b6c8cfe5dbf8e2 100644
107+
--- a/src/devices/common/drivers/android/exec/ADB.js
108+
+++ b/src/devices/common/drivers/android/exec/ADB.js
109+
@@ -380,7 +380,19 @@ class ADB {
110+
}
111+
112+
async reverseRemove(deviceId, port) {
113+
- return this.adbCmd(deviceId, `reverse --remove tcp:${port}`);
114+
+ try {
115+
+ return await this.adbCmd(deviceId, `reverse --remove tcp:${port}`);
116+
+ } catch (error) {
117+
+ const details = `${error.stderr || ''} ${error.message || ''}`;
118+
+ if (/listener .* not found/i.test(details)) {
119+
+ log.warn(
120+
+ { deviceId, port },
121+
+ 'Ignoring missing adb reverse listener during Detox teardown',
122+
+ );
123+
+ return;
124+
+ }
125+
+ throw error;
126+
+ }
127+
}
128+
129+
async emuKill(deviceId) {
105130
diff --git a/src/server/handlers/AnonymousConnectionHandler.js b/src/server/handlers/AnonymousConnectionHandler.js
106131
index 10743c87d17de135317e1bac3e65159b9872412f..abbb302307fbd5eabbbff875983f284a4bb9e738 100644
107132
--- a/src/server/handlers/AnonymousConnectionHandler.js
@@ -146,28 +171,28 @@ index 10743c87d17de135317e1bac3e65159b9872412f..abbb302307fbd5eabbbff875983f284a
146171
}
147172
}
148173

149-
diff --git a/src/devices/common/drivers/android/exec/ADB.js b/src/devices/common/drivers/android/exec/ADB.js
150-
index 1111111111111111111111111111111111111111..2222222222222222222222222222222222222222 100644
151-
--- a/src/devices/common/drivers/android/exec/ADB.js
152-
+++ b/src/devices/common/drivers/android/exec/ADB.js
153-
@@ -379,7 +379,19 @@ class ADB {
154-
}
174+
diff --git a/src/utils/ExclusiveLockfile.js b/src/utils/ExclusiveLockfile.js
175+
index da29ae99c8b90877f15715cdeb692ed8a2b47569..f53f49bca03b3acca4c4d8b05d087a8c1b601815 100644
176+
--- a/src/utils/ExclusiveLockfile.js
177+
+++ b/src/utils/ExclusiveLockfile.js
178+
@@ -6,6 +6,9 @@ const DetoxRuntimeError = require('../errors/DetoxRuntimeError');
155179

156-
async reverseRemove(deviceId, port) {
157-
- return this.adbCmd(deviceId, `reverse --remove tcp:${port}`);
158-
+ try {
159-
+ return await this.adbCmd(deviceId, `reverse --remove tcp:${port}`);
160-
+ } catch (error) {
161-
+ const details = `${error.stderr || ''} ${error.message || ''}`;
162-
+ if (/listener .* not found/i.test(details)) {
163-
+ log.warn(
164-
+ { deviceId, port },
165-
+ 'Ignoring missing adb reverse listener during Detox teardown',
166-
+ );
167-
+ return;
168-
+ }
169-
+ throw error;
170-
+ }
171-
}
180+
const retry = require('./retry');
172181

173-
async emuKill(deviceId) {
182+
+// proper-lockfile default stale is 10s; CI runners under load can miss lock heartbeats → ECOMPROMISED.
183+
+const DEVICE_REGISTRY_LOCK_STALE_MS = 20000;
184+
+
185+
const DEFAULT_OPTIONS = {
186+
retry: { retries: 10000, interval: 5 },
187+
read: { encoding: 'utf8' },
188+
@@ -107,7 +110,9 @@ class ExclusiveLockfile {
189+
this._ensureFileExists();
190+
191+
await retry(this._options.retry, () => {
192+
- const operationResult = plockfile.lockSync(this._lockFilePath);
193+
+ const operationResult = plockfile.lockSync(this._lockFilePath, {
194+
+ stale: DEVICE_REGISTRY_LOCK_STALE_MS,
195+
+ });
196+
197+
this._isLocked = true;
198+
this._invalidate();

okf-bundle/ci-workflows/android.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,13 @@ Observed on [run 27803881448](https://github.com/invertase/react-native-firebase
3232

3333
When the Jet/app WebSocket drops (1006), Detox can treat the session as dead and tear down instrumentation **while Jet is still in its 15s reconnect grace**. That triggers `reverseRemove` before the orchestration retry's `terminateApp()`. If the listener was never established or was already removed, adb exits non-zero.
3434

35-
**Mitigation (patched):** `.yarn/patches/detox-npm-20.51.0-*.patch` makes `ADB.reverseRemove` ignore `listener 'tcp:…' not found` during teardown.
35+
**Mitigation (patched):** `.yarn/patches/detox-npm-20.51.0-*.patch`:
36+
37+
- `ADB.reverseRemove` — ignore `listener 'tcp:…' not found` during teardown (below)
38+
- Android idling resources — force idle (network/timers/Fabric)
39+
- `ExclusiveLockfile.js` — 2× lock stale for `ECOMPROMISED` flake ([detox-patches.md](detox-patches.md))
40+
41+
Full inventory: [detox-patches.md](detox-patches.md).
3642

3743
**Orchestration retry:** `firebase.test.js` still retries on `RETRYABLE_DISCONNECT`; attempt 2 can pass all tests even when attempt 1's teardown logged adb noise.
3844

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Detox yarn patches
2+
3+
E2E runs on **Detox 20.51.0** (`tests/package.json`), applied via Yarn Berry patch:
4+
5+
`.yarn/patches/detox-npm-20.51.0-3e13b6e309.patch`
6+
7+
Patches are maintained in-repo (including by agents). Prefer **editing the patch file directly** or the headless workflow below — `yarn patch-commit` can prompt for overwrite confirmation, which fails in non-interactive shells.
8+
9+
## Inventory
10+
11+
| Change | File(s) | Platforms | Problem |
12+
|--------|---------|-----------|---------|
13+
| Disable network idling | `NetworkIdlingResource.kt` | Android | OkHttp never idle in RN Firebase tests → Detox sync timeout |
14+
| Disable timers idling | `TimersIdlingResource.kt` | Android | RN timer queue never drains → infinite idle wait |
15+
| Disable Fabric UI idling | `FabricUIManagerIdlingResources.kt` | Android | Stuck mount items on API 36+ / edge-to-edge → false busy |
16+
| Buffer early `ready` | `AnonymousConnectionHandler.js` | iOS | App sends `ready` before Detox login → `launchApp` stuck |
17+
| Ignore missing adb reverse on teardown | `ADB.js` | Android | Jet WS 1006 triggers mid-run `reverse --remove` → adb exit 1 |
18+
| **2× device-registry lock stale** | `ExclusiveLockfile.js` | iOS, macOS, Android | `proper-lockfile` `ECOMPROMISED` before tests start |
19+
20+
Related non-Detox patches: `jet`, `mocha-remote-client`, `mocha-remote-server` (coverage over WebSocket) — see [coverage design](../testing/coverage-design.md).
21+
22+
## Device registry lock (`ECOMPROMISED`)
23+
24+
### Symptom
25+
26+
Detox Test step fails **before any test output**, often ~30–60s after `yarn tests:ios:test:release` starts:
27+
28+
```
29+
Error: Unable to update lock within the stale threshold
30+
at .../proper-lockfile/lib/lockfile.js:109
31+
code: 'ECOMPROMISED'
32+
```
33+
34+
Build, pre-boot, and app install usually succeed. Codecov and coverage steps are skipped because Jest never starts.
35+
36+
### Cause
37+
38+
Detox serializes access to `~/Library/Detox/device.registry.json` (the **device allocation ledger**: which simulators/emulators are busy, session IDs, PIDs). It uses `proper-lockfile`, which refreshes the lock file mtime on a timer (default **stale = 10s**). If the runner is under load — simulator logging, Firestore emulator, `yeetd`, Xcode build cache — a heartbeat can miss that window and Detox throws `ECOMPROMISED`.
39+
40+
This is a known Detox / CI flake ([Detox #4210](https://github.com/wix/Detox/issues/4210)). Community reports ~1-in-25 on busy macOS CI.
41+
42+
### Mitigation (this repo)
43+
44+
**Patch** `ExclusiveLockfile.js` to pass `{ stale: 20000 }` (2× default) into `plockfile.lockSync` when locking the device registry.
45+
46+
We **do not** run `detox reset-lock-file` in CI. That command wipes `device.registry.json` — useful for local recovery, but it destroys Detox's ledger state and can mask concurrent-session bugs. Extending the stale window preserves the ledger while tolerating slow heartbeats.
47+
48+
Other prerequisites already in place:
49+
50+
- `maxWorkers: 1` in `tests/e2e/jest.config.js` (multi-worker lock contention is a common trigger)
51+
- Pre-boot simulator before Detox (orthogonal; fixes boot/migration, not lock heartbeats)
52+
53+
### Diagnosing
54+
55+
| Pattern | Meaning |
56+
|---------|---------|
57+
| Failure in first minute, no Jest/Detox test lines | Startup lock failure (this issue) |
58+
| `proper-lockfile` + `ECOMPROMISED` in Detox step log | Confirms lock heartbeat, not test logic |
59+
| Release fails, debug passes (or vice versa) | Timing/load flake; same root cause |
60+
| Orphan `node` in job cleanup | Stuck Detox/Jest from lock abort |
61+
62+
```bash
63+
rg 'ECOMPROMISED|proper-lockfile|Unable to update lock' detox-step.log
64+
```
65+
66+
### When to bump further
67+
68+
If `ECOMPROMISED` persists after 20s, consider 30s in the patch (MetaMask used 20s on Bitrise). Re-evaluate when migrating off Detox to Appium.
69+
70+
## Updating a Detox patch (headless)
71+
72+
**Do not** `rsync` the whole `node_modules/detox` tree into a patch folder — that pulls in frameworks and multi‑MB artifacts.
73+
74+
1. Edit the patched file under `tests/node_modules/detox/...` (after `yarn install`), **or** append a correct unified-diff hunk to `.yarn/patches/detox-npm-20.51.0-3e13b6e309.patch`.
75+
2. Regenerate the patch file without prompts:
76+
77+
```bash
78+
PATCH_DIR=$(yarn patch detox@npm:20.51.0 2>&1 | sed -n 's/.*folder: //p')
79+
SRC=tests/node_modules/detox
80+
81+
# Copy ONLY the files this patch touches (see inventory table above)
82+
/bin/cp -f "$SRC/src/utils/ExclusiveLockfile.js" "$PATCH_DIR/src/utils/ExclusiveLockfile.js"
83+
/bin/cp -f "$SRC/src/server/handlers/AnonymousConnectionHandler.js" "$PATCH_DIR/src/server/handlers/AnonymousConnectionHandler.js"
84+
/bin/cp -f "$SRC/src/devices/common/drivers/android/exec/ADB.js" "$PATCH_DIR/src/devices/common/drivers/android/exec/ADB.js"
85+
/bin/cp -f "$SRC/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/network/NetworkIdlingResource.kt" \
86+
"$PATCH_DIR/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/network/NetworkIdlingResource.kt"
87+
/bin/cp -f "$SRC/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/timers/TimersIdlingResource.kt" \
88+
"$PATCH_DIR/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/timers/TimersIdlingResource.kt"
89+
/bin/cp -f "$SRC/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/uimodule/fabric/FabricUIManagerIdlingResources.kt" \
90+
"$PATCH_DIR/android/detox/src/full/java/com/wix/detox/reactnative/idlingresources/uimodule/fabric/FabricUIManagerIdlingResources.kt"
91+
92+
yarn patch-commit -s "$PATCH_DIR" # non-interactive when using /bin/cp -f, not plain cp
93+
```
94+
95+
3. `yarn install` from repo root and confirm the change in `tests/node_modules/detox/...`.
96+
4. Update this doc and platform pages (`ios.md`, `android.md`) if behaviour or file list changes.
97+
98+
**Detox version bump:** change `tests/package.json`, run `yarn`, re-apply all hunks (or redo the headless flow from a fresh `yarn patch`), run iOS + Android E2E.
99+
100+
## Platform docs
101+
102+
- [iOS](ios.md) — simulator boot, early-ready, Jet WS, lock flake sentinels
103+
- [Android](android.md) — idling resources, adb reverse teardown

okf-bundle/ci-workflows/index.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,13 @@ Knowledge for GitHub Actions workflows in this repository: how jobs are structur
55
## Platforms
66

77
* [iOS](ios.md) — simulator boot reliability, logging, and troubleshooting
8-
* [Android](android.md)TBD
8+
* [Android](android.md)idling resources, adb teardown, native coverage pull
99
* [Other](other.md) — macOS Detox (non-iOS), Windows, and shared workflow concerns — TBD
1010

11+
## Shared E2E dependencies
12+
13+
* [Detox yarn patches](detox-patches.md) — inventory, `ECOMPROMISED` lock flake, headless patch update workflow
14+
1115
## Related
1216

1317
* [Testing / coverage design](../testing/coverage-design.md) — e2e coverage collection that runs inside the iOS workflow

0 commit comments

Comments
 (0)