Skip to content

Commit 847f969

Browse files
mverzilliclaude
andauthored
chore: fix kv-store browser tests hangs (#22721)
`yarn-project/kv-store/yarn test:browser` runs ~11 vitest browser-mode test files in a single vitest process, all sharing one chromium browser. In CI3's `ISOLATE` environment (`docker run --cpus=2 --memory=8g --tmpfs /tmp:exec,size=1g`), the run would intermittently hang at a test-file boundary, sit silent for >80s, then get killed by the outer CI timeout. Roughly 25% hit rate on CI; zero hit rate on developer machines. The bug only manifests under CPU pressure. On unconstrained hardware (developer boxes with many cores), every chromium thread always gets scheduled in time and the bug never surfaces. Root caused to a CDP teardown deadlock between vitest and chromium at test-file transitions: 1. Vitest opens 10 CDP TCP connections to chromium's `headless_shell --type=utility --utility-sub-type=network.mojom.NetworkService `(the chromium "network service" process) at startup. 2. Between test files, vitest tears down 6 of those 10 connections in a batch, sending FIN on each socket. 3. The kernel transitions chromium's side of those 6 sockets to `CLOSE_WAIT` (peer closed, application hasn't called `close()` yet). 4. Chromium's network service event loop, starved of CPU under `--cpus=2`, never gets scheduled to drain those closed FDs and call `close()`. 5. Vitest's teardown path is awaiting completion of the close handshake on those 6 connections (likely a `Promise.all` over `CDPSession` detach), and never gets the response. 6. Both processes end up with zero on-CPU threads. `vitest` parked on `futex_wait_queue`, chromium parked on `ep_poll` -> deadlock The hang is at upstream code (@vitest/browser-playwright ↔ playwright ↔ chromium CDP). The 6:4 close/open ratio reproduces exactly across runs. The fix is to run each test file in a separate vitest invocation, instead of one vitest process iterating over all files. - `yarn-project/kv-store/scripts/run-browser-tests.sh`: new shell loop that finds each *.test.ts under src/indexeddb/ and src/sqlite-opfs/ and runs yarn vitest run "$file" per file, sequentially. - `yarn-project/kv-store/package.local.json`: test:browser overridden to bash scripts/run-browser-tests.sh This avoids the bug entirely because the cross-file teardown path is never exercised: each vitest process only has to tear down at its own end-of-process, where chromium gets killed outright by the OS rather than asked to close gracefully via CDP. Cost: ~5–10s of vite/yarn startup per file. With 11 files, that's ~60–100s of extra wall time per yarn test:browser run. Not great, but if we can stabilize the suite and keep it running, we can iteratively look for better ways (eg: reducing the amount of files to reduce overhead where sensible). ## Verification A reliable repro for future regressions is saved at yarn-project/kv-store/scripts/repro-browser-hang.sh. It runs the previous (single-process) test:browser shape under docker_isolate constraints; reproduces consistently (without this fix) in ~3 minutes. Closes F-589 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 85e1539 commit 847f969

10 files changed

Lines changed: 344 additions & 13 deletions

.test_patterns.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,14 +146,19 @@ tests:
146146
- *palla
147147

148148
# yarn-project tests
149+
# Attempt to catch all kv-store browser test failures (consider them quarantined for now)
150+
- regex: "yarn-project/kv-store"
151+
error_regex: "vitest"
152+
owners:
153+
- *martin
149154
- regex: "yarn-project/kv-store"
150155
error_regex: "Could not import your test module"
151156
owners:
152157
- *grego
153158
- regex: "yarn-project/kv-store"
154159
error_regex: "timeout: sending signal TERM to command"
155160
owners:
156-
- *alex
161+
- *martin
157162
- regex: "yarn-project/kv-store"
158163
error_regex: "Failed to fetch dynamically imported module"
159164
owners:

yarn-project/bootstrap.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ function test_cmds {
170170

171171
# Exclusions:
172172
# end-to-end: e2e tests handled separately with end-to-end/bootstrap.sh.
173-
# kv-store: Uses mocha so will need different treatment.
173+
# kv-store: per-file fan-out handled by kv-store/bootstrap.sh test_cmds.
174174
for test in !(end-to-end|kv-store|aztec)/src/**/*.test.ts; do
175175
# Skip benchmarks here.
176176
[[ "$test" =~ \.bench\.test\.ts$ ]] && continue
@@ -218,8 +218,8 @@ function test_cmds {
218218
echo "${prefix}${cmd_env} yarn-project/scripts/run_test.sh $test"
219219
done
220220

221-
# Uses mocha for browser tests, so we have to treat it differently.
222-
echo "$hash:ISOLATE=1 cd yarn-project/kv-store && yarn test"
221+
# kv-store: per-file fan-out (mocha for node tests, vitest for browser tests).
222+
kv-store/bootstrap.sh test_cmds
223223

224224
# Aztec CLI tests
225225
aztec/bootstrap.sh test_cmds

yarn-project/kv-store/bootstrap.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/usr/bin/env bash
2+
source $(git rev-parse --show-toplevel)/ci3/source_bootstrap
3+
4+
hash=$(../bootstrap.sh hash)
5+
6+
function test_cmds {
7+
# Node tests (mocha): files outside the browser-test and bench dirs.
8+
# Mirrors .mocharc.json's spec.
9+
for test in src/**/!(indexeddb|sqlite-opfs|bench)/*.test.ts; do
10+
echo "$hash yarn-project/kv-store/scripts/run_test.sh $test"
11+
done
12+
13+
# Browser tests (vitest + chromium). Each file runs in its own ISOLATE
14+
# container — running multiple files in a single vitest invocation
15+
# triggers a CDP teardown deadlock on the 2-CPU CI executor. See
16+
# scripts/run-browser-tests.sh for the root-cause analysis.
17+
for test in src/indexeddb/*.test.ts src/sqlite-opfs/*.test.ts; do
18+
echo "$hash:ISOLATE=1 yarn-project/kv-store/scripts/run_test.sh $test"
19+
done
20+
}
21+
22+
case "$cmd" in
23+
"")
24+
;;
25+
*)
26+
default_cmd_handler "$@"
27+
;;
28+
esac

yarn-project/kv-store/package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@
1616
"build:dev": "../scripts/tsc.sh --watch",
1717
"clean": "rm -rf ./dest .tsbuildinfo",
1818
"test:node": "NODE_NO_WARNINGS=1 mocha --config ./.mocharc.json",
19-
"test:browser": "vitest run --config ./vitest.config.ts",
19+
"test:browser": "bash scripts/run-browser-tests.sh",
2020
"bench:browser": "VITE_BENCH=1 vitest run --config ./vitest.config.ts src/bench",
21-
"test": "yarn test:node",
21+
"test": "yarn test:node && yarn test:browser",
2222
"test:jest": "NODE_NO_WARNINGS=1 node --experimental-vm-modules ../node_modules/.bin/jest --passWithNoTests --maxWorkers=${JEST_MAX_WORKERS:-8}"
2323
},
2424
"inherits": [
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
{
22
"scripts": {
3-
"test": "yarn test:node"
3+
"test": "yarn test:node && yarn test:browser",
4+
"test:browser": "bash scripts/run-browser-tests.sh"
45
}
56
}
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
#!/usr/bin/env bash
2+
# Deep diagnostic wrapper around `yarn test:browser` for investigating
3+
# browser-test hangs and silent failures.
4+
#
5+
# Wraps a single vitest run with a hybrid 1s-steady / 0.1s-burst sampler
6+
# that captures /proc state (wchan, stack, syscall, fd), cgroup CPU/memory
7+
# counters, /proc/net socket state counts, and a CPU-availability
8+
# microbench. When vitest goes silent for >3s the sampler enters dense
9+
# burst mode and emits a stacks_snapshot every 1s of burst. Outer 90s
10+
# timeout converts a hang into a clean exit with the full diagnostic dump
11+
# emitted to stderr by the EXIT trap.
12+
#
13+
# Single-tenant invocation:
14+
# bash yarn-project/kv-store/scripts/probe-test-browser.sh
15+
#
16+
# Parallel invocation (avoids tmp-path collisions across slots):
17+
# DIAG_DIR=/tmp/diag.$$ bash yarn-project/kv-store/scripts/probe-test-browser.sh
18+
#
19+
# Was used to root-cause the kv-store CDP teardown deadlock that
20+
# scripts/run-browser-tests.sh now works around; kept for future regression
21+
# debugging. scripts/repro-browser-hang.sh runs this in a parallel grind
22+
# under docker_isolate constraints.
23+
24+
cd "$(dirname "$0")/.."
25+
26+
DIAG_DIR=${DIAG_DIR:-/tmp/diag}
27+
PROBE_LOG=$DIAG_DIR/probe.log
28+
STACKS_LOG=$DIAG_DIR/stacks.log
29+
VITEST_LOG=$DIAG_DIR/vitest.log
30+
mkdir -p "$DIAG_DIR"
31+
: > "$PROBE_LOG"
32+
: > "$STACKS_LOG"
33+
: > "$VITEST_LOG"
34+
35+
echo "probe-test-browser v2: starting at $(date +%T) (pid $$)" >&2
36+
37+
pids_of_interest() {
38+
pgrep -f 'chrom|node|vitest|esbuild|yarn' 2>/dev/null
39+
}
40+
41+
# Force fixed CPU demand and measure what fraction we actually got.
42+
# ratio = (user+sys)/real. ~1.0 = full CPU; <0.5 = heavily preempted by host.
43+
# Disambiguates "processes not asking for CPU" (scenario A) from "host not
44+
# scheduling us" (scenario B) when cgroup counters are flat.
45+
microbench() {
46+
local TIMEFORMAT='microbench: real=%R user=%U sys=%S'
47+
{ time awk 'BEGIN{x=0; for(i=0;i<100000;i++) x+=i*i; print x > "/dev/null"}' \
48+
</dev/null 2>/dev/null; } 2>&1
49+
}
50+
51+
snapshot() {
52+
local tag="$1"
53+
echo "=== $(date +%T.%N | cut -c1-12) $tag ==="
54+
echo "-- /tmp --"
55+
df -h /tmp 2>/dev/null | tail -1
56+
du -sh /tmp 2>/dev/null
57+
echo "-- cgroup mem --"
58+
if [ -r /sys/fs/cgroup/memory.current ]; then
59+
awk '{printf "current %d bytes (%.0f MB)\n", $1, $1/1048576}' /sys/fs/cgroup/memory.current
60+
cat /sys/fs/cgroup/memory.events 2>/dev/null
61+
fi
62+
echo "-- cgroup cpu --"
63+
cat /sys/fs/cgroup/cpu.stat 2>/dev/null
64+
echo "-- top by rss --"
65+
ps -eo pid,ppid,stat,rss,pcpu,wchan:20,comm --sort=-rss --no-headers 2>/dev/null | head -15
66+
echo "-- procs of interest --"
67+
for pid in $(pids_of_interest); do
68+
[ -d "/proc/$pid" ] || continue
69+
local state threads rss_kb wchan cmd
70+
state=$(awk '{print $3}' "/proc/$pid/stat" 2>/dev/null)
71+
wchan=$(cat "/proc/$pid/wchan" 2>/dev/null)
72+
rss_kb=$(awk '/^VmRSS:/{print $2}' "/proc/$pid/status" 2>/dev/null)
73+
threads=$(awk '/^Threads:/{print $2}' "/proc/$pid/status" 2>/dev/null)
74+
cmd=$({ tr '\0' ' ' < "/proc/$pid/cmdline"; } 2>/dev/null | cut -c1-120)
75+
echo "pid=$pid state=$state threads=$threads rss=${rss_kb}kB wchan=$wchan cmd=$cmd"
76+
done
77+
# TCP state hex (col 4): 01=ESTABLISHED 04=FIN_WAIT1 05=FIN_WAIT2 06=TIME_WAIT
78+
# 07=CLOSE 08=CLOSE_WAIT 09=LAST_ACK 0A=LISTEN 0B=CLOSING
79+
echo "-- tcp4 sockets (state counts) --"
80+
awk 'NR>1{print $4}' /proc/net/tcp 2>/dev/null | sort | uniq -c
81+
echo "-- tcp6 sockets (state counts) --"
82+
awk 'NR>1{print $4}' /proc/net/tcp6 2>/dev/null | sort | uniq -c
83+
# Unix socket state (col 6): 01=SS_UNCONNECTED 02=SS_CONNECTING 03=SS_CONNECTED
84+
# 04=SS_DISCONNECTING — CDP/vitest IPC is typically over SS_CONNECTED unix sockets.
85+
echo "-- unix sockets (state counts) --"
86+
awk 'NR>1{print $6}' /proc/net/unix 2>/dev/null | sort | uniq -c
87+
}
88+
89+
stacks_snapshot() {
90+
local tag="$1"
91+
echo "=== $(date +%T.%N | cut -c1-12) $tag ==="
92+
echo "-- lsof tcp/unix (-n -P, head -40) --"
93+
lsof -n -P -i TCP 2>/dev/null | head -25
94+
lsof -n -P -U 2>/dev/null | head -15
95+
for pid in $(pids_of_interest); do
96+
[ -d "/proc/$pid" ] || continue
97+
echo "--- pid=$pid ---"
98+
{ tr '\0' ' ' < "/proc/$pid/cmdline"; } 2>/dev/null | cut -c1-140
99+
echo ""
100+
# /proc/$pid/syscall: syscall_nr arg0..arg5 sp pc (readable by process owner)
101+
echo "syscall:"
102+
cat "/proc/$pid/syscall" 2>/dev/null | head -1
103+
# /proc/$pid/stack: may require CAP_SYS_ADMIN; print only if non-empty
104+
local stack_content
105+
stack_content=$(cat "/proc/$pid/stack" 2>/dev/null | head -15)
106+
if [ -n "$stack_content" ]; then
107+
echo "kernel stack:"
108+
echo "$stack_content"
109+
fi
110+
# Per-thread wchan — tells us if threads are stuck on different things
111+
echo "thread wchans (sorted | uniq -c):"
112+
for t in /proc/$pid/task/*/wchan; do
113+
cat "$t" 2>/dev/null
114+
echo ""
115+
done | sort | uniq -c | sort -rn | head -10
116+
echo "socket/pipe fds:"
117+
for fd in /proc/$pid/fd/*; do
118+
local target
119+
target=$(readlink "$fd" 2>/dev/null) || continue
120+
case "$target" in
121+
socket:*|pipe:*|anon_inode:*) echo " $(basename "$fd") -> $target" ;;
122+
esac
123+
done
124+
done
125+
}
126+
127+
probe_loop() {
128+
local last_vitest_size=0
129+
local silent_since
130+
silent_since=$(date +%s)
131+
local burst_count=0
132+
local burst_cap=200 # ~20s dense sampling at 0.1s
133+
local steady_count=0 # increments once per steady iteration
134+
while true; do
135+
local now_s cur_size silent_for
136+
now_s=$(date +%s)
137+
cur_size=$(stat -c %s "$VITEST_LOG" 2>/dev/null || echo 0)
138+
if [ "$cur_size" != "$last_vitest_size" ]; then
139+
last_vitest_size=$cur_size
140+
silent_since=$now_s
141+
burst_count=0
142+
fi
143+
silent_for=$((now_s - silent_since))
144+
145+
if [ "$silent_for" -ge 3 ] && [ "$burst_count" -lt "$burst_cap" ]; then
146+
{
147+
snapshot "burst(${silent_for}s silent)"
148+
# During a hang, microbench every burst snapshot — it's the core
149+
# signal for "is the host giving us CPU?" Overhead is irrelevant
150+
# since tests aren't running.
151+
microbench
152+
} >> "$PROBE_LOG" 2>&1
153+
# Stacks every 10th burst tick (~every 1s of burst) to keep log size sane
154+
if [ $((burst_count % 10)) -eq 0 ]; then
155+
stacks_snapshot "burst(${silent_for}s silent)" >> "$STACKS_LOG" 2>&1
156+
fi
157+
burst_count=$((burst_count + 1))
158+
sleep 0.1
159+
else
160+
{
161+
snapshot "steady"
162+
# Microbench every 5th steady iteration to bound CPU overhead
163+
# while tests are actually running (~0.4% vs tests' CPU budget).
164+
if [ $((steady_count % 5)) -eq 0 ]; then
165+
microbench
166+
fi
167+
} >> "$PROBE_LOG" 2>&1
168+
# Periodic in-flight stacks baseline — gives us a "healthy waiting"
169+
# reference to diff against burst-triggered dumps during a hang.
170+
if [ $((steady_count % 10)) -eq 0 ]; then
171+
stacks_snapshot "steady(tick=${steady_count})" >> "$STACKS_LOG" 2>&1
172+
fi
173+
steady_count=$((steady_count + 1))
174+
sleep 1
175+
fi
176+
done
177+
}
178+
179+
probe_loop &
180+
PROBE_PID=$!
181+
182+
cleanup() {
183+
local rc=$?
184+
kill "$PROBE_PID" 2>/dev/null
185+
wait "$PROBE_PID" 2>/dev/null
186+
# One last stacks snapshot at the moment of failure
187+
stacks_snapshot "cleanup" >> "$STACKS_LOG" 2>&1
188+
{
189+
echo ""
190+
echo "=== VITEST LOG (tail -300) ==="
191+
tail -300 "$VITEST_LOG"
192+
echo ""
193+
echo "=== FULL PROBE LOG ($(wc -l < "$PROBE_LOG") lines) ==="
194+
cat "$PROBE_LOG"
195+
echo ""
196+
echo "=== FULL STACKS LOG ($(wc -l < "$STACKS_LOG") lines, test exit=$rc) ==="
197+
cat "$STACKS_LOG"
198+
echo "=== END DIAGNOSTICS ==="
199+
} >&2
200+
exit "$rc"
201+
}
202+
trap cleanup EXIT
203+
204+
# Tee vitest output to a file so the probe loop can detect silence.
205+
# pipefail ensures we surface timeout's exit code, not tee's.
206+
set -o pipefail
207+
timeout -v 90s yarn test:browser 2>&1 | tee "$VITEST_LOG"
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/env bash
2+
# Reproduces the kv-store browser-test hang under CI3 ISOLATE constraints.
3+
#
4+
# Failure mode: CDP CLOSE_WAIT deadlock between vitest and chromium's network
5+
# service at test-file transitions. Vitest closes 6 of 10 CDP TCP connections
6+
# on file-boundary teardown; chromium's network-service event loop, starved of
7+
# CPU under --cpus=2, never drains the closed sockets. Vitest's teardown hangs
8+
# waiting for the close-handshake. Both processes end up with zero on-CPU
9+
# threads. Outer `timeout -v 90s` in probe-test-browser.sh fires SIGTERM after
10+
# 90s of silence.
11+
#
12+
# Crucially, this does NOT reproduce on unconstrained hardware — chromium has
13+
# enough cores to drain sockets immediately. Must run inside docker_isolate
14+
# (--cpus=2 --memory=8g --tmpfs /tmp:exec,size=1g) to surface the hang.
15+
#
16+
# Usage:
17+
# bash yarn-project/kv-store/scripts/repro-browser-hang.sh [JOBS]
18+
#
19+
# JOBS: parallel container count (default 8). Higher = faster repro,
20+
# higher RAM/CPU ceiling on host. 8 reliably catches a hang in ~3min on
21+
# a multi-core box.
22+
#
23+
# On hang: parallel halts (rc=124) and the failing job's full probe diagnostic
24+
# (vitest tail + probe.log + stacks.log) is dumped to this script's stderr by
25+
# dump_fail. Capture stderr to a file if you want to keep it.
26+
#
27+
# Example: capture both streams for later analysis
28+
# bash yarn-project/kv-store/scripts/repro-browser-hang.sh 8 > /tmp/repro.log 2>&1
29+
30+
set -uo pipefail
31+
32+
cd "$(git rev-parse --show-toplevel)"
33+
34+
JOBS=${1:-8}
35+
36+
echo "=== repro-browser-hang start: $(date -Is), jobs=$JOBS ==="
37+
while true; do
38+
echo './ci3/dump_fail "CPUS=2 MEM=8g TMPFS_SIZE=1g ./ci3/docker_isolate \"cd yarn-project/kv-store && bash scripts/probe-test-browser.sh\"" >/dev/null'
39+
done | parallel -j"$JOBS" --halt now,fail=1
40+
rc=$?
41+
echo "=== repro-browser-hang end (parallel rc=$rc): $(date -Is) ==="
42+
exit $rc
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/usr/bin/env bash
2+
# Local-dev entrypoint for `yarn test:browser`: runs every browser test file
3+
# in its own vitest process, sequentially. Delegates per-file dispatch to
4+
# scripts/run_test.sh (which CI also uses for per-file fan-out).
5+
#
6+
# Why not a single `vitest run` over all files: vitest+chromium have a CDP
7+
# teardown deadlock at test-file transitions under CPU-constrained
8+
# environments (CI3 ISOLATE: --cpus=2). Vitest closes a cohort of CDP TCP
9+
# connections when switching files; chromium's network service can't drain
10+
# them fast enough under contention; vitest's teardown blocks indefinitely
11+
# on the close-handshake. By running each file in a separate vitest process
12+
# the close-handshake only happens at process exit, avoiding the cross-file
13+
# teardown path entirely. See scripts/repro-browser-hang.sh for the repro.
14+
set -euo pipefail
15+
16+
cd "$(dirname "$0")/.."
17+
18+
files=$(find src/indexeddb src/sqlite-opfs -name '*.test.ts' 2>/dev/null | sort)
19+
20+
if [ -z "$files" ]; then
21+
echo "No test files found in src/indexeddb or src/sqlite-opfs"
22+
exit 0
23+
fi
24+
25+
count=$(echo "$files" | wc -l)
26+
echo "Running $count browser test files (one vitest process per file)"
27+
28+
i=0
29+
for f in $files; do
30+
i=$((i + 1))
31+
echo "==> [$i/$count] $f"
32+
bash scripts/run_test.sh "$f"
33+
done
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/usr/bin/env bash
2+
# Runs a single kv-store test file. Dispatches to vitest+chromium for
3+
# browser tests (under src/indexeddb or src/sqlite-opfs) and to mocha for
4+
# everything else. Emitted by yarn-project/kv-store/bootstrap.sh test_cmds
5+
# for CI per-file fan-out and runnable directly for local reproduction:
6+
#
7+
# yarn-project/kv-store/scripts/run_test.sh src/lmdb-v2/store.test.ts
8+
source $(git rev-parse --show-toplevel)/ci3/source
9+
10+
test=${1:?"Usage: $0 <test-file relative to kv-store/>"}
11+
cd ..
12+
13+
case "$test" in
14+
src/indexeddb/*|src/sqlite-opfs/*)
15+
exec yarn vitest run --config ./vitest.config.ts "$test"
16+
;;
17+
*)
18+
NODE_NO_WARNINGS=1 exec yarn mocha --config ./.mocharc.json "$test"
19+
;;
20+
esac

0 commit comments

Comments
 (0)