Skip to content

Commit 54cbf4a

Browse files
ci(audience): replace fixed timeout with log-driven watchdog (SDK-317)
- Drops the `timeout 1320` wrapper. Replaces it with a watchdog loop inside the docker bash that polls artifacts/playmode.log every 5 s and signals Unity 30 s after "Test run completed" first appears. - 40 min hard cap as a fallback for the case where Unity never logs "Test run completed" (player hang, etc.); 15 s SIGKILL grace after SIGTERM if the editor refuses to exit. - Bumps cell timeout-minutes from 30 to 45 to cover the inner 40 min cap plus post-Unity steps (license return, Player.log copy, artifact upload, dorny/test-reporter). - Why: a fixed timeout that fits Unity 2021.3 (~5-7 min cells) cuts Unity 6 off mid-run; a fixed timeout sized for Unity 6 makes 2021.3 cells wait up to 30+ min on a shutdown hang they would not have hit. The previous 22-min cap killed Unity 6 cells before tests could finish writing playmode-results.xml. The watchdog adapts to whatever the actual test runtime is, then catches the Unity 6 Linux shutdown hang ("Application is shutting down..." that never completes) without waiting on it. - Also captures `tail -F` of the log to job stdout while Unity is alive, so the live build / test progress streams to the GitHub Actions log as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9106a74 commit 54cbf4a

1 file changed

Lines changed: 81 additions & 38 deletions

File tree

.github/workflows/test-audience-sample-app.yml

Lines changed: 81 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -412,11 +412,13 @@ jobs:
412412
|| github.event_name == 'workflow_dispatch'
413413
name: ${{ matrix.target }} / ${{ matrix.backend }} / Unity ${{ matrix.unity }}
414414
runs-on: ubuntu-latest-8-cores
415-
# Tests now actually execute under xvfb instead of returning instantly
416-
# as inconclusive, so cells take ~5-10 min. The 30 min cap leaves
417-
# headroom for cold caches and the first image pull without leaving
418-
# a stuck job sitting on the runner for the default 6 hours.
419-
timeout-minutes: 30
415+
# Cells settle to ~5-7 min for Unity 2021.3 (both backends) and
416+
# ~22-25 min for Unity 6000.4 (Mono). Unity 6 IL2CPP on Mesa
417+
# software OpenGL is the slow path. The 45 min cap covers the inner
418+
# 40 min watchdog cap plus post-Unity steps (license return,
419+
# Player.log copy, artifact upload, dorny/test-reporter) without
420+
# leaving a stuck job sitting on the runner for the default 6 hours.
421+
timeout-minutes: 45
420422
strategy:
421423
fail-fast: false
422424
matrix:
@@ -507,40 +509,81 @@ jobs:
507509
# the player. GLX + render are required for UIElements; the
508510
# image already bundles mesa-llvmpipe for software OpenGL, so
509511
# no GPU is needed. -noreset keeps the X server up across
510-
# Unity client reconnects (the editor opens / closes / reopens
511-
# connections during scene load).
512+
# Unity client reconnects.
512513
#
513-
# `timeout` wraps the whole xvfb-run + unity-editor pipeline so
514-
# the editor cannot stall the cell past 22 minutes. Unity 6 on
515-
# Linux has a known shutdown hang: tests complete, the runner
516-
# writes playmode-results.xml, then the editor begins
517-
# `Application is shutting down...` and never fully exits
518-
# (likely a leftover thread or a player process xvfb-run is
519-
# still tied to). Without the wrapper, the cell sits idle until
520-
# the GitHub 30-min cap and the upload step never gets to run.
521-
# 22 min covers Unity 6 IL2CPP build (~5 min) + 39-test suite
522-
# execution (~10-15 min) with a 2-min buffer. SIGTERM first via
523-
# --signal=TERM so Unity can flush; --kill-after=10 forces
524-
# SIGKILL 10 s later if it is still alive. The 8 min slack to
525-
# the cell timeout lets the post-steps (license return, Player
526-
# log capture, artifact upload) all run.
527-
set +e
528-
timeout --signal=TERM --kill-after=10 1320 \
529-
xvfb-run -a --server-args="-screen 0 1280x720x24 -ac +extension GLX +render -noreset" -- \
530-
unity-editor \
531-
-batchmode \
532-
-projectPath /github/workspace/examples/audience \
533-
-runTests \
534-
-testPlatform StandaloneLinux64 \
535-
-testResults /github/workspace/artifacts/playmode-results.xml \
536-
-logFile - 2>&1 | tee /github/workspace/artifacts/playmode.log
537-
test_rc=${PIPESTATUS[0]}
538-
set -uo pipefail
539-
# exit 124 = timeout fired, 137 = SIGKILL via --kill-after.
540-
# Either way, log it explicitly so the artifact reader can tell
541-
# a real test failure from a shutdown-hang kill.
542-
if [ "$test_rc" = "124" ] || [ "$test_rc" = "137" ]; then
543-
echo "::warning::Unity exceeded the 22-min budget and was killed (rc=$test_rc). Tests likely completed; the editor failed to shut down."
514+
# Why the watchdog instead of a simple `timeout` wrapper:
515+
# - Unity 6 on Linux has a known shutdown hang. After
516+
# "Test run completed", the editor begins
517+
# `Application is shutting down...` and never fully exits.
518+
# Without intervention the cell sits idle until the cell
519+
# timeout fires before the post-Unity steps can run.
520+
# - Tests on Unity 6 + Mesa software OpenGL take ~22 min for
521+
# Mono and longer for IL2CPP, vs ~2 min on Unity 2021.3.
522+
# A fixed timeout that fits 2021.3 cuts Unity 6 off mid-run;
523+
# a fixed timeout sized for Unity 6 makes 2021.3 cells wait
524+
# up to 30+ min on a shutdown hang they would not have hit.
525+
# - The watchdog adapts: it scans the log, sees
526+
# "Test run completed" the moment Unity finishes the suite,
527+
# gives the editor 30 s to flush playmode-results.xml, then
528+
# sends SIGTERM. SIGKILL follows 15 s later if the editor
529+
# refuses to exit. Cells finish as soon as their tests do,
530+
# regardless of how slow the underlying Unity version is or
531+
# what shutdown bug it happens to hit.
532+
# - 40 min hard cap as a fallback for the case where
533+
# "Test run completed" never appears (player hang, etc.).
534+
log=/github/workspace/artifacts/playmode.log
535+
536+
xvfb-run -a --server-args="-screen 0 1280x720x24 -ac +extension GLX +render -noreset" -- \
537+
unity-editor \
538+
-batchmode \
539+
-projectPath /github/workspace/examples/audience \
540+
-runTests \
541+
-testPlatform StandaloneLinux64 \
542+
-testResults /github/workspace/artifacts/playmode-results.xml \
543+
-logFile "$log" &
544+
unity_pid=$!
545+
546+
# Stream the log to job stdout for live visibility while the
547+
# editor is alive. tail --pid exits when unity_pid does.
548+
tail --pid=$unity_pid -F "$log" 2>/dev/null &
549+
550+
deadline=$((SECONDS + 2400)) # 40 min hard cap
551+
flush_deadline=0
552+
kill_reason=""
553+
while kill -0 $unity_pid 2>/dev/null; do
554+
if [ "$SECONDS" -ge "$deadline" ]; then
555+
kill_reason="hard-cap-40m"
556+
break
557+
fi
558+
if [ "$flush_deadline" -eq 0 ] && grep -q "Test run completed" "$log" 2>/dev/null; then
559+
flush_deadline=$((SECONDS + 30))
560+
echo "[watchdog] saw \"Test run completed\" at ${SECONDS}s; SIGTERM after 30s flush window"
561+
fi
562+
if [ "$flush_deadline" -gt 0 ] && [ "$SECONDS" -ge "$flush_deadline" ]; then
563+
kill_reason="flush-window-elapsed"
564+
break
565+
fi
566+
sleep 5
567+
done
568+
569+
if [ -n "$kill_reason" ]; then
570+
echo "[watchdog] sending SIGTERM to Unity (reason: $kill_reason)"
571+
kill -TERM $unity_pid 2>/dev/null || true
572+
# 15 s grace, then SIGKILL if still alive.
573+
for _ in 1 2 3; do
574+
kill -0 $unity_pid 2>/dev/null || break
575+
sleep 5
576+
done
577+
if kill -0 $unity_pid 2>/dev/null; then
578+
echo "[watchdog] SIGTERM not honored, sending SIGKILL"
579+
kill -KILL $unity_pid 2>/dev/null || true
580+
fi
581+
fi
582+
583+
wait $unity_pid 2>/dev/null
584+
test_rc=$?
585+
if [ "$kill_reason" = "hard-cap-40m" ]; then
586+
echo "::warning::Unity hit the 40 min hard cap without logging \"Test run completed\". The player may have hung mid-suite. Inspect Player.log to see how far it got."
544587
fi
545588
546589
# Capture the standalone test player log. PlayMode tests on

0 commit comments

Comments
 (0)