Skip to content

Commit c25f5f5

Browse files
committed
DEBUG #4054 clang CI: capture stacks on homing timeout + abort
Three debug-only additions to diagnose the rip-and-test-clang failure (homing timeout after 60s + Fatal glibc pthread mutex assertion on shutdown) which my local docker cannot reproduce: - launch.sh: PYTHONFAULTHANDLER=1, ulimit -c unlimited, LIBC_FATAL_ STDERR_=1, MALLOC_CHECK_=3 so SIGABRT/SIGSEGV in any Python child prints a Python+native stack to stderr (visible via linuxcnc.err). - qtvcp.py: faulthandler.enable() + register on SIGUSR1 so the smoke driver can dump qtvcp's interpreter stack without killing it. - drive.py: on homing timeout, dump per-joint state, halui machine pin, locate qtvcp processes and send SIGUSR1; sleep briefly so the stack dump lands in the log before we tear down. Will be reverted once the clang-only failure mode is understood.
1 parent d1d28b8 commit c25f5f5

3 files changed

Lines changed: 81 additions & 0 deletions

File tree

src/emc/usr_intf/qtvcp/qtvcp.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,17 @@
88
import signal
99
import subprocess
1010

11+
# DEBUG #4054 clang CI: register Python faulthandler on SIGUSR1 so the
12+
# ui-smoke driver can dump the qtvcp interpreter stack on home timeout
13+
# without killing the process. Default fatal signal set (SIGSEGV/ABRT/
14+
# BUS/FPE/ILL) is already enabled via PYTHONFAULTHANDLER=1 in launch.sh.
15+
try:
16+
import faulthandler
17+
faulthandler.enable()
18+
faulthandler.register(signal.SIGUSR1, chain=False)
19+
except Exception:
20+
pass
21+
1122

1223
if '--force_pyqt=6' in sys.argv:
1324
os.environ["QT_API"] = "pyqt6"

tests/ui-smoke/_lib/drive.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
import argparse
1515
import linuxcnc
16+
import os
1617
import sys
1718
import time
1819

@@ -130,6 +131,41 @@ def home_all(cmd, stat, timeout):
130131
f"{timeout}s; homed={list(stat.homed[:njoints])} "
131132
f"task_state={stat.task_state} task_mode={stat.task_mode} "
132133
f"exec_state={stat.exec_state} njoints={njoints}\n")
134+
# DEBUG #4054 clang CI: dump per-joint state + halcmd snapshot +
135+
# signal qtvcp processes to dump their Python stacks via faulthandler.
136+
try:
137+
for i in range(njoints):
138+
j = stat.joint[i]
139+
sys.stderr.write(
140+
f"DEBUG joint[{i}]: homed={j['homed']} homing={j['homing']} "
141+
f"enabled={j['enabled']} inpos={j['inpos']} fault={j['fault']} "
142+
f"min_hard_limit={j['min_hard_limit']} max_hard_limit={j['max_hard_limit']} "
143+
f"min_soft_limit={j['min_soft_limit']} max_soft_limit={j['max_soft_limit']}\n")
144+
sys.stderr.write(
145+
f"DEBUG axis_mask={stat.axis_mask} kinematics_type={stat.kinematics_type} "
146+
f"motion_mode={stat.motion_mode} interp_state={stat.interp_state} "
147+
f"estop={stat.estop} enabled={stat.enabled} homed_all={stat.homed}\n")
148+
except Exception as e:
149+
sys.stderr.write(f"DEBUG joint dump failed: {e}\n")
150+
import subprocess
151+
for args in (["halcmd", "show", "pin", "halui.machine"],
152+
["halcmd", "show", "pin", "joint.0"],
153+
["halcmd", "show", "param", "joint.0"],
154+
["halcmd", "show", "sig"]):
155+
try:
156+
out = subprocess.check_output(
157+
args, stderr=subprocess.STDOUT, timeout=5).decode()
158+
sys.stderr.write(f"DEBUG {' '.join(args)}:\n{out}\n")
159+
except Exception as e:
160+
sys.stderr.write(f"DEBUG {' '.join(args)} failed: {e}\n")
161+
qlog = os.path.expanduser("~/qtdragon.log")
162+
if os.path.exists(qlog):
163+
try:
164+
with open(qlog) as f:
165+
tail = f.readlines()[-50:]
166+
sys.stderr.write(f"DEBUG qtdragon.log tail:\n{''.join(tail)}\n")
167+
except Exception as e:
168+
sys.stderr.write(f"DEBUG read {qlog} failed: {e}\n")
133169
return False
134170

135171

tests/ui-smoke/_lib/launch.sh

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,40 @@ export GST_PLUGIN_FEATURE_RANK="pulsesink:NONE,alsasink:NONE,osssink:NONE,oss4si
5656
export PULSE_SERVER=/dev/null
5757
export SDL_AUDIODRIVER=dummy
5858

59+
# DEBUG #4054 clang CI: enable Python faulthandler so SIGSEGV/SIGABRT/SIGUSR1
60+
# in a Python process print a native+Python stack to stderr. PYTHONFAULTHANDLER
61+
# also registers SIGUSR1 via faulthandler.register at interpreter startup when
62+
# the env var is set, so drive.py can signal qtvcp on homing timeout.
63+
export PYTHONFAULTHANDLER=1
64+
# DEBUG #4054 clang CI: enable cores + glibc abort verbosity.
65+
ulimit -c unlimited
66+
export LIBC_FATAL_STDERR_=1
67+
68+
# DEBUG #4054 clang CI: wrap qtvcp under gdb --batch so a SIGSEGV inside
69+
# the Qt event loop (which faulthandler now confirms is happening) gets a
70+
# C-level backtrace. Only takes effect if gdb is available; gdb is on the
71+
# GitHub Actions ubuntu-24.04 runner image by default. The wrapper is a
72+
# temp dir prepended to PATH that shadows qtvcp; the inner gdb invocation
73+
# uses the absolute path captured BEFORE shadowing so it does not recurse.
74+
REAL_QTVCP_PATH="$(command -v qtvcp || true)"
75+
if [ -n "$REAL_QTVCP_PATH" ] && command -v gdb >/dev/null 2>&1; then
76+
GDB_WRAP_DIR="$(mktemp -d -t qtvcp-gdb.XXXXXX)"
77+
cat >"$GDB_WRAP_DIR/qtvcp" <<WRAP
78+
#!/bin/bash
79+
exec gdb -batch -nx \\
80+
-ex 'set pagination off' \\
81+
-ex 'handle SIG33 nostop noprint pass' \\
82+
-ex 'handle SIGCHLD nostop noprint pass' \\
83+
-ex 'handle SIGPIPE nostop noprint pass' \\
84+
-ex run \\
85+
-ex 'echo \n=== signal caught, dumping all-thread backtrace ===\n' \\
86+
-ex 'thread apply all bt' \\
87+
--args /usr/bin/python3 "$REAL_QTVCP_PATH" "\$@"
88+
WRAP
89+
chmod +x "$GDB_WRAP_DIR/qtvcp"
90+
export PATH="$GDB_WRAP_DIR:$PATH"
91+
fi
92+
5993
# Export the per-invocation values so the inner bash -c receives them
6094
# as proper env vars (avoids embedding paths into the inner script
6195
# via quoting, which breaks on apostrophes / spaces).

0 commit comments

Comments
 (0)