Commit e0e62db
committed
fix(bench): drop stress-ng benchmarks
The local 'CodSpeed Benchmarks' job kept hitting its job timeout on the
stress-ng rows. stress-ng is the only forking workload in the suite and
deadlocks under 'callgrind --trace-children=yes' on a lost pause() wakeup.
Root cause is an application-level race in stress-ng, not our code and not
a Valgrind signal-delivery bug. Its termination wait
while (stress_continue(args))
(void)shim_pause();
can miss the SIGALRM that clears the continue flag if the signal lands
between the flag check and pause(), so pause() blocks on a signal that
already arrived. stress-ng's alarm(1) re-alarm mitigation fails here because
alarm() is per-process (a forked worker that already lost its SIGALRM has no
self-armed alarm) and Valgrind serializes all guest threads onto one
scheduler lock, widening the check->pause() window from nanoseconds to
milliseconds. It reproduces on stock upstream Valgrind 3.26.0/3.25.1 too; the
only thing special about 'local' is that it is slower and runs extra configs,
so it trips the race far more often.
In practice only the full-* configs (--trace-children + cache-sim, the
slowest) hang; take_strings/echo/python3 are unaffected. stress-ng adds
little value as a callgrind throughput benchmark, so remove it rather than
work around an upstream bug. The 'timeout --kill-after=10s 120s' wrapper from
the previous commit stays as a backstop. (--fair-sched=yes was tried and
regressed the hang onto take_strings full-with-inline, so it was reverted.)1 parent 1a12a56 commit e0e62db
1 file changed
Lines changed: 0 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | 25 | | |
28 | 26 | | |
29 | 27 | | |
| |||
0 commit comments