Body
Summary
In mutmut/__main__.py::timeout_checker.inner_timeout_checker (lines 1155–1172 on v3.5.0, similar shape on main after #411 and #493), the est lookup uses the outer loop's mutant_name rather than the mutant associated with the PID being checked. When any mutant has estimated_time_of_tests = 0 (typical for "no tests" / status 33 mutants), the watchdog computes a boundary of (0 + timeout_constant) * timeout_multiplier = 15s (defaults) for any currently-running PID on the watchdog ticks that iterate to those zero-est outer mutants — and kills them with SIGXCPU even though their actual mutant's estimate is much higher.
This produces uniform 15s wall-clock timeouts for any mutant whose test selection takes longer than 15s wall, regardless of whether the mutation is a real infinite loop. The mutants are misclassified as timeout instead of survived / killed.
Affected code
# mutmut/__main__.py (line numbers from 3.5.0; same shape on main)
def timeout_checker(mutants):
def inner_timeout_checker():
while True:
sleep(1)
now = datetime.now()
for m, mutant_name, result in mutants: # <-- outer loop captures mutant_name
with START_TIMES_BY_PID_LOCK:
start_times_by_pid = dict(m.start_time_by_pid)
for pid, start_time in start_times_by_pid.items():
run_time = now - start_time
if run_time.total_seconds() > (m.estimated_time_of_tests_by_mutant[mutant_name] + 1) * 15:
# ^^^^^^^^^^^
# wrong mutant!
try:
os.kill(pid, signal.SIGXCPU)
except ProcessLookupError:
pass
return inner_timeout_checker
The inner loop is over start_times_by_pid.items() — i.e. currently-running PIDs. The PID-to-mutant mapping is held in m.key_by_pid[pid] (see SourceFileMutationData.register_pid, register_result). But the est lookup uses the outer-loop variable mutant_name, which iterates over every mutant in the run — including the est=0 ones.
Effect: any time the outer loop's iteration lands on an est=0 mutant (every watchdog tick where mutants is non-trivial), every currently-running PID older than 15s gets SIGXCPU'd.
Repro signature
In our project (single-file plugin, 4037 mutants):
| Status |
Count |
Wall-clock duration |
| killed (exit 1) |
2173 |
median 60ms |
| survived (exit 0) |
714 |
median 9.6s |
| timeout (exit -24, SIGXCPU) |
1005 |
uniform 15.2s ± 0.5s |
| no-tests (exit 33) |
145 |
n/a |
The uniform 15.2s wall-clock is the smoking gun: not a function of per-mutant estimate, but a constant. The 145 no-tests mutants are sufficient to poison the watchdog.
Decisive evidence: re-running a "timeout"-classified mutant in isolation:
$ mutmut run --max-children=1 plugins.portfolio_risk.plugin.x__parse_factor_defs__mutmut_1
...
🙁 plugins.portfolio_risk.plugin.x__parse_factor_defs__mutmut_1
It survives. Same mutant. Same tests. Same machine. The difference: in isolation, the outer loop has only the one mutant — no est=0 sibling to poison the watchdog. In the full run, the mutant's PID sat there for 15s while the watchdog's outer loop visited an est=0 mutant and SIGXCPU'd it.
The mutation in question is > → >= (a boundary check), which cannot infinite-loop.
Minimal reproducer
A project with:
- At least one mutant that triggers status 33 ("no tests") — i.e. a function whose mutants have no covering tests in
tests_by_mangled_function_name. This gives an est=0 entry.
- At least one mutant whose mapped test selection takes >15s wall-clock to run.
Run mutmut run --max-children=1. The slow mutant will be classified timeout even though it has no infinite loop. Re-run it alone via mutmut run <one_mutant_name> and it will classify correctly.
Suggested fix
Look up the mutant per-PID using m.key_by_pid[pid]:
def timeout_checker(mutants):
def inner_timeout_checker():
while True:
sleep(1)
now = datetime.now()
for m, _outer_mutant_name, result in mutants:
with START_TIMES_BY_PID_LOCK:
start_times_by_pid = dict(m.start_time_by_pid)
key_by_pid = dict(m.key_by_pid)
for pid, start_time in start_times_by_pid.items():
actual_mutant_name = key_by_pid.get(pid)
if actual_mutant_name is None:
continue # race: pid registered then de-registered between fork and watchdog
est = m.estimated_time_of_tests_by_mutant.get(actual_mutant_name, 0)
run_time = now - start_time
if run_time.total_seconds() > (est + 1) * 15:
try:
os.kill(pid, signal.SIGXCPU)
except ProcessLookupError:
pass
return inner_timeout_checker
Notes on the fix:
- The outer loop is now only useful as a way to discover all
SourceFileMutationData instances m. The mutant_name from that iteration is irrelevant for the boundary calculation.
m.key_by_pid should be snapshotted under the same lock as m.start_time_by_pid for consistency between the two maps; otherwise the watchdog could see a PID in start_time_by_pid whose key_by_pid entry has already been deleted by register_result. The if actual_mutant_name is None: continue guard handles the residual race window.
Versions tested
mutmut == 3.5.0 (pinned)
- Python 3.12.5, macOS (Darwin 25.4.0)
--max-children=1
Related
Body
Summary
In
mutmut/__main__.py::timeout_checker.inner_timeout_checker(lines 1155–1172 onv3.5.0, similar shape onmainafter #411 and #493), the est lookup uses the outer loop'smutant_namerather than the mutant associated with the PID being checked. When any mutant hasestimated_time_of_tests = 0(typical for "no tests" / status 33 mutants), the watchdog computes a boundary of(0 + timeout_constant) * timeout_multiplier = 15s(defaults) for any currently-running PID on the watchdog ticks that iterate to those zero-est outer mutants — and kills them withSIGXCPUeven though their actual mutant's estimate is much higher.This produces uniform 15s wall-clock timeouts for any mutant whose test selection takes longer than 15s wall, regardless of whether the mutation is a real infinite loop. The mutants are misclassified as
timeoutinstead ofsurvived/killed.Affected code
The inner loop is over
start_times_by_pid.items()— i.e. currently-running PIDs. The PID-to-mutant mapping is held inm.key_by_pid[pid](seeSourceFileMutationData.register_pid,register_result). But the est lookup uses the outer-loop variablemutant_name, which iterates over every mutant in the run — including the est=0 ones.Effect: any time the outer loop's iteration lands on an est=0 mutant (every watchdog tick where mutants is non-trivial), every currently-running PID older than 15s gets SIGXCPU'd.
Repro signature
In our project (single-file plugin, 4037 mutants):
The uniform 15.2s wall-clock is the smoking gun: not a function of per-mutant estimate, but a constant. The 145 no-tests mutants are sufficient to poison the watchdog.
Decisive evidence: re-running a "timeout"-classified mutant in isolation:
It survives. Same mutant. Same tests. Same machine. The difference: in isolation, the outer loop has only the one mutant — no est=0 sibling to poison the watchdog. In the full run, the mutant's PID sat there for 15s while the watchdog's outer loop visited an est=0 mutant and SIGXCPU'd it.
The mutation in question is
>→>=(a boundary check), which cannot infinite-loop.Minimal reproducer
A project with:
tests_by_mangled_function_name. This gives an est=0 entry.Run
mutmut run --max-children=1. The slow mutant will be classifiedtimeouteven though it has no infinite loop. Re-run it alone viamutmut run <one_mutant_name>and it will classify correctly.Suggested fix
Look up the mutant per-PID using
m.key_by_pid[pid]:Notes on the fix:
SourceFileMutationDatainstancesm. Themutant_namefrom that iteration is irrelevant for the boundary calculation.m.key_by_pidshould be snapshotted under the same lock asm.start_time_by_pidfor consistency between the two maps; otherwise the watchdog could see a PID instart_time_by_pidwhosekey_by_pidentry has already been deleted byregister_result. Theif actual_mutant_name is None: continueguard handles the residual race window.Versions tested
mutmut == 3.5.0(pinned)--max-children=1Related
dictionary changed size during iterationrace in the sameinner_timeout_checker; did not address this closure-variable defect.inner_timout_checker: iterating overm.start_time_by_pid.items()can crash with "dictionary changed size during iteration" and leavemutmut runhanging #490 — reports another race ininner_timeout_checker(unrelated to this bug).15and1constants configurable astimeout_multiplier/timeout_constant; the closure-variable defect remains because the lookup site was not touched.