Bug
When the cluster status command is killed by a signal (negative return code) fewer than 10 times, job_status() falls through the except block without assigning ret, then crashes trying to access it.
Traceback
File ".../snakemake_executor_plugin_cluster_generic/__init__.py", line 236, in job_status
ret = ret.strip().split("\n")
^^^
UnboundLocalError: cannot access local variable 'ret' where it is not associated with a value
Root cause
In __init__.py, the job_status closure (around lines 196–236):
def job_status(job_info, valid_returns=...):
try:
ret = subprocess.check_output(...) # line ~201
except subprocess.CalledProcessError as e:
if e.returncode < 0: # killed by signal
self.status_cmd_kills.append(-e.returncode)
if len(self.status_cmd_kills) > 10:
# logs info and clears list
pass # falls through
# <-- no return or assignment when count <= 10
else:
raise WorkflowError(...)
ret = ret.strip().split("\n") # ret is unbound!
When e.returncode < 0 and len(self.status_cmd_kills) <= 10, the handler does nothing — no return, no assignment to ret. Execution continues past the except block to ret.strip() where ret is unbound.
Suggested fix
Return "running" when a signal-killed status check is ignored (count ≤ 10):
if e.returncode < 0:
self.status_cmd_kills.append(-e.returncode)
if len(self.status_cmd_kills) > 10:
self.logger.info(...)
self.status_cmd_kills.clear()
return running # <-- treat interrupted check as still-active
Environment
- snakemake: 8.30.0
- Executor: cluster-generic
- Cluster: PBS (Gadi NCI, Australia)
- Status command:
qxtat check --snakemake
The status command was intermittently killed by a signal during normal workflow execution, causing the entire snakemake session to crash even though the underlying PBS jobs were still running or had completed successfully.
Bug
When the cluster status command is killed by a signal (negative return code) fewer than 10 times,
job_status()falls through theexceptblock without assigningret, then crashes trying to access it.Traceback
Root cause
In
__init__.py, thejob_statusclosure (around lines 196–236):When
e.returncode < 0andlen(self.status_cmd_kills) <= 10, the handler does nothing — noreturn, no assignment toret. Execution continues past theexceptblock toret.strip()whereretis unbound.Suggested fix
Return
"running"when a signal-killed status check is ignored (count ≤ 10):Environment
qxtat check --snakemakeThe status command was intermittently killed by a signal during normal workflow execution, causing the entire snakemake session to crash even though the underlying PBS jobs were still running or had completed successfully.