Skip to content

UnboundLocalError in job_status when status command is killed by signal #25

@johnyaku

Description

@johnyaku

Bug

When the cluster status command is killed by a signal (negative return code) fewer than 10 times, job_status() falls through the except block without assigning ret, then crashes trying to access it.

Traceback

File ".../snakemake_executor_plugin_cluster_generic/__init__.py", line 236, in job_status
    ret = ret.strip().split("\n")
          ^^^
UnboundLocalError: cannot access local variable 'ret' where it is not associated with a value

Root cause

In __init__.py, the job_status closure (around lines 196–236):

def job_status(job_info, valid_returns=...):
    try:
        ret = subprocess.check_output(...)       # line ~201
    except subprocess.CalledProcessError as e:
        if e.returncode < 0:                     # killed by signal
            self.status_cmd_kills.append(-e.returncode)
            if len(self.status_cmd_kills) > 10:
                # logs info and clears list
                pass                             # falls through
            # <-- no return or assignment when count <= 10
        else:
            raise WorkflowError(...)

    ret = ret.strip().split("\n")                # ret is unbound!

When e.returncode < 0 and len(self.status_cmd_kills) <= 10, the handler does nothing — no return, no assignment to ret. Execution continues past the except block to ret.strip() where ret is unbound.

Suggested fix

Return "running" when a signal-killed status check is ignored (count ≤ 10):

if e.returncode < 0:
    self.status_cmd_kills.append(-e.returncode)
    if len(self.status_cmd_kills) > 10:
        self.logger.info(...)
        self.status_cmd_kills.clear()
    return running  # <-- treat interrupted check as still-active

Environment

  • snakemake: 8.30.0
  • Executor: cluster-generic
  • Cluster: PBS (Gadi NCI, Australia)
  • Status command: qxtat check --snakemake

The status command was intermittently killed by a signal during normal workflow execution, causing the entire snakemake session to crash even though the underlying PBS jobs were still running or had completed successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions