Skip to content

Silent failure when grouped job runtime exceeds partition time limit #456

@sebastianbeyer

Description

@sebastianbeyer

Software Versions

  • snakemake 9.19.0
  • snakemake-executor-plugin-slurm 2.6.1
  • SLURM 23.02

When a rule has a group: directive, the SLURM executor sums the runtime resource across all jobs in the group. If the summed runtime exceeds the partition's time limit, sbatch rejects the submission, but no error is reported to the user. The jobs silently appear as failed with (command exited with non-zero exit code) and no SLURM job ID is ever shown.

This makes the failure very difficult to diagnose: there is no SLURM log, no job in sacct, and no indication that sbatch was rejected.

Minimal example

  rule seed:
      output:
          "results/{exp}/step_00.txt",
      localrule: True
      shell:
          "echo 'start' > {output}"

  def _prev_step(wildcards):
      n = int(wildcards.n)
      return f"results/{wildcards.exp}/step_{n - 1:02d}.txt"

  rule step:
      input:
          prev=_prev_step,
      output:
          "results/{exp}/step_{n}.txt",
      group:
          "chain_{exp}"
      resources:
          runtime=10,
      wildcard_constraints:
          n=r"0[1-5]",
          exp=r"[a-z_]+",
      shell:
          "echo 'step {wildcards.n}' && cp {input.prev} {output}"

Run with a partition that has a 30-minute limit:

  snakemake --executor slurm --jobs 5 \
    --group-components chain_exp_a=5 \
    --default-resources slurm_account=<account> slurm_partition=debug \
    -- results/exp_a/step_05.txt

The group job requests -t 50 (5 * 10 min), which exceeds the 30-minute partition limit. sbatch rejects it, but the user only sees:

  Error in group chain_exp_a
      Error in rule step:
          message: None
          (command exited with non-zero exit code)

Expected behavior

The sbatch rejection error (e.g., sbatch: error: Batch job submission failed: Requested time limit is invalid) should be surfaced to the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions