Software Versions
- snakemake 9.19.0
- snakemake-executor-plugin-slurm 2.6.1
- SLURM 23.02
When a rule has a group: directive, the SLURM executor sums the runtime resource across all jobs in the group. If the summed runtime exceeds the partition's time limit, sbatch rejects the submission, but no error is reported to the user. The jobs silently appear as failed with (command exited with non-zero exit code) and no SLURM job ID is ever shown.
This makes the failure very difficult to diagnose: there is no SLURM log, no job in sacct, and no indication that sbatch was rejected.
Minimal example
rule seed:
output:
"results/{exp}/step_00.txt",
localrule: True
shell:
"echo 'start' > {output}"
def _prev_step(wildcards):
n = int(wildcards.n)
return f"results/{wildcards.exp}/step_{n - 1:02d}.txt"
rule step:
input:
prev=_prev_step,
output:
"results/{exp}/step_{n}.txt",
group:
"chain_{exp}"
resources:
runtime=10,
wildcard_constraints:
n=r"0[1-5]",
exp=r"[a-z_]+",
shell:
"echo 'step {wildcards.n}' && cp {input.prev} {output}"
Run with a partition that has a 30-minute limit:
snakemake --executor slurm --jobs 5 \
--group-components chain_exp_a=5 \
--default-resources slurm_account=<account> slurm_partition=debug \
-- results/exp_a/step_05.txt
The group job requests -t 50 (5 * 10 min), which exceeds the 30-minute partition limit. sbatch rejects it, but the user only sees:
Error in group chain_exp_a
Error in rule step:
message: None
(command exited with non-zero exit code)
Expected behavior
The sbatch rejection error (e.g., sbatch: error: Batch job submission failed: Requested time limit is invalid) should be surfaced to the user.
Software Versions
When a rule has a group: directive, the SLURM executor sums the runtime resource across all jobs in the group. If the summed runtime exceeds the partition's time limit, sbatch rejects the submission, but no error is reported to the user. The jobs silently appear as failed with (command exited with non-zero exit code) and no SLURM job ID is ever shown.
This makes the failure very difficult to diagnose: there is no SLURM log, no job in sacct, and no indication that sbatch was rejected.
Minimal example
Run with a partition that has a 30-minute limit:
The group job requests -t 50 (5 * 10 min), which exceeds the 30-minute partition limit. sbatch rejects it, but the user only sees:
Expected behavior
The sbatch rejection error (e.g., sbatch: error: Batch job submission failed: Requested time limit is invalid) should be surfaced to the user.