Skip to content

fix(slurm): request zero GPUs when export falls back to GPU partition#915

Open
AdamRajfer wants to merge 1 commit into
mainfrom
arajfer/fix-export-sbatch-gpu-fallback
Open

fix(slurm): request zero GPUs when export falls back to GPU partition#915
AdamRajfer wants to merge 1 commit into
mainfrom
arajfer/fix-export-sbatch-gpu-fallback

Conversation

@AdamRajfer
Copy link
Copy Markdown
Contributor

PR #901 introduced export_partition = cpu_partition or cfg.execution.partition. When cpu_partition is null (existing configs), export falls back to execution.partition — typically a GPU partition like 'batch' — with no --gpus-per-node in the sbatch header, and schedulers that enforce GPU specification on non-CPU partitions reject the submission:

sbatch: error: Cannot find GPU specification, you may not submit a job
not requesting GPUs in a non-CPU partition, partition: batch

The existing --gpus 0 on the inner srun only applies to the step, not to the allocation. Declare --gpus-per-node=0 at the sbatch level when falling back so the scheduler accepts an allocation with no GPUs.

PR #901 introduced `export_partition = cpu_partition or cfg.execution.partition`.
When cpu_partition is null (existing configs), export falls back to
execution.partition — typically a GPU partition like 'batch' — with no
--gpus-per-node in the sbatch header, and schedulers that enforce GPU
specification on non-CPU partitions reject the submission:

  sbatch: error: Cannot find GPU specification, you may not submit a job
  not requesting GPUs in a non-CPU partition, partition: batch

The existing `--gpus 0` on the inner srun only applies to the step, not
to the allocation. Declare `--gpus-per-node=0` at the sbatch level when
falling back so the scheduler accepts an allocation with no GPUs.

Signed-off-by: Adam Rajfer <arajfer@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant