Software Versions
$ snakemake -version
9.13.2
$ mamba list | grep "snakemake-executor-plugin-slurm"
snakemake-executor-plugin-slurm 1.9.2 pyhdfd78af_0 bioconda
snakemake-executor-plugin-slurm-jobstep 0.3.0 pyhdfd78af_0 bioconda
$ sinfo --version
slurm 24.11.5
Describe the bug
When providing a Multi-Instance GPU (MIG) device as the GPU to allocate, given the naming convention of these devices, the executor plugin raises a WorkFlow error due to the name not matching the appropiate regex pattern. This name contains a dot (e.g. gpu:nvidia_h100_nvl_1g.12gb:1), which is not in the accepted characters of the regex pattern. Switching the regex to gres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_\.]+)?:\d+$") solves the issue. I don't know, however, if this fix would have any intended consequences.
Note: While not using the latest version of the plugin (2.0.2), the regex has not changed since #173 (9 months ago).
https://github.com/snakemake/snakemake-executor-plugin-slurm/blame/fb09d7fe7fe965daf1eae99afa2c67ad3836befc/snakemake_executor_plugin_slurm/utils.py#L179
Logs
(snakemake) [dgarcia@gemini01 paper]$ snakemake -s ./workflow/rules/gres.smk -p --verbose
Using workflow specific profile profiles/default for setting default command line arguments.
host: gemini01
Building DAG of jobs...
results/gres_bug_done: True 0
shared_storage_local_copies: True
remote_exec: False
Submitting maximum 100 job(s) over 1.0 second(s).
SLURM run ID: d5ded64e-cd8d-4968-ad89-68c2a9dc1985
Using shell: /usr/bin/bash
Provided remote nodes: 30
Job stats:
job count
-------- -------
gres_bug 1
total 1
Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 30, '_job_count': 9223372036854775807}
Ready jobs: 1
Select jobs to execute...
Selected jobs: 1
Resources after job selection: {'_cores': 9223372036854775807, '_nodes': 30, '_job_count': 100}
Execute 1 jobs...
[Wed Dec 10 11:11:00 2025]
rule gres_bug:
output: results/gres_bug_done
jobid: 0
reason: Missing output files: results/gres_bug_done
resources: tmpdir=<TBD>, slurm_partition=gpu, gres=gpu:nvidia_h100_nvl_1g.12gb:1
Shell command:
echo "Testing gres resource allocation"
touch results/gres_bug_done
No SLURM account given, trying to guess.
Unable to guess SLURM account. Trying to proceed without.
unlocking
removing lock
removing lock
removed all locks
Full Traceback (most recent call last):
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/cli.py", line 2187, in args_to_api
dag_api.execute_workflow(
~~~~~~~~~~~~~~~~~~~~~~~~^
executor=args.executor,
^^^^^^^^^^^^^^^^^^^^^^^
...<46 lines>...
scheduler_settings=scheduler_settings,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/api.py", line 634, in execute_workflow
workflow.execute(
~~~~~~~~~~~~~~~~^
executor_plugin=executor_plugin,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<4 lines>...
updated_files=updated_files,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/workflow.py", line 1442, in execute
raise e
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/workflow.py", line 1438, in execute
success = self.scheduler.schedule()
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/scheduling/job_scheduler.py", line 389, in schedule
raise e
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/scheduling/job_scheduler.py", line 367, in schedule
self.run(runjobs)
~~~~~~~~^^^^^^^^^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake/scheduling/job_scheduler.py", line 496, in run
executor.run_jobs(jobs)
~~~~~~~~~~~~~~~~~^^^^^^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake_interface_executor_plugins/executors/base.py", line 73, in run_jobs
self.run_job(job)
~~~~~~~~~~~~^^^^^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake_executor_plugin_slurm/__init__.py", line 328, in run_job
call += set_gres_string(job)
~~~~~~~~~~~~~~~^^^^^
File "/home/dgarcia/miniforge3/envs/snakemake/lib/python3.13/site-packages/snakemake_executor_plugin_slurm/utils.py", line 96, in set_gres_string
raise WorkflowError(
...<3 lines>...
)
snakemake_interface_common.exceptions.WorkflowError: Invalid GRES format: gpu:nvidia_h100_nvl_1g.12gb:1. Expected format: '<name>:<number>' or '<name>:<type>:<number>' (e.g., 'gpu:1' or 'gpu:tesla:2')
WorkflowError:
Invalid GRES format: gpu:nvidia_h100_nvl_1g.12gb:1. Expected format: '<name>:<number>' or '<name>:<type>:<number>' (e.g., 'gpu:1' or 'gpu:tesla:2')
Minimal example
rule gres_bug:
output:
touch("results/gres_bug_done")
resources:
slurm_partition="gpu",
gres="gpu:nvidia_h100_nvl_1g.12gb:1",
shell:
"""
echo "Testing gres resource allocation"
touch {output}
"""
Additional context
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/mig-device-names.html
Software Versions
$ snakemake -version
9.13.2
$ mamba list | grep "snakemake-executor-plugin-slurm"
snakemake-executor-plugin-slurm 1.9.2 pyhdfd78af_0 bioconda
snakemake-executor-plugin-slurm-jobstep 0.3.0 pyhdfd78af_0 bioconda
$ sinfo --version
slurm 24.11.5
Describe the bug
When providing a Multi-Instance GPU (MIG) device as the GPU to allocate, given the naming convention of these devices, the executor plugin raises a WorkFlow error due to the name not matching the appropiate regex pattern. This name contains a dot (e.g.
gpu:nvidia_h100_nvl_1g.12gb:1), which is not in the accepted characters of the regex pattern. Switching the regex togres_re = re.compile(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_\.]+)?:\d+$")solves the issue. I don't know, however, if this fix would have any intended consequences.Note: While not using the latest version of the plugin (2.0.2), the regex has not changed since #173 (9 months ago).
https://github.com/snakemake/snakemake-executor-plugin-slurm/blame/fb09d7fe7fe965daf1eae99afa2c67ad3836befc/snakemake_executor_plugin_slurm/utils.py#L179
Logs
Minimal example
Additional context
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/mig-device-names.html