Skip to content

fix: code refactoring#451

Merged
cmeesters merged 10 commits intosnakemake:mainfrom
fgvieira:small_tweak
Apr 16, 2026
Merged

fix: code refactoring#451
cmeesters merged 10 commits intosnakemake:mainfrom
fgvieira:small_tweak

Conversation

@fgvieira
Copy link
Copy Markdown
Contributor

@fgvieira fgvieira commented Apr 10, 2026

  • allow for both GRES and GPU to be specified (for example, gpu with some network specs)
  • assume default GPU of 1
  • GRES can be just a string since, according to docs, slurm will assume 1:

--gres=
Specifies a comma-delimited list of generic consumable resources requested per node. The format for each entry in the list is "name[[:type]:count]". The name is the type of consumable resource (e.g. gpu). The type is an optional classification for the resource (e.g. a100). The count is the number of those resources with a default value of 1. [...]. Examples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and "--gres=help".

Summary by CodeRabbit

  • Bug Fixes

    • Broadened validation for generic resource specs to accept optional subtypes and optional numeric/postfix formats.
    • GPU allocation now consistently appears in job submissions and permits a model identifier without requiring an explicit GPU count (defaults to 1 when omitted).
    • Removed strict mutual-exclusion between generic-resource and GPU settings so combined configurations are allowed.
  • Tests

    • Updated tests to accept expanded GRES/GPU formats and auto-filled GPU counts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Relaxed and reorganized GRES/GPU string construction in set_gres_string(): GRES regex now accepts an optional subtype and optional numeric/count postfix, mutual-exclusion checks between resources.gres and resources.gpu removed, early returns replaced by a single accumulator, and GPU model now auto-fills count to 1 when missing.

Changes

Cohort / File(s) Summary
GRES and GPU Configuration Logic
snakemake_executor_plugin_slurm/utils.py
Refactored set_gres_string() control flow: loosened GRES regex to allow optional :<subtype> and optional :<number>[TGM], removed exclusivity between resources.gres and resources.gpu, replaced early returns with a single accumulator, and changed GPU handling to always derive gpu/gpu_model from job.resources, auto-filling GPU count to 1 when gpu_model is set without a count and emitting --gpus=... alongside --gres=....
Tests Updated for New Behavior
tests/tests.py
Converted previously error-expected tests into valid-case assertions: renamed tests and now assert exact returned strings for gres="gpu", gres="gpu:tesla", gpu_model="tesla" (auto-filled count), and the case where both gres and gpu are set now returns combined --gres=... --gpus=... instead of raising exceptions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through regex vines tonight,

nudged rules loose till they felt light,
stitched gres and gpus into a line,
filled a missing count with a tiny sign,
now scheduler whispers "all is fine".

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'fix: code refactoring' is vague and generic, using non-descriptive terms that do not convey meaningful information about the specific changes made to the codebase. Consider a more specific title that captures the main change, such as 'Allow --gres and --gpus options to be specified together' or 'Refactor set_gres_string() to support combined GPU and GRES configurations'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
snakemake_executor_plugin_slurm/utils.py (2)

383-406: ⚠️ Potential issue | 🔴 Critical

Critical: Non-GPU jobs will incorrectly request --gpus=1.

The default of 1 is applied unconditionally, causing ALL jobs (including non-GPU jobs) to receive --gpus=1. This conflicts with partitions.py:parse_gpu_requirements() which defaults gpu_required to 0 for non-GPU jobs.

The --gpus flag should only be added when GPU resources are explicitly requested.

🐛 Proposed fix to only add GPU flags when requested
-    # Default to 1 GPU
-    gpu_string = str(job.resources.get("gpu", 1))
-    gpu_model = job.resources.get("gpu_model")
-
-    if gpu_model:
+    gpu_value = job.resources.get("gpu")
+    gpu_model = job.resources.get("gpu_model")
+
+    # Only add GPU flags if GPU resources were explicitly requested
+    if gpu_value is not None or gpu_model is not None:
+        # Default to 1 GPU when gpu_model is set but count is not
+        gpu_string = str(gpu_value if gpu_value is not None else 1)
+        if gpu_model:
             # validate GPU model format
             if not gpu_model_re.match(gpu_model):
                 if not string_check.match(gpu_model):
                     raise WorkflowError(
                         "GPU model format should not be a nested string (start "
                         "and end with ticks or quotation marks). "
                         "Expected format: '<name>' (e.g., 'tesla')"
                     )
                 else:
                     raise WorkflowError(
                         f"Invalid GPU model format: {gpu_model}."
                         " Expected format: '<name>' (e.g., 'tesla')"
                     )
-        gres += f" --gpus={gpu_model}:{gpu_string}"
-    else:
-        # we assume here, that the validator ensures that the 'gpu_string'
-        # is an integer
-        gres += f" --gpus={gpu_string}"
+            gres += f" --gpus={gpu_model}:{gpu_string}"
+        else:
+            # we assume here, that the validator ensures that the 'gpu_string'
+            # is an integer
+            gres += f" --gpus={gpu_string}"

     return gres
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_executor_plugin_slurm/utils.py` around lines 383 - 406, The code
unconditionally sets gpu_string = str(job.resources.get("gpu", 1)) which causes
every job to get --gpus=1; change the logic so the --gpus flag (gres += ...) is
only appended when GPU resources are explicitly requested. Specifically: stop
defaulting gpu_string to "1" — instead check if "gpu" is present in
job.resources (and >0) or gpu_model is provided; only then set gpu_string =
str(job.resources.get("gpu")) and perform gpu_model validation with
gpu_model_re/string_check and append gres += f" --gpus={gpu_model}:{gpu_string}"
(or gres += f" --gpus={gpu_string}" when no model). Use the existing identifiers
(job.resources, gpu_string, gpu_model, gpu_model_re, string_check, gres) to
locate and update the code paths.

360-381: ⚠️ Potential issue | 🔴 Critical

set_gres_string() is missing mutual-exclusion validation for gres and gpu resources.

The function currently allows both gres and gpu to be processed simultaneously, but test_both_gres_and_gpu_set expects it to raise WorkflowError with message "GRES and GPU are set. Please only set one". Additionally, partitions.py:parse_gpu_requirements() enforces this mutual exclusion at lines 302-305. Add validation at the start of set_gres_string() to check if both resources are provided and raise an error accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_executor_plugin_slurm/utils.py` around lines 360 - 381, In
set_gres_string() add an early mutual-exclusion check: if both
job.resources.gres and job.resources.gpu are present, raise WorkflowError("GRES
and GPU are set. Please only set one"); place this check before any gres
validation/processing (i.e., before using gres_re.match or accessing
job.resources.gres) so the function mirrors the behavior enforced by
partitions.py:parse_gpu_requirements() and satisfies test_both_gres_and_gpu_set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@snakemake_executor_plugin_slurm/utils.py`:
- Around line 383-406: The code unconditionally sets gpu_string =
str(job.resources.get("gpu", 1)) which causes every job to get --gpus=1; change
the logic so the --gpus flag (gres += ...) is only appended when GPU resources
are explicitly requested. Specifically: stop defaulting gpu_string to "1" —
instead check if "gpu" is present in job.resources (and >0) or gpu_model is
provided; only then set gpu_string = str(job.resources.get("gpu")) and perform
gpu_model validation with gpu_model_re/string_check and append gres += f"
--gpus={gpu_model}:{gpu_string}" (or gres += f" --gpus={gpu_string}" when no
model). Use the existing identifiers (job.resources, gpu_string, gpu_model,
gpu_model_re, string_check, gres) to locate and update the code paths.
- Around line 360-381: In set_gres_string() add an early mutual-exclusion check:
if both job.resources.gres and job.resources.gpu are present, raise
WorkflowError("GRES and GPU are set. Please only set one"); place this check
before any gres validation/processing (i.e., before using gres_re.match or
accessing job.resources.gres) so the function mirrors the behavior enforced by
partitions.py:parse_gpu_requirements() and satisfies test_both_gres_and_gpu_set.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: baa63543-6714-4b70-8fd8-a54f3ccb5dd0

📥 Commits

Reviewing files that changed from the base of the PR and between 7fa975f and a71c6c6.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/utils.py

@fgvieira
Copy link
Copy Markdown
Contributor Author

@cmeesters do these changes make sense?

@cmeesters
Copy link
Copy Markdown
Member

@fgvieira I don't know. I am currently in Wales and will need some time next week, before I can have a look.

@cmeesters
Copy link
Copy Markdown
Member

@fgvieira ok, I agree, this is much cleaner. Plus: Yes, --gres might be a necessity in addition to --gpus on some clusters. On the other hand, it can result in conflicts.

We should:

  • go through the failing tests, which still assume that --gres and --gpus are mutually exclusive.
  • and look for a better error message upon failed job submissions. I fear the latter is hard and beyond the scope of this PR. Perhaps worth a talk to SchedMD.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
snakemake_executor_plugin_slurm/utils.py (2)

344-379: ⚠️ Potential issue | 🟡 Minor

Fix the GRES help text to match the new regex.

Line 346 now accepts bare <name> and <name>:<type>, but Lines 370-379 still say count is mandatory. The current message will point users to the wrong syntax when validation fails.

Suggested fix
-    # generic resources (GRES) arguments can be of type "string",
-    # "string:int" or "string:string:int" with optional postfix 'T' or 'G' or 'M'
+    # generic resources (GRES) arguments can be of type "string",
+    # "string:string", "string:int", or "string:string:int"
+    # with optional postfix 'T' or 'G' or 'M'
@@
-                    "Expected format: "
-                    "'<name>:<number>' or '<name>:<type>:<number>' with an optional "
-                    "'T' 'M' or 'G' postfix "
-                    "(e.g., 'gpu:1' or 'gpu:tesla:2')"
+                    "Expected format: "
+                    "'<name>', '<name>:<type>', '<name>:<number>', or "
+                    "'<name>:<type>:<number>' with an optional "
+                    "'T' 'M' or 'G' postfix "
+                    "(e.g., 'gpu', 'gpu:tesla', 'gpu:1', or 'gpu:tesla:2')"
@@
-                    f"Invalid GRES format: {job.resources.gres}. Expected format: "
-                    "'<name>:<number>' or '<name>:<type>:<number>' with an optional "
-                    "'T' 'M' or 'G' postfix "
-                    "(e.g., 'gpu:1' or 'gpu:tesla:2') "
+                    f"Invalid GRES format: {job.resources.gres}. Expected format: "
+                    "'<name>', '<name>:<type>', '<name>:<number>', or "
+                    "'<name>:<type>:<number>' with an optional "
+                    "'T' 'M' or 'G' postfix "
+                    "(e.g., 'gpu', 'gpu:tesla', 'gpu:1', or 'gpu:tesla:2') "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_executor_plugin_slurm/utils.py` around lines 344 - 379, The GRES
error messages are out of sync with the gres_re which now permits bare "<name>"
and "<name>:<type>" (number is optional); update the WorkflowError strings
raised in the validation block that references job.resources.gres so they
reflect the new allowed formats (e.g., allow "<name>", "<name>:<type>",
"<name>:<type>:<number>" and optional 'T'/'G'/'M' postfix) and remove text that
says a count is mandatory; ensure both the nested-string check and the
invalid-format branch use consistent, accurate wording matching gres_re and
string_check.

360-406: ⚠️ Potential issue | 🟠 Major

Remove or update the upstream mutual-exclusion guard in partitions.py.

The new code in utils.py appends both --gres and --gpus flags when both resources are set, but snakemake_executor_plugin_slurm/partitions.py:302-305 still rejects jobs with both gpu and gres specified, preventing this code path from ever executing.

Either the guard must be removed to allow both resources simultaneously, or the utils.py logic must be corrected to handle only mutually exclusive cases (one branch executing, not both).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@snakemake_executor_plugin_slurm/utils.py` around lines 360 - 406, The
partitions.py mutual-exclusion guard that blocks jobs having both
job.resources.gres and job.resources.gpu must be reconciled with the new
utils.py behavior that appends both --gres and --gpus; either remove/update that
guard so jobs with both resources are allowed or change utils.py to make gres
and gpu mutually exclusive (only one branch runs). Locate the guard in
partitions.py (the check rejecting jobs when both gpu and gres are present) and
either delete or modify it to permit both, or alternatively update the utils.py
logic around variables gres, gpu_string, and the gpu_model branch so it selects
one flag (prefer gres OR gpu) and does not append both; ensure consistency
between the validator in partitions and the formatter in utils.py
(job.resources.gres, job.resources.gpu, and gpu_model) so only supported
combinations are accepted and formatted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@snakemake_executor_plugin_slurm/utils.py`:
- Around line 384-406: The GPU flag is only emitted when job.resources.gpu is
truthy, so a provided gpu_model alone is dropped instead of defaulting to 1 GPU;
change the logic in the gpu handling block (symbols: gpu_string, gpu_model,
gpu_model_re, string_check, gres) to treat gpu_model as authoritative: if
gpu_model exists but gpu_string is falsy, set gpu_string to "1" (or equivalent
validated integer) before validation and building gres, then continue to
validate gpu_model with gpu_model_re/string_check and append either "
--gpus={gpu_model}:{gpu_string}" or " --gpus={gpu_string}" as appropriate.

---

Outside diff comments:
In `@snakemake_executor_plugin_slurm/utils.py`:
- Around line 344-379: The GRES error messages are out of sync with the gres_re
which now permits bare "<name>" and "<name>:<type>" (number is optional); update
the WorkflowError strings raised in the validation block that references
job.resources.gres so they reflect the new allowed formats (e.g., allow
"<name>", "<name>:<type>", "<name>:<type>:<number>" and optional 'T'/'G'/'M'
postfix) and remove text that says a count is mandatory; ensure both the
nested-string check and the invalid-format branch use consistent, accurate
wording matching gres_re and string_check.
- Around line 360-406: The partitions.py mutual-exclusion guard that blocks jobs
having both job.resources.gres and job.resources.gpu must be reconciled with the
new utils.py behavior that appends both --gres and --gpus; either remove/update
that guard so jobs with both resources are allowed or change utils.py to make
gres and gpu mutually exclusive (only one branch runs). Locate the guard in
partitions.py (the check rejecting jobs when both gpu and gres are present) and
either delete or modify it to permit both, or alternatively update the utils.py
logic around variables gres, gpu_string, and the gpu_model branch so it selects
one flag (prefer gres OR gpu) and does not append both; ensure consistency
between the validator in partitions and the formatter in utils.py
(job.resources.gres, job.resources.gpu, and gpu_model) so only supported
combinations are accepted and formatted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a41cbe8b-741f-414a-a8a1-25321082a62e

📥 Commits

Reviewing files that changed from the base of the PR and between 5d37e32 and 9738233.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/utils.py

Comment thread snakemake_executor_plugin_slurm/utils.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/tests.py (1)

282-294: ⚠️ Potential issue | 🔴 Critical

Test input gres="gpu:tesla:" is rejected by regex validation and will cause WorkflowError before the assertion.

The regex ^[a-zA-Z0-9_]+(:[a-zA-Z0-9_\.]+)?(:\d+[TGM]?)?$ requires that if a colon is present, it must be followed by digits (or optionally T/G/M). The trailing colon in gpu:tesla: violates this pattern and triggers the WorkflowError on line 375 of set_gres_string().

Additionally, even if validation passed, the function appends the raw GRES value (line 381: --gres={job.resources.gres}), so the output would be --gres=gpu:tesla: not --gres=tesla:1. The expected output also violates SLURM's documented grammar, which requires a resource name (like gpu) before the type.

Fix: Use valid GRES format gpu:tesla (no trailing colon) and expect output --gres=gpu:tesla, or if a count default is intended, implement explicit normalization in the function and document it.

Suggested test adjustment
-    def test_valid_gres_format_gpu_model(self, mock_job):
-        """Test with invalid GRES format (missing count)."""
-        job = mock_job(gres="gpu:tesla:")
+    def test_valid_gres_format_gpu_model(self, mock_job):
+        """Test with valid GRES format with GPU type."""
+        job = mock_job(gres="gpu:tesla")
...
-        assert set_gres_string(job) == " --gres=tesla:1"
+        assert set_gres_string(job) == " --gres=gpu:tesla"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 282 - 294, The test uses an invalid GRES string
with a trailing colon which fails the regex validation in set_gres_string and
triggers WorkflowError; update the test to use a valid GRES value (e.g., job =
mock_job(gres="gpu:tesla")) and change the expected assertion to match the
function's behavior (expect " --gres=gpu:tesla"), or alternatively implement
explicit normalization inside set_gres_string to convert "gpu:tesla" to include
a default count before changing the test; reference set_gres_string and
job.resources.gres to locate where validation/appending occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/tests.py`:
- Around line 282-294: The test uses an invalid GRES string with a trailing
colon which fails the regex validation in set_gres_string and triggers
WorkflowError; update the test to use a valid GRES value (e.g., job =
mock_job(gres="gpu:tesla")) and change the expected assertion to match the
function's behavior (expect " --gres=gpu:tesla"), or alternatively implement
explicit normalization inside set_gres_string to convert "gpu:tesla" to include
a default count before changing the test; reference set_gres_string and
job.resources.gres to locate where validation/appending occurs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e1b36a4-b2b1-43b4-8c42-dd281cd7d12d

📥 Commits

Reviewing files that changed from the base of the PR and between 9738233 and a6678a6.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/utils.py
  • tests/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/utils.py

@fgvieira fgvieira requested a review from cmeesters April 16, 2026 12:24
@cmeesters cmeesters merged commit 8489b10 into snakemake:main Apr 16, 2026
5 checks passed
@fgvieira fgvieira deleted the small_tweak branch April 17, 2026 06:32
cmeesters pushed a commit that referenced this pull request Apr 17, 2026
🤖 I have created a release *beep* *boop*
---


##
[2.6.1](v2.6.0...v2.6.1)
(2026-04-16)


### Bug Fixes

* code refactoring
([#451](#451))
([8489b10](8489b10))
* handle integer slurm_account values from YAML parsing
([#448](#448))
([263f3a8](263f3a8))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants