Skip to content

Add ability to speficy GPUs by UUID prefix#923

Merged
inimaz merged 3 commits into
mlco2:masterfrom
cianc:fix/ISSUE-873-add-gpu-uuid-support
Sep 14, 2025
Merged

Add ability to speficy GPUs by UUID prefix#923
inimaz merged 3 commits into
mlco2:masterfrom
cianc:fix/ISSUE-873-add-gpu-uuid-support

Conversation

@cianc

@cianc cianc commented Sep 2, 2025

Copy link
Copy Markdown
Contributor

Description

Add the ability to pass UUID prefixes as a way of specifying what GPUs to track.

This is to address #873
Prior to this change you could only pass an index into the number of GPUs on the system.
Now you can pass a UUID prefix, including the 'MIG-' prefix per
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables
if desired.

This change likely requires a documentation change to
https://mlco2.github.io/codecarbon/parameters.html. I am planning to do that in a follow up
change, but can add it here if you'd like.

Note that I have not been able to test this on a real life reproduction. The reporter of
#873 was not able to provide one.

Related Issue

Please link to the issue this PR resolves: #873

Motivation and Context

Per the above issue, we currently fail on parsing passed GPU ids when they are UUID prefixes.
This is especially a problem when the CUDA_VISIBLE_DEVICES variable is automatically set
in some cases (I think this is what is happening in the linked issue on huggingface).

How Has This Been Tested?

Added new unit tests, but as noted above I have not been able to reproduce the orginal
huggingface issue to verify.

Screenshots (if appropriate):

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@cianc

cianc commented Sep 2, 2025

Copy link
Copy Markdown
Contributor Author

Hey, I've left this in draft because I'm a little unsure on how to proceed given the lack of real-world reproduction to verify against. I also have a smaller question on whether I should add the documentation change to this change or do it in a followup.

@cianc cianc marked this pull request as ready for review September 7, 2025 16:06
@cianc

cianc commented Sep 7, 2025

Copy link
Copy Markdown
Contributor Author

Taking out of draft for visibility and hopefully feedback: I still have some questions from #923 (comment)

@cianc

cianc commented Sep 9, 2025

Copy link
Copy Markdown
Contributor Author

@benoit-cty since you were kind enough to review my previous PR, could you advise me on how to proceed?

@benoit-cty

Copy link
Copy Markdown
Contributor

Hello @cianc , thanks for your contribution.

We are a project maintained only by volunteer so you have to expect some delay for the review.

@inimaz inimaz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thanks @cianc! This indeed will be helpful with all the MIG- like ids. As you mention, could you update as well the docs?

…s to track..

This is to address mlco2#873
Prior to this change you could only pass an index into the number of GPUs on the system.
Now you can pass a UUID prefix, including the 'MIG-' prefix per
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables
if desired.

Note that I have not been able to test this on real life repo. The reporter of
mlco2#873 was not able to provide a repro.
@cianc cianc force-pushed the fix/ISSUE-873-add-gpu-uuid-support branch from 8f7f5f7 to 87f41ab Compare September 12, 2025 09:45
@cianc

cianc commented Sep 12, 2025

Copy link
Copy Markdown
Contributor Author

Docs updated.

@inimaz

inimaz commented Sep 14, 2025

Copy link
Copy Markdown
Collaborator

Nice thanks! unfortunately the .html part is the generated part of the docs. To edit them you just need to edit the .rst files in here and then run uv run --only-group doc task docs to generate the docs automatically.

See https://github.com/mlco2/codecarbon/blob/master/CONTRIBUTING.md#build-documentation-%EF%B8%8F for mor info.

@cianc

cianc commented Sep 14, 2025

Copy link
Copy Markdown
Contributor Author

Whoops, that was embarrassing! Fixed.

@inimaz inimaz merged commit ef08f3e into mlco2:master Sep 14, 2025
8 checks passed
@cianc cianc deleted the fix/ISSUE-873-add-gpu-uuid-support branch September 14, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with MIG gpus on huggingface ZeroGPU:

3 participants