-
Notifications
You must be signed in to change notification settings - Fork 253
Add NVIDIA DCGM packages and repository support #7063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
b73a201
Add NVIDIA DCGM packages and repository support
surajssd b005a0e
Add error handling and logging to NVIDIA repository setup
surajssd e537f7a
Fix package name encoding issues in apt_get_download
surajssd 9b3adb9
Add NVIDIA DCGM package installation support
surajssd 5eecdcd
refactor: Consolidate package testing logic using case statement
surajssd 8f20dad
Fix typo in nvidia repo setup and add missing benchmark capture
surajssd 0f48cde
Remove CUDA 12 specific datacenter-gpu-manager packages
surajssd 12d4e4a
Fix trailing whitespace and implement managed GPU experience feature
surajssd dcdc249
Add e2e tests for NVIDIA DCGM Exporter on Ubuntu 22.04 and 24.04
surajssd 4424199
feat: Rename dcgm-exporter package for Azure Linux
surajssd f3fd888
test: Add OS-specific logic for GPU manager exporter packages
surajssd fc7b6fc
Move epoch stripping outside OS-specific conditional
surajssd a9ac311
test: Improve DCGM exporter validation
surajssd 98f0ee0
Add dcgm-exporter component config
surajssd 4025aee
fix: Escape dots in version for JSON path queries
surajssd 8f2ae5d
Add Azure Linux 3.0 version check for NVIDIA DCGM pkg installation
surajssd 41e3448
feat: Consolidate GPU device plugin and DCGM
surajssd afd0c76
test: Standardize test function naming with underscores
surajssd 9e49dbf
Refactor function names to use camelCase convention
surajssd d47fcf7
tests: Consolidate NVIDIA DCGM Exporter tests into device plugin tests
surajssd 2429388
Rename GPU test file to reflect broader scope
surajssd 82006eb
refactor: Narrow Nvidia package updates to DCGM pkg
surajssd edb3ab7
Add .claude to .gitignore
surajssd a5f33a6
Rename function from managed_gpu_package_list to managedGPUPackageList
surajssd 831c621
Add CustomData
surajssd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are assuming once released in Ubuntu repo it will be available in AzureLinux ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, I understand. What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question was more around the fact that changes in this files were only for ubuntu, I was wondering is we needed to setup renovate datasource for the azurelinux repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that for the AzureLinux renovate uses the registry URL provided as a part of the components.json file to fetch new update.
AgentBaker/.github/renovate.json
Lines 625 to 628 in b805330
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but I might be wrong here. @Devinwong could confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes for RPM_registry, it will capture the URL provided in the components.json.
But I do have a suggestion here. As this PR involves multiple new rules and components added, it will be great to really test them out, if you haven't yet, to see if Renovate can really create PRs for them automatically, and ensure it doesn't break others (Renovate will complain with warnings/errors)
4.4.1-1to4.4.0and see if it really works. You will need to onboard your fork to https://developer.mend.io/ so that Renovate can detect your fork.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often break Renovate when I introduce new rules as Renovate highly relies on the renovate.json correctness and JSON can only be debugged at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Devinwong the renovate config for the APT packages is working fine, the rpm config has proven problematic. I will sync with you office to move this forward.