fix: apply filterdevices configuration during device registration#117
fix: apply filterdevices configuration during device registration#117chendave wants to merge 1 commit intoProject-HAMi:mainfrom
Conversation
The filterdevice configuration defined in the configmap: volcano-vgpu-node-config could be read correctly but not applied during device registration. The feature was introduced by commit e7d4347 which add the support to filter GPU. The issue was introduced during the v0.19.0 sync (commit 621c67b) when the old vgpu framework was removed. Signed-off-by: Dave Chen <dave.jungler@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chendave The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @chendave! It looks like this is your first PR to Project-HAMi/volcano-vgpu-device-plugin 🎉 |
|
cc @DSFans2014 |
| devices := make(DeviceMap) | ||
|
|
||
| err := b.VisitDevices(func(i int, gpu device.Device) error { | ||
| // Get UUID for filtering |
There was a problem hiding this comment.
@chendave thanks for your contribution.
And I have a question that why do we need to add filtering here?
There was a problem hiding this comment.
Hi, this is where the change is needed, I think it's where the device was detected by NVML and I will remove the change from "pkg/plugin/register.go".
There was a problem hiding this comment.
why not add filter during the registration phase? My concern is that if too many modifications are made in other places, it is very likely that some of them will be missed during the next upgrade.
There was a problem hiding this comment.
okay, I will revisit this later, might not able to push the update today, must have a way to detect the real physical index if go to the register phase.
| continue | ||
| } | ||
| klog.V(3).Infoln("device ", idx, "id=", dev.ID, "health=", dev.Health) | ||
| } |
There was a problem hiding this comment.
the code is problematic, I was intending to filter the device again (unnecessary), and the code was meant to move inside the loop below.
and the check here is unnecessary because, it will filter the device that is not meant to be filtered,
e.g. I have 3 GPU, and I want to filter the device with index = 0, the code in buildGPUDeviceMap already filter the device with index = 0, but the code here will loop the remaining devices, and filter the GPU with index = 1, since the idx here will still start from 0. This is wrong! I will remove those piece of changes.
| devices := make(DeviceMap) | ||
|
|
||
| err := b.VisitDevices(func(i int, gpu device.Device) error { | ||
| // Get UUID for filtering |
There was a problem hiding this comment.
Hi, this is where the change is needed, I think it's where the device was detected by NVML and I will remove the change from "pkg/plugin/register.go".
|
I will push an update by the end of the day. |
no bandwidth left today, should reach an agreement on where the fix should go first, will find time to have more investigation in the coming days. |
|
hi @chendave After discuss with @archlitchi , we felt it would be better to filter at |
The filterdevice configuration defined in the configmap: volcano-vgpu-node-config could be read correctly but not applied during device registration.
The feature was introduced by commit e7d4347 which add the support to filter GPU.
The issue was introduced during the v0.19.0 sync (commit 621c67b) when the old vgpu framework was removed.
This PR is tested by myself in my local environment.