Add DRA DaemonSet reconciler#1283
Conversation
|
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kmm ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: TomerNewman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cc @ybettan |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1283 +/- ##
==========================================
- Coverage 79.09% 73.55% -5.54%
==========================================
Files 51 67 +16
Lines 5109 4871 -238
==========================================
- Hits 4041 3583 -458
- Misses 882 1123 +241
+ Partials 186 165 -21 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/uncc @mresvanis |
591cd45 to
38c5484
Compare
| // remove the older version module loader daemonsets | ||
| for _, ds := range dsList.Items { | ||
| if ds.GetLabels()[constants.DaemonSetRole] != constants.ModuleLoaderRoleLabelValue { | ||
| role := ds.GetLabels()[constants.DaemonSetRole] |
There was a problem hiding this comment.
what about adding Role label to the device plugin daemonset? This way you can add the Role label to opts := []client.ListOption{....}
and get the needed DS without afterwards going over the label.
There was a problem hiding this comment.
its not backward compatible, old DS will not be found after an upgrade
There was a problem hiding this comment.
why does it matter? The code will run CreateOrPatch with the same name and namespace, and the DS will be updated accordingly, and will be re-found in the next loop. Is there something catch that i am missing?
There was a problem hiding this comment.
The DS is created with GenerateName, not a fixed name. So CreateOrPatch would create a new DS instead of updating the old one. The old DS would remain orphaned — never found by the query, never garbage collected.
There was a problem hiding this comment.
updating - we will add the label now but quarry like it was implemented before, when 3.0 release will come, we will quarry with this new label.
| type draReconcilerHelperAPI interface { | ||
| getModuleDRADaemonSets(ctx context.Context, name, namespace string) ([]appsv1.DaemonSet, error) | ||
| handleDRA(ctx context.Context, mod *kmmv1beta1.Module, existingDRADS []appsv1.DaemonSet) error | ||
| handleModuleDeletion(ctx context.Context, existingDRADS []appsv1.DaemonSet) error |
There was a problem hiding this comment.
This function is used to delete DRA DS both when the Module is deleted, and when the Module is updated and its DRA field is set to nil. WDYT about renaming it to: deleteDRADaemonset, or something like that?
There was a problem hiding this comment.
I named it in puepose in order to be the same as we do it in DevicePlugin, but since you are right I renamed it both in DRA and in DevicePlugin controllers.
|
we should also add a code to the DevicePluginReconciler, that deletes the DevicePlugin DS in case device plugin is not set in Module (the same way as you do in DRA reconciler). This should cover the case where user switches between DRA and DevicePlugin |
|
what about DeviceResource? Where will it be reconciled? |
|
|
8c6827f to
86d6c40
Compare
39f64db to
a8c3014
Compare
DeviceClass do you mean? |
yes :) |
794e6ea to
939c546
Compare
Implement a DRAReconciler that manages DRA driver DaemonSets from Module spec.dra, mirroring the DevicePluginReconciler pattern. The reconciler creates DaemonSets with mandatory kubelet-plugins and kubelet-plugins-registry host-path volumes, DRA-specific container names, and a DaemonSetRole label to distinguish DRA DaemonSets. Status patching reports targeted nodes and available pods in status.dra, and clearing status.dra when spec.dra is removed. Also fixes DevicePluginReconciler to exclude DRA-labeled DaemonSets from its device-plugin list, preventing cross-reconciler interference. Additionally addresses review feedback: - Rename handleModuleDeletion to deleteDRADaemonSets and deleteDevicePluginDaemonSets for clarity - Add DevicePluginRoleLabelValue label to device plugin DaemonSets; the exclusion-based query is kept for backward compatibility with a TODO(3.0) to switch to direct role-label querying - Add cleanup logic to DevicePluginReconciler to delete existing DaemonSets and clear status when spec.devicePlugin is removed
939c546 to
4f51daa
Compare
| } | ||
| } | ||
|
|
||
| func (drh *draReconcilerHelper) getModuleDRADaemonSets(ctx context.Context, name, namespace string) ([]appsv1.DaemonSet, error) { |
There was a problem hiding this comment.
i don't think we should see more then 1 DRA daemoset here. lets return the pointer to the DS, instead of slice
There was a problem hiding this comment.
The same can implemented also for DevicePlugin
There was a problem hiding this comment.
Currently, DRA only creates a single DaemonSet per Module — you're right about that. However, the next phase of DRA work will add ordered upgrade support, mirroring DevicePlugin's existing mechanism. During ordered upgrades, two DRA DaemonSets will coexist (old version kept until DesiredNumberScheduled == 0, then garbage collected) — same as DevicePlugin does today via getExistingDSFromVersion + garbageCollect.
Keeping the slice now avoids refactoring it back when ordered upgrades are implemented. For the same reason, DevicePlugin should also stay as a slice — it already uses multiple DaemonSets during version transitions.
|
/lgtm |
Add DRA DaemonSet reconciler
Summary
Implements a
DRAReconcilerthat manages DRA driver DaemonSets fromModule.spec.dra, mirroring theDevicePluginReconcilerpattern. Also improves theDevicePluginReconcilerwith role labeling, cleanup onspec.devicePluginremoval, and clearer naming.Changes
internal/constants/constants.go: AddDRARoleLabelValue = "dra"andDevicePluginRoleLabelValue = "device-plugin"constantsinternal/controllers/dra_reconciler.go: New DRA reconciler with helper and DaemonSet creator — creates DaemonSets with mandatory kubelet-plugins host-path volumes, DRA-specific container names,DaemonSetRolelabel, and status patchinginternal/controllers/dra_reconciler_test.go: Comprehensive unit tests covering all reconcile paths, helper methods, and DaemonSet spec generationinternal/controllers/device_plugin_reconciler.go:getModuleDevicePluginDaemonSetsto exclude DRA-labeled DaemonSetsDevicePluginRoleLabelValuelabel to device plugin DaemonSets (exclusion-based query kept for backward compat, withTODO(3.0)to switch)spec.devicePluginis removed (delete DaemonSets + clear status)handleModuleDeletion→deleteDevicePluginDaemonSetsfor clarityinternal/controllers/device_plugin_reconciler_test.go: Tests for DRA exclusion,spec.devicePluginnil cleanup,clearDevicePluginStatus, and updated label expectationsTesting
Reconcile(good flow, error flows, deletion, spec removal),handleDRA(create/update),deleteDRADaemonSets,moduleUpdateDRAStatus,clearDRAStatus,setDRAAsDesired(with/without init container, volumes, labels),getModuleDRADaemonSets,clearDevicePluginStatus, and device plugin nil cleanup pathsAcceptance Criteria
DRAReconcilerwatchesModuleand ownsDaemonSetresourcesspec.dra == nilGenerateName: mod.Name + "-dra-"kubelet-pluginsandkubelet-plugins-registryhost paths; user volumes appended afterdra-init(init) anddra(main)ModuleNameLabel,DaemonSetRole: DRARoleLabelValuenodeSelectorincludesGetKernelModuleReadyNodeLabel(ns, name)spec.drais removed or Module is deletedstatus.drais patched mirroringmoduleUpdateDevicePluginStatus