Skip to content

Add DRA DaemonSet reconciler#1283

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24470-dra-daemonset-reconciler
Jun 11, 2026
Merged

Add DRA DaemonSet reconciler#1283
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24470-dra-daemonset-reconciler

Conversation

@TomerNewman

@TomerNewman TomerNewman commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Add DRA DaemonSet reconciler

Summary

Implements a DRAReconciler that manages DRA driver DaemonSets from Module.spec.dra, mirroring the DevicePluginReconciler pattern. Also improves the DevicePluginReconciler with role labeling, cleanup on spec.devicePlugin removal, and clearer naming.

Changes

  • internal/constants/constants.go: Add DRARoleLabelValue = "dra" and DevicePluginRoleLabelValue = "device-plugin" constants
  • internal/controllers/dra_reconciler.go: New DRA reconciler with helper and DaemonSet creator — creates DaemonSets with mandatory kubelet-plugins host-path volumes, DRA-specific container names, DaemonSetRole label, and status patching
  • internal/controllers/dra_reconciler_test.go: Comprehensive unit tests covering all reconcile paths, helper methods, and DaemonSet spec generation
  • internal/controllers/device_plugin_reconciler.go:
    • Fix getModuleDevicePluginDaemonSets to exclude DRA-labeled DaemonSets
    • Add DevicePluginRoleLabelValue label to device plugin DaemonSets (exclusion-based query kept for backward compat, with TODO(3.0) to switch)
    • Add cleanup logic when spec.devicePlugin is removed (delete DaemonSets + clear status)
    • Rename handleModuleDeletiondeleteDevicePluginDaemonSets for clarity
  • internal/controllers/device_plugin_reconciler_test.go: Tests for DRA exclusion, spec.devicePlugin nil cleanup, clearDevicePluginStatus, and updated label expectations

Testing

  • Unit tests: Full coverage of Reconcile (good flow, error flows, deletion, spec removal), handleDRA (create/update), deleteDRADaemonSets, moduleUpdateDRAStatus, clearDRAStatus, setDRAAsDesired (with/without init container, volumes, labels), getModuleDRADaemonSets, clearDevicePluginStatus, and device plugin nil cleanup paths
  • Integration tests: N/A
  • Coverage: 100% on all new behavioral functions

Acceptance Criteria

  • AC-1: DRAReconciler watches Module and owns DaemonSet resources
  • AC-2: Reconcile is a no-op when spec.dra == nil
  • AC-3: DaemonSet is created with GenerateName: mod.Name + "-dra-"
  • AC-4: DaemonSet auto-mounts kubelet-plugins and kubelet-plugins-registry host paths; user volumes appended after
  • AC-5: Container names are dra-init (init) and dra (main)
  • AC-6: DaemonSet pod labels include ModuleNameLabel, DaemonSetRole: DRARoleLabelValue
  • AC-7: DaemonSet nodeSelector includes GetKernelModuleReadyNodeLabel(ns, name)
  • AC-8: DaemonSet is updated when Module spec changes and deleted when spec.dra is removed or Module is deleted
  • AC-9: status.dra is patched mirroring moduleUpdateDevicePluginStatus
  • AC-10: Unit tests verify DaemonSet spec generation, early exit, create/update/delete paths, and status patching

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@netlify

netlify Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit 4f51daa
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kmm/deploys/6a29415321b56e00089cc849
😎 Deploy Preview https://deploy-preview-1283--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: TomerNewman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 9, 2026
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/cc @ybettan
/uncc @mresvanis
/assign @ybettan @yevgeny-shnaidman

@k8s-ci-robot k8s-ci-robot requested review from ybettan and removed request for mresvanis June 9, 2026 10:53
@codecov-commenter

codecov-commenter commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.11628% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.55%. Comparing base (fa23a9b) to head (4f51daa).
⚠️ Report is 378 commits behind head on main.

Files with missing lines Patch % Lines
internal/controllers/dra_reconciler.go 90.19% 14 Missing and 1 partial ⚠️
internal/controllers/device_plugin_reconciler.go 89.47% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1283      +/-   ##
==========================================
- Coverage   79.09%   73.55%   -5.54%     
==========================================
  Files          51       67      +16     
  Lines        5109     4871     -238     
==========================================
- Hits         4041     3583     -458     
- Misses        882     1123     +241     
+ Partials      186      165      -21     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TomerNewman TomerNewman marked this pull request as ready for review June 9, 2026 11:19
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 9, 2026
@k8s-ci-robot k8s-ci-robot requested a review from mresvanis June 9, 2026 11:19
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/uncc @mresvanis

@k8s-ci-robot k8s-ci-robot removed the request for review from mresvanis June 9, 2026 11:20
@TomerNewman TomerNewman force-pushed the MGMT-24470-dra-daemonset-reconciler branch from 591cd45 to 38c5484 Compare June 9, 2026 12:00
// remove the older version module loader daemonsets
for _, ds := range dsList.Items {
if ds.GetLabels()[constants.DaemonSetRole] != constants.ModuleLoaderRoleLabelValue {
role := ds.GetLabels()[constants.DaemonSetRole]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding Role label to the device plugin daemonset? This way you can add the Role label to opts := []client.ListOption{....}
and get the needed DS without afterwards going over the label.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not backward compatible, old DS will not be found after an upgrade

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it matter? The code will run CreateOrPatch with the same name and namespace, and the DS will be updated accordingly, and will be re-found in the next loop. Is there something catch that i am missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DS is created with GenerateName, not a fixed name. So CreateOrPatch would create a new DS instead of updating the old one. The old DS would remain orphaned — never found by the query, never garbage collected.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating - we will add the label now but quarry like it was implemented before, when 3.0 release will come, we will quarry with this new label.

Comment thread internal/controllers/dra_reconciler.go Outdated
type draReconcilerHelperAPI interface {
getModuleDRADaemonSets(ctx context.Context, name, namespace string) ([]appsv1.DaemonSet, error)
handleDRA(ctx context.Context, mod *kmmv1beta1.Module, existingDRADS []appsv1.DaemonSet) error
handleModuleDeletion(ctx context.Context, existingDRADS []appsv1.DaemonSet) error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to delete DRA DS both when the Module is deleted, and when the Module is updated and its DRA field is set to nil. WDYT about renaming it to: deleteDRADaemonset, or something like that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it in puepose in order to be the same as we do it in DevicePlugin, but since you are right I renamed it both in DRA and in DevicePlugin controllers.

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

we should also add a code to the DevicePluginReconciler, that deletes the DevicePlugin DS in case device plugin is not set in Module (the same way as you do in DRA reconciler). This should cover the case where user switches between DRA and DevicePlugin

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

what about DeviceResource? Where will it be reconciled?

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 9, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: TomerNewman / name: TomerNewman (4f51daa)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 9, 2026
@TomerNewman TomerNewman force-pushed the MGMT-24470-dra-daemonset-reconciler branch from 8c6827f to 86d6c40 Compare June 9, 2026 14:05
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 9, 2026
@TomerNewman TomerNewman force-pushed the MGMT-24470-dra-daemonset-reconciler branch 2 times, most recently from 39f64db to a8c3014 Compare June 9, 2026 14:15
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

what about DeviceResource? Where will it be reconciled?

DeviceClass do you mean?

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

what about DeviceResource? Where will it be reconciled?

DeviceClass do you mean?

yes :)

@TomerNewman TomerNewman force-pushed the MGMT-24470-dra-daemonset-reconciler branch 2 times, most recently from 794e6ea to 939c546 Compare June 10, 2026 10:46
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 10, 2026
Implement a DRAReconciler that manages DRA driver DaemonSets from
Module spec.dra, mirroring the DevicePluginReconciler pattern. The
reconciler creates DaemonSets with mandatory kubelet-plugins and
kubelet-plugins-registry host-path volumes, DRA-specific container
names, and a DaemonSetRole label to distinguish DRA DaemonSets.

Status patching reports targeted nodes and available pods in
status.dra, and clearing status.dra when spec.dra is removed.

Also fixes DevicePluginReconciler to exclude DRA-labeled DaemonSets
from its device-plugin list, preventing cross-reconciler interference.

Additionally addresses review feedback:
- Rename handleModuleDeletion to deleteDRADaemonSets and
  deleteDevicePluginDaemonSets for clarity
- Add DevicePluginRoleLabelValue label to device plugin DaemonSets;
  the exclusion-based query is kept for backward compatibility with
  a TODO(3.0) to switch to direct role-label querying
- Add cleanup logic to DevicePluginReconciler to delete existing
  DaemonSets and clear status when spec.devicePlugin is removed
@TomerNewman TomerNewman force-pushed the MGMT-24470-dra-daemonset-reconciler branch from 939c546 to 4f51daa Compare June 10, 2026 10:49
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 10, 2026
}
}

func (drh *draReconcilerHelper) getModuleDRADaemonSets(ctx context.Context, name, namespace string) ([]appsv1.DaemonSet, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should see more then 1 DRA daemoset here. lets return the pointer to the DS, instead of slice

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same can implemented also for DevicePlugin

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, DRA only creates a single DaemonSet per Module — you're right about that. However, the next phase of DRA work will add ordered upgrade support, mirroring DevicePlugin's existing mechanism. During ordered upgrades, two DRA DaemonSets will coexist (old version kept until DesiredNumberScheduled == 0, then garbage collected) — same as DevicePlugin does today via getExistingDSFromVersion + garbageCollect.

Keeping the slice now avoids refactoring it back when ordered upgrades are implemented. For the same reason, DevicePlugin should also stay as a slice — it already uses multiple DaemonSets during version transitions.

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2026
@k8s-ci-robot k8s-ci-robot merged commit aa2e9ba into kubernetes-sigs:main Jun 11, 2026
23 checks passed
@TomerNewman TomerNewman deleted the MGMT-24470-dra-daemonset-reconciler branch June 11, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants