Skip to content

Add DRA support to ordered-upgrade state machine#1291

Merged
kubernetes-prow[bot] merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24474-dra-version-label-action-table
Jun 24, 2026
Merged

Add DRA support to ordered-upgrade state machine#1291
kubernetes-prow[bot] merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24474-dra-version-label-action-table

Conversation

@TomerNewman

@TomerNewman TomerNewman commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Extend the NodeLabelModuleVersionReconciler to handle DRA modules through the same ordered-upgrade label action table used by device plugin modules, adding a new version-dra node label prefix
  • Parameterize getLabelAndAction with a role-specific label function so the existing state machine works for both device plugin and DRA without duplication
  • Make the DRA reconciler version-aware: include version-dra in DaemonSet node selectors, create per-version DaemonSets (to avoid immutable selector errors on upgrade), and garbage collect old-version DaemonSets

Changes

  • internal/constants/constants.go — new DRAVersionLabelPrefix constant
  • internal/utils/kmmlabels.go — new GetDRAVersionLabelName, IsDRAVersionLabel; refactored GetNodesVersionLabels to use IsVersionLabel
  • internal/controllers/node_label_module_version_reconciler.go — added isDRA/draVersionLabel fields, setModulesDRAStatus (with graceful fallback when Module CR is deleted), getDRAPods via shared getDaemonSetPods helper, resolveLabel for role dispatch
  • internal/controllers/dra_reconciler.go — version label in DaemonSet labels + node selector, getExistingDRADSFromVersion, handleDRA version-aware creation, garbageCollectDRADaemonSets
  • Tests updated across all changed files with full DRA path coverage

Test plan

  • Unit tests: 385 controller specs, 44 utils specs — all passing
  • go vet and go build ./... clean
  • E2E on Minikube: deployed v1 DRA module (kmm_ci_a), upgraded to v2 (kmm_ci_b) via version-module label change, verified full label transition sequence, DRA DaemonSet lifecycle (new DS created, old GC'd), and kernel module swap via lsmod — zero controller errors

@netlify

netlify Bot commented Jun 22, 2026

Copy link
Copy Markdown

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit f23fc7d
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kmm/deploys/6a3bb05f2022a300084b2da3
😎 Deploy Preview https://deploy-preview-1291--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@kubernetes-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: TomerNewman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubernetes-prow kubernetes-prow Bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 22, 2026
@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.42105% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.74%. Comparing base (fa23a9b) to head (f23fc7d).
⚠️ Report is 387 commits behind head on main.

Files with missing lines Patch % Lines
internal/controllers/dra_reconciler.go 94.44% 1 Missing and 1 partial ⚠️
internal/utils/kmmlabels.go 66.66% 2 Missing ⚠️
...ontrollers/node_label_module_version_reconciler.go 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1291      +/-   ##
==========================================
- Coverage   79.09%   73.74%   -5.36%     
==========================================
  Files          51       67      +16     
  Lines        5109     5054      -55     
==========================================
- Hits         4041     3727     -314     
- Misses        882     1155     +273     
+ Partials      186      172      -14     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/retest

2 similar comments
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/retest

@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/retest

@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/assign @ybettan @yevgeny-shnaidman

fieldSelector := client.MatchingFields{"spec.nodeName": nodeName}
labelSelector := client.HasLabels{constants.ModuleNameLabel}
err := nlmvha.client.List(ctx, &kmmPodsList, labelSelector, fieldSelector)
return nlmvha.getDaemonSetPods(ctx, nodeName, client.HasLabels{constants.ModuleNameLabel})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that we will have a role label in DevicePlugin too

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

Suggestion:
from [node_label_module_version reconciler POV, it does not matter what is running: DRA or DevicePLugin.
Since they are mutually exlusive, i suggest we unify the labels:

  1. one label for both: beta.kmm.node.kubernetes.io/version-dra-device-plugin (or something else)
  2. in the reconciler, only gettting Pods needs to be changed (it should now check DevicePlugin or DRA pods).
  3. state machine should use the new label and should change the labelActionKey devicePLugin to a new name

@TomerNewman TomerNewman force-pushed the MGMT-24474-dra-version-label-action-table branch from c47d3f3 to 7e4fda5 Compare June 24, 2026 08:31
Extend the NodeLabelModuleVersionReconciler to handle DRA modules
through the same ordered-upgrade label action table used by device
plugin modules. The reconciler now detects whether a module uses DRA
(via spec.dra) and routes label resolution to the version-dra label
instead of version-device-plugin, reusing the existing state machine
without modification.
@TomerNewman TomerNewman force-pushed the MGMT-24474-dra-version-label-action-table branch from 7e4fda5 to f23fc7d Compare June 24, 2026 10:24
@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

/lgtm

@kubernetes-prow kubernetes-prow Bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 24, 2026
@kubernetes-prow kubernetes-prow Bot merged commit 8302e3f into kubernetes-sigs:main Jun 24, 2026
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants