You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restart driver pods in place when driver config is unchanged
A patch chart upgrade can change only cosmetic pod-template metadata
(e.g. the helm.sh/chart label) without changing the driver itself. The
upgrade controller keys on the DaemonSet's controller revision hash, so
such a change still cordons the node, evicts running GPU workloads, and
drains the node -- for no driver benefit.
Register a RestartOnlyPredicate on the upgrade state manager (from the
UpgradeReconciler) that compares DRIVER_CONFIG_DIGEST -- a hash of the
install-relevant driver config, already stamped on the driver pod
template -- between the running pod and the desired DaemonSet. When the
digests match, the driver pod is rolled in place without cordon,
eviction, or drain; the driver fast-path keeps the kernel modules loaded
across the restart, so running GPU workloads are not disrupted. A missing
or differing digest falls back to the full upgrade flow.
The digest env name and a reader for it live in internal/config beside
the digest definition; the restart-only routing decision is a method on
the upgrade controller, registered in SetupWithManager. Depends on the
RestartOnlyPredicate hook in k8s-operator-libs; the vendored dependency
bump follows once that change is released.
Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
0 commit comments