Skip to content

make kubelet plugin dir configurable#464

Merged
guptaNswati merged 5 commits intokubernetes-sigs:mainfrom
guptaNswati:kubelet-dir-flags
Aug 20, 2025
Merged

make kubelet plugin dir configurable#464
guptaNswati merged 5 commits intokubernetes-sigs:mainfrom
guptaNswati:kubelet-dir-flags

Conversation

@guptaNswati
Copy link
Copy Markdown
Contributor

@guptaNswati guptaNswati commented Aug 13, 2025

Backport kubernetes-sigs/dra-example-driver#98 to make kubelet dir path configurable via helm.

fixes: #339

Follow-ups:

Changes tested in local kind cluster.

$ git diff
helm upgrade -i --create-namespace --namespace nvidia-dra-driver-gpu nvidia-dra-driver-gpu ${PROJECT_DIR}/deployments/helm/nvidia-dra-driver-gpu \
     ${NVIDIA_DRIVER_ROOT:+--set nvidiaDriverRoot=${NVIDIA_DRIVER_ROOT}} \
     ${MASK_NVIDIA_DRIVER_PARAMS:+--set maskNvidiaDriverParams=${MASK_NVIDIA_DRIVER_PARAMS}} \
+    --set kubeletPlugin.kubeletRegistrarDirectoryPath=/var/lib/kubelet2/plugins_registry \
+    --set kubeletPlugin.kubeletPluginsDirectoryPath=/var/lib/kubelet2/plugins \

@@ -44,6 +44,7 @@ nodes:
     nodeRegistration:
       kubeletExtraArgs:
         v: "1"
+        root-dir: /var/lib/kubelet2
 - role: worker
   labels:
     node-role.x-k8s.io/worker: ""
@@ -53,6 +54,7 @@ nodes:
     nodeRegistration:
       kubeletExtraArgs:
         v: "1"
+        root-dir: /var/lib/kubelet2

kubectl get po -n nvidia-dra-driver-gpu 
NAME                                                READY   STATUS    RESTARTS   AGE
nvidia-dra-driver-gpu-controller-58b49ff77f-zf4f5   1/1     Running   0          6m10s
nvidia-dra-driver-gpu-kubelet-plugin-v67jv          2/2     Running   0          6m10s

nonblockinggrpcserver.go:90] "GRPC server started" logger="dra" endpoint="/var/lib/kubelet2/plugins/compute-domain.nvidia.com/dra.sock"
I0815 19:30:49.872525       1 nonblockinggrpcserver.go:90] "GRPC server started" logger="registrar" endpoint="/var/lib/kubelet2/plugins_registry/compute-domain.nvidia.com-reg.sock"

I0815 19:30:50.068397       1 nonblockinggrpcserver.go:90] "GRPC server started" logger="dra" endpoint="/var/lib/kubelet2/plugins/gpu.nvidia.com/dra.sock"
I0815 19:30:50.068501       1 nonblockinggrpcserver.go:90] "GRPC server started" logger="registrar" endpoint="/var/lib/kubelet2/plugins_registry/gpu.nvidia.com-reg.sock"

kubectl logs pod -n gpu-test2 
Defaulted container "ctr0" out of: ctr0, ctr1
GPU 0: NVIDIA GB200 (UUID: GPU-dcf75458-101f-9395-1d4c-7c78d15bd3c2)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Aug 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@guptaNswati
Copy link
Copy Markdown
Contributor Author

guptaNswati commented Aug 14, 2025

The driver is still trying to register at old path. Need to debug more...

+++ b/demo/clusters/kind/install-dra-driver-gpu.sh
@@ -27,6 +27,8 @@ kubectl label node -l node-role.x-k8s.io/worker --overwrite nvidia.com/gpu.prese
 helm upgrade -i --create-namespace --namespace nvidia-dra-driver-gpu nvidia-dra-driver-gpu ${PROJECT_DIR}/deployments/helm/nvidia-dra-driver-gpu \
     ${NVIDIA_DRIVER_ROOT:+--set nvidiaDriverRoot=${NVIDIA_DRIVER_ROOT}} \
     ${MASK_NVIDIA_DRIVER_PARAMS:+--set maskNvidiaDriverParams=${MASK_NVIDIA_DRIVER_PARAMS}} \
+    --set kubeletPlugin.kubeletRegistrarDirectoryPath=/var/lib/kubelet2/plugins_registry \
+    --set kubeletPlugin.kubeletPluginsDirectoryPath=/var/lib/kubelet2/plugins \

--- a/demo/clusters/kind/scripts/kind-cluster-config.yaml
+++ b/demo/clusters/kind/scripts/kind-cluster-config.yaml
@@ -44,6 +44,7 @@ nodes:
     nodeRegistration:
       kubeletExtraArgs:
         v: "1"
+        root-dir: /var/lib/kubelet2
 - role: worker
   labels:
     node-role.x-k8s.io/worker: ""
@@ -53,6 +54,7 @@ nodes:
     nodeRegistration:
       kubeletExtraArgs:
         v: "1"
+        root-dir: /var/lib/kubelet2

$ ps aux | grep kubelet | grep -v grep | grep root-dir
root     3872652  5.3  0.0 8109376 36864 ?       Ssl  21:47   0:15 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.18.0.2 --node-labels=node-role.x-k8s.io/control-plane= --pod-infra-container-image=registry.k8s.io/pause:3.10 --provider-id=kind://docker/k8s-dra-driver-gpu-cluster/k8s-dra-driver-gpu-cluster-control-plane --root-dir=/var/lib/kubelet2 --v=1 --runtime-cgroups=/system.slice/containerd.service

root@k8s-dra-driver-gpu-cluster-control-plane:/# cat /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.18.0.2 --node-labels=node-role.x-k8s.io/control-plane= --pod-infra-container-image=registry.k8s.io/pause:3.10 --provider-id=kind://docker/k8s-dra-driver-gpu-cluster/k8s-dra-driver-gpu-cluster-control-plane --root-dir=/var/lib/kubelet2 --v=1"

$ kubectl get po nvidia-dra-driver-gpu-kubelet-plugin-gtznm  -n nvidia-dra-driver-gpu  -o yaml | grep /var/lib
    - mountPath: /var/lib/kubelet2/plugins_registry
    - mountPath: /var/lib/kubelet2/plugins
    - mountPath: /var/lib/kubelet2/plugins_registry
    - mountPath: /var/lib/kubelet2/plugins
      path: /var/lib/kubelet2/plugins_registry
      path: /var/lib/kubelet2/plugins
    - mountPath: /var/lib/kubelet2/plugins_registry
    - mountPath: /var/lib/kubelet2/plugins
    - mountPath: /var/lib/kubelet2/plugins_registry
    - mountPath: /var/lib/kubelet2/plugins

Error: error creating driver: start registrar: start gRPC server: listen on "/var/lib/kubelet/plugins_registry/compute-domain.nvidia.com-reg.sock": listen unix /var/lib/kubelet/plugins_registry/compute-domain.nvidia.com-reg.sock: bind: no such file or directory

its coming from ./vendor/k8s.io/dynamic-resource-allocation/kubeletplugin/draplugin.go which has a hard-coded path 

@klueska klueska self-requested a review August 15, 2025 18:35
@nojnhuh
Copy link
Copy Markdown
Contributor

nojnhuh commented Aug 15, 2025

I see a few things missing that are causing that error. These changes should get things further along: guptaNswati/k8s-dra-driver@kubelet-dir-flags...nojnhuh:k8s-dra-driver:pr/464

@guptaNswati
Copy link
Copy Markdown
Contributor Author

guptaNswati commented Aug 15, 2025

Oh i missed the envs but why do we need the kubelet path in for compute-domain-daemon? And atleast this should not cause the socket registation error

@nojnhuh
Copy link
Copy Markdown
Contributor

nojnhuh commented Aug 15, 2025

Sorry I got turned around amongst the compute-domain binaries. I see you already have the flags on the kubelet-plugin binary which is where I meant to add that.

@guptaNswati
Copy link
Copy Markdown
Contributor Author

Adding envs worked. Thank you @nojnhuh :)

Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Comment thread deployments/helm/nvidia-dra-driver-gpu/values.yaml
Comment thread cmd/compute-domain-kubelet-plugin/main.go
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Comment thread cmd/compute-domain-kubelet-plugin/driver.go Outdated
Comment thread cmd/compute-domain-kubelet-plugin/driver.go Outdated
Comment thread cmd/gpu-kubelet-plugin/device_state.go Outdated
Comment thread cmd/gpu-kubelet-plugin/driver.go Outdated
Comment thread cmd/gpu-kubelet-plugin/sharing.go Outdated
Comment thread cmd/gpu-kubelet-plugin/sharing.go
@klueska klueska added the feature issue/PR that proposes a new feature or functionality label Aug 19, 2025
Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Comment thread cmd/compute-domain-kubelet-plugin/driver.go Outdated
Comment thread cmd/gpu-kubelet-plugin/driver.go Outdated
Comment thread cmd/gpu-kubelet-plugin/sharing.go Outdated
Comment thread cmd/gpu-kubelet-plugin/sharing.go Outdated
Copy link
Copy Markdown
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small comments, but looking good overall.

Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Comment thread cmd/compute-domain-kubelet-plugin/computedomain.go Outdated
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
},
&cli.StringFlag{
Name: "kubelet-registrar-directory-path",
Usage: "Absolute path to the directory where kubelet stores plugin registrations.",
Copy link
Copy Markdown
Contributor

@jgehrcke jgehrcke Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I noted that in another PR: this is not where the kubelet stores plugin registrations. This is the directory where the kubelet looks for unix domain sockets to discover plugins. A plugin registration then is performed through such socket.

I'd rather have no help text than a misleading one. Feel free to act on this opinion as you wish (I am OK with a merge).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. see this is how upstream describes it https://github.com/kubernetes/dynamic-resource-allocation/blob/master/kubeletplugin/draplugin.go#L226

go doc --all k8s.io/dynamic-resource-allocation/kubeletplugin.RegistrarDirectoryPath
package kubeletplugin // import "k8s.io/dynamic-resource-allocation/kubeletplugin"

func RegistrarDirectoryPath(path string) Option
    RegistrarDirectoryPath sets the path to the directory where the kubelet
    expects to find registration sockets of plugins. Typically this is
    /var/lib/kubelet/plugins_registry with /var/lib/kubelet being the kubelet's
    data directory.

    This is also the default. Some Kubernetes clusters may use a different data
    directory. This path must be the same inside and outside of the driver's
    container. The directory must exist.

i am okay either way. i would prefer not to deviate from upstream as a different description may confuse more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as is for now. Feel free to merge.

@guptaNswati guptaNswati merged commit c0046b4 into kubernetes-sigs:main Aug 20, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature issue/PR that proposes a new feature or functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose kubelet plugin socket path(s) as configuration parameter(s)

4 participants