Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,73 @@ vGPU_device_memory_limit_in_bytes{ctrname="cuda-container",deviceuuid="GPU-xxxx"
vGPU_device_memory_usage_in_bytes{ctrname="cuda-container",deviceuuid="GPU-xxxx",podname="hami-device",podnamespace="default",vdeviceid="0",zone="vGPU"} 2.109867008e+09
```

# Troubleshooting

## Node `volcano.sh/vgpu-memory` is `0` in `kubectl describe node`

The device-plugin advertises Memory at the granularity of one device per
MiB (controlled by `gpuMemoryFactor`, default `1`). On nodes with large
GPUs (e.g. 46 GiB cards × 2 = ~92 K devices), the kubelet ↔ device-plugin
`ListAndWatch` gRPC message can exceed the kubelet's default 4 MiB
receive limit. The kubelet then drops the advertise and the node's
`Allocatable` for `volcano.sh/vgpu-memory` is reported as `0`, while the
other resources (`vgpu-number`, `vgpu-cores`) are correct.

Symptoms:

```text
$ kubectl get node <gpu-node> -o jsonpath='{.status.allocatable}'
"volcano.sh/vgpu-cores": "200"
"volcano.sh/vgpu-memory": "0" # <- broken
"volcano.sh/vgpu-number": "20"
```

```text
# from the Volcano scheduler
queue resource quota insufficient: insufficient volcano.sh/vgpu-memory
```

### Fix: increase `gpuMemoryFactor`

Edit the device-plugin ConfigMap (`volcano-vgpu-device-config`) and
raise `gpuMemoryFactor` so each advertised device represents a larger
chunk of memory. For a 46 GiB card, `gpuMemoryFactor: 1024` reduces the
advertised count to ~45 devices/card (well within the 4 MiB gRPC
window).

```yaml
data:
device-config.yaml: |-
nvidia:
...
gpuMemoryFactor: 1024 # 1 device == 1024 MiB
```

After the ConfigMap change, restart the device-plugin DaemonSet so the
new factor takes effect:

```bash
kubectl rollout restart -n kube-system ds/volcano-device-plugin
```

> **Important — pod yaml unit changes when `gpuMemoryFactor` is not 1.**
>
> The `volcano.sh/vgpu-memory` resource limit is interpreted as an
> integer count of advertised devices, so its meaning is
> `vgpu-memory × gpuMemoryFactor` MiB. With the default `gpuMemoryFactor: 1`,
> `vgpu-memory: 4000` requests 4000 MiB. With `gpuMemoryFactor: 1024`,
> the same 4000 MiB partition is requested as `vgpu-memory: 4`
> (4 × 1024 MiB).
>
> The Volcano queue's `capability.volcano.sh/vgpu-memory` (if set) must
> use the same unit. Update both the deployment limits and the queue
> capability when changing `gpuMemoryFactor`.
>
> The hard memory enforcement (CUDA + Vulkan via HAMi-core) is
> unaffected: the device-plugin's `Allocate` always emits
> `CUDA_DEVICE_MEMORY_LIMIT_<i>` in MiB by multiplying the requested
> count by `gpuMemoryFactor`.

# Issues and Contributing
[Checkout the Contributing document!](CONTRIBUTING.md)

Expand Down
117 changes: 117 additions & 0 deletions doc/vulkan-vgpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Vulkan vGPU 지원

이 device-plugin 은 CUDA workload 와 동일하게 **Vulkan workload** 도 메모리 partitioning 을 enforce 한다. Volcano scheduler 와 함께 사용한다.

## 작동 원리

1. **libvgpu (HAMi-core) vulkan-layer**: `vkAllocateMemory` 를 후킹하여 `CUDA_DEVICE_MEMORY_LIMIT_0` 를 enforce.
2. **device-plugin Allocate**: host 의 `/usr/local/vgpu/vulkan/implicit_layer.d/hami.json` 이 존재하면 container 의 `/etc/vulkan/implicit_layer.d/hami.json` 으로 bind-mount.
3. **HAMi mutating webhook (별도 install)**: pod annotation `hami.io/vulkan: "true"` 검사 → `HAMI_VULKAN_ENABLE=1` env + `NVIDIA_DRIVER_CAPABILITIES` 에 `graphics` 추가.
4. **enable_environment 가드**: manifest 의 `enable_environment: HAMI_VULKAN_ENABLE=1` 매치 시에만 layer 로드. annotation 없는 pod 은 영향 없음.

## 설치 (한 번만)

### 1. device-plugin 갱신

```bash
kubectl apply -f volcano-vgpu-device-plugin.yml
# 또는 CDI 모드:
# kubectl apply -f volcano-vgpu-device-plugin-cdi.yml
```

device-plugin 의 postStart hook 이 image 안의 hami.json 을 host `/usr/local/vgpu/vulkan/implicit_layer.d/` 로 자동 복사한다.

### 2. HAMi mutating webhook 별도 install

```bash
helm repo add hami https://project-hami.github.io/HAMi
helm install hami-webhook hami/hami \
--namespace kube-system \
--set devicePlugin.enabled=false \
--set scheduler.kubeScheduler.enabled=false \
--set scheduler.extender.enabled=false \
--set admissionWebhook.enabled=true
```

webhook 만 활성화 — Volcano scheduler 와 device-plugin 은 그대로 유지.

### 3. (선택) Fallback manifest DaemonSet

device-plugin 이 init 으로 manifest 를 host 에 자동 배치하지 못하는 환경에서:

```bash
kubectl apply -f volcano-vgpu-vulkan-manifest.yml
```

## 사용

pod 에 annotation `hami.io/vulkan: "true"` + `nvidia.com/gpumem` resource limit 추가:

```yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
hami.io/vulkan: "true"
spec:
schedulerName: volcano
containers:
- name: vulkan-app
image: <Vulkan 사용 image>
resources:
limits:
nvidia.com/gpu: 1
nvidia.com/gpumem: 4000
```

전체 예시: `examples/vulkan-pod.yaml`

## 검증

container 안에서:

```bash
# 1. env 주입 확인
env | grep -E '(HAMI_VULKAN|DRIVER_CAPABILITIES)'
# 기대: HAMI_VULKAN_ENABLE=1, NVIDIA_DRIVER_CAPABILITIES=...,graphics

# 2. manifest 파일 mount 확인
ls /etc/vulkan/implicit_layer.d/hami.json

# 3. CUDA_DEVICE_MEMORY_LIMIT 확인
env | grep CUDA_DEVICE_MEMORY_LIMIT
# 기대: CUDA_DEVICE_MEMORY_LIMIT_0=4000m

# 4. Vulkan tool 로 memory limit 확인 (Vulkan app 실행 시)
# 예: Isaac Sim Kit boot log 의 'GPU Memory: <limit> MB'
```

## 비활성화

annotation `hami.io/vulkan: "true"` 가 없으면 webhook 은 no-op. 즉:
- env `HAMI_VULKAN_ENABLE` 미주입
- manifest 의 `enable_environment` 가드 unmatched
- Vulkan layer 안 로드
- 일반 CUDA pod 동작 그대로

## 트러블슈팅

| 증상 | 원인 | 해결 |
|---|---|---|
| Vulkan app 이 메모리 한계 무시 | webhook annotation 처리 안 됨 | `kubectl get pod ... -o yaml` 로 env 에 HAMI_VULKAN_ENABLE 있는지 확인 |
| `manifest 파일 not found` | host 에 hami.json 미배치 | DaemonSet pod log 또는 `ls /usr/local/vgpu/vulkan/implicit_layer.d/` 확인 |
| `vk_icdNegotiateLoaderICDInterfaceVersion -3` | NVIDIA Vulkan ICD 의존성 부족 | container image 에 libGLX_nvidia, libEGL, X11 라이브러리 포함 |
| 노드 `volcano.sh/vgpu-memory: 0`, scheduler `queue resource quota insufficient` | 큰 GPU (40+ GiB) 환경에서 vgpu-memory device 수가 kubelet gRPC 4 MiB 한계 초과 | ConfigMap 의 `gpuMemoryFactor` 키우기 (예: 1024). 자세한 내용은 README 의 Troubleshooting 섹션 참고 |

## `gpuMemoryFactor` 사용 시 vgpu-memory 단위 변경

운영자가 ConfigMap 의 `gpuMemoryFactor` 를 1 보다 키우면 pod 의 `volcano.sh/vgpu-memory` limit 단위가 _chunks_ 로 변경된다. Vulkan 분할 enforcement 자체는 영향 없으나 (`CUDA_DEVICE_MEMORY_LIMIT_<i>` env 가 device-plugin 의 Allocate 에서 자동으로 `chunks * factor` MiB 로 박힘), 사용자가 yaml 작성 시 단위를 의식해야 한다.

| `gpuMemoryFactor` | yaml 의 `vgpu-memory: 4000` 의미 | yaml 의 `vgpu-memory: 4` 의미 |
|---|---|---|
| 1 (default) | 4000 MiB | 4 MiB |
| 1024 | 4000 chunks × 1024 MiB ≈ 4 TiB (대부분 unschedulable) | 4 GiB |

요컨대 `gpuMemoryFactor=1024` 환경에서 4 GiB partition 을 원하면 `vgpu-memory: 4` 로 박는다.

Volcano Queue 의 `capability.volcano.sh/vgpu-memory` 도 같은 단위로 맞춰야 한다.
3 changes: 2 additions & 1 deletion docker/Dockerfile.ubuntu20.04
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ RUN go install github.com/NVIDIA/mig-parted/cmd/nvidia-mig-parted@latest
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS nvidia_builder
ARG TARGETARCH
RUN apt-get update
RUN apt-get -y install wget openssl libssl-dev
RUN apt-get -y install wget openssl libssl-dev libvulkan-dev
RUN case "${TARGETARCH}" in \
"amd64") wget https://cmake.org/files/v3.19/cmake-3.19.8-Linux-x86_64.tar.gz ;; \
"arm64") wget https://cmake.org/files/v3.19/cmake-3.19.8-Linux-aarch64.tar.gz ;; \
Expand All @@ -55,5 +55,6 @@ COPY --from=builder /go/src/volcano.sh/devices/volcano-vgpu-monitor /usr/bin/vol
COPY --from=builder /go/bin/nvidia-mig-parted /usr/bin/nvidia-mig-parted
COPY --from=builder /go/src/volcano.sh/devices/lib/nvidia/ld.so.preload /k8s-vgpu/lib/nvidia/
COPY --from=nvidia_builder /libvgpu/build/libvgpu.so /k8s-vgpu/lib/nvidia/
COPY --from=nvidia_builder /libvgpu/etc/vulkan/implicit_layer.d/hami.json /k8s-vgpu/lib/nvidia/vulkan/implicit_layer.d/hami.json

ENTRYPOINT ["volcano-vgpu-device-plugin"]
34 changes: 34 additions & 0 deletions examples/vulkan-pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# HAMi Vulkan vGPU 분할 활성화 예시 pod (Volcano scheduler).
#
# 동작 흐름:
# 1. annotation `hami.io/vulkan: "true"` → HAMi mutating webhook 이
# HAMI_VULKAN_ENABLE=1 env + NVIDIA_DRIVER_CAPABILITIES 의 graphics 캡 주입
# 2. device-plugin Allocate 가 host 의 hami.json 을 container 에 자동 mount
# 3. libvgpu (HAMi-core) 의 vkAllocateMemory 후킹이 nvidia.com/gpumem 한계 enforce
#
# 설치 전제:
# - volcano-vgpu-device-plugin 의 vulkan-v1 image 가 deploy 됨
# - HAMi mutating webhook 이 별도 helm install 됨 (doc/vulkan-vgpu.md 참고)
apiVersion: v1
kind: Pod
metadata:
name: vulkan-vgpu-demo
annotations:
hami.io/vulkan: "true"
spec:
schedulerName: volcano
containers:
- name: vulkan-app
image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 1
# nvidia.com/gpumem unit depends on the device-plugin ConfigMap's
# gpuMemoryFactor (default 1). With factor=1, "4000" == 4000 MiB.
# With factor=1024 (recommended for 40+ GiB cards to stay under
# kubelet's 4 MiB gRPC ListAndWatch window), the same 4 GiB
# partition is requested as "4". See README troubleshooting
# section for details.
nvidia.com/gpumem: 4000 # MiB at gpuMemoryFactor=1
nvidia.com/gpucores: 50 # %
6 changes: 6 additions & 0 deletions pkg/plugin/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,12 @@ func (plugin *nvidiaDevicePlugin) Allocate(ctx context.Context, reqs *pluginapi.
ReadOnly: true},
)
}
// Mount Vulkan implicit layer manifest so the HAMi Vulkan layer
// activates for pods that set HAMI_VULKAN_ENABLE=1 (done by the
// HAMi mutating webhook when the pod carries hami.io/vulkan="true").
// Skipped if the host file is absent to avoid blocking pod startup
// on nodes where the postStart copy has not yet completed.
response.Mounts = append(response.Mounts, buildVulkanManifestMount(hostHookPath)...)
}
responses.ContainerResponses = append(responses.ContainerResponses, response)
}
Expand Down
56 changes: 56 additions & 0 deletions pkg/plugin/server_vulkan_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package plugin

import (
"os"
"path/filepath"
"testing"
)

func TestBuildVulkanManifestMount_Present(t *testing.T) {
tmp := t.TempDir()

manifestDir := filepath.Join(tmp, "vulkan", "implicit_layer.d")
if err := os.MkdirAll(manifestDir, 0755); err != nil {
t.Fatal(err)
}
manifestPath := filepath.Join(manifestDir, "hami.json")
if err := os.WriteFile(manifestPath, []byte("{}"), 0644); err != nil {
t.Fatal(err)
}

mounts := buildVulkanManifestMount(tmp)
if len(mounts) != 1 {
t.Fatalf("expected 1 mount, got %d", len(mounts))
}
if mounts[0].ContainerPath != "/etc/vulkan/implicit_layer.d/hami.json" {
t.Errorf("unexpected ContainerPath: %s", mounts[0].ContainerPath)
}
if mounts[0].HostPath != manifestPath {
t.Errorf("unexpected HostPath: %s (want %s)", mounts[0].HostPath, manifestPath)
}
if !mounts[0].ReadOnly {
t.Error("expected ReadOnly=true")
}
}

func TestBuildVulkanManifestMount_Absent(t *testing.T) {
tmp := t.TempDir()
mounts := buildVulkanManifestMount(tmp)
if len(mounts) != 0 {
t.Errorf("expected 0 mounts when manifest absent, got %d", len(mounts))
}
}
48 changes: 48 additions & 0 deletions pkg/plugin/vulkan.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package plugin

import (
"os"

pluginapi "k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1"
)

// buildVulkanManifestMount returns a kubelet device-plugin Mount that exposes
// the HAMi Vulkan implicit layer manifest to the container. The manifest file
// is placed on the host by the device-plugin postStart lifecycle hook (which
// recursively copies /k8s-vgpu/lib/nvidia/. to HOOK_PATH). When the host file
// is absent the mount is skipped so we do not block pod startup on nodes that
// have not yet been populated.
//
// hostHookPath corresponds to the HOOK_PATH env (typically /usr/local/vgpu in
// this fork). The manifest path mirrors the directory layout shipped by the
// Dockerfile: <HOOK_PATH>/vulkan/implicit_layer.d/hami.json.
//
// Pods that opt into Vulkan partitioning by setting hami.io/vulkan="true"
// receive the layer activation env (HAMI_VULKAN_ENABLE=1) from the HAMi
// mutating webhook; the manifest's enable_environment guard then triggers the
// Vulkan layer load.
func buildVulkanManifestMount(hostHookPath string) []*pluginapi.Mount {
vulkanManifestHost := hostHookPath + "/vulkan/implicit_layer.d/hami.json"
if _, err := os.Stat(vulkanManifestHost); err != nil {
return nil
}
return []*pluginapi.Mount{{
ContainerPath: "/etc/vulkan/implicit_layer.d/hami.json",
HostPath: vulkanManifestHost,
ReadOnly: true,
}}
}
11 changes: 10 additions & 1 deletion volcano-vgpu-device-plugin-cdi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,15 @@ data:
deviceSplitCount: 10
deviceMemoryScaling: 1
deviceCoreScaling: 1
# gpuMemoryFactor: bytes-per-MiB unit at which the plugin advertises
# vgpu-memory to kubelet. Default 1 == 1 MiB per device. On nodes
# with large GPUs (e.g. 40+ GiB), the resulting device count can
# exceed kubelet's 4 MiB ListAndWatch gRPC window, causing
# node Allocatable for volcano.sh/vgpu-memory to be reported as 0.
# Increase to a larger chunk (e.g. 1024 -> 1 device == 1 GiB) to
# stay under the gRPC window. NOTE: pod yaml volcano.sh/vgpu-memory
# then represents the count of these chunks (e.g. 4 == 4 GiB at
# gpuMemoryFactor=1024). See README troubleshooting section.
gpuMemoryFactor: 1
knownMigGeometries:
- models: [ "A30" ]
Expand Down Expand Up @@ -208,7 +217,7 @@ spec:
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "cp -f /k8s-vgpu/lib/nvidia/* /usr/local/vgpu/"]
command: ["/bin/sh", "-c", "cp -rf /k8s-vgpu/lib/nvidia/. /usr/local/vgpu/"]
name: volcano-device-plugin
env:
- name: NODE_NAME
Expand Down
Loading