Skip to content

feat: Vulkan vGPU support — auto-mount HAMi Vulkan implicit layer manifest into containers#118

Open
100milliongold wants to merge 7 commits intoProject-HAMi:mainfrom
xiilab:pr/vulkan-upstream
Open

feat: Vulkan vGPU support — auto-mount HAMi Vulkan implicit layer manifest into containers#118
100milliongold wants to merge 7 commits intoProject-HAMi:mainfrom
xiilab:pr/vulkan-upstream

Conversation

@100milliongold
Copy link
Copy Markdown

Summary

Adds Vulkan vGPU support to volcano-vgpu-device-plugin. When a pod opts into Vulkan partitioning, the device-plugin auto-mounts the HAMi Vulkan implicit layer manifest into the container so the Vulkan loader picks up the layer that enforces per-pod memory limits on vkAllocateMemory.

This is the Volcano-side counterpart to Project-HAMi/HAMi#1803 and Project-HAMi/HAMi-core#182. The actual memory enforcement happens in the HAMi-core Vulkan layer; this PR just makes sure the manifest reaches the container.

Why

Vulkan workloads (Isaac Sim, ray tracing, GPU-accelerated rendering) currently bypass HAMi's per-container memory limit because allocations go through vkAllocateMemory / the NVIDIA Vulkan ICD, not the CUDA driver path. We hit this in production with Isaac Sim — Kit allocates several GB through Vulkan, ignored the requested partition, and OOM'd the host.

The fix needs three coordinated layers — HAMi-core's Vulkan layer (PR linked above), the admission webhook env injection (HAMi PR linked above), and this PR which makes the device-plugin actually deliver the implicit-layer manifest into the container.

What changed

File / Area Change
libvgpu submodule Project-HAMi/HAMi-core@vulkan-layer (companion PR #182). The submodule now ships etc/vulkan/implicit_layer.d/hami.json next to libvgpu.so.
docker/Dockerfile.ubuntu20.04 Install libvulkan-dev in the nvidia_builder stage so the HAMi-core Vulkan layer can compile against the Vulkan headers; COPY the hami.json manifest into the runtime image at /k8s-vgpu/lib/nvidia/vulkan/implicit_layer.d/hami.json.
volcano-vgpu-device-plugin{,-cdi}.yml postStart hook changes cp -fcp -rf so the new vulkan/implicit_layer.d/ subdirectory is copied to the host (not just the top-level .so files). Backwards compatible — cp -rf handles plain files identically.
pkg/plugin/vulkan.go New buildVulkanManifestMount(hostHookPath) helper. Returns a single Mount{} for /etc/vulkan/implicit_layer.d/hami.json when the host file exists; returns nil otherwise. Idempotent and side-effect-free on nodes without the manifest.
pkg/plugin/server.go Append the helper's mount to the Allocate response.
pkg/plugin/server_vulkan_test.go TDD coverage: Present (manifest exists → mount appended), Absent (manifest missing → no-op).
examples/vulkan-pod.yaml Minimal usage sample.
doc/vulkan-vgpu.md User-facing guide (architecture, install, verification, troubleshooting).

How it works

  1. The HAMi-core Vulkan layer (companion PR) hooks vkAllocateMemory and enforces the per-pod budget. The implicit-layer manifest is gated by enable_environment: HAMI_VULKAN_ENABLE=1.
  2. The HAMi admission webhook (companion PR) injects HAMI_VULKAN_ENABLE=1 and merges graphics into NVIDIA_DRIVER_CAPABILITIES for pods that carry the hami.io/vulkan: \"true\" annotation.
  3. This PR mounts the manifest into the container so the Vulkan loader can find it.

Pods without the annotation get neither the env nor the layer activation — the manifest's enable_environment guard means the layer doesn't load even if the file is present.

Compatibility / Breaking changes

  • CUDA-only workloads: zero behavior change. The Vulkan layer is gated by both the manifest's enable_environment and the per-pod budget — neither activates without the pod opting in.
  • Existing deployments: the cp -fcp -rf change in the postStart hook is backwards compatible; nodes that already have the previous lib directory get the new vulkan/ subdirectory added on next pod restart.
  • Nodes without the Vulkan manifest: buildVulkanManifestMount returns nil so the Allocate response is unchanged. No pod-startup blocker.

Test plan

  • go test ./pkg/plugin/... — Vulkan unit tests (Present / Absent) pass.
  • make build succeeds with the updated submodule.
  • make image produces a runtime image containing /k8s-vgpu/lib/nvidia/vulkan/implicit_layer.d/hami.json.
  • Deploy + verify: pod with hami.io/vulkan: \"true\" has /etc/vulkan/implicit_layer.d/hami.json mounted; pod without the annotation does not (or has it mounted but the layer doesn't load — same outcome).
  • E2E: Isaac Sim with nvidia.com/gpumem limit + the annotation; Kit boot log reports the configured partition size and the workload is held to it.

Notes for reviewers

  • The submodule update at this branch points at Project-HAMi/HAMi-core@vulkan-layer (PR #182). This PR cannot be merged until that one lands; happy to coordinate.
  • The pkg/plugin/vulkan.go helper is intentionally tiny so MIG / non-MIG / CDI paths can call it identically. (The HAMi PR #1803 follows the same pattern with its own helper.)
  • Happy to split into smaller pieces (submodule bump + Dockerfile / postStart hook / mount logic / docs) if that's easier to review.

Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
… compile

Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 100milliongold
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented Apr 27, 2026

Welcome @100milliongold! It looks like this is your first PR to Project-HAMi/volcano-vgpu-device-plugin 🎉

@hami-robot hami-robot Bot added the size/L label Apr 27, 2026
The device-plugin advertises vgpu-memory at one device per MiB, which
on nodes with 40+ GiB GPUs (e.g. RTX 6000 Ada × 2 == ~92 K devices)
exceeds the kubelet's 4 MiB gRPC ListAndWatch receive limit. The
kubelet then drops the advertise and the node's Allocatable for
volcano.sh/vgpu-memory is reported as 0, while vgpu-number /
vgpu-cores are correct. The Volcano scheduler's capacity plugin
treats this as 'queue resource quota insufficient'.

The fix already exists in the codebase (gpuMemoryFactor in the
device-config ConfigMap, default 1), but operators hitting the 0
issue have nothing in the README or docs that points at it. This
commit adds:

- README Troubleshooting section: symptoms, root cause, fix steps,
  unit-change warning, the Allocate emits CUDA_DEVICE_MEMORY_LIMIT_<i>
  in MiB regardless so hard memory enforcement (CUDA + Vulkan via
  HAMi-core) is unaffected.
- ConfigMap inline comment in both plugin yamls describing when to
  raise gpuMemoryFactor.
- doc/vulkan-vgpu.md: Vulkan side note explaining the unit mapping
  (vgpu-memory: 4 == 4 GiB at gpuMemoryFactor=1024).
- examples/vulkan-pod.yaml: clarify that the limit unit depends on
  gpuMemoryFactor.

No code change. Default behavior (gpuMemoryFactor=1) is preserved.

Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>
@100milliongold
Copy link
Copy Markdown
Author

Follow-up: bump libvgpu submodule again to pick up Step D cherry-picks

After Step D verification on ws-node074, two more libvgpu commits were cherry-picked into HAMi-core vulkan-layer (originally Step C Tasks 1+2; safely re-introduced under the libvgpu_vk.so split architecture):

  • a07d717 fix(vulkan): cache first next-gipa/gdpa + EnumerateDevice* via dispatch table
  • bdd5bbe fix(vulkan): GIPA/GDPA fallback to cached next when instance/device unknown

Without these, vkCreateDevice fails with VkErrorLayerNotPresent because NVIDIA's vkEnumerateDeviceExtensionProperties returns LayerNotPresent when the loader queries with our layer name. Path 4 of the Step D 4-path verification (vkAllocateMemory over-budget → OOM) needs vkCreateDevice to succeed first, so these hooks are required for production opt-in to work end-to-end.

What this PR does

  • Bump libvgpu submodule from previous Step D end (65930f4) to bdd5bbe.
  • Rebuild and tag the image as volcano-vgpu-device-plugin:vulkan-v3. After deploy, the DaemonSet's postStart cp -rf /k8s-vgpu/lib/nvidia/. /usr/local/vgpu/ will install the corrected libvgpu_vk.so on every GPU node.

Verification

ws-node074 isaac-launchable-0 4-path PASS confirmed against an out-of-band-installed bdd5bbe build (before submodule bump, manual scp to host); this PR just makes that build the permanent image-shipped one.

Submodule SHA: bdd5bbe (HAMi-core vulkan-layer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants