ollama: WSL2 CDI fallback + AddDevice so the dev VM uses the NVIDIA GPU

Kabuki94 · Kabuki94 · commit 9a8181d0885e · 2026-05-11T12:45:55.000-04:00
Operator-flagged 2026-05-11: ollama in podman-MiOS-DEV loaded models
on CPU (`load_tensors: CPU model buffer size = 16676 MiB`) despite
nvidia-smi inside the dev VM working fine (`NVIDIA GeForce RTX 4090
... CUDA Version: 13.1`). Root causes:

1. mios-cdi-detect.service detected WSL2 + NVIDIA but bailed because
   nvidia-ctk isn't installed on podman-machine-os 6.0 (the dev VM
   base) and can't be cleanly layered. /etc/cdi/ stayed empty, so
   podman had no nvidia.com/gpu=all spec to attach.

2. usr/share/containers/systemd/ollama.container had no AddDevice=
   for the GPU. Even if a CDI spec existed, the container started
   without the device class so /dev/dxg + /usr/lib/wsl never made
   it inside.

Fixes:

mios-cdi-detect -- when HAS_NVIDIA &amp;&amp; (VIRT=wsl || /dev/dxg) but
nvidia-ctk is missing, hand-roll /run/cdi/wsl2-nvidia.yaml with:
  * deviceNodes: /dev/dxg
  * mounts: /usr/lib/wsl rbind ro nosuid nodev
    (rbind is critical -- /usr/lib/wsl/lib is an overlay sub-mount
    on the WSL2 host that doesn't propagate through a plain bind,
    so the container would see an empty /usr/lib/wsl/lib/ otherwise)
  * env: LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/nvidia/lib:...

ollama.container -- add AddDevice=nvidia.com/gpu=all so podman
attaches whichever CDI spec mios-cdi-detect emitted (nvidia.yaml on
bare metal via nvidia-ctk, or wsl2-nvidia.yaml on the dev VM).
AMD/Intel hosts can override via .container.d/ drop-ins.

Verified live on operator's RTX 4090 install:
  ollama runner.go: inference compute library=CUDA name=CUDA0
  description="NVIDIA GeForce RTX 4090" compute=8.9 driver=13.1
  total="24.0 GiB" available="18.7 GiB"
diff --git a/usr/libexec/mios/mios-cdi-detect b/usr/libexec/mios/mios-cdi-detect
@@ -76,6 +76,38 @@ if [[ "$HAS_NVIDIA" == 1 ]] && command -v nvidia-ctk >/dev/null 2>&1; then
             _log "nvidia: nvidia-ctk cdi generate failed (non-fatal)"
         fi
     fi
+elif [[ "$HAS_NVIDIA" == 1 && ( "$VIRT" == "wsl" || -e /dev/dxg ) ]]; then
+    # WSL2-specific fallback when nvidia-ctk isn't on the dev VM
+    # (podman-machine-os 6.0 doesn't ship it; the dev-VM overlay can't
+    # always install it cleanly). Hand-roll the CDI YAML using the
+    # standard WSL2 GPU surface: /dev/dxg device node + rbind of
+    # /usr/lib/wsl (which has libcuda.so + libcudadebugger.so + the
+    # nvidia-smi binary). rbind, NOT bind: /usr/lib/wsl/lib is an
+    # overlay sub-mount that doesn't propagate through a plain bind.
+    # Operator-flagged 2026-05-11: ollama loaded models on CPU
+    # (`load_tensors: CPU model buffer size = 16676 MiB`) because no
+    # CDI spec was generated and the Quadlet had no AddDevice for the
+    # GPU. After this YAML lands + the Quadlet adds AddDevice=
+    # nvidia.com/gpu=all, ollama detects the GPU via libcuda.so 1.1
+    # and `inference compute ... library=CUDA ... description=NVIDIA
+    # GeForce RTX 4090` shows in the journal.
+    out=/run/cdi/wsl2-nvidia.yaml
+    cat > "$out" <<'WSLCDI'
+cdiVersion: "0.6.0"
+kind: nvidia.com/gpu
+devices:
+  - name: all
+    containerEdits:
+      deviceNodes:
+        - path: /dev/dxg
+      mounts:
+        - hostPath: /usr/lib/wsl
+          containerPath: /usr/lib/wsl
+          options: ["ro", "nosuid", "nodev", "rbind"]
+      env:
+        - LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
+WSLCDI
+    _log "nvidia: wrote $out (WSL2 hand-rolled CDI; nvidia-ctk unavailable on this distro)"
 elif [[ "$HAS_NVIDIA" == 1 ]]; then
     _log "nvidia: device present but nvidia-ctk missing -- install nvidia-container-toolkit"
 fi
diff --git a/usr/share/containers/systemd/ollama.container b/usr/share/containers/systemd/ollama.container
@@ -23,6 +23,17 @@ ContainerName=mios-ollama
 Network=${MIOS_QUADLET_NETWORK:-mios.network}
 Network=ai-net.network
 AutoUpdate=registry
+# GPU passthrough. nvidia.com/gpu=all resolves to the CDI spec written
+# by mios-cdi-detect.service (Before=ollama.service). On bare metal
+# that's nvidia-ctk's generated /run/cdi/nvidia.yaml; on WSL2 it's the
+# hand-rolled /run/cdi/wsl2-nvidia.yaml that maps /dev/dxg + rbinds
+# /usr/lib/wsl. Operator-flagged 2026-05-11 (`load_tensors: CPU model
+# buffer size = 16676 MiB` -- ollama was CPU-only because no CDI spec
+# was attached). With this device line, ollama's runner.go reports
+# `inference compute library=CUDA name=CUDA0 description=NVIDIA
+# GeForce RTX 4090`. AMD/Intel hosts can swap the device class via
+# /etc/containers/systemd/ollama.container.d/ drop-in overrides.
+AddDevice=nvidia.com/gpu=all
 # Numeric UID/GID -- the upstream ollama/ollama image has no
 # `mios-ollama` user in its /etc/passwd, so a name-based User= lookup
 # fails with "unable to find user mios-ollama: no matching entries in