Skip to content

Commit 9a8181d

Browse files
committed
ollama: WSL2 CDI fallback + AddDevice so the dev VM uses the NVIDIA GPU
Operator-flagged 2026-05-11: ollama in podman-MiOS-DEV loaded models on CPU (`load_tensors: CPU model buffer size = 16676 MiB`) despite nvidia-smi inside the dev VM working fine (`NVIDIA GeForce RTX 4090 ... CUDA Version: 13.1`). Root causes: 1. mios-cdi-detect.service detected WSL2 + NVIDIA but bailed because nvidia-ctk isn't installed on podman-machine-os 6.0 (the dev VM base) and can't be cleanly layered. /etc/cdi/ stayed empty, so podman had no nvidia.com/gpu=all spec to attach. 2. usr/share/containers/systemd/ollama.container had no AddDevice= for the GPU. Even if a CDI spec existed, the container started without the device class so /dev/dxg + /usr/lib/wsl never made it inside. Fixes: mios-cdi-detect -- when HAS_NVIDIA && (VIRT=wsl || /dev/dxg) but nvidia-ctk is missing, hand-roll /run/cdi/wsl2-nvidia.yaml with: * deviceNodes: /dev/dxg * mounts: /usr/lib/wsl rbind ro nosuid nodev (rbind is critical -- /usr/lib/wsl/lib is an overlay sub-mount on the WSL2 host that doesn't propagate through a plain bind, so the container would see an empty /usr/lib/wsl/lib/ otherwise) * env: LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/nvidia/lib:... ollama.container -- add AddDevice=nvidia.com/gpu=all so podman attaches whichever CDI spec mios-cdi-detect emitted (nvidia.yaml on bare metal via nvidia-ctk, or wsl2-nvidia.yaml on the dev VM). AMD/Intel hosts can override via .container.d/ drop-ins. Verified live on operator's RTX 4090 install: ollama runner.go: inference compute library=CUDA name=CUDA0 description="NVIDIA GeForce RTX 4090" compute=8.9 driver=13.1 total="24.0 GiB" available="18.7 GiB"
1 parent 95471ae commit 9a8181d

2 files changed

Lines changed: 43 additions & 0 deletions

File tree

usr/libexec/mios/mios-cdi-detect

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,38 @@ if [[ "$HAS_NVIDIA" == 1 ]] && command -v nvidia-ctk >/dev/null 2>&1; then
7676
_log "nvidia: nvidia-ctk cdi generate failed (non-fatal)"
7777
fi
7878
fi
79+
elif [[ "$HAS_NVIDIA" == 1 && ( "$VIRT" == "wsl" || -e /dev/dxg ) ]]; then
80+
# WSL2-specific fallback when nvidia-ctk isn't on the dev VM
81+
# (podman-machine-os 6.0 doesn't ship it; the dev-VM overlay can't
82+
# always install it cleanly). Hand-roll the CDI YAML using the
83+
# standard WSL2 GPU surface: /dev/dxg device node + rbind of
84+
# /usr/lib/wsl (which has libcuda.so + libcudadebugger.so + the
85+
# nvidia-smi binary). rbind, NOT bind: /usr/lib/wsl/lib is an
86+
# overlay sub-mount that doesn't propagate through a plain bind.
87+
# Operator-flagged 2026-05-11: ollama loaded models on CPU
88+
# (`load_tensors: CPU model buffer size = 16676 MiB`) because no
89+
# CDI spec was generated and the Quadlet had no AddDevice for the
90+
# GPU. After this YAML lands + the Quadlet adds AddDevice=
91+
# nvidia.com/gpu=all, ollama detects the GPU via libcuda.so 1.1
92+
# and `inference compute ... library=CUDA ... description=NVIDIA
93+
# GeForce RTX 4090` shows in the journal.
94+
out=/run/cdi/wsl2-nvidia.yaml
95+
cat > "$out" <<'WSLCDI'
96+
cdiVersion: "0.6.0"
97+
kind: nvidia.com/gpu
98+
devices:
99+
- name: all
100+
containerEdits:
101+
deviceNodes:
102+
- path: /dev/dxg
103+
mounts:
104+
- hostPath: /usr/lib/wsl
105+
containerPath: /usr/lib/wsl
106+
options: ["ro", "nosuid", "nodev", "rbind"]
107+
env:
108+
- LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
109+
WSLCDI
110+
_log "nvidia: wrote $out (WSL2 hand-rolled CDI; nvidia-ctk unavailable on this distro)"
79111
elif [[ "$HAS_NVIDIA" == 1 ]]; then
80112
_log "nvidia: device present but nvidia-ctk missing -- install nvidia-container-toolkit"
81113
fi

usr/share/containers/systemd/ollama.container

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,17 @@ ContainerName=mios-ollama
2323
Network=${MIOS_QUADLET_NETWORK:-mios.network}
2424
Network=ai-net.network
2525
AutoUpdate=registry
26+
# GPU passthrough. nvidia.com/gpu=all resolves to the CDI spec written
27+
# by mios-cdi-detect.service (Before=ollama.service). On bare metal
28+
# that's nvidia-ctk's generated /run/cdi/nvidia.yaml; on WSL2 it's the
29+
# hand-rolled /run/cdi/wsl2-nvidia.yaml that maps /dev/dxg + rbinds
30+
# /usr/lib/wsl. Operator-flagged 2026-05-11 (`load_tensors: CPU model
31+
# buffer size = 16676 MiB` -- ollama was CPU-only because no CDI spec
32+
# was attached). With this device line, ollama's runner.go reports
33+
# `inference compute library=CUDA name=CUDA0 description=NVIDIA
34+
# GeForce RTX 4090`. AMD/Intel hosts can swap the device class via
35+
# /etc/containers/systemd/ollama.container.d/ drop-in overrides.
36+
AddDevice=nvidia.com/gpu=all
2637
# Numeric UID/GID -- the upstream ollama/ollama image has no
2738
# `mios-ollama` user in its /etc/passwd, so a name-based User= lookup
2839
# fails with "unable to find user mios-ollama: no matching entries in

0 commit comments

Comments
 (0)