Skip to content

Commit df20712

Browse files
committed
Add vfio-gpu profile for kubevirt DRA
Signed-off-by: svarnam <svarnam@nvidia.com>
1 parent 5484eae commit df20712

24 files changed

Lines changed: 1461 additions & 22 deletions

File tree

README.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The procedure below has been tested and verified on both Linux and Mac.
2525
* [GNU Make 3.81+](https://www.gnu.org/software/make/)
2626
* [GNU Tar 1.34+](https://www.gnu.org/software/tar/)
2727
* [docker v20.10+ (including buildx)](https://docs.docker.com/engine/install/) or [Podman v4.9+](https://podman.io/docs/installation)
28-
* [kind v0.17.0+](https://kind.sigs.k8s.io/docs/user/quick-start/)
28+
* [kind v0.32.0+](https://kind.sigs.k8s.io/docs/user/quick-start/) (required for Kubernetes 1.36 node images / containerd config v4; `kind load` fails on older kind)
2929
* [helm v3.7.0+](https://helm.sh/docs/intro/install/)
3030
* [kubectl v1.18+](https://kubernetes.io/docs/reference/kubectl/)
3131

@@ -483,6 +483,70 @@ intended to be a recommendation for all DRA drivers. Other drivers will likely
483483
be simpler by implementing their logic more directly than through an
484484
abstraction like the example driver's profiles.
485485

486+
### Available profiles
487+
488+
The default profile is `gpu`, which is what the quickstart above installs; all
489+
existing `demo/gpu-test*.yaml` fixtures continue to work unchanged. An
490+
additional profile exists for devices in vfio mode. This is for virtualized workloads like KubeVirt and Kata.
491+
492+
| Profile | Driver name | Devices advertise | Discovery | Demo fixtures |
493+
|------------|-------------------------|-------------------------------------------------------------------------|----------------------------------------------------|----------------------------|
494+
| `gpu` | `gpu.example.com` | model/index/uuid | Mock (count via `--num-devices`) | `demo/gpu-test{1..5}.yaml` |
495+
| `vfio-gpu` | `vfio-gpu.example.com` | `resource.kubernetes.io/pciBusID`, vendor/device/class, IOMMU group| Real, scans `/sys/bus/pci/drivers/vfio-pci` (vendor/device/class read from `/sys/bus/pci/devices/<BDF>`) | `demo/clusters/kind/vfio-gpu-test.yaml` |
496+
497+
The `vfio-gpu` profile relies on the upstream kubeletplugin framework's
498+
[KEP-5304][kep-5304] support to write a device metadata file at
499+
`/var/run/kubernetes.io/dra-device-attributes/<claim>/<request>/metadata.json`
500+
inside any consuming pod (enabled via the `kubeletPlugin.enableDeviceMetadata`
501+
Helm value / `--enable-device-metadata` CLI flag).
502+
503+
The profile additionally injects, via the per-claim CDI spec built at
504+
`NodePrepareResources` time, the VFIO character devices the launcher
505+
needs to actually open the device: `/dev/vfio/<iommu_group>` for the
506+
allocated BDF and the userspace `/dev/vfio/vfio` entry point.
507+
508+
The profile discovers devices by walking `/sys/bus/pci/drivers/vfio-pci/`,
509+
so every advertised device is by construction already bound to `vfio-pci`.
510+
No vendor/device filter or CEL selector is needed: the kernel has already
511+
partitioned the bus for us.
512+
513+
Binding devices to `vfio-pci` is the operator's job (kernel cmdline
514+
`vfio-pci.ids=`, `driverctl set-override <BDF> vfio-pci`, a custom systemd
515+
unit, ...). Hosts that haven't bound anything yet will advertise an empty
516+
pool rather than fail the driver startup.
517+
518+
### Installing a non-default profile
519+
520+
```bash
521+
# vfio-gpu profile (real PCI passthrough for KubeVirt VMIs)
522+
helm upgrade -i \
523+
--create-namespace \
524+
--namespace dra-example-driver-vfio \
525+
--set deviceProfile=vfio-gpu \
526+
--set kubeletPlugin.enableDeviceMetadata=true \
527+
--set driverName=vfio-gpu.example.com \
528+
dra-example-driver-vfio \
529+
deployments/helm/dra-example-driver
530+
```
531+
532+
Each profile is a separate driver in the cluster, so both can be
533+
installed side-by-side without conflict.
534+
535+
### vfio-gpu kind demo
536+
537+
The default [`demo/clusters/kind/create-cluster.sh`](demo/clusters/kind/create-cluster.sh)
538+
quickstart targets the mock `gpu` profile. For vfio-gpu, prepare a Linux host with
539+
devices bound to `vfio-pci`, then create the cluster with vfio mounts enabled:
540+
541+
```bash
542+
VFIO_GPU=true ./demo/clusters/kind/create-cluster.sh
543+
```
544+
545+
Install the driver with `deviceProfile=vfio-gpu` as above, then apply
546+
[`demo/clusters/kind/vfio-gpu-test.yaml`](demo/clusters/kind/vfio-gpu-test.yaml)
547+
(KubeVirt must be installed separately). See
548+
[`demo/clusters/kind/README.md`](demo/clusters/kind/README.md) for the full walkthrough.
549+
486550
## Anatomy of a DRA resource driver
487551

488552
TBD

api/example.com/resource/gpu/v1alpha1/api.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,15 @@ const GpuConfigKind = "GpuConfig"
3131
type GpuConfig struct {
3232
metav1.TypeMeta `json:",inline"`
3333
Sharing *GpuSharing `json:"sharing,omitempty"`
34+
Mode VfioMode `json:"mode,omitempty"`
3435
}
3536

37+
type VfioMode string
38+
39+
const (
40+
VfioModePassthrough VfioMode = "Passthrough"
41+
)
42+
3643
// DefaultGpuConfig provides the default GPU configuration.
3744
func DefaultGpuConfig() *GpuConfig {
3845
return &GpuConfig{
@@ -49,11 +56,24 @@ func DefaultGpuConfig() *GpuConfig {
4956
}
5057
}
5158

59+
func DefaultVfioGpuConfig() *GpuConfig {
60+
return &GpuConfig{
61+
TypeMeta: metav1.TypeMeta{
62+
APIVersion: GroupName + "/" + Version,
63+
Kind: GpuConfigKind,
64+
},
65+
Mode: VfioModePassthrough,
66+
}
67+
}
68+
5269
// Normalize updates a GpuConfig config with implied default values based on other settings.
5370
func (c *GpuConfig) Normalize() error {
5471
if c == nil {
5572
return fmt.Errorf("config is 'nil'")
5673
}
74+
if c.Mode == VfioModePassthrough {
75+
return nil
76+
}
5777
if c.Sharing == nil {
5878
c.Sharing = &GpuSharing{
5979
Strategy: TimeSlicingStrategy,

api/example.com/resource/gpu/v1alpha1/validate.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,21 @@ func (s *GpuSharing) Validate() error {
6767

6868
// Validate ensures that GpuConfig has a valid set of values.
6969
func (c *GpuConfig) Validate() error {
70+
if c == nil {
71+
return fmt.Errorf("GPU config is nil")
72+
}
73+
if c.Mode != "" {
74+
return c.ValidateVfioMode()
75+
}
7076
if c.Sharing == nil {
7177
return fmt.Errorf("no sharing strategy set")
7278
}
7379
return c.Sharing.Validate()
7480
}
81+
82+
func (c *GpuConfig) ValidateVfioMode() error {
83+
if c.Mode == VfioModePassthrough {
84+
return nil
85+
}
86+
return fmt.Errorf("unknown GPU mode: %v", c.Mode)
87+
}

api/example.com/resource/gpu/v1alpha1/validate_test.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,14 @@ func TestGpuConfigValidate(t *testing.T) {
9292
gpuConfig: DefaultGpuConfig(),
9393
expected: nil,
9494
},
95+
"passthrough GpuConfig": {
96+
gpuConfig: DefaultVfioGpuConfig(),
97+
expected: nil,
98+
},
99+
"invalid GPU mode": {
100+
gpuConfig: &GpuConfig{Mode: "invalid"},
101+
expected: errors.New("unknown GPU mode: invalid"),
102+
},
95103
"invalid TimeSlicingConfig ignored with strategy is SpacePartitioning": {
96104
gpuConfig: &GpuConfig{
97105
Sharing: &GpuSharing{

cmd/dra-example-kubeletplugin/driver.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ func (d *driver) prepareResourceClaim(ctx context.Context, claim *resourceapi.Re
117117
ShareID: preparedDevice.ShareID,
118118
}
119119

120-
if allocDev, ok := d.state.allocatable[preparedPB.GetDeviceName()]; ok && len(allocDev.Attributes) > 0 {
120+
if allocDev, ok := d.state.allocatable[preparedDevice.GetDeviceName()]; ok && len(allocDev.Attributes) > 0 {
121121
attrs := make(map[string]resourceapi.DeviceAttribute, len(allocDev.Attributes))
122122
for k, v := range allocDev.Attributes {
123123
attrs[string(k)] = v

cmd/dra-example-kubeletplugin/main.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ import (
3434
"sigs.k8s.io/dra-example-driver/internal/profiles"
3535
"sigs.k8s.io/dra-example-driver/internal/profiles/cpu"
3636
"sigs.k8s.io/dra-example-driver/internal/profiles/gpu"
37+
vfiogpu "sigs.k8s.io/dra-example-driver/internal/profiles/vfio-gpu"
3738
"sigs.k8s.io/dra-example-driver/pkg/flags"
3839
)
3940

@@ -77,6 +78,9 @@ var validProfiles = map[string]func(flags Flags) profiles.Profile{
7778
cpu.ProfileName: func(flags Flags) profiles.Profile {
7879
return cpu.NewProfile(flags.nodeName, flags.driverName, flags.cpuNUMANodes, flags.cpusPerNUMANode)
7980
},
81+
vfiogpu.ProfileName: func(flags Flags) profiles.Profile {
82+
return vfiogpu.NewProfile(flags.nodeName, flags.driverName)
83+
},
8084
}
8185

8286
var validProfileNames = func() []string {

cmd/dra-example-webhook/main.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ import (
3737
"sigs.k8s.io/dra-example-driver/internal/profiles"
3838
"sigs.k8s.io/dra-example-driver/internal/profiles/cpu"
3939
"sigs.k8s.io/dra-example-driver/internal/profiles/gpu"
40+
vfiogpu "sigs.k8s.io/dra-example-driver/internal/profiles/vfio-gpu"
4041
"sigs.k8s.io/dra-example-driver/pkg/flags"
4142
)
4243

@@ -53,8 +54,9 @@ type Flags struct {
5354
type validator func(runtime.Object) error
5455

5556
var validProfiles = map[string]profiles.ConfigHandler{
56-
gpu.ProfileName: gpu.Profile{},
57-
cpu.ProfileName: cpu.Profile{},
57+
gpu.ProfileName: gpu.Profile{},
58+
cpu.ProfileName: cpu.Profile{},
59+
vfiogpu.ProfileName: vfiogpu.Profile{},
5860
}
5961

6062
func main() {

demo/build-driver.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,11 @@ check_demo_config || exit 1
3232
${SCRIPTS_DIR}/build-driver-image.sh
3333

3434
# If a cluster is already running, load the image onto its nodes
35-
EXISTING_CLUSTER="$(${KIND} get clusters | grep -w "${KIND_CLUSTER_NAME}" || true)"
36-
if [ "${EXISTING_CLUSTER}" != "" ]; then
37-
${SCRIPTS_DIR}/load-driver-image-into-kind.sh
35+
if command -v kind >/dev/null; then
36+
EXISTING_CLUSTER="$(${KIND} get clusters | grep -w "${KIND_CLUSTER_NAME}" || true)"
37+
if [ "${EXISTING_CLUSTER}" != "" ]; then
38+
${SCRIPTS_DIR}/load-driver-image-into-kind.sh
39+
fi
3840
fi
3941

4042
set +x

demo/clusters/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,10 @@ common layout:
1111
- `delete-cluster.sh` — delete that cluster
1212

1313
Platforms may add other scripts or notes next to these entrypoints as needed.
14+
15+
## Available platforms
16+
17+
| Path | Purpose |
18+
|---|---|
19+
| [`kind/`](kind/) | kind cluster for the default `gpu` (mock devices) DRA profile and, with `VFIO_GPU=true`, the `vfio-gpu` profile (host vfio-pci bindings + PCI sysfs mounts). See [`kind/README.md`](kind/README.md). |
20+
| [`gke/`](gke/) | GKE cluster scripts |

demo/clusters/kind/README.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# kind cluster
2+
3+
Scripts to create and delete a kind cluster for the DRA demo.
4+
5+
| File | Purpose |
6+
| --- | --- |
7+
| `create-cluster.sh` | Create the cluster (default `gpu` profile demo) |
8+
| `delete-cluster.sh` | Delete the cluster |
9+
| `kind-cluster-config-vfio.yaml` | kind config used when `VFIO_GPU=true` (PCI sysfs + `/dev/vfio` mounts) |
10+
| `vfio-gpu-test.yaml` | ResourceClaimTemplate for the `vfio-gpu` profile |
11+
12+
Shared vfio helpers live in [`demo/scripts/vfio-kind.sh`](../../scripts/vfio-kind.sh).
13+
14+
## Default (mock GPU profile)
15+
16+
```bash
17+
./demo/build-driver.sh
18+
./demo/clusters/kind/create-cluster.sh
19+
helm upgrade -i \
20+
--create-namespace \
21+
--namespace dra-example-driver \
22+
dra-example-driver \
23+
deployments/helm/dra-example-driver
24+
```
25+
26+
Uses the CDI-enabled kind node image built by `demo/scripts/build-kind-image.sh` and
27+
`demo/scripts/kind-cluster-config.yaml`.
28+
29+
## vfio-gpu profile (`VFIO_GPU=true`)
30+
31+
**Off by default.** Set `VFIO_GPU=true` when creating the cluster to bind-mount host
32+
PCI sysfs and `/dev/vfio` into kind nodes. This is required for the `vfio-gpu` driver
33+
profile but does **not** install the driver — you still set `deviceProfile=vfio-gpu` in
34+
Helm separately.
35+
36+
**Linux host only.** The host must already have devices bound to `vfio-pci` before
37+
cluster creation. The script verifies bindings and exits if none are found.
38+
39+
### Host setup
40+
41+
Synthetic devices for testing come from [kubevirt's kind-1.35-vfio-gpu provider](https://github.com/kubevirt/kubevirt/tree/main/kubevirtci/cluster-up/cluster/kind-1.35-vfio-gpu):
42+
43+
It has not merged yet - https://github.com/kubevirt/kubevirtci/pull/1726
44+
45+
```bash
46+
sudo bash setup-host-vfio-pci.sh
47+
ls /sys/bus/pci/drivers/vfio-pci/ # expect BDF entries
48+
```
49+
50+
Real hardware bound to vfio-pci works equally well.
51+
52+
### Cluster + driver
53+
54+
```bash
55+
# 1. Cluster (vfio mounts into nodes)
56+
VFIO_GPU=true ./demo/clusters/kind/create-cluster.sh
57+
58+
# 2. Driver image (build from this repo; published images may predate vfio-gpu)
59+
./demo/build-driver.sh
60+
61+
# 3. Driver install (vfio-gpu profile)
62+
helm upgrade --install \
63+
--create-namespace \
64+
--namespace dra-example-driver-vfio \
65+
--set deviceProfile=vfio-gpu \
66+
--set kubeletPlugin.enableDeviceMetadata=true \
67+
--set driverName=vfio-gpu.example.com \
68+
dra-example-driver-vfio \
69+
deployments/helm/dra-example-driver
70+
71+
Verify the driver:
72+
73+
```bash
74+
kubectl -n dra-example-driver-vfio get ds -o jsonpath='{range .spec.template.spec.containers[?(@.name=="plugin")].env[?(@.name=="DEVICE_PROFILE")]}{.value}{"\n"}{end}'
75+
kubectl get resourceslices -o custom-columns='NAME:.metadata.name,DRIVER:.spec.driver,NODE:.spec.nodeName'
76+
```
77+
78+
Tear down:
79+
80+
```bash
81+
./demo/clusters/kind/delete-cluster.sh
82+
```
83+
84+
### Environment variables
85+
86+
| Variable | Default | Purpose |
87+
| --- | --- | --- |
88+
| `VFIO_GPU` | `false` | Enable vfio-gpu cluster mode (`true` / `1` / `yes` / `on`) |
89+
| `VFIO_KIND_NODE_IMAGE` | pinned `kindest/node:v1.35.0` | kind node image when `VFIO_GPU=true` |
90+
| `VFIO_KIND_CLUSTER_CONFIG_PATH` | `kind-cluster-config-vfio.yaml` in this directory | kind config when `VFIO_GPU=true` |
91+
92+
Other variables (`KIND_CLUSTER_NAME`, `CONTAINER_TOOL`, …) come from [`demo/scripts/common.sh`](../../scripts/common.sh).
93+
94+
### Two knobs (cluster vs driver)
95+
96+
| Setting | Layer | What it does |
97+
| --- | --- | --- |
98+
| `VFIO_GPU=true` | Cluster (`create-cluster.sh`) | vfio-pci preflight, vfio kind config, post-create node setup |
99+
| `deviceProfile=vfio-gpu` | Driver (Helm) | Driver discovers vfio-bound PCI devices and prepares `/dev/vfio` CDI |
100+
101+
Both are required for the end-to-end vfio-gpu demo.

0 commit comments

Comments
 (0)