You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Same requirement as when you use GPUs on Docker. For details, please refer to [the doc by NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites).
12
-
-`nvidia-container-cli`
13
-
- containerd relies on this CLI for setting up GPUs inside container. You can install this via [`libnvidia-container` package](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#libnvidia-container).
16
+
-The NVIDIA Container Toolkit
17
+
- containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
14
18
15
19
## Options for `nerdctl run --gpus`
16
20
@@ -27,23 +31,24 @@ You can also pass detailed configuration to `--gpus` option as a list of key-val
27
31
28
32
-`count`: number of GPUs to use. `all` exposes all available GPUs.
29
33
-`device`: IDs of GPUs to use. UUID or numbers of GPUs can be specified.
30
-
-`capabilities`: [Driver capabilities](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities). If unset, use default driver `utility`, `compute`.
31
34
32
35
The following example exposes a specific GPU to the container.
33
36
34
37
```
35
-
nerdctl run -it --rm --gpus '"capabilities=utility,compute",device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
38
+
nerdctl run -it --rm --gpus 'device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
36
39
```
37
40
41
+
Note that although `capabilities` options may be provided, these are ignored when processing the GPU request since nerdctl v2.3.
42
+
38
43
## Fields for `nerdctl compose`
39
44
40
45
`nerdctl compose` also supports GPUs following [compose-spec](https://github.com/compose-spec/compose-spec/blob/master/deploy.md#devices).
41
46
42
-
You can use GPUs on compose when you specify some of the following `capabilities` in `services.demo.deploy.resources.reservations.devices`.
47
+
You can use GPUs on compose when you specify the `driver` as `nvidia` or one or
48
+
more of the following `capabilities` in `services.demo.deploy.resources.reservations.devices`.
43
49
44
50
-`gpu`
45
51
-`nvidia`
46
-
- all allowed capabilities for `nerdctl run --gpus`
47
52
48
53
Available fields are the same as `nerdctl run --gpus`.
49
54
@@ -59,12 +64,37 @@ services:
59
64
resources:
60
65
reservations:
61
66
devices:
62
-
- capabilities: ["utility"]
67
+
- driver: nvidia
63
68
count: all
64
69
```
65
70
66
71
## Trouble Shooting
67
72
73
+
### `nerdctl run --gpus` fails due to an unresolvable CDI device
74
+
75
+
If the required CDI specifications for NVIDIA devices are not available on the
76
+
system, the `nerdctl run` command will fail with an error similar to: `CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all` (the
77
+
exact error message will depend on the device(s) requested).
78
+
79
+
This should be the same error message that is reported when the `--device` flag
80
+
is used to request a CDI device:
81
+
```
82
+
nerdctl run --device=nvidia.com/gpu=all
83
+
```
84
+
85
+
Ensure that the NVIDIA Container Toolkit (>= v1.18.0 is recommended) is installed and the requested CDI devices are present in the ouptut of `nvidia-ctk cdi list`:
See the NVIDIA Container Toolkit [CDI documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) for more information.
96
+
97
+
68
98
### `nerdctl run --gpus` fails when using the Nvidia gpu-operator
69
99
70
100
If the Nvidia driver is installed by the [gpu-operator](https://github.com/NVIDIA/gpu-operator).The `nerdctl run` will fail with the error message `(FATA[0000] exec: "nvidia-container-cli": executable file not found in $PATH)`.
0 commit comments