Skip to content

Commit b3f1510

Browse files
Merge pull request #1350 from NVIDIA/dholt/container-toolkit-rhel
Support Red Hat NVIDIA Container Toolkit path
2 parents 1e77de7 + 617e72b commit b3f1510

7 files changed

Lines changed: 163 additions & 120 deletions

File tree

docs/airgap/mirror-rpm-repos.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,16 @@ If you do not already have mirrors of the distribution repositories available, p
2525
The following additional RPM repositories are commonly used for GPU-enabled systems deployed by DeepOps:
2626

2727
- [Fedora Extra Packages for Enterprise Linux (EPEL)](https://fedoraproject.org/wiki/EPEL)
28-
- NVIDIA CUDA repository: [repo file for EL7](https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo), [repo file for EL8](https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel8.repo)
29-
- NVIDIA container repositories: [repo file for EL7](https://raw.githubusercontent.com/NVIDIA/nvidia-docker/gh-pages/centos7/nvidia-docker.repo), [repo file for EL8](https://raw.githubusercontent.com/NVIDIA/nvidia-docker/gh-pages/centos8/nvidia-docker.repo)
28+
- NVIDIA CUDA repository: [repo file for EL8](https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo), [repo file for EL9](https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo)
29+
- NVIDIA Container Toolkit repository: [repo file for RPM-based distributions](https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo)
3030
- Docker CE repository: [repo file](https://download.docker.com/linux/centos/docker-ce.repo)
3131

3232
These repo files provide the following repository IDs, which will be needed by `reposync` below:
3333

3434
- epel
35-
- cuda-rhel7-x86_64 or cuda-rhel8-x86_64
35+
- cuda-rhel8-x86_64 or cuda-rhel9-x86_64
3636
- libnvidia-container
37-
- nvidia-container-runtime
38-
- nvidia-docker
37+
- nvidia-container-toolkit
3938
- docker-ce-stable
4039

4140
To discover a complete list of repositories needed for your particular workload,
@@ -49,10 +48,11 @@ On a RHEL or CentOS machine with Internet access, install the `yum-utils` and `c
4948
sudo yum install yum-utils createrepo
5049
```
5150

52-
Then install the EPEL repository:
51+
Then install the EPEL repository if your workload requires EPEL packages.
52+
For example, on EL9:
5353

5454
```bash
55-
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
55+
sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
5656
```
5757

5858
Then, for each of the other repo files, install the file into the `/etc/yum.repos.d` directory.
@@ -61,8 +61,8 @@ For example, if using the list of repositories from the previous section:
6161
```bash
6262
cd /etc/yum.repos.d
6363
sudo wget https://download.docker.com/linux/centos/docker-ce.repo
64-
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
65-
sudo wget https://raw.githubusercontent.com/NVIDIA/nvidia-docker/gh-pages/centos7/nvidia-docker.repo
64+
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
65+
sudo wget https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
6666
```
6767

6868
For each of the repositories you wish to mirror, run the `reposync` command to download the contents of the repository.
@@ -80,7 +80,7 @@ At this point, you should have one subdirectory for each of the repositories you
8080

8181
```bash
8282
ls /var/repos/
83-
docker-ce-stable nvidia-docker
83+
docker-ce-stable libnvidia-container nvidia-container-toolkit
8484
```
8585

8686
For each of these directories, run the `createrepo` command to generate repository metadata:

playbooks/container/nvidia-docker.yml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,27 @@
1616
state: absent
1717
when: docker_install | default('yes')
1818

19-
- name: install NVIDIA Container Toolkit on Ubuntu 24.04 and newer
19+
- name: install NVIDIA Container Toolkit on current OS releases
2020
include_role:
2121
name: nvidia_container_toolkit
2222
when:
23-
- ansible_local['gpus']['count'] and ansible_distribution == "Ubuntu"
24-
- ansible_distribution_version is version('24.04', '>=')
23+
- ansible_local['gpus']['count']
24+
- >
25+
(ansible_distribution == "Ubuntu" and ansible_distribution_version is version('24.04', '>='))
26+
or
27+
(ansible_os_family == "RedHat" and ansible_distribution_major_version is version('8', '>='))
2528
- docker_install | default('yes')
2629

2730
- name: install nvidia-docker
2831
include_role:
2932
name: nvidia.nvidia_docker
3033
when:
3134
- ansible_local['gpus']['count'] and (ansible_distribution == "Ubuntu" or ansible_os_family == "RedHat")
32-
- not (ansible_distribution == "Ubuntu" and ansible_distribution_version is version('24.04', '>='))
35+
- >
36+
not (
37+
(ansible_distribution == "Ubuntu" and ansible_distribution_version is version('24.04', '>='))
38+
or
39+
(ansible_os_family == "RedHat" and ansible_distribution_major_version is version('8', '>='))
40+
)
3341
- docker_install | default('yes')
3442
environment: "{{ proxy_env if proxy_env is defined else {} }}"

roles/nvidia_container_toolkit/defaults/main.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ nvidia_container_toolkit_repo_gpg_url: "{{ nvidia_container_toolkit_repo_base_ur
44
nvidia_container_toolkit_keyring_ascii_path: "/usr/share/keyrings/nvidia-container-toolkit-keyring.asc"
55
nvidia_container_toolkit_keyring_path: "/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg"
66
nvidia_container_toolkit_apt_source_path: "/etc/apt/sources.list.d/nvidia-container-toolkit.list"
7+
nvidia_container_toolkit_rpm_repo_url: "{{ nvidia_container_toolkit_repo_base_url }}/stable/rpm/nvidia-container-toolkit.repo"
8+
nvidia_container_toolkit_yum_repo_path: "/etc/yum.repos.d/nvidia-container-toolkit.repo"
9+
nvidia_container_toolkit_rpm_prerequisites:
10+
- ca-certificates
711
nvidia_container_toolkit_package: "nvidia-container-toolkit"
812
nvidia_container_toolkit_configure_docker: true
913
nvidia_container_toolkit_set_as_default_runtime: true
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
- name: Docker | ensure Docker configuration directory exists
3+
ansible.builtin.file:
4+
path: /etc/docker
5+
state: directory
6+
owner: root
7+
group: root
8+
mode: "0755"
9+
10+
- name: Docker | check NVIDIA runtime configuration
11+
ansible.builtin.command:
12+
cmd: >-
13+
python3 -c 'import json, pathlib, sys;
14+
p = pathlib.Path("/etc/docker/daemon.json");
15+
required_default = {{ nvidia_container_toolkit_set_as_default_runtime | bool | ternary("True", "False") }};
16+
data = {};
17+
text = p.read_text().strip() if p.exists() else "";
18+
data = json.loads(text) if text else {};
19+
runtime = data.get("runtimes", {}).get("nvidia", {});
20+
ok = runtime.get("path") in ("nvidia-container-runtime", "/usr/bin/nvidia-container-runtime");
21+
ok = ok and (not required_default or data.get("default-runtime") == "nvidia");
22+
sys.exit(0 if ok else 1)'
23+
register: nvidia_container_toolkit_docker_runtime
24+
failed_when: false
25+
changed_when: false
26+
27+
- name: Docker | configure NVIDIA runtime
28+
ansible.builtin.command:
29+
cmd: "nvidia-ctk runtime configure --runtime=docker{{ ' --set-as-default' if nvidia_container_toolkit_set_as_default_runtime | bool else '' }}"
30+
when: nvidia_container_toolkit_docker_runtime.rc != 0
31+
changed_when: true
32+
notify: restart docker
Lines changed: 10 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,114 +1,18 @@
11
---
2-
- name: Ubuntu | verify supported distribution
2+
- name: Verify supported distribution
33
ansible.builtin.assert:
44
that:
5-
- ansible_distribution == "Ubuntu"
6-
fail_msg: "The nvidia_container_toolkit role currently supports Ubuntu only."
7-
8-
- name: Ubuntu | set package architecture
9-
ansible.builtin.set_fact:
10-
nvidia_container_toolkit_deb_arch: "{{ _nvidia_container_toolkit_arch_map.get(ansible_architecture, ansible_architecture) }}"
11-
vars:
12-
_nvidia_container_toolkit_arch_map:
13-
aarch64: arm64
14-
arm64: arm64
15-
x86_64: amd64
16-
17-
- name: Ubuntu | install repository prerequisites
18-
ansible.builtin.apt:
19-
name:
20-
- ca-certificates
21-
- gnupg
22-
state: present
23-
update_cache: true
24-
when: nvidia_container_toolkit_repo_base_url | length > 0
25-
26-
- name: Ubuntu | ensure keyring directory exists
27-
ansible.builtin.file:
28-
path: "{{ nvidia_container_toolkit_keyring_path | dirname }}"
29-
state: directory
30-
owner: root
31-
group: root
32-
mode: "0755"
33-
34-
- name: Ubuntu | check NVIDIA Container Toolkit keyring
35-
ansible.builtin.stat:
36-
path: "{{ nvidia_container_toolkit_keyring_path }}"
37-
register: nvidia_container_toolkit_keyring
38-
39-
- name: Ubuntu | download NVIDIA Container Toolkit GPG key
40-
ansible.builtin.get_url:
41-
url: "{{ nvidia_container_toolkit_repo_gpg_url }}"
42-
dest: "{{ nvidia_container_toolkit_keyring_ascii_path }}"
43-
owner: root
44-
group: root
45-
mode: "0644"
46-
register: nvidia_container_toolkit_key
47-
environment: "{{ proxy_env if proxy_env is defined else {} }}"
48-
49-
- name: Ubuntu | install NVIDIA Container Toolkit GPG keyring
50-
ansible.builtin.command:
51-
cmd: "gpg --dearmor --yes -o {{ nvidia_container_toolkit_keyring_path }} {{ nvidia_container_toolkit_keyring_ascii_path }}"
52-
when: nvidia_container_toolkit_key.changed or not nvidia_container_toolkit_keyring.stat.exists
53-
register: nvidia_container_toolkit_keyring_install
54-
changed_when: true
55-
56-
- name: Ubuntu | set NVIDIA Container Toolkit GPG keyring permissions
57-
ansible.builtin.file:
58-
path: "{{ nvidia_container_toolkit_keyring_path }}"
59-
owner: root
60-
group: root
61-
mode: "0644"
62-
63-
- name: Ubuntu | configure NVIDIA Container Toolkit APT repository
64-
ansible.builtin.copy:
65-
content: |
66-
deb [signed-by={{ nvidia_container_toolkit_keyring_path }}] {{ nvidia_container_toolkit_repo_base_url }}/stable/deb/{{ nvidia_container_toolkit_deb_arch }} /
67-
dest: "{{ nvidia_container_toolkit_apt_source_path }}"
68-
owner: root
69-
group: root
70-
mode: "0644"
71-
register: nvidia_container_toolkit_apt_source
5+
- ansible_distribution == "Ubuntu" or ansible_os_family == "RedHat"
6+
fail_msg: "The nvidia_container_toolkit role supports Ubuntu and Red Hat family hosts only."
727

738
- name: Ubuntu | install NVIDIA Container Toolkit
74-
ansible.builtin.apt:
75-
name: "{{ nvidia_container_toolkit_package }}"
76-
state: present
77-
update_cache: "{{ nvidia_container_toolkit_apt_source.changed or nvidia_container_toolkit_keyring_install.changed | default(false) }}"
78-
environment: "{{ proxy_env if proxy_env is defined else {} }}"
79-
80-
- name: Docker | ensure Docker configuration directory exists
81-
ansible.builtin.file:
82-
path: /etc/docker
83-
state: directory
84-
owner: root
85-
group: root
86-
mode: "0755"
87-
when: nvidia_container_toolkit_configure_docker | bool
9+
ansible.builtin.include_tasks: ubuntu.yml
10+
when: ansible_distribution == "Ubuntu"
8811

89-
- name: Docker | check NVIDIA runtime configuration
90-
ansible.builtin.command:
91-
cmd: >-
92-
python3 -c 'import json, pathlib, sys;
93-
p = pathlib.Path("/etc/docker/daemon.json");
94-
required_default = {{ nvidia_container_toolkit_set_as_default_runtime | bool | ternary("True", "False") }};
95-
data = {};
96-
text = p.read_text().strip() if p.exists() else "";
97-
data = json.loads(text) if text else {};
98-
runtime = data.get("runtimes", {}).get("nvidia", {});
99-
ok = runtime.get("path") in ("nvidia-container-runtime", "/usr/bin/nvidia-container-runtime");
100-
ok = ok and (not required_default or data.get("default-runtime") == "nvidia");
101-
sys.exit(0 if ok else 1)'
102-
register: nvidia_container_toolkit_docker_runtime
103-
failed_when: false
104-
changed_when: false
105-
when: nvidia_container_toolkit_configure_docker | bool
12+
- name: Red Hat | install NVIDIA Container Toolkit
13+
ansible.builtin.include_tasks: redhat.yml
14+
when: ansible_os_family == "RedHat"
10615

10716
- name: Docker | configure NVIDIA runtime
108-
ansible.builtin.command:
109-
cmd: "nvidia-ctk runtime configure --runtime=docker{{ ' --set-as-default' if nvidia_container_toolkit_set_as_default_runtime | bool else '' }}"
110-
when:
111-
- nvidia_container_toolkit_configure_docker | bool
112-
- nvidia_container_toolkit_docker_runtime.rc != 0
113-
changed_when: true
114-
notify: restart docker
17+
ansible.builtin.include_tasks: docker.yml
18+
when: nvidia_container_toolkit_configure_docker | bool
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
- name: Red Hat | install repository prerequisites
3+
ansible.builtin.dnf:
4+
name: "{{ nvidia_container_toolkit_rpm_prerequisites }}"
5+
state: present
6+
when: nvidia_container_toolkit_repo_base_url | length > 0
7+
8+
- name: Red Hat | configure NVIDIA Container Toolkit Yum repository
9+
ansible.builtin.get_url:
10+
url: "{{ nvidia_container_toolkit_rpm_repo_url }}"
11+
dest: "{{ nvidia_container_toolkit_yum_repo_path }}"
12+
owner: root
13+
group: root
14+
mode: "0644"
15+
register: nvidia_container_toolkit_yum_repo
16+
environment: "{{ proxy_env if proxy_env is defined else {} }}"
17+
18+
- name: Red Hat | install NVIDIA Container Toolkit
19+
ansible.builtin.dnf:
20+
name: "{{ nvidia_container_toolkit_package }}"
21+
state: present
22+
update_cache: "{{ nvidia_container_toolkit_yum_repo.changed }}"
23+
environment: "{{ proxy_env if proxy_env is defined else {} }}"
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
- name: Ubuntu | set package architecture
3+
ansible.builtin.set_fact:
4+
nvidia_container_toolkit_deb_arch: "{{ _nvidia_container_toolkit_arch_map.get(ansible_architecture, ansible_architecture) }}"
5+
vars:
6+
_nvidia_container_toolkit_arch_map:
7+
aarch64: arm64
8+
arm64: arm64
9+
x86_64: amd64
10+
11+
- name: Ubuntu | install repository prerequisites
12+
ansible.builtin.apt:
13+
name:
14+
- ca-certificates
15+
- gnupg
16+
state: present
17+
update_cache: true
18+
when: nvidia_container_toolkit_repo_base_url | length > 0
19+
20+
- name: Ubuntu | ensure keyring directory exists
21+
ansible.builtin.file:
22+
path: "{{ nvidia_container_toolkit_keyring_path | dirname }}"
23+
state: directory
24+
owner: root
25+
group: root
26+
mode: "0755"
27+
28+
- name: Ubuntu | check NVIDIA Container Toolkit keyring
29+
ansible.builtin.stat:
30+
path: "{{ nvidia_container_toolkit_keyring_path }}"
31+
register: nvidia_container_toolkit_keyring
32+
33+
- name: Ubuntu | download NVIDIA Container Toolkit GPG key
34+
ansible.builtin.get_url:
35+
url: "{{ nvidia_container_toolkit_repo_gpg_url }}"
36+
dest: "{{ nvidia_container_toolkit_keyring_ascii_path }}"
37+
owner: root
38+
group: root
39+
mode: "0644"
40+
register: nvidia_container_toolkit_key
41+
environment: "{{ proxy_env if proxy_env is defined else {} }}"
42+
43+
- name: Ubuntu | install NVIDIA Container Toolkit GPG keyring
44+
ansible.builtin.command:
45+
cmd: "gpg --dearmor --yes -o {{ nvidia_container_toolkit_keyring_path }} {{ nvidia_container_toolkit_keyring_ascii_path }}"
46+
when: nvidia_container_toolkit_key.changed or not nvidia_container_toolkit_keyring.stat.exists
47+
register: nvidia_container_toolkit_keyring_install
48+
changed_when: true
49+
50+
- name: Ubuntu | set NVIDIA Container Toolkit GPG keyring permissions
51+
ansible.builtin.file:
52+
path: "{{ nvidia_container_toolkit_keyring_path }}"
53+
owner: root
54+
group: root
55+
mode: "0644"
56+
57+
- name: Ubuntu | configure NVIDIA Container Toolkit APT repository
58+
ansible.builtin.copy:
59+
content: |
60+
deb [signed-by={{ nvidia_container_toolkit_keyring_path }}] {{ nvidia_container_toolkit_repo_base_url }}/stable/deb/{{ nvidia_container_toolkit_deb_arch }} /
61+
dest: "{{ nvidia_container_toolkit_apt_source_path }}"
62+
owner: root
63+
group: root
64+
mode: "0644"
65+
register: nvidia_container_toolkit_apt_source
66+
67+
- name: Ubuntu | install NVIDIA Container Toolkit
68+
ansible.builtin.apt:
69+
name: "{{ nvidia_container_toolkit_package }}"
70+
state: present
71+
update_cache: "{{ nvidia_container_toolkit_apt_source.changed or nvidia_container_toolkit_keyring_install.changed | default(false) }}"
72+
environment: "{{ proxy_env if proxy_env is defined else {} }}"

0 commit comments

Comments
 (0)