You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`nginx_docker_cache_image`|`"rpardini/docker-registry-proxy:0.6.1"`| Container image used to deploy the proxy |
46
-
|`nginx_docker_cache_registry_string`|`"quay.io k8s.gcr.io gcr.io nvcr.io"`| Space-separated list of registries to proxy|
45
+
|`nginx_docker_cache_image`|`"rpardini/docker-registry-proxy:0.6.5"`| Container image used to deploy the proxy |
46
+
|`nginx_docker_cache_registry_string`|`"registry.k8s.io quay.io k8s.gcr.io gcr.io nvcr.io"`| Space-separated list of registries to proxy; `k8s.gcr.io` is retained for older clusters while current Kubernetes images use `registry.k8s.io`|
47
47
|`nginx_docker_cache_manifests`|`"false"`| Flag to determine whether to cache image manifests |
48
48
|`nginx_docker_cache_manifest_default_time`| "1h" | If manifests are cached, time to cache them |
49
49
|`nginx_docker_cache_hostgroup`|`"cache"`| Ansible inventory host group where proxy is deployed |
Copy file name to clipboardExpand all lines: docs/k8s-cluster/kubernetes-usage.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ Kubernetes Usage Guide
10
10
11
11
## Introduction
12
12
13
-
Most of the following examples can be configured and executed through the Kubernetes Dashboard. For a basic run-through on how to leverage the Kubernetes Dashboard, please see the [official documentation](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/). The following examples `kubectl` on the master node instead.
13
+
Most of the following examples can be configured and executed through the Kubernetes Dashboard. For a basic run-through on how to leverage the Kubernetes Dashboard, please see the [official documentation](https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/). The following examples use `kubectl` on the master node instead.
14
14
15
15
## Simple Commands
16
16
@@ -63,12 +63,12 @@ kubectl get pods --all-namespaces
63
63
4. Delete the job (and the corresponding pod).
64
64
65
65
```bash
66
-
kubectl delete job cuda-job
66
+
kubectl delete job pytorch-job
67
67
```
68
68
69
69
## Using NGC Containers with Kubernetes and Launching Jobs
70
70
71
-
[NVIDIA GPU Cloud (NGC)](https://docs.nvidia.com/ngc/ngc-introduction) manages a catalog of fully integrated and optimized DL framework containers that take full advantage of NVIDIA GPUs in both single and multi-GPU configurations. They include NVIDIA CUDA® Toolkit, DIGITS workflow, and the following DL frameworks: NVCaffe, Caffe2, Microsoft Cognitive Toolkit (CNTK), MXNet, PyTorch, TensorFlow, Theano, and Torch. These framework containers are delivered ready-to-run, including all necessary dependencies such as the CUDA runtime and NVIDIA libraries.
71
+
[NVIDIA GPU Cloud (NGC)](https://docs.nvidia.com/ngc/ngc-introduction) manages a catalog of optimized GPU containers for CUDA, PyTorch, TensorFlow, Triton Inference Server, RAPIDS, and other NVIDIA software. Use the NGC catalog and the NVIDIA framework container release notes to choose the current image for your workload.
72
72
73
73
To access the NGC container registry via Kubernetes, add a secret which will be employed when Kubernetes asks NGC to pull container images from it.
74
74
@@ -105,9 +105,9 @@ To access the NGC container registry via Kubernetes, add a secret which will be
Copy file name to clipboardExpand all lines: docs/slurm-cluster/slurm-perf-cluster.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -254,7 +254,7 @@ If errors are noticed when running `sinfo -R`, it's also helpful to search the l
254
254
sudo journalctl -e | grep slurm
255
255
```
256
256
257
-
To re-run the test manually, from the slurm login node...
257
+
To re-run the test manually, from the slurm login node. Replace `registry.example.com/hpc/nccl-tests:latest` with your site's current NCCL tests image or a `.sqsh` image built by `playbooks/slurm-cluster/slurm-validation.yml`.
0 commit comments