Skip to content

Commit c1c0f3b

Browse files
[Docs] Improve Kubernetes documentation
Minor updates, incl. the description of `Default image`, and `privileged` for NCCL tests
1 parent 2b4c012 commit c1c0f3b

5 files changed

Lines changed: 19 additions & 17 deletions

File tree

docs/docs/concepts/dev-environments.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ The `gpu` property lets you specify vendor, model, memory, and count — e.g., `
133133

134134
If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
135135

136-
??? info "Google Cloud TPU"
136+
<!-- ??? info "Google Cloud TPU"
137137
To use TPUs, specify its architecture via the `gpu` property.
138138

139139
```yaml
@@ -146,7 +146,7 @@ If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
146146
gpu: v2-8
147147
```
148148

149-
Currently, only 8 TPU cores can be specified, supporting single TPU device workloads. Multi-TPU support is coming soon.
149+
Currently, only 8 TPU cores can be specified, supporting single TPU device workloads. Multi-TPU support is coming soon. -->
150150

151151
??? info "Shared memory"
152152
If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure
@@ -159,8 +159,8 @@ If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
159159

160160
#### Default image
161161

162-
If you don't specify `image`, `dstack` uses its base Docker image pre-configured with
163-
`uv`, `python`, `pip`, essential CUDA drivers, and NCCL tests (under `/opt/nccl-tests/build`).
162+
If you don't specify `image`, `dstack` uses its [base :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/docker/base){:target="_blank"} Docker image pre-configured with
163+
`uv`, `python`, `pip`, essential CUDA drivers, `mpirun`, and NCCL tests (under `/opt/nccl-tests/build`).
164164

165165
Set the `python` property to pre-install a specific version of Python.
166166

docs/docs/concepts/services.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -433,8 +433,8 @@ If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
433433

434434
#### Default image
435435

436-
If you don't specify `image`, `dstack` uses its base Docker image pre-configured with
437-
`uv`, `python`, `pip`, essential CUDA drivers, and NCCL tests (under `/opt/nccl-tests/build`).
436+
If you don't specify `image`, `dstack` uses its [base :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/docker/base){:target="_blank"} Docker image pre-configured with
437+
`uv`, `python`, `pip`, essential CUDA drivers, `mpirun`, and NCCL tests (under `/opt/nccl-tests/build`).
438438

439439
Set the `python` property to pre-install a specific version of Python.
440440

docs/docs/concepts/tasks.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,8 +229,6 @@ If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
229229
<!-- ??? info "Google Cloud TPU"
230230
To use TPUs, specify its architecture via the `gpu` property.
231231

232-
<!-- TODO: Add a TRL TPU example -->
233-
234232
```yaml
235233
type: task
236234
name: train
@@ -259,8 +257,8 @@ If vendor is omitted, `dstack` infers it from the model or defaults to `nvidia`.
259257

260258
#### Default image
261259

262-
If you don't specify `image`, `dstack` uses its base Docker image pre-configured with
263-
`uv`, `python`, `pip`, essential CUDA drivers, and NCCL tests (under `/opt/nccl-tests/build`).
260+
If you don't specify `image`, `dstack` uses its [base :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/docker/base){:target="_blank"} Docker image pre-configured with
261+
`uv`, `python`, `pip`, essential CUDA drivers, `mpirun`, and NCCL tests (under `/opt/nccl-tests/build`).
264262

265263
Set the `python` property to pre-install a specific version of Python.
266264

examples/clusters/nccl-tests/.dstack.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ commands:
2121
sleep infinity
2222
fi
2323
24+
# Uncomment if the `kubernetes` backend requires it for `/dev/infiniband` access
25+
#privileged: true
26+
2427
resources:
2528
gpu: nvidia:1..8
2629
shm_size: 16GB

examples/clusters/nccl-tests/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,22 @@ commands:
3333
sleep infinity
3434
fi
3535
36+
# Uncomment if the `kubernetes` backend requires it for `/dev/infiniband` access
37+
#privileged: true
38+
3639
resources:
3740
gpu: nvidia:1..8
3841
shm_size: 16GB
3942
```
4043
4144
</div>
4245
43-
<!-- TODO: Need to stop using our EFA image - either make our default image cluster-friendly, or recommend using NGC or other images -->
46+
!!! info "Default image"
47+
If you don't specify `image`, `dstack` uses its [base :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/tree/master/docker/base){:target="_blank"} Docker image pre-configured with
48+
`uv`, `python`, `pip`, essential CUDA drivers, `mpirun`, and NCCL tests (under `/opt/nccl-tests/build`).
4449

45-
!!! info "Docker image"
46-
The `dstackai/efa` image used in the example comes with MPI and NCCL tests pre-installed. While it is optimized for
47-
[AWS EFA :material-arrow-top-right-thin:{ .external }](https://aws.amazon.com/hpc/efa/){:target="_blank"}, it can also
48-
be used with regular TCP/IP network adapters and InfiniBand.
49-
50-
See the [source code :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/docker/efa) for the image.
50+
!!! info "Privileged"
51+
In some cases, the backend (e.g., `kubernetes`) may require `privileged: true` to access the high-speed interconnect (e.g., InfiniBand).
5152

5253
### Apply a configuration
5354

0 commit comments

Comments
 (0)