Skip to content

Commit 40d072c

Browse files
Update examples (#3007)
* remove `dstack init` prerequisite, as repos are optional since 0.19.25 * replace repo paths with `files` Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
1 parent 0eaf494 commit 40d072c

File tree

26 files changed

+336
-411
lines changed

26 files changed

+336
-411
lines changed

docs/blog/posts/amd-on-tensorwave.md

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
22
title: Using SSH fleets with TensorWave's private AMD cloud
33
date: 2025-03-11
4-
description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."
4+
description: "This tutorial walks you through how dstack can be used with TensorWave's private AMD cloud using SSH fleets."
55
slug: amd-on-tensorwave
66
image: https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png
77
categories:
88
- Case studies
99
---
1010

11-
# Using SSH fleets with TensorWave's private AMD cloud
11+
# Using SSH fleets with TensorWave's private AMD cloud
1212

1313
Since last month, when we introduced support for private clouds and data centers, it has become easier to use `dstack`
1414
to orchestrate AI containers with any AI cloud vendor, whether they provide on-demand compute or reserved clusters.
1515

1616
In this tutorial, we’ll walk you through how `dstack` can be used with
17-
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
17+
[TensorWave :material-arrow-top-right-thin:{ .external }](https://tensorwave.com/){:target="_blank"} using
1818
[SSH fleets](../../docs/concepts/fleets.md#ssh).
1919

2020
<img src="https://dstack.ai/static-assets/static-assets/images/dstack-tensorwave-v2.png" width="630"/>
@@ -32,13 +32,12 @@ TensorWave dashboard.
3232
## Creating a fleet
3333

3434
??? info "Prerequisites"
35-
Once `dstack` is [installed](https://dstack.ai/docs/installation), create a project repo folder and run `dstack init`.
35+
Once `dstack` is [installed](https://dstack.ai/docs/installation), create a project folder.
3636

3737
<div class="termy">
3838

3939
```shell
4040
$ mkdir tensorwave-demo && cd tensorwave-demo
41-
$ dstack init
4241
```
4342

4443
</div>
@@ -79,9 +78,9 @@ $ dstack apply -f fleet.dstack.yml
7978
Provisioning...
8079
---> 100%
8180
82-
FLEET INSTANCE RESOURCES STATUS CREATED
83-
my-tensorwave-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
84-
1 8xMI300X (192GB) 0/8 busy 3 mins ago
81+
FLEET INSTANCE RESOURCES STATUS CREATED
82+
my-tensorwave-fleet 0 8xMI300X (192GB) 0/8 busy 3 mins ago
83+
1 8xMI300X (192GB) 0/8 busy 3 mins ago
8584
8685
```
8786

@@ -98,7 +97,7 @@ Once the fleet is created, you can use `dstack` to run workloads.
9897

9998
A dev environment lets you access an instance through your desktop IDE.
10099

101-
<div editor-title=".dstack.yml">
100+
<div editor-title=".dstack.yml">
102101

103102
```yaml
104103
type: dev-environment
@@ -137,9 +136,9 @@ Open the link to access the dev environment using your desktop IDE.
137136
138137
A task allows you to schedule a job or run a web app. Tasks can be distributed and support port forwarding.
139138
140-
Below is a distributed training task configuration:
139+
Below is a distributed training task configuration:
141140
142-
<div editor-title="train.dstack.yml">
141+
<div editor-title="train.dstack.yml">
143142
144143
```yaml
145144
type: task
@@ -175,7 +174,7 @@ Provisioning `train-distrib`...
175174

176175
</div>
177176

178-
`dstack` automatically runs the container on each node while passing
177+
`dstack` automatically runs the container on each node while passing
179178
[system environment variables](../../docs/concepts/tasks.md#system-environment-variables)
180179
which you can use with `torchrun`, `accelerate`, or other distributed frameworks.
181180

@@ -185,7 +184,7 @@ A service allows you to deploy a model or any web app as a scalable and secure e
185184

186185
Create the following configuration file inside the repo:
187186

188-
<div editor-title="deepseek.dstack.yml">
187+
<div editor-title="deepseek.dstack.yml">
189188

190189
```yaml
191190
type: service
@@ -196,7 +195,7 @@ env:
196195
- MODEL_ID=deepseek-ai/DeepSeek-R1
197196
- HSA_NO_SCRATCH_RECLAIM=1
198197
commands:
199-
- python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code
198+
- python3 -m sglang.launch_server --model-path $MODEL_ID --port 8000 --tp 8 --trust-remote-code
200199
port: 8000
201200
model: deepseek-ai/DeepSeek-R1
202201

@@ -221,7 +220,7 @@ Submit the run `deepseek-r1-sglang`? [y/n]: y
221220
Provisioning `deepseek-r1-sglang`...
222221
---> 100%
223222

224-
Service is published at:
223+
Service is published at:
225224
http://localhost:3000/proxy/services/main/deepseek-r1-sglang/
226225
Model deepseek-ai/DeepSeek-R1 is published at:
227226
http://localhost:3000/proxy/models/main/
@@ -236,6 +235,6 @@ Want to see how it works? Check out the video below:
236235
<iframe width="750" height="520" src="https://www.youtube.com/embed/b1vAgm5fCfE?si=qw2gYHkMjERohdad&rel=0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
237236
238237
!!! info "What's next?"
239-
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
238+
1. See [SSH fleets](../../docs/concepts/fleets.md#ssh)
240239
2. Read about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), and [services](../../docs/concepts/services.md)
241240
3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)

examples/accelerators/amd/README.md

Lines changed: 36 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
# AMD
22

33
`dstack` supports running dev environments, tasks, and services on AMD GPUs.
4-
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
4+
You can do that by setting up an [SSH fleet](https://dstack.ai/docs/concepts/fleets#ssh)
55
with on-prem AMD GPUs or configuring a backend that offers AMD GPUs such as the `runpod` backend.
66

77
## Deployment
88

9-
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
9+
Most serving frameworks including vLLM and TGI have AMD support. Here's an example of a [service](https://dstack.ai/docs/services) that deploys
1010
Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](https://huggingface.co/docs/text-generation-inference/en/installation_amd){:target="_blank"} and [vLLM :material-arrow-top-right-thin:{ .external }](https://docs.vllm.ai/en/latest/getting_started/amd-installation.html){:target="_blank"}.
1111

1212
=== "TGI"
13-
14-
<div editor-title="examples/inference/tgi/amd/.dstack.yml">
15-
13+
14+
<div editor-title="examples/inference/tgi/amd/.dstack.yml">
15+
1616
```yaml
1717
type: service
1818
name: amd-service-tgi
19-
19+
2020
# Using the official TGI's ROCm Docker image
2121
image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
2222

@@ -30,26 +30,26 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
3030
port: 8000
3131
# Register the model
3232
model: meta-llama/Meta-Llama-3.1-70B-Instruct
33-
33+
3434
# Uncomment to leverage spot instances
3535
#spot_policy: auto
36-
36+
3737
resources:
3838
gpu: MI300X
3939
disk: 150GB
4040
```
41-
41+
4242
</div>
4343

4444

4545
=== "vLLM"
4646

47-
<div editor-title="examples/inference/vllm/amd/.dstack.yml">
48-
47+
<div editor-title="examples/inference/vllm/amd/.dstack.yml">
48+
4949
```yaml
5050
type: service
5151
name: llama31-service-vllm-amd
52-
52+
5353
# Using RunPod's ROCm Docker image
5454
image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
5555
# Required environment variables
@@ -84,20 +84,20 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
8484
port: 8000
8585
# Register the model
8686
model: meta-llama/Meta-Llama-3.1-70B-Instruct
87-
87+
8888
# Uncomment to leverage spot instances
8989
#spot_policy: auto
90-
90+
9191
resources:
9292
gpu: MI300X
9393
disk: 200GB
9494
```
9595
</div>
9696

9797
Note, maximum size of vLLM’s `KV cache` is 126192, consequently we must set `MAX_MODEL_LEN` to 126192. Adding `/opt/conda/envs/py_3.10/bin` to PATH ensures we use the Python 3.10 environment necessary for the pre-built binaries compiled specifically for this version.
98-
99-
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100-
> You can find the task to build and upload the binary in
98+
99+
> To speed up the `vLLM-ROCm` installation, we use a pre-built binary from S3.
100+
> You can find the task to build and upload the binary in
101101
> [`examples/inference/vllm/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd/){:target="_blank"}.
102102

103103
!!! info "Docker image"
@@ -110,22 +110,25 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
110110

111111
=== "TRL"
112112

113-
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
113+
Below is an example of LoRA fine-tuning Llama 3.1 8B using [TRL :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/single-gpu-fine-tuning-and-inference.html){:target="_blank"}
114114
and the [`mlabonne/guanaco-llama2-1k` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k){:target="_blank"}
115115
dataset.
116-
116+
117117
<div editor-title="examples/single-node-training/trl/amd/.dstack.yml">
118-
118+
119119
```yaml
120120
type: task
121121
name: trl-amd-llama31-train
122-
122+
123123
# Using RunPod's ROCm Docker image
124124
image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
125125

126126
# Required environment variables
127127
env:
128128
- HF_TOKEN
129+
# Mount files
130+
files:
131+
- train.py
129132
# Commands of the task
130133
commands:
131134
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -140,25 +143,25 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
140143
- pip install peft
141144
- pip install transformers datasets huggingface-hub scipy
142145
- cd ..
143-
- python examples/single-node-training/trl/amd/train.py
144-
146+
- python train.py
147+
145148
# Uncomment to leverage spot instances
146149
#spot_policy: auto
147-
150+
148151
resources:
149152
gpu: MI300X
150153
disk: 150GB
151154
```
152-
155+
153156
</div>
154157

155158
=== "Axolotl"
156-
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
159+
Below is an example of fine-tuning Llama 3.1 8B using [Axolotl :material-arrow-top-right-thin:{ .external }](https://rocm.blogs.amd.com/artificial-intelligence/axolotl/README.html){:target="_blank"}
157160
and the [tatsu-lab/alpaca :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/tatsu-lab/alpaca){:target="_blank"}
158161
dataset.
159-
162+
160163
<div editor-title="examples/single-node-training/axolotl/amd/.dstack.yml">
161-
164+
162165
```yaml
163166
type: task
164167
# The name is optional, if not specified, generated randomly
@@ -198,9 +201,9 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
198201
- make
199202
- pip install .
200203
- cd ..
201-
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
202-
--wandb-project "$WANDB_PROJECT"
203-
--wandb-name "$WANDB_NAME"
204+
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
205+
--wandb-project "$WANDB_PROJECT"
206+
--wandb-name "$WANDB_NAME"
204207
--hub-model-id "$HUB_MODEL_ID"
205208

206209
resources:
@@ -211,7 +214,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
211214

212215
Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
213216

214-
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
217+
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
215218
> You can find the tasks that build and upload the binaries
216219
> in [`examples/single-node-training/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd/){:target="_blank"}.
217220

@@ -235,7 +238,7 @@ $ dstack apply -f examples/inference/vllm/amd/.dstack.yml
235238

236239
## Source code
237240

238-
The source-code of this example can be found in
241+
The source-code of this example can be found in
239242
[`examples/inference/tgi/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/amd){:target="_blank"},
240243
[`examples/inference/vllm/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd){:target="_blank"},
241244
[`examples/single-node-training/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd){:target="_blank"} and

0 commit comments

Comments
 (0)