Skip to content

Commit 7364f8f

Browse files
[Docs] Renamed Fine-tuning to Single-node training for more clarity and consistence
1 parent 874099c commit 7364f8f

40 files changed

Lines changed: 85 additions & 552 deletions

docs/blog/posts/intel-gaudi.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ model using [Optimum for Intel Gaudi :material-arrow-top-right-thin:{ .external
9898
and [DeepSpeed :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide/DeepSpeed_User_Guide.html#deepspeed-user-guide){:target="_blank"} with
9999
the [`lvwerra/stack-exchange-paired` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/lvwerra/stack-exchange-paired){:target="_blank"} dataset:
100100

101-
<div editor-title="examples/fine-tuning/trl/intel/.dstack.yml">
101+
<div editor-title="examples/single-node-training/trl/intel/.dstack.yml">
102102

103103
```yaml
104104
type: task
@@ -152,7 +152,7 @@ Submit the task using the [`dstack apply`](../../docs/reference/cli/dstack/apply
152152
<div class="termy">
153153

154154
```shell
155-
$ dstack apply -f examples/fine-tuning/trl/intel/.dstack.yml -R
155+
$ dstack apply -f examples/single-node-training/trl/intel/.dstack.yml -R
156156
```
157157

158158
</div>

docs/blog/posts/tpu-on-gcp.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Below is an example of fine-tuning Llama 3.1 8B using [Optimum TPU :material-arr
158158
and the [Abirate/english_quotes :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/Abirate/english_quotes){:target="_blank"}
159159
dataset.
160160

161-
<div editor-title="examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml">
161+
<div editor-title="examples/single-node-training/optimum-tpu/llama31/train.dstack.yml">
162162

163163
```yaml
164164
type: task
@@ -171,8 +171,8 @@ env:
171171
commands:
172172
- git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
173173
- mkdir -p optimum-tpu/examples/custom/
174-
- cp examples/fine-tuning/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
175-
- cp examples/fine-tuning/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
174+
- cp examples/single-node-training/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
175+
- cp examples/single-node-training/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
176176
- cd optimum-tpu
177177
- pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
178178
- pip install datasets evaluate

docs/docs/concepts/tasks.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The filename must end with `.dstack.yml` (e.g. `.dstack.yml` or `dev.dstack.yml`
1010

1111
[//]: # (TODO: Make tabs - single machine & distributed tasks & web app)
1212

13-
<div editor-title="examples/fine-tuning/axolotl/train.dstack.yml">
13+
<div editor-title="examples/single-node-training/axolotl/train.dstack.yml">
1414

1515
```yaml
1616
type: task
@@ -26,7 +26,7 @@ env:
2626
- WANDB_API_KEY
2727
# Commands of the task
2828
commands:
29-
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
29+
- accelerate launch -m axolotl.cli.train examples/single-node-training/axolotl/config.yaml
3030

3131
resources:
3232
gpu:
@@ -461,4 +461,4 @@ it does not block other runs with lower priority from scheduling.
461461
!!! info "What's next?"
462462
1. Read about [dev environments](dev-environments.md), [services](services.md), and [repos](repos.md)
463463
2. Learn how to manage [fleets](fleets.md)
464-
3. Check the [Axolotl](/examples/fine-tuning/axolotl) example
464+
3. Check the [Axolotl](/examples/single-node-training/axolotl) example

docs/examples.md

Lines changed: 34 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ hide:
1212
}
1313
</style>
1414

15-
## Fine-tuning
15+
## Single-node training
1616

1717
<div class="tx-landing__highlights_grid">
18-
<a href="/examples/fine-tuning/axolotl"
18+
<a href="/examples/single-node-training/axolotl"
1919
class="feature-cell">
2020
<h3>
2121
Axolotl
@@ -26,7 +26,7 @@ hide:
2626
</p>
2727
</a>
2828

29-
<a href="/examples/fine-tuning/trl"
29+
<a href="/examples/single-node-training/trl"
3030
class="feature-cell">
3131
<h3>
3232
TRL
@@ -38,85 +38,86 @@ hide:
3838
</a>
3939
</div>
4040

41-
## Clusters
41+
## Distributed training
4242

4343
<div class="tx-landing__highlights_grid">
44-
<a href="/examples/clusters/nccl-tests"
44+
<a href="/examples/distributed-training/ray-ragen"
4545
class="feature-cell sky">
4646
<h3>
47-
NCCL tests
47+
Ray+RAGEN
4848
</h3>
4949

5050
<p>
51-
Run multi-node NCCL tests with MPI
51+
Fine-tune an agent on multiple nodes
52+
with RAGEN, verl, and Ray.
5253
</p>
5354
</a>
54-
<a href="/examples/clusters/rccl-tests"
55+
<a href="/examples/distributed-training/trl"
5556
class="feature-cell sky">
5657
<h3>
57-
RCCL tests
58+
TRL
5859
</h3>
5960

6061
<p>
61-
Run multi-node RCCL tests with MPI
62+
Fine-tune LLM on multiple nodes
63+
with TRL, Accelerate, and Deepspeed.
6264
</p>
6365
</a>
64-
<a href="/examples/clusters/a3mega"
66+
<a href="/examples/distributed-training/axolotl"
6567
class="feature-cell sky">
6668
<h3>
67-
A3 Mega
69+
Axolotl
6870
</h3>
6971

7072
<p>
71-
Set up GCP A3 Mega clusters with optimized networking
73+
Fine-tune LLM on multiple nodes
74+
with Axolotl.
7275
</p>
7376
</a>
74-
<a href="/examples/clusters/a3high"
77+
</div>
78+
79+
80+
## Clusters
81+
82+
<div class="tx-landing__highlights_grid">
83+
<a href="/examples/clusters/nccl-tests"
7584
class="feature-cell sky">
7685
<h3>
77-
A3 High
86+
NCCL tests
7887
</h3>
7988

8089
<p>
81-
Set up GCP A3 High clusters with optimized networking
90+
Run multi-node NCCL tests with MPI
8291
</p>
8392
</a>
84-
</div>
85-
86-
## Distributed training
87-
88-
<div class="tx-landing__highlights_grid">
89-
<a href="/examples/distributed-training/ray-ragen"
93+
<a href="/examples/clusters/rccl-tests"
9094
class="feature-cell sky">
9195
<h3>
92-
Ray+RAGEN
96+
RCCL tests
9397
</h3>
9498

9599
<p>
96-
Fine-tune an agent on multiple nodes
97-
with RAGEN, verl, and Ray.
100+
Run multi-node RCCL tests with MPI
98101
</p>
99102
</a>
100-
<a href="/examples/distributed-training/trl"
103+
<a href="/examples/clusters/a3mega"
101104
class="feature-cell sky">
102105
<h3>
103-
TRL
106+
A3 Mega
104107
</h3>
105108

106109
<p>
107-
Fine-tune LLM on multiple nodes
108-
with TRL, Accelerate, and Deepspeed.
110+
Set up GCP A3 Mega clusters with optimized networking
109111
</p>
110112
</a>
111-
<a href="/examples/distributed-training/axolotl"
113+
<a href="/examples/clusters/a3high"
112114
class="feature-cell sky">
113115
<h3>
114-
Axolotl
116+
A3 High
115117
</h3>
116118

117119
<p>
118-
Fine-tune LLM on multiple nodes
119-
with Axolotl.
120+
Set up GCP A3 High clusters with optimized networking
120121
</p>
121122
</a>
122123
</div>
@@ -219,31 +220,6 @@ hide:
219220
</a>
220221
</div>
221222

222-
## LLMs
223-
224-
<div class="tx-landing__highlights_grid">
225-
<a href="/examples/llms/deepseek"
226-
class="feature-cell sky">
227-
<h3>
228-
Deepseek
229-
</h3>
230-
231-
<p>
232-
Deploy and train Deepseek models
233-
</p>
234-
</a>
235-
<a href="/examples/llms/llama"
236-
class="feature-cell sky">
237-
<h3>
238-
Llama
239-
</h3>
240-
241-
<p>
242-
Deploy Llama 4 models
243-
</p>
244-
</a>
245-
</div>
246-
247223
## Misc
248224

249225
<div class="tx-landing__highlights_grid">
File renamed without changes.
File renamed without changes.

docs/overrides/main.html

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,12 +117,11 @@
117117

118118
<div class="tx-footer__section">
119119
<div class="tx-footer__section-title">Examples</div>
120-
<a href="/examples#fine-tuning" class="tx-footer__section-link">Fine-tuning</a>
120+
<a href="/examples#fine-tuning" class="tx-footer__section-link">Single-node training</a>
121121
<a href="/examples#clusters" class="tx-footer__section-link">Clusters</a>
122122
<a href="/examples#distributed-training" class="tx-footer__section-link">Distributed training</a>
123123
<a href="/examples#inference" class="tx-footer__section-link">Inference</a>
124124
<a href="/examples#accelerators" class="tx-footer__section-link">Accelerators</a>
125-
<a href="/examples#llms" class="tx-footer__section-link">LLMs</a>
126125
<!-- <a href="/examples#misc" class="tx-footer__section-link">Misc</a> -->
127126
</div>
128127

examples/accelerators/amd/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
114114
and the [`mlabonne/guanaco-llama2-1k` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k){:target="_blank"}
115115
dataset.
116116

117-
<div editor-title="examples/fine-tuning/trl/amd/.dstack.yml">
117+
<div editor-title="examples/single-node-training/trl/amd/.dstack.yml">
118118

119119
```yaml
120120
type: task
@@ -140,7 +140,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
140140
- pip install peft
141141
- pip install transformers datasets huggingface-hub scipy
142142
- cd ..
143-
- python examples/fine-tuning/trl/amd/train.py
143+
- python examples/single-node-training/trl/amd/train.py
144144

145145
# Uncomment to leverage spot instances
146146
#spot_policy: auto
@@ -157,7 +157,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
157157
and the [tatsu-lab/alpaca :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/tatsu-lab/alpaca){:target="_blank"}
158158
dataset.
159159

160-
<div editor-title="examples/fine-tuning/axolotl/amd/.dstack.yml">
160+
<div editor-title="examples/single-node-training/axolotl/amd/.dstack.yml">
161161

162162
```yaml
163163
type: task
@@ -213,7 +213,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
213213

214214
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
215215
> You can find the tasks that build and upload the binaries
216-
> in [`examples/fine-tuning/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/amd/){:target="_blank"}.
216+
> in [`examples/single-node-training/axolotl/amd/` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd/){:target="_blank"}.
217217

218218
## Running a configuration
219219

@@ -238,8 +238,8 @@ $ dstack apply -f examples/inference/vllm/amd/.dstack.yml
238238
The source-code of this example can be found in
239239
[`examples/inference/tgi/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/amd){:target="_blank"},
240240
[`examples/inference/vllm/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/amd){:target="_blank"},
241-
[`examples/fine-tuning/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/amd){:target="_blank"} and
242-
[`examples/fine-tuning/trl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/trl/amd){:target="_blank"}
241+
[`examples/single-node-training/axolotl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/axolotl/amd){:target="_blank"} and
242+
[`examples/single-node-training/trl/amd` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/trl/amd){:target="_blank"}
243243

244244
## What's next?
245245

examples/accelerators/intel/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ using [Optimum for Intel Gaudi :material-arrow-top-right-thin:{ .external }](htt
102102
and [DeepSpeed :material-arrow-top-right-thin:{ .external }](https://docs.habana.ai/en/latest/PyTorch/DeepSpeed/DeepSpeed_User_Guide/DeepSpeed_User_Guide.html#deepspeed-user-guide){:target="_blank"} with
103103
the [`lvwerra/stack-exchange-paired` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/lvwerra/stack-exchange-paired){:target="_blank"} dataset.
104104

105-
<div editor-title="examples/fine-tuning/trl/intel/.dstack.yml">
105+
<div editor-title="examples/single-node-training/trl/intel/.dstack.yml">
106106

107107
```yaml
108108
type: task

examples/accelerators/tpu/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ Below is an example of fine-tuning Llama 3.1 8B using [Optimum TPU :material-arr
127127
and the [`Abirate/english_quotes` :material-arrow-top-right-thin:{ .external }](https://huggingface.co/datasets/Abirate/english_quotes){:target="_blank"}
128128
dataset.
129129

130-
<div editor-title="examples/fine-tuning/optimum-tpu/llama31/.dstack.yml">
130+
<div editor-title="examples/single-node-training/optimum-tpu/llama31/.dstack.yml">
131131

132132
```yaml
133133
type: task
@@ -139,8 +139,8 @@ env:
139139
commands:
140140
- git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
141141
- mkdir -p optimum-tpu/examples/custom/
142-
- cp examples/fine-tuning/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
143-
- cp examples/fine-tuning/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
142+
- cp examples/single-node-training/optimum-tpu/llama31/train.py optimum-tpu/examples/custom/train.py
143+
- cp examples/single-node-training/optimum-tpu/llama31/config.yaml optimum-tpu/examples/custom/config.yaml
144144
- cd optimum-tpu
145145
- pip install -e . -f https://storage.googleapis.com/libtpu-releases/index.html
146146
- pip install datasets evaluate
@@ -155,7 +155,7 @@ resources:
155155
</div>
156156
157157
[//]: # (### Fine-Tuning with TRL)
158-
[//]: # (Use the example `examples/fine-tuning/optimum-tpu/gemma/train.dstack.yml` to Finetune `Gemma-2B` model using `trl` with `dstack` and `optimum-tpu`. )
158+
[//]: # (Use the example `examples/single-node-training/optimum-tpu/gemma/train.dstack.yml` to Finetune `Gemma-2B` model using `trl` with `dstack` and `optimum-tpu`. )
159159

160160
### Memory requirements
161161

@@ -181,7 +181,7 @@ Note, `v5litepod` is optimized for fine-tuning transformer-based models. Each co
181181
The source-code of this example can be found in
182182
[`examples/inference/tgi/tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/tgi/tpu){:target="_blank"},
183183
[`examples/inference/vllm/tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/inference/vllm/tpu){:target="_blank"},
184-
and [`examples/fine-tuning/optimum-tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/trl){:target="_blank"}.
184+
and [`examples/single-node-training/optimum-tpu` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/single-node-training/trl){:target="_blank"}.
185185

186186
## What's next?
187187

0 commit comments

Comments
 (0)