Skip to content

Commit 119f7b8

Browse files
[Examples] Minor improvements regarding TRL and Axolotl
1 parent 6cfa0b6 commit 119f7b8

14 files changed

Lines changed: 154 additions & 193 deletions

File tree

docs/examples.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,43 +15,32 @@ hide:
1515
## Single-node training
1616

1717
<div class="tx-landing__highlights_grid">
18-
<a href="/examples/single-node-training/axolotl"
18+
<a href="/examples/single-node-training/trl"
1919
class="feature-cell">
2020
<h3>
21-
Axolotl
21+
TRL
2222
</h3>
2323

2424
<p>
25-
Fine-tune Llama 4 on a custom dataset using Axolotl.
25+
Fine-tune Llama 3.1 8B on a custom dataset using TRL.
2626
</p>
2727
</a>
2828

29-
<a href="/examples/single-node-training/trl"
29+
<a href="/examples/single-node-training/axolotl"
3030
class="feature-cell">
3131
<h3>
32-
TRL
32+
Axolotl
3333
</h3>
3434

3535
<p>
36-
Fine-tune Llama 3.1 8B on a custom dataset using TRL.
36+
Fine-tune Llama 4 on a custom dataset using Axolotl.
3737
</p>
3838
</a>
3939
</div>
4040

4141
## Distributed training
4242

4343
<div class="tx-landing__highlights_grid">
44-
<a href="/examples/distributed-training/ray-ragen"
45-
class="feature-cell sky">
46-
<h3>
47-
Ray+RAGEN
48-
</h3>
49-
50-
<p>
51-
Fine-tune an agent on multiple nodes
52-
with RAGEN, verl, and Ray.
53-
</p>
54-
</a>
5544
<a href="/examples/distributed-training/trl"
5645
class="feature-cell sky">
5746
<h3>
@@ -74,6 +63,17 @@ hide:
7463
with Axolotl.
7564
</p>
7665
</a>
66+
<a href="/examples/distributed-training/ray-ragen"
67+
class="feature-cell sky">
68+
<h3>
69+
Ray+RAGEN
70+
</h3>
71+
72+
<p>
73+
Fine-tune an agent on multiple nodes
74+
with RAGEN, verl, and Ray.
75+
</p>
76+
</a>
7777
</div>
7878

7979

docs/overrides/main.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,8 @@
118118
<div class="tx-footer__section">
119119
<div class="tx-footer__section-title">Examples</div>
120120
<a href="/examples#fine-tuning" class="tx-footer__section-link">Single-node training</a>
121-
<a href="/examples#clusters" class="tx-footer__section-link">Clusters</a>
122121
<a href="/examples#distributed-training" class="tx-footer__section-link">Distributed training</a>
122+
<a href="/examples#clusters" class="tx-footer__section-link">Clusters</a>
123123
<a href="/examples#inference" class="tx-footer__section-link">Inference</a>
124124
<a href="/examples#accelerators" class="tx-footer__section-link">Accelerators</a>
125125
<!-- <a href="/examples#misc" class="tx-footer__section-link">Misc</a> -->

examples/distributed-training/axolotl/.dstack.yml

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
type: task
2-
# The name is optional, if not specified, generated randomly
32
name: axolotl-multi-node-qlora-llama3-70b
43

54
# Size of the cluster
@@ -10,13 +9,12 @@ image: nvcr.io/nvidia/pytorch:25.01-py3
109
# Required environment variables
1110
env:
1211
- HF_TOKEN
13-
- ACCELERATE_LOG_LEVEL=info
1412
- WANDB_API_KEY
15-
- NCCL_DEBUG=INFO
16-
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
17-
- WANDB_NAME=axolotl-dist-llama-qlora-train
1813
- WANDB_PROJECT
1914
- HUB_MODEL_ID
15+
- NCCL_DEBUG=INFO
16+
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
17+
- ACCELERATE_LOG_LEVEL=info
2018
# Commands of the task
2119
commands:
2220
# Replacing the default Torch and FlashAttention in the NCG container with Axolotl-compatible versions.
@@ -35,15 +33,14 @@ commands:
3533
-m axolotl.cli.train qlora-fsdp-70b.yaml \
3634
--hub-model-id $HUB_MODEL_ID \
3735
--output-dir /checkpoints/qlora-llama3-70b \
38-
--wandb-project $WANDB_PROJECT \
36+
--wandb-project $DSTACK_RUN_NAME \
3937
--wandb-name $WANDB_NAME \
4038
--main_process_ip=$DSTACK_MASTER_NODE_IP \
4139
--main_process_port=8008 \
4240
--machine_rank=$DSTACK_NODE_RANK \
4341
--num_processes=$DSTACK_GPUS_NUM \
4442
--num_machines=$DSTACK_NODES_NUM
4543
46-
4744
resources:
4845
gpu: 80GB:8
4946
shm_size: 128GB

examples/distributed-training/axolotl/README.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Axolotl
22

3-
This example walks you through how to run distributed fine-tune using [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) with `dstack`.
3+
This example walks you through how to run distributed fine-tune using [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl){:target="_blank"} with `dstack`.
44

55
??? info "Prerequisites"
66
Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
@@ -14,38 +14,35 @@ This example walks you through how to run distributed fine-tune using [Axolotl](
1414
```
1515
</div>
1616

17-
## Create fleet
17+
## Create a fleet
1818

1919
Before submitting distributed training runs, make sure to create a fleet with a `placement` set to `cluster`.
2020

2121
> For more detials on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
2222
23-
## Run Distributed Training
23+
## Define a configuration
24+
2425
Once the fleet is created, define a distributed task configuration. Here's an example of distributed `QLORA` task using `FSDP`.
2526

2627
<div editor-title="examples/distributed-training/axolotl/.dstack.yml">
2728

2829
```yaml
2930
type: task
30-
# The name is optional, if not specified, generated randomly
3131
name: axolotl-multi-node-qlora-llama3-70b
3232

33-
# Size of the cluster
3433
nodes: 2
3534

36-
# The axolotlai/axolotl:main-latest image does not include InfiniBand or RDMA libraries, so we need to use the NGC container.
3735
image: nvcr.io/nvidia/pytorch:25.01-py3
38-
# Required environment variables
36+
3937
env:
4038
- HF_TOKEN
41-
- ACCELERATE_LOG_LEVEL=info
4239
- WANDB_API_KEY
43-
- NCCL_DEBUG=INFO
44-
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
45-
- WANDB_NAME=axolotl-dist-llama-qlora-train
4640
- WANDB_PROJECT
4741
- HUB_MODEL_ID
48-
# Commands of the task
42+
- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
43+
- NCCL_DEBUG=INFO
44+
- ACCELERATE_LOG_LEVEL=info
45+
4946
commands:
5047
# Replacing the default Torch and FlashAttention in the NCG container with Axolotl-compatible versions.
5148
# The preinstalled versions are incompatible with Axolotl.
@@ -64,7 +61,7 @@ commands:
6461
--hub-model-id $HUB_MODEL_ID \
6562
--output-dir /checkpoints/qlora-llama3-70b \
6663
--wandb-project $WANDB_PROJECT \
67-
--wandb-name $WANDB_NAME \
64+
--wandb-name $DSTACK_RUN_NAME \
6865
--main_process_ip=$DSTACK_MASTER_NODE_IP \
6966
--main_process_port=8008 \
7067
--machine_rank=$DSTACK_NODE_RANK \
@@ -80,10 +77,11 @@ volumes:
8077
```
8178
</div>
8279
83-
!!! Note
84-
We are using the NGC container because it includes the necessary libraries and packages for RDMA and InfiniBand support.
80+
!!! info "Docker image"
81+
We are using `nvcr.io/nvidia/pytorch:25.01-py3` from NGC because it includes the necessary libraries and packages for RDMA and InfiniBand support.
82+
83+
### Apply the configuration
8584

86-
### Applying the configuration
8785
To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
8886

8987
<div class="termy">
@@ -112,5 +110,6 @@ The source-code of this example can be found in
112110
[`examples/distributed-training/axolotl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/distributed-training/axolotl).
113111

114112
!!! info "What's next?"
115-
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
116-
[services](https://dstack.ai/docs/services), [clusters](https://dstack.ai/docs/guides/clusters) and [protips](https://dstack.ai/docs/protips).
113+
1. Read the [clusters](https://dstack.ai/docs/guides/clusters) guide
114+
2. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
115+
[services](https://dstack.ai/docs/concepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets)

examples/single-node-training/pytorch-distributed/train.dstack.yml renamed to examples/distributed-training/pytorch-distributed/train.dstack.yml

File renamed without changes.

examples/distributed-training/trl/README.md

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TRL
22

3-
This example walks you through how to run distributed fine-tune using [TRL](https://github.com/huggingface/trl), [Accelerate](https://github.com/huggingface/accelerate) and [Deepspeed](https://github.com/deepspeedai/DeepSpeed) with `dstack`.
3+
This example walks you through how to run distributed fine-tune using [TRL :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/trl){:target="_blank"}, [Accelerate :material-arrow-top-right-thin:{ .external }](https://github.com/huggingface/accelerate){:target="_blank"} and [Deepspeed :material-arrow-top-right-thin:{ .external }](https://github.com/deepspeedai/DeepSpeed){:target="_blank"}.
44

55
??? info "Prerequisites"
66
Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
@@ -20,32 +20,28 @@ Before submitting distributed training runs, make sure to create a fleet with a
2020

2121
> For more detials on how to use clusters with `dstack`, check the [Clusters](https://dstack.ai/docs/guides/clusters) guide.
2222
23-
## Run Distributed Training
24-
Once the fleet is created, define a distributed task configuration. Here's an example of distributed Supervised Fine-Tuning (SFT) task using `FSDP` and `Deepseed ZeRO-3`.
23+
## Define a configurtation
2524

25+
Once the fleet is created, define a distributed task configuration. Here's an example of such a task.
2626

2727
=== "FSDP"
2828

2929
<div editor-title="examples/distributed-training/trl/fsdp.dstack.yml">
3030
```yaml
3131
type: task
32-
# The name is optional, if not specified, generated randomly
3332
name: trl-train-fsdp-distrib
3433

35-
# Size of the cluster
3634
nodes: 2
3735

3836
image: nvcr.io/nvidia/pytorch:25.01-py3
3937

40-
# Required environment variables
4138
env:
4239
- HF_TOKEN
4340
- ACCELERATE_LOG_LEVEL=info
4441
- WANDB_API_KEY
4542
- MODEL_ID=meta-llama/Llama-3.1-8B
4643
- HUB_MODEL_ID
4744

48-
# Commands of the task
4945
commands:
5046
- pip install transformers bitsandbytes peft wandb
5147
- git clone https://github.com/huggingface/trl
@@ -90,23 +86,19 @@ Once the fleet is created, define a distributed task configuration. Here's an ex
9086
<div editor-title="examples/distributed-training/trl/deepspeed.dstack.yml">
9187
```yaml
9288
type: task
93-
# The name is optional, if not specified, generated randomly
9489
name: trl-train-deepspeed-distrib
9590

96-
# Size of the cluster
9791
nodes: 2
9892

9993
image: nvcr.io/nvidia/pytorch:25.01-py3
10094

101-
# Required environment variables
10295
env:
10396
- HF_TOKEN
104-
- ACCELERATE_LOG_LEVEL=info
10597
- WANDB_API_KEY
106-
- MODEL_ID=meta-llama/Llama-3.1-8B
10798
- HUB_MODEL_ID
99+
- MODEL_ID=meta-llama/Llama-3.1-8B
100+
- ACCELERATE_LOG_LEVEL=info
108101

109-
# Commands of the task
110102
commands:
111103
- pip install transformers bitsandbytes peft wandb deepspeed
112104
- git clone https://github.com/huggingface/trl
@@ -146,11 +138,11 @@ Once the fleet is created, define a distributed task configuration. Here's an ex
146138
```
147139
</div>
148140

141+
!!! info "Docker image"
142+
We are using `nvcr.io/nvidia/pytorch:25.01-py3` from NGC because it includes the necessary libraries and packages for RDMA and InfiniBand support.
149143

150-
!!! Note
151-
We are using the NGC container because it includes the necessary libraries and packages for RDMA and InfiniBand support.
144+
### Apply the configuration
152145

153-
### Applying the configuration
154146
To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/reference/cli/dstack/apply.md) command.
155147

156148
<div class="termy">
@@ -175,8 +167,10 @@ Provisioning...
175167
## Source code
176168

177169
The source-code of this example can be found in
178-
[`examples/distributed-training/trl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/distributed-training/trl).
170+
[`examples/distributed-training/trl` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/distributed-training/trl){:target="_blank"}.
179171

180172
!!! info "What's next?"
181-
1. Check [dev environments](https://dstack.ai/docs/dev-environments), [tasks](https://dstack.ai/docs/tasks),
182-
[services](https://dstack.ai/docs/services),[clusters](https://dstack.ai/docs/guides/clusters) and [protips](https://dstack.ai/docs/protips).
173+
1. Read the [clusters](https://dstack.ai/docs/guides/clusters) guide
174+
2. Check [dev environments](https://dstack.ai/docs/concepts/dev-environments), [tasks](https://dstack.ai/docs/concepts/tasks),
175+
[services](https://dstack.ai/docs/concepts/services), and [fleets](https://dstack.ai/docs/concepts/fleets)
176+

examples/distributed-training/trl/deepspeed.dstack.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
type: task
2-
# The name is optional, if not specified, generated randomly
32
name: trl-train-deepspeed-distrib
43

54
# Size of the cluster
@@ -10,10 +9,10 @@ image: nvcr.io/nvidia/pytorch:25.01-py3
109
# Required environment variables
1110
env:
1211
- HF_TOKEN
13-
- ACCELERATE_LOG_LEVEL=info
1412
- WANDB_API_KEY
15-
- MODEL_ID=meta-llama/Llama-3.1-8B
1613
- HUB_MODEL_ID
14+
- MODEL_ID=meta-llama/Llama-3.1-8B
15+
- ACCELERATE_LOG_LEVEL=info
1716
# Commands of the task
1817
commands:
1918
- pip install transformers bitsandbytes peft wandb deepspeed

examples/distributed-training/trl/fsdp.dstack.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
type: task
2-
# The name is optional, if not specified, generated randomly
32
name: trl-train-fsdp-distrib
43

54
# Size of the cluster
@@ -10,10 +9,10 @@ image: nvcr.io/nvidia/pytorch:25.01-py3
109
# Required environment variables
1110
env:
1211
- HF_TOKEN
13-
- ACCELERATE_LOG_LEVEL=info
1412
- WANDB_API_KEY
15-
- MODEL_ID=meta-llama/Llama-3.1-8B
1613
- HUB_MODEL_ID
14+
- MODEL_ID=meta-llama/Llama-3.1-8B
15+
- ACCELERATE_LOG_LEVEL=info
1716
# Commands of the task
1817
commands:
1918
- pip install transformers bitsandbytes peft wandb

examples/single-node-training/axolotl/.dstack.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,15 @@ env:
1010
- HF_TOKEN
1111
- WANDB_API_KEY
1212
- WANDB_PROJECT
13-
- WANDB_NAME=axolotl-nvidia-llama-scout-train
1413
- HUB_MODEL_ID
1514
# Commands of the task
1615
commands:
1716
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
18-
- axolotl train scout-qlora-fsdp1.yaml --wandb-project $WANDB_PROJECT --wandb-name $WANDB_NAME --hub-model-id $HUB_MODEL_ID
17+
- |
18+
axolotl train scout-qlora-fsdp1.yaml \
19+
--wandb-project $WANDB_PROJECT \
20+
--wandb-name $DSTACK_RUN_NAME \
21+
--hub-model-id $HUB_MODEL_ID
1922
2023
resources:
2124
# Two GPU (required by FSDP)

0 commit comments

Comments
 (0)