Skip to content

Commit 99a88d3

Browse files
BihanBihan  Rana
andauthored
Update Axolotl Examples (#2502)
* Update Axolotl Examples Updated Axolotl Nvidia Example with Llama 4 Scout Update AMD axolotl example for dependency error * Resolve Review Comments --------- Co-authored-by: Bihan Rana <bihan@Bihans-MacBook-Pro.local>
1 parent fb57f55 commit 99a88d3

File tree

5 files changed

+62
-36
lines changed

5 files changed

+62
-36
lines changed

docs/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ hide:
7171
</h3>
7272

7373
<p>
74-
Fine-tune Llama 3 on a custom dataset using Axolotl.
74+
Fine-tune Llama 4 on a custom dataset using Axolotl.
7575
</p>
7676
</a>
7777

examples/accelerators/amd/README.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,13 +161,18 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
161161

162162
```yaml
163163
type: task
164+
# The name is optional, if not specified, generated randomly
164165
name: axolotl-amd-llama31-train
165-
166+
166167
# Using RunPod's ROCm Docker image
167168
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
168169
# Required environment variables
169170
env:
170171
- HF_TOKEN
172+
- WANDB_API_KEY
173+
- WANDB_PROJECT
174+
- WANDB_NAME=axolotl-amd-llama31-train
175+
- HUB_MODEL_ID
171176
# Commands of the task
172177
commands:
173178
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -177,6 +182,9 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
177182
- cd axolotl
178183
- git checkout d4f6c65
179184
- pip install -e .
185+
# Latest pynvml is not compatible with axolotl commit d4f6c65, so we need to fall back to version 11.5.3
186+
- pip uninstall pynvml -y
187+
- pip install pynvml==11.5.3
180188
- cd ..
181189
- wget https://dstack-binaries.s3.amazonaws.com/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
182190
- pip install flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
@@ -190,18 +198,18 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
190198
- make
191199
- pip install .
192200
- cd ..
193-
- accelerate launch -m axolotl.cli.train axolotl/examples/llama-3/fft-8b.yaml
194-
195-
# Uncomment to leverage spot instances
196-
#spot_policy: auto
201+
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
202+
--wandb-project "$WANDB_PROJECT"
203+
--wandb-name "$WANDB_NAME"
204+
--hub-model-id "$HUB_MODEL_ID"
197205

198206
resources:
199207
gpu: MI300X
200208
disk: 150GB
201209
```
202210
</div>
203211

204-
Note, to support ROCm, we need to checkout to commit `d4f6c65`. You can find the installation instruction in [rocm-blogs :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
212+
Note, to support ROCm, we need to checkout to commit `d4f6c65`. This commit eliminates the need to manually modify the Axolotl source code to make xformers compatible with ROCm, as described in the [xformers workaround :material-arrow-top-right-thin:{ .external }](https://docs.axolotl.ai/docs/amd_hpc.html#apply-xformers-workaround). This installation approach is also followed for building Axolotl ROCm docker image. [(See Dockerfile) :material-arrow-top-right-thin:{ .external }](https://github.com/ROCm/rocm-blogs/blob/release/blogs/artificial-intelligence/axolotl/src/Dockerfile.rocm){:target="_blank"}.
205213

206214
> To speed up installation of `flash-attention` and `xformers `, we use pre-built binaries uploaded to S3.
207215
> You can find the tasks that build and upload the binaries
@@ -216,6 +224,10 @@ cloud resources and run the configuration.
216224

217225
```shell
218226
$ HF_TOKEN=...
227+
$ WANDB_API_KEY=...
228+
$ WANDB_PROJECT=...
229+
$ WANDB_NAME=axolotl-amd-llama31-train
230+
$ HUB_MODEL_ID=...
219231
$ dstack apply -f examples/deployment/vllm/amd/.dstack.yml
220232
```
221233

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,25 @@
11
type: task
22
# The name is optional, if not specified, generated randomly
3-
name: axolotl-train
3+
name: axolotl-nvidia-llama-scout-train
44

55
# Using the official Axolotl's Docker image
6-
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
6+
image: axolotlai/axolotl:main-latest
77

88
# Required environment variables
99
env:
1010
- HF_TOKEN
1111
- WANDB_API_KEY
12+
- WANDB_PROJECT
13+
- WANDB_NAME=axolotl-nvidia-llama-scout-train
14+
- HUB_MODEL_ID
1215
# Commands of the task
1316
commands:
14-
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
17+
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
18+
- axolotl train scout-qlora-fsdp1.yaml --wandb-project $WANDB_PROJECT --wandb-name $WANDB_NAME --hub-model-id $HUB_MODEL_ID
1519

1620
resources:
17-
gpu:
18-
# 24GB or more vRAM
19-
memory: 24GB..
20-
# Two or more GPU (required by FSDP)
21-
count: 2..
21+
# Two GPU (required by FSDP)
22+
gpu: H100:2
2223
# Shared memory size for inter-process communication
2324
shm_size: 24GB
25+
disk: 500GB..

examples/fine-tuning/axolotl/README.md

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Axolotl
22

33
This example shows how use [Axolotl :material-arrow-top-right-thin:{ .external }](https://github.com/OpenAccess-AI-Collective/axolotl){:target="_blank"}
4-
with `dstack` to fine-tune Llama3 8B using FSDP and QLoRA.
4+
with `dstack` to fine-tune 4-bit Quantized [Llama-4-Scout-17B-16E :material-arrow-top-right-thin:{ .external }](https://huggingface.co/axolotl-quants/Llama-4-Scout-17B-16E-Linearized-bnb-nf4-bf16){:target="_blank"} using FSDP and QLoRA.
55

66
??? info "Prerequisites"
77
Once `dstack` is [installed](https://dstack.ai/docs/installation), go ahead clone the repo, and run `dstack init`.
@@ -18,44 +18,45 @@ with `dstack` to fine-tune Llama3 8B using FSDP and QLoRA.
1818

1919
## Training configuration recipe
2020

21-
Axolotl reads the model, LoRA, and dataset arguments, as well as trainer configuration from a YAML file. This file can
22-
be found at [`examples/fine-tuning/axolotl/config.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/config.yaml){:target="_blank"}.
23-
You can modify it as needed.
21+
Axolotl reads the model, QLoRA, and dataset arguments, as well as trainer configuration from a [`scout-qlora-fsdp1.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-4/scout-qlora-fsdp1.yaml){:target="_blank"} file. The configuration uses 4-bit axolotl quantized version of `meta-llama/Llama-4-Scout-17B-16E`, requiring only ~43GB VRAM/GPU with 4K context length.
2422

25-
> Before you proceed with training, make sure to update the `hub_model_id` in [`examples/fine-tuning/axolotl/config.yaml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/alignment-handbook/config.yaml){:target="_blank"}
26-
> with your HuggingFace username.
2723

2824
## Single-node training
2925

3026
The easiest way to run a training script with `dstack` is by creating a task configuration file.
31-
This file can be found at [`examples/fine-tuning/axolotl/train.dstack.yml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/train.dstack.yml){:target="_blank"}.
27+
This file can be found at [`examples/fine-tuning/axolotl/.dstack.yml` :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/blob/master/examples/fine-tuning/axolotl/.dstack.yaml){:target="_blank"}.
3228

3329
<div editor-title="examples/fine-tuning/axolotl/.dstack.yml">
3430

3531
```yaml
3632
type: task
37-
name: axolotl-train
33+
# The name is optional, if not specified, generated randomly
34+
name: axolotl-nvidia-llama-scout-train
3835

3936
# Using the official Axolotl's Docker image
40-
image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
37+
image: axolotlai/axolotl:main-latest
4138

4239
# Required environment variables
4340
env:
4441
- HF_TOKEN
4542
- WANDB_API_KEY
43+
- WANDB_PROJECT
44+
- WANDB_NAME=axolotl-nvidia-llama-scout-train
45+
- HUB_MODEL_ID
4646
# Commands of the task
4747
commands:
48-
- accelerate launch -m axolotl.cli.train examples/fine-tuning/axolotl/config.yaml
49-
50-
# Uncomment to leverage spot instances
51-
#spot_policy: auto
48+
- wget https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/llama-4/scout-qlora-fsdp1.yaml
49+
- axolotl train scout-qlora-fsdp1.yaml
50+
--wandb-project $WANDB_PROJECT
51+
--wandb-name $WANDB_NAME
52+
--hub-model-id $HUB_MODEL_ID
5253

5354
resources:
54-
gpu:
55-
# 24GB or more vRAM
56-
memory: 24GB..
57-
# Two or more GPU
58-
count: 2..
55+
# Two GPU (required by FSDP)
56+
gpu: H100:2
57+
# Shared memory size for inter-process communication
58+
shm_size: 24GB
59+
disk: 500GB..
5960
```
6061
6162
</div>
@@ -75,6 +76,9 @@ cloud resources and run the configuration.
7576
```shell
7677
$ HF_TOKEN=...
7778
$ WANDB_API_KEY=...
79+
$ WANDB_PROJECT=...
80+
$ WANDB_NAME=axolotl-nvidia-llama-scout-train
81+
$ HUB_MODEL_ID=...
7882
$ dstack apply -f examples/fine-tuning/axolotl/.dstack.yml
7983
```
8084

examples/fine-tuning/axolotl/amd/.dstack.yml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
type: task
22
# The name is optional, if not specified, generated randomly
33
name: axolotl-amd-llama31-train
4-
54
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
6-
75
# Required environment variables
86
env:
97
- HF_TOKEN
8+
- WANDB_API_KEY
9+
- WANDB_PROJECT
10+
- WANDB_NAME=axolotl-amd-llama31-train
11+
- HUB_MODEL_ID
1012
# Commands of the task
1113
commands:
1214
- export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -16,6 +18,9 @@ commands:
1618
- cd axolotl
1719
- git checkout d4f6c65
1820
- pip install -e .
21+
# Latest pynvml is not compatible with axolotl commit d4f6c65, so we need to fall back to version 11.5.3
22+
- pip uninstall pynvml -y
23+
- pip install pynvml==11.5.3
1924
- cd ..
2025
- wget https://dstack-binaries.s3.amazonaws.com/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
2126
- pip install flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl
@@ -29,7 +34,10 @@ commands:
2934
- make
3035
- pip install .
3136
- cd ..
32-
- accelerate launch -m axolotl.cli.train axolotl/examples/llama-3/fft-8b.yaml
37+
- accelerate launch -m axolotl.cli.train -- axolotl/examples/llama-3/fft-8b.yaml
38+
--wandb-project "$WANDB_PROJECT"
39+
--wandb-name "$WANDB_NAME"
40+
--hub-model-id "$HUB_MODEL_ID"
3341

3442
resources:
3543
gpu: MI300X

0 commit comments

Comments
 (0)