Skip to content

Commit 1fc9679

Browse files
poganeshretroyuy
andauthored
[VitisAI] Update AMD NPU LLM recipes with Windows + CUDA support (microsoft#189)
Co-authored-by: Yu Yan <yu.yan@amd.com>
1 parent b1616be commit 1fc9679

78 files changed

Lines changed: 883 additions & 458 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Qwen-Qwen1.5-7B-Chat/VitisAI/Qwen1.5-7B-Chat_quark_vitisai_llm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"quant_scheme": "w_uint4_per_group_asym",
77
"quant_algo": "awq",
88
"dataset": "pileval_for_awq_benchmark",
9-
"data_type": "float32",
9+
"data_type": "bfloat16",
1010
"num_calib_data": 128,
1111
"model_export": [ "hf_format" ],
1212
"exclude_layers": [ ],

Qwen-Qwen1.5-7B-Chat/VitisAI/README.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,16 @@ This folder contains sample Olive configuration to optimize Qwen models for AMD
1313

1414
For LLMs - follow the below commands to generate the optimized model for VitisAI Execution Provider.
1515

16-
**Note:** We’ve tested it on Linux with ROCm and on Linux with CUDA. It is also supported on Windows with CPU, though quantization may be slower. Support for Windows with CUDA/ROCm is planned for a future release.
16+
**Platform Support:**
17+
-**Linux with ROCm** - Supported
18+
-**Linux with CUDA** - Supported
19+
-**Windows with CUDA** - Supported
20+
-**Windows with CPU** - Supported (quantization will be slower)
21+
-**Windows with ROCm** - Planned for future release
1722

1823
For more details about quark, see the [Quark Documentation](https://quark.docs.amd.com/latest/)
1924

20-
#### Create a Python 3.10 conda environment and run the below commands
25+
#### **Create a Python 3.10 conda environment and run the below commands**
2126
```bash
2227
conda create -n olive python=3.10
2328
conda activate olive
@@ -29,24 +34,33 @@ pip install -e .
2934
pip install -r requirements.txt
3035
```
3136

32-
#### Install VitisAI LLM dependencies
37+
#### **Install VitisAI LLM dependencies**
3338

3439
```bash
35-
cd examples/qwen2_5/vitisai
40+
cd olive-recipes/Qwen-Qwen1.5-7B-Chat/VitisAI
3641
pip install --force-reinstall -r requirements_vitisai_llm.txt
37-
38-
# Note: If you're running model generation on a Windows system, please uncomment the following line in requirements_vitisai_llm.txt:
39-
# --extra-index-url=https://pypi.amd.com/simple
40-
# model-generate==1.5.1
4142
```
42-
Make sure to install the correct version of PyTorch before running quantization. If using AMD GPUs, update PyTorch to use ROCm-compatible PyTorch build. For example see the below commands
4343

44+
**Note:** The requirements file automatically installs the correct `model-generate` version for your platform (1.5.0 for Linux, 1.5.1 for Windows).
45+
46+
#### **Install PyTorch**
47+
48+
Make sure to install the correct version of PyTorch before running quantization:
49+
50+
**For AMD GPUs (ROCm):**
4451
```bash
4552
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
4653

4754
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
4855
```
49-
#### Generate optimized LLM model for VitisAI NPU
56+
57+
**For NVIDIA GPUs (CUDA):**
58+
```bash
59+
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
60+
61+
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
62+
```
63+
#### **Generate optimized LLM model for VitisAI NPU**
5064
Follow the above setup instructions, then run the below command to generate the optimized LLM model for VitisAI EP
5165

5266
```bash

Qwen-Qwen1.5-7B-Chat/VitisAI/requirements_vitisai_llm.txt

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
1-
21
# AMD model generation
3-
# Model generation on a Linux system
42
--extra-index-url=https://pypi.amd.com/simple
53
accelerate
64

75
# Quark
86
amd-quark==0.9
97
datasets
108
evaluate
11-
model-generate==1.5.0
9+
10+
# Platform-specific model-generate versions:
11+
# Linux: use model-generate==1.5.0 (default)
12+
# Windows: MUST use model-generate==1.5.1
13+
model-generate==1.5.0; sys_platform != 'win32'
14+
model-generate==1.5.1; sys_platform == 'win32'
15+
1216
nltk
1317
numpy
18+
19+
# Pin onnx version
20+
onnx==1.18.0
1421
onnxruntime==1.21.1
1522
onnxruntime-genai==0.7.1
1623
optimum
1724
sentencepiece
1825
tabulate
1926
transformers==4.50.0
20-
21-
# Uncomment the below line if running model generation on a Windows system
22-
# --extra-index-url=https://pypi.amd.com/simple
23-
# model-generate==1.5.1

Qwen-Qwen2-7B-Instruct/VitisAI/Qwen2-7B-Instruct_quark_vitisai_llm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"quant_scheme": "w_uint4_per_group_asym",
77
"quant_algo": "awq",
88
"dataset": "pileval_for_awq_benchmark",
9-
"data_type": "float32",
9+
"data_type": "bfloat16",
1010
"num_calib_data": 128,
1111
"model_export": [ "hf_format" ],
1212
"exclude_layers": [ ],

Qwen-Qwen2-7B-Instruct/VitisAI/README.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,16 @@ This folder contains sample Olive configuration to optimize Qwen models for AMD
1313

1414
For LLMs - follow the below commands to generate the optimized model for VitisAI Execution Provider.
1515

16-
**Note:** We’ve tested it on Linux with ROCm and on Linux with CUDA. It is also supported on Windows with CPU, though quantization may be slower. Support for Windows with CUDA/ROCm is planned for a future release.
16+
**Platform Support:**
17+
-**Linux with ROCm** - Supported
18+
-**Linux with CUDA** - Supported
19+
-**Windows with CUDA** - Supported
20+
-**Windows with CPU** - Supported (quantization will be slower)
21+
-**Windows with ROCm** - Planned for future release
1722

1823
For more details about quark, see the [Quark Documentation](https://quark.docs.amd.com/latest/)
1924

20-
#### Create a Python 3.10 conda environment and run the below commands
25+
#### **Create a Python 3.10 conda environment and run the below commands**
2126
```bash
2227
conda create -n olive python=3.10
2328
conda activate olive
@@ -29,24 +34,33 @@ pip install -e .
2934
pip install -r requirements.txt
3035
```
3136

32-
#### Install VitisAI LLM dependencies
37+
#### **Install VitisAI LLM dependencies**
3338

3439
```bash
35-
cd examples/qwen2_5/vitisai
40+
cd olive-recipes/Qwen-Qwen2-7B-Instruct/VitisAI
3641
pip install --force-reinstall -r requirements_vitisai_llm.txt
37-
38-
# Note: If you're running model generation on a Windows system, please uncomment the following line in requirements_vitisai_llm.txt:
39-
# --extra-index-url=https://pypi.amd.com/simple
40-
# model-generate==1.5.1
4142
```
42-
Make sure to install the correct version of PyTorch before running quantization. If using AMD GPUs, update PyTorch to use ROCm-compatible PyTorch build. For example see the below commands
4343

44+
**Note:** The requirements file automatically installs the correct `model-generate` version for your platform (1.5.0 for Linux, 1.5.1 for Windows).
45+
46+
#### **Install PyTorch**
47+
48+
Make sure to install the correct version of PyTorch before running quantization:
49+
50+
**For AMD GPUs (ROCm):**
4451
```bash
4552
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
4653

4754
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
4855
```
49-
#### Generate optimized LLM model for VitisAI NPU
56+
57+
**For NVIDIA GPUs (CUDA):**
58+
```bash
59+
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
60+
61+
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
62+
```
63+
#### **Generate optimized LLM model for VitisAI NPU**
5064
Follow the above setup instructions, then run the below command to generate the optimized LLM model for VitisAI EP
5165

5266
```bash

Qwen-Qwen2-7B-Instruct/VitisAI/requirements_vitisai_llm.txt

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
1-
21
# AMD model generation
3-
# Model generation on a Linux system
42
--extra-index-url=https://pypi.amd.com/simple
53
accelerate
64

75
# Quark
86
amd-quark==0.9
97
datasets
108
evaluate
11-
model-generate==1.5.0
9+
10+
# Platform-specific model-generate versions:
11+
# Linux: use model-generate==1.5.0 (default)
12+
# Windows: MUST use model-generate==1.5.1
13+
model-generate==1.5.0; sys_platform != 'win32'
14+
model-generate==1.5.1; sys_platform == 'win32'
15+
1216
nltk
1317
numpy
18+
19+
# Pin onnx version
20+
onnx==1.18.0
1421
onnxruntime==1.21.1
1522
onnxruntime-genai==0.7.1
1623
optimum
1724
sentencepiece
1825
tabulate
1926
transformers==4.50.0
20-
21-
# Uncomment the below line if running model generation on a Windows system
22-
# --extra-index-url=https://pypi.amd.com/simple
23-
# model-generate==1.5.1

Qwen-Qwen2.5-0.5B-Instruct/VitisAI/Qwen2.5-0.5B-Instruct_quark_vitisai_llm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"quant_scheme": "w_uint4_per_group_asym",
77
"quant_algo": "awq",
88
"dataset": "pileval_for_awq_benchmark",
9-
"data_type": "float32",
9+
"data_type": "bfloat16",
1010
"num_calib_data": 128,
1111
"model_export": [ "hf_format" ],
1212
"exclude_layers": [ ],

Qwen-Qwen2.5-0.5B-Instruct/VitisAI/README.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,16 @@ This folder contains sample Olive configuration to optimize Qwen models for AMD
1313

1414
For LLMs - follow the below commands to generate the optimized model for VitisAI Execution Provider.
1515

16-
**Note:** We’ve tested it on Linux with ROCm and on Linux with CUDA. It is also supported on Windows with CPU, though quantization may be slower. Support for Windows with CUDA/ROCm is planned for a future release.
16+
**Platform Support:**
17+
-**Linux with ROCm** - Supported
18+
-**Linux with CUDA** - Supported
19+
-**Windows with CUDA** - Supported
20+
-**Windows with CPU** - Supported (quantization will be slower)
21+
-**Windows with ROCm** - Planned for future release
1722

1823
For more details about quark, see the [Quark Documentation](https://quark.docs.amd.com/latest/)
1924

20-
#### Create a Python 3.10 conda environment and run the below commands
25+
#### **Create a Python 3.10 conda environment and run the below commands**
2126
```bash
2227
conda create -n olive python=3.10
2328
conda activate olive
@@ -29,24 +34,33 @@ pip install -e .
2934
pip install -r requirements.txt
3035
```
3136

32-
#### Install VitisAI LLM dependencies
37+
#### **Install VitisAI LLM dependencies**
3338

3439
```bash
35-
cd examples/qwen2_5/vitisai
40+
cd olive-recipes/Qwen-Qwen2.5-0.5B-Instruct/VitisAI
3641
pip install --force-reinstall -r requirements_vitisai_llm.txt
37-
38-
# Note: If you're running model generation on a Windows system, please uncomment the following line in requirements_vitisai_llm.txt:
39-
# --extra-index-url=https://pypi.amd.com/simple
40-
# model-generate==1.5.1
4142
```
42-
Make sure to install the correct version of PyTorch before running quantization. If using AMD GPUs, update PyTorch to use ROCm-compatible PyTorch build. For example see the below commands
4343

44+
**Note:** The requirements file automatically installs the correct `model-generate` version for your platform (1.5.0 for Linux, 1.5.1 for Windows).
45+
46+
#### **Install PyTorch**
47+
48+
Make sure to install the correct version of PyTorch before running quantization:
49+
50+
**For AMD GPUs (ROCm):**
4451
```bash
4552
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
4653

4754
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
4855
```
49-
#### Generate optimized LLM model for VitisAI NPU
56+
57+
**For NVIDIA GPUs (CUDA):**
58+
```bash
59+
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
60+
61+
python -c "import torch; print(torch.cuda.is_available())" # Must return `True`
62+
```
63+
#### **Generate optimized LLM model for VitisAI NPU**
5064
Follow the above setup instructions, then run the below command to generate the optimized LLM model for VitisAI EP
5165

5266
```bash

Qwen-Qwen2.5-0.5B-Instruct/VitisAI/requirements_vitisai_llm.txt

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
1-
21
# AMD model generation
3-
# Model generation on a Linux system
42
--extra-index-url=https://pypi.amd.com/simple
53
accelerate
64

75
# Quark
86
amd-quark==0.9
97
datasets
108
evaluate
11-
model-generate==1.5.0
9+
10+
# Platform-specific model-generate versions:
11+
# Linux: use model-generate==1.5.0 (default)
12+
# Windows: MUST use model-generate==1.5.1
13+
model-generate==1.5.0; sys_platform != 'win32'
14+
model-generate==1.5.1; sys_platform == 'win32'
15+
1216
nltk
1317
numpy
18+
19+
# Pin onnx version
20+
onnx==1.18.0
1421
onnxruntime==1.21.1
1522
onnxruntime-genai==0.7.1
1623
optimum
1724
sentencepiece
1825
tabulate
1926
transformers==4.50.0
20-
21-
# Uncomment the below line if running model generation on a Windows system
22-
# --extra-index-url=https://pypi.amd.com/simple
23-
# model-generate==1.5.1

Qwen-Qwen2.5-1.5B-Instruct/VitisAI/Qwen2.5-1.5B-Instruct_quark_vitisai_llm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"quant_scheme": "w_uint4_per_group_asym",
77
"quant_algo": "awq",
88
"dataset": "pileval_for_awq_benchmark",
9-
"data_type": "float32",
9+
"data_type": "bfloat16",
1010
"num_calib_data": 128,
1111
"model_export": [ "hf_format" ],
1212
"exclude_layers": [ ],

0 commit comments

Comments
 (0)