Skip to content

Commit 046d14f

Browse files
committed
Update for llama3.1-8b-instruct
1 parent c04bcc9 commit 046d14f

28 files changed

Lines changed: 5819 additions & 1 deletion

File tree

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Llama3.1-8B Model Optimization
2+
3+
This directory demonstrates the optimization of the [Llama3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) model using various AIMET quantization techniques.
4+
5+
## Overview
6+
7+
This workflow utilizes a Llama3.1-8B script to perform quantization based on the [Qualcomm-distributed Jupyter notebook](https://qpm.qualcomm.com/#/main/tools/details/Tutorial_for_Llama3_1_Compute) for Llama3.1-8B (v1.0.1.260219) which is available for download via QPM.
8+
9+
After quantization, the QAIRT GenAIBuilder API is utilized to apply additional model transformations, perform conversion, and compile the model for execution on the HTP backend.
10+
11+
Finally, a prepared QAIRT DLC is encapsulated in an ONNX protobuf and exported to a directory compatible with onnxruntime-genai.
12+
13+
## Requirements
14+
15+
This workflow has been tested using the following host configuration:
16+
* Python 3.10
17+
* QAIRT 2.45.40
18+
19+
Further, this workflow has been tested on the following target configurations:
20+
* HTP backend on SC8380XP
21+
* HTP backend on SC8480XP
22+
23+
## Preparation Instructions
24+
25+
1. Install olive[qairt]
26+
27+
```bash
28+
pip install olive[qairt]
29+
```
30+
31+
2. (Optional) Use qairt-vm to install a non-default version of QAIRT and set QAIRT_SDK_ROOT
32+
33+
```bash
34+
# List available QAIRT SDK versions
35+
qairt-vm fetch --list
36+
37+
# Download non-default version of QAIRT SDK
38+
qairt-vm fetch -v <version>
39+
40+
# Set QAIRT_SDK_ROOT to download location of QAIRT SDK
41+
# By default, /opt/qcom/aistack/qairt/<version>
42+
# Note: No further QAIRT SDK installation steps are required when using qairt-dev
43+
export QAIRT_SDK_ROOT=/path/to/qairt/sdk
44+
```
45+
46+
3. Install model-specific requirements
47+
48+
```bash
49+
pip install -r requirements.txt
50+
```
51+
52+
4. Run Olive recipe
53+
54+
```bash
55+
olive run --config htp_sc8480xp.json
56+
```
57+
58+
## Execution Instructions
59+
60+
The output of the above olive recipe is a directory compatible with the following versions of onnxruntime-genai and onnxruntime-qnn.
61+
62+
```bash
63+
pip install onnxruntime-genai>=0.13
64+
pip install onnxruntime-qnn>=2.1.0
65+
```
66+
67+
Please see the following script in the onnxruntime-genai repository for [an example of how to run this model directory](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
{
2+
"module_list":
3+
[
4+
{
5+
"module_name": "QuantizedRmsNorm",
6+
"exceptions": {
7+
"param_exceptions": {
8+
"asymmetric": true,
9+
"bitwidth": 16
10+
},
11+
"input_exceptions": null,
12+
"output_exceptions": null
13+
}
14+
}
15+
],
16+
"name_list":[
17+
{
18+
"module_name": "\\w*model_embed_tokens_Gather",
19+
"exceptions": {
20+
"param_exceptions": {
21+
"bitwidth": 16,
22+
"asymmetric": true
23+
},
24+
"input_exceptions": null,
25+
"output_exceptions": null
26+
}
27+
},
28+
{
29+
"module_name": "\\w*lm_head_(MatMul|conv_Conv|conv2d_Conv|Conv)",
30+
"exceptions": {
31+
"param_exceptions": {
32+
"bitwidth": 4
33+
},
34+
"input_exceptions": null,
35+
"output_exceptions": null
36+
}
37+
},
38+
{
39+
"module_name": "\\w*self_attn_Concat_1",
40+
"exceptions": {
41+
"param_exceptions": null,
42+
"input_exceptions": null,
43+
"output_exceptions": [
44+
{
45+
"output_index": 0,
46+
"bitwidth": 8,
47+
"asymmetric": false
48+
}
49+
]
50+
}
51+
},
52+
{
53+
"module_name": "\\w*v_proj_(MatMul|conv_Conv|conv2d_Conv|Conv)(\\.base_layer)?",
54+
"exceptions": {
55+
"param_exceptions": null,
56+
"input_exceptions": null,
57+
"output_exceptions": [
58+
{
59+
"output_index": 0,
60+
"bitwidth": 8,
61+
"asymmetric": false
62+
}
63+
]
64+
}
65+
}
66+
]
67+
}

0 commit comments

Comments
 (0)