Skip to content

Commit c04bcc9

Browse files
authored
[QAIRT] Add Phi-4 reasoning recipe for OGA->Genie workflow (microsoft#310)
1 parent 9322357 commit c04bcc9

27 files changed

Lines changed: 5364 additions & 0 deletions
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Phi-4 Reasoning Model Optimization
2+
3+
This directory demonstrates the optimization of the [Microsoft Phi-4 Reasoning](https://huggingface.co/microsoft/Phi-4-reasoning) model using various AIMET quantization techniques.
4+
5+
## Overview
6+
7+
This workflow utilizes a Phi-4 Reasoning script to perform quantization based on the [Qualcomm-distributed Jupyter notebook](https://qpm.qualcomm.com/#/main/tools/details/Tutorial_for_Phi4_Reasoning_14B_Compute) for Phi-4-reasoning which is available for download via QPM.
8+
9+
After quantization, the QAIRT GenAIBuilder API is utilized to apply additional model transformations, perform conversion, and compile the model for execution on the HTP backend.
10+
11+
Finally, a prepared QAIRT DLC is encapsulated in an ONNX protobuf and exported to a directory compatible with onnxruntime-genai.
12+
13+
## Requirements
14+
15+
This workflow has been tested using the following host configuration:
16+
* Python 3.10
17+
* QAIRT 2.45.40
18+
19+
Further, this workflow has been tested on the following target configurations:
20+
* HTP backend on SC8480XP
21+
22+
## Preparation Instructions
23+
24+
1. Install olive[qairt]
25+
26+
```bash
27+
pip install olive[qairt]
28+
```
29+
30+
2. (Optional) Use qairt-vm to install a non-default version of QAIRT and set QAIRT_SDK_ROOT
31+
32+
```bash
33+
# List available QAIRT SDK versions
34+
qairt-vm fetch --list
35+
36+
# Download non-default version of QAIRT SDK
37+
qairt-vm fetch -v <version>
38+
39+
# Set QAIRT_SDK_ROOT to download location of QAIRT SDK
40+
# By default, /opt/qcom/aistack/qairt/<version>
41+
# Note: No further QAIRT SDK installation steps are required when using qairt-dev
42+
export QAIRT_SDK_ROOT=/path/to/qairt/sdk
43+
```
44+
45+
3. Install model-specific requirements
46+
47+
```bash
48+
pip install -r requirements.txt
49+
```
50+
51+
4. Run Olive recipe
52+
53+
```bash
54+
olive run --config htp_sc8480xp.json
55+
```
56+
57+
## Execution Instructions
58+
59+
The output of the above olive recipe is a directory compatible with the following versions of onnxruntime-genai and onnxruntime-qnn.
60+
61+
```bash
62+
pip install onnxruntime-genai>=0.13
63+
pip install onnxruntime-qnn>=2.1.0
64+
```
65+
66+
Please see the following script in the onnxruntime-genai repository for [an example of how to run this model directory](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
{
2+
"module_list":
3+
[
4+
{
5+
"module_name": "QuantizedRmsNorm",
6+
"exceptions": {
7+
"param_exceptions": {
8+
"asymmetric": true,
9+
"bitwidth": 16
10+
},
11+
"input_exceptions": null,
12+
"output_exceptions": null
13+
}
14+
}
15+
],
16+
"name_list":[
17+
{
18+
"module_name": "\\w*model_embed_tokens_Gather",
19+
"exceptions": {
20+
"param_exceptions": {
21+
"bitwidth": 16,
22+
"asymmetric": true
23+
},
24+
"input_exceptions": null,
25+
"output_exceptions": null
26+
}
27+
},
28+
{
29+
"module_name": "\\w*lm_head_(MatMul|conv_Conv|conv2d_Conv|Conv)",
30+
"exceptions": {
31+
"param_exceptions": {
32+
"bitwidth": 4
33+
},
34+
"input_exceptions": null,
35+
"output_exceptions": null
36+
}
37+
},
38+
{
39+
"module_name": "\\w*self_attn_Concat_1",
40+
"exceptions": {
41+
"param_exceptions": null,
42+
"input_exceptions": null,
43+
"output_exceptions": [
44+
{
45+
"output_index": 0,
46+
"bitwidth": 8,
47+
"asymmetric": false
48+
}
49+
]
50+
}
51+
},
52+
{
53+
"module_name": "\\w*norm_(Mul_1|Mul_1.module)",
54+
"exceptions": {
55+
"param_exceptions": null,
56+
"input_exceptions": [
57+
{
58+
"input_index": 0,
59+
"bitwidth": 16,
60+
"asymmetric": true
61+
}
62+
],
63+
"output_exceptions": null
64+
}
65+
},
66+
{
67+
"module_name": "\\w*norm_(Pow|Pow.module|ReduceMean|Add|Sqrt|Div|Mul)",
68+
"exceptions": {
69+
"param_exceptions": null,
70+
"input_exceptions": null,
71+
"output_exceptions": [
72+
{
73+
"output_index": 0,
74+
"enabled": false
75+
}
76+
]
77+
}
78+
},
79+
{
80+
"module_name": "\\w*v_proj_(MatMul|conv_Conv|conv2d_Conv|Conv)(\\.base_layer)?",
81+
"exceptions": {
82+
"param_exceptions": null,
83+
"input_exceptions": null,
84+
"output_exceptions": [
85+
{
86+
"output_index": 0,
87+
"bitwidth": 8,
88+
"asymmetric": false
89+
}
90+
]
91+
}
92+
}
93+
]
94+
}

0 commit comments

Comments
 (0)