You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Describe your changes
Introduce OpenVINO Weight Compression Pass. This pass performs Weight
Compression on Hugginface models to produce OpenVINO or ONNX models, as
well as on ONNX models to produce ONNX models using Intel® NNCF
compress_weights() functionality.
## Checklist before requesting a review
- [x] Add unit tests for this change.
- [x] Make sure all tests can pass.
- [x] Update documents if necessary.
- [x] Lint and apply fixes to your code by running `lintrunner -a`
- [x] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
## (Optional) Issue link
---------
## Describe your changes
## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
## (Optional) Issue link
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Option 2: Install OpenVINO Runtime and OpenVINO Development Tools from Pypi
24
24
25
25
```bash
26
-
pip install openvino==2025.1.0
27
-
pip install nncf==2.16.0
26
+
pip install openvino>=2025.3.0
27
+
pip install nncf>=2.18.0
28
28
pip install onnxruntime-openvino
29
29
```
30
30
31
31
### Install Optimum Intel® for Generative AI Workloads
32
32
33
33
```bash
34
-
pip install optimum[openvino]
34
+
pip install optimum[openvino]<=1.24.0
35
35
```
36
36
37
37
More detailed instructions are available at [Optimum Intel® Installation Instructions](https://huggingface.co/docs/optimum/main/en/intel/installation)
@@ -96,6 +96,27 @@ Please refer to [OpenVINOQuantizationWithAccuracy](https://microsoft.github.io/O
96
96
}
97
97
```
98
98
99
+
## Weight Compression
100
+
101
+
`OpenVINOWeightCompression` pass runs [Weight Compression](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/weight-compression.html) to compress Huggingface to OpenVINO model and Huggingface to ONNX model, as well as ONNX to ONNX model using Intel® NNCF.
102
+
103
+
Please refer to [OpenVINOWeightCompression](https://microsoft.github.io/Olive/reference/pass.html#openvinoweightcompression) for more details about the `OpenVINOWeightCompression` pass and its config parameters.
104
+
105
+
### Example Weight Compression Configuration
106
+
107
+
```json
108
+
{
109
+
"type": "OpenVINOWeightCompression",
110
+
"data_config": "compress_data_config",
111
+
"transform_fn": "custom_transform_func",
112
+
"extra_args": { "tokenizer": true },
113
+
"compress_config": {
114
+
"mode": "INT4_SYM",
115
+
"ratio": 0.8
116
+
}
117
+
}
118
+
```
119
+
99
120
## Model Encapsulation
100
121
101
122
`OpenVINOEncapsulation` pass is used to generate an onnx model that encapsulates a OpenVINO IR model. It supports `OpenVINOModelHandler` for now.
Copy file name to clipboardExpand all lines: docs/source/reference/options.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -320,8 +320,9 @@ Please also find the detailed options from following table for each pass:
320
320
|[LoftQ](pass.rst#loftq)| Run LoftQ fine-tuning on a Hugging Face PyTorch model. |
321
321
|[OpenVINOConversion](pass.rst#openvinoconversion)| Converts PyTorch, ONNX or TensorFlow Model to OpenVINO Model. |
322
322
|[OpenVINOIoUpdate](pass.rst#openvinoioupdate)| Converts dynamic OpenVINO Model to static OpenVINO Model and updates IO names. |
323
-
|[OpenVINOQuantization](pass.rst#openvinoquantization)| Post-training quantization for OpenVINO models and ONNX models |
324
-
|[OpenVINOQuantizationWithAccuracy](pass.rst#openvinoquantizationwithaccuracy)| Post-training quantization with accuracy for OpenVINO models and ONNX models |
323
+
|[OpenVINOQuantization](pass.rst#openvinoquantization)| Post-training quantization for OpenVINO models and ONNX models using Intel® NNCF |
324
+
|[OpenVINOQuantizationWithAccuracy](pass.rst#openvinoquantizationwithaccuracy)| Post-training quantization with accuracy for OpenVINO models and ONNX models using Intel® NNCF |
325
+
|[OpenVINOWeightCompression](pass.rst#openvinoweightcompression)| Weight Compression to compress Huggingface to OpenVINO model and Huggingface to ONNX model as well as ONNX to ONNX model using Intel® NNCF |
325
326
|[OpenVINOEncapsulation](pass.rst#openvinoencapsulation)| Generates an ONNX model that encapsulates an OpenVINO IR model. |
326
327
|[OpenVINOOptimumConversion](pass.rst#openvinooptimumconversion)| Run [optimum-cli export openvino](https://huggingface.co/docs/optimum/main/en/intel/openvino/export) command using Optimum Intel® to convert Huggingface Model to OpenVINO Model and optionally perform weight compression or quantization. |
327
328
|[QNNConversion](pass.rst#qnnconversion)| Convert ONNX, TensorFlow, or PyTorch model to QNN C++ model. Quantize the model if –input_list is provided as extra_args. Uses qnn-[framework]-converter tool from the QNN SDK. |
0 commit comments