Skip to content

Commit e3ff856

Browse files
Introduce Intel® OpenVINO Weight Compression Pass (microsoft#2180)
## Describe your changes Introduce OpenVINO Weight Compression Pass. This pass performs Weight Compression on Hugginface models to produce OpenVINO or ONNX models, as well as on ONNX models to produce ONNX models using Intel® NNCF compress_weights() functionality. ## Checklist before requesting a review - [x] Add unit tests for this change. - [x] Make sure all tests can pass. - [x] Update documents if necessary. - [x] Lint and apply fixes to your code by running `lintrunner -a` - [x] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- ## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 3e28211 commit e3ff856

8 files changed

Lines changed: 1138 additions & 11 deletions

File tree

docs/source/features/ihv-integration/openvino.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ For Generative AI models, install Optimum Intel® from [Optimum Intel® Installa
1212

1313
## Prerequisites
1414

15-
Note: OpenVINO version in Olive: 2025.1.0
15+
Note: OpenVINO version in Olive >= 2025.3.0
1616

1717
### Option 1: install Olive with OpenVINO extras
1818

@@ -23,15 +23,15 @@ pip install olive-ai[openvino]
2323
### Option 2: Install OpenVINO Runtime and OpenVINO Development Tools from Pypi
2424

2525
```bash
26-
pip install openvino==2025.1.0
27-
pip install nncf==2.16.0
26+
pip install openvino>=2025.3.0
27+
pip install nncf>=2.18.0
2828
pip install onnxruntime-openvino
2929
```
3030

3131
### Install Optimum Intel® for Generative AI Workloads
3232

3333
```bash
34-
pip install optimum[openvino]
34+
pip install optimum[openvino]<=1.24.0
3535
```
3636

3737
More detailed instructions are available at [Optimum Intel® Installation Instructions](https://huggingface.co/docs/optimum/main/en/intel/installation)
@@ -96,6 +96,27 @@ Please refer to [OpenVINOQuantizationWithAccuracy](https://microsoft.github.io/O
9696
}
9797
```
9898

99+
## Weight Compression
100+
101+
`OpenVINOWeightCompression` pass runs [Weight Compression](https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/weight-compression.html) to compress Huggingface to OpenVINO model and Huggingface to ONNX model, as well as ONNX to ONNX model using Intel® NNCF.
102+
103+
Please refer to [OpenVINOWeightCompression](https://microsoft.github.io/Olive/reference/pass.html#openvinoweightcompression) for more details about the `OpenVINOWeightCompression` pass and its config parameters.
104+
105+
### Example Weight Compression Configuration
106+
107+
```json
108+
{
109+
"type": "OpenVINOWeightCompression",
110+
"data_config": "compress_data_config",
111+
"transform_fn": "custom_transform_func",
112+
"extra_args": { "tokenizer": true },
113+
"compress_config": {
114+
"mode": "INT4_SYM",
115+
"ratio": 0.8
116+
}
117+
}
118+
```
119+
99120
## Model Encapsulation
100121

101122
`OpenVINOEncapsulation` pass is used to generate an onnx model that encapsulates a OpenVINO IR model. It supports `OpenVINOModelHandler` for now.

docs/source/reference/options.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -320,8 +320,9 @@ Please also find the detailed options from following table for each pass:
320320
| [LoftQ](pass.rst#loftq) | Run LoftQ fine-tuning on a Hugging Face PyTorch model. |
321321
| [OpenVINOConversion](pass.rst#openvinoconversion) | Converts PyTorch, ONNX or TensorFlow Model to OpenVINO Model. |
322322
| [OpenVINOIoUpdate](pass.rst#openvinoioupdate) | Converts dynamic OpenVINO Model to static OpenVINO Model and updates IO names. |
323-
| [OpenVINOQuantization](pass.rst#openvinoquantization) | Post-training quantization for OpenVINO models and ONNX models |
324-
| [OpenVINOQuantizationWithAccuracy](pass.rst#openvinoquantizationwithaccuracy) | Post-training quantization with accuracy for OpenVINO models and ONNX models |
323+
| [OpenVINOQuantization](pass.rst#openvinoquantization) | Post-training quantization for OpenVINO models and ONNX models using Intel® NNCF |
324+
| [OpenVINOQuantizationWithAccuracy](pass.rst#openvinoquantizationwithaccuracy) | Post-training quantization with accuracy for OpenVINO models and ONNX models using Intel® NNCF |
325+
| [OpenVINOWeightCompression](pass.rst#openvinoweightcompression) | Weight Compression to compress Huggingface to OpenVINO model and Huggingface to ONNX model as well as ONNX to ONNX model using Intel® NNCF |
325326
| [OpenVINOEncapsulation](pass.rst#openvinoencapsulation) | Generates an ONNX model that encapsulates an OpenVINO IR model. |
326327
| [OpenVINOOptimumConversion](pass.rst#openvinooptimumconversion) | Run [optimum-cli export openvino](https://huggingface.co/docs/optimum/main/en/intel/openvino/export) command using Optimum Intel® to convert Huggingface Model to OpenVINO Model and optionally perform weight compression or quantization. |
327328
| [QNNConversion](pass.rst#qnnconversion) | Convert ONNX, TensorFlow, or PyTorch model to QNN C++ model. Quantize the model if –input_list is provided as extra_args. Uses qnn-[framework]-converter tool from the QNN SDK. |

docs/source/reference/pass.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,12 @@ OpenVINOQuantizationWithAccuracy
333333
--------------------------------
334334
.. autoconfigclass:: olive.passes.OpenVINOQuantizationWithAccuracy
335335

336+
.. _openvino_weight_compression:
337+
338+
OpenVINOWeightCompression
339+
--------------------------
340+
.. autoconfigclass:: olive.passes.OpenVINOWeightCompression
341+
336342
.. _openvino_encapsulation:
337343

338344
OpenVINOEncapsulation

olive/olive_config.json

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,15 @@
403403
"supported_quantization_encodings": [ ],
404404
"extra_dependencies": [ "openvino" ]
405405
},
406+
"OpenVINOWeightCompression": {
407+
"module_path": "olive.passes.openvino.compression.OpenVINOWeightCompression",
408+
"supported_providers": [ "*" ],
409+
"supported_accelerators": [ "*" ],
410+
"supported_precisions": [ "*" ],
411+
"supported_algorithms": [ ],
412+
"supported_quantization_encodings": [ ],
413+
"extra_dependencies": [ "openvino" ]
414+
},
406415
"OptimumConversion": {
407416
"module_path": "olive.passes.onnx.optimum_conversion.OptimumConversion",
408417
"supported_providers": [ "*" ],
@@ -618,8 +627,8 @@
618627
"lora": [ "accelerate>=0.30.0", "peft", "scipy" ],
619628
"nvmo": [ "nvidia-modelopt[onnx]" ],
620629
"openvino": [
621-
"openvino>=2025.1.0",
622-
"nncf>=2.16.0",
630+
"openvino>=2025.3.0",
631+
"nncf>=2.18.0",
623632
"numpy<2.0",
624633
"optimum[openvino]<=1.24",
625634
"onnxruntime-openvino"

0 commit comments

Comments
 (0)