Commit 2dfa873
Add support for MXFP8 PTQ (#736)
## What does this PR do?
**Type of change:** new feature <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->
**Overview:** Add support for MXFP8 PTQ, enabling MXFP8 hardware
acceleration during inference on Blackwell GPUs.
## Usage
<!-- You can potentially add a usage example below. -->
```bash
export MODEL_PATH=/my_home/hf_models/nvidia/OpenMath2-Llama3.1-8B
export OUTPUT_PATH=/my_home/hf_models/nvidia/OpenMath2-Llama3.1-8B-MXFP8
mkdir -p $OUTPUT_PATH
python examples/llm_ptq/hf_ptq.py \
--export_fmt hf \
--dataset cnn_dailymail \
--pyt_ckpt_path $MODEL_PATH \
--export_path $OUTPUT_PATH \
--qformat mxfp8
```
The `hf_quant_config.json` of the output checkpoint:
```json
{
"producer": {
"name": "modelopt",
"version": "0.41.0.dev50+g7a796a875"
},
"quantization": {
"quant_algo": "MXFP8",
"kv_cache_quant_algo": "FP8",
"group_size": 32,
"exclude_modules": [
"lm_head"
]
}
}
```
And `config.json` (only the `quantization_config`):
```json
...
"quantization_config": {
"ignore": [
"lm_head"
],
"quant_algo": "MXFP8",
"kv_cache_scheme": {
"dynamic": false,
"num_bits": 8,
"type": "float"
},
"producer": {
"name": "modelopt",
"version": "0.41.0.dev50+g7a796a875"
},
"quant_method": "modelopt"
}
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
Used `hf_ptq.py` to quantize the model `nvidia/OpenMath2-Llama3.1-8B`
([available in
hugging-face](https://huggingface.co/nvidia/OpenMath2-Llama3.1-8B)), see
the example command above.
Checked that the generated MXFP8 checkpoint can be loaded with vLLM
(required changes in vLLM, not merged to main).
Added tests for `MXFP8QTensor` in
`tests/gpu/torch/quantization/test_qtensor_cuda.py`.
Added "mxfp8" in `tests/examples/llm_ptq/test_llm_ptq.py`
#### Support for Nemotron Models
Verify that Nemotron Nano V3 BF16 can be converted to MXFP8 using
`hf_ptq.py`:
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added MXFP8 quantization format support with new scaling mechanisms
and quantization utilities.
* Updated configuration options, example scripts, and utilities to
recognize and process MXFP8 quantization workflows.
* Extended quantization export pipelines to handle MXFP8 quantized
models.
* **Tests**
* Expanded test coverage for MXFP8 quantization across various tensor
shapes, data types, and device configurations.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>1 parent d1dac55 commit 2dfa873
File tree
10 files changed
+717
-18
lines changed- examples/llm_ptq
- scripts
- modelopt/torch
- export
- quantization
- nn/modules
- qtensor
- tests
- examples/llm_ptq
- gpu/torch/quantization
10 files changed
+717
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
248 | 249 | | |
249 | 250 | | |
250 | 251 | | |
| 252 | + | |
251 | 253 | | |
252 | 254 | | |
253 | 255 | | |
| |||
862 | 864 | | |
863 | 865 | | |
864 | 866 | | |
| 867 | + | |
865 | 868 | | |
866 | 869 | | |
867 | 870 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| 62 | + | |
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
| |||
326 | 328 | | |
327 | 329 | | |
328 | 330 | | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
329 | 334 | | |
330 | 335 | | |
331 | 336 | | |
| |||
524 | 529 | | |
525 | 530 | | |
526 | 531 | | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
527 | 540 | | |
528 | 541 | | |
529 | 542 | | |
| |||
724 | 737 | | |
725 | 738 | | |
726 | 739 | | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
727 | 745 | | |
728 | 746 | | |
729 | 747 | | |
| |||
828 | 846 | | |
829 | 847 | | |
830 | 848 | | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
831 | 852 | | |
832 | 853 | | |
833 | 854 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
426 | 427 | | |
427 | 428 | | |
428 | 429 | | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
429 | 439 | | |
430 | 440 | | |
431 | 441 | | |
| |||
Lines changed: 27 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
649 | 650 | | |
650 | 651 | | |
651 | 652 | | |
652 | | - | |
653 | | - | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
654 | 679 | | |
655 | 680 | | |
656 | 681 | | |
| |||
683 | 708 | | |
684 | 709 | | |
685 | 710 | | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
691 | | - | |
692 | | - | |
693 | | - | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | 711 | | |
699 | 712 | | |
700 | 713 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
0 commit comments