Commit b353110
Added support to rotate in fp32 (optional) (#885)
## What does this PR do?
**Type of change:** New Feature
**Overview:**
This MR adds support to perform rotation for RHT in float32 if enabled
by quantization configuration. It also makes rotate argument in
quantization configuration of type bool (for backward compatibility) or
dict (added option for float32 rotation)
## Usage
```
python hf_ptq.py --pyt_ckpt_path meta-llama/Llama-3.2-3B-Instruct --qformat nvfp4 --export_fmt hf --dataset cnn_dailymail --export_path test --trust_remote_code --inference_pipeline_parallel 1 --batch_size 1 --calib_size 4 --kv_cache_qformat nvfp4_rotate
```
Updated `NVFP4_KV_ROTATE_CFG` locally with `"rotate": {"enable": True,
"rotate_fp32": True}`
```
...
model.layers.27.self_attn.k_bmm_quantizer TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated (fp32) calibrator
=MaxCalibrator quant)
...
```
Updated `NVFP4_KV_ROTATE_CFG` locally with `"rotate": {"enable": True,
"rotate_fp32": False}`
```
model.layers.27.self_attn.k_bmm_quantizer TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated calibrator=MaxCalibrator quant)
```
## Testing
Updated unit test in `tests/gpu/torch/quantization/test_hadamard.py`
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No (updated existing test)
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **New Features**
* Added rotational input capability prior to quantization for RHT
(Rotated Hyperplane Transform).
* Introduced granular rotation configuration options enabling FP32
casting for improved numerical stability during transforms.
* **Tests**
* Expanded test coverage for rotation functionality with parameterized
FP32 casting scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>1 parent eace1ae commit b353110
File tree
5 files changed
+49
-12
lines changed- modelopt/torch/quantization
- nn
- modules
- tests/gpu/torch/quantization
5 files changed
+49
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1033 | 1033 | | |
1034 | 1034 | | |
1035 | 1035 | | |
1036 | | - | |
| 1036 | + | |
1037 | 1037 | | |
1038 | | - | |
1039 | | - | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
1040 | 1046 | | |
1041 | 1047 | | |
1042 | 1048 | | |
1043 | | - | |
| 1049 | + | |
1044 | 1050 | | |
1045 | 1051 | | |
1046 | 1052 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
108 | 111 | | |
109 | 112 | | |
| 113 | + | |
Lines changed: 18 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
529 | 529 | | |
530 | 530 | | |
531 | 531 | | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
532 | 546 | | |
533 | 547 | | |
534 | 548 | | |
| |||
996 | 1010 | | |
997 | 1011 | | |
998 | 1012 | | |
999 | | - | |
1000 | | - | |
| 1013 | + | |
| 1014 | + | |
1001 | 1015 | | |
1002 | 1016 | | |
1003 | 1017 | | |
| |||
1109 | 1123 | | |
1110 | 1124 | | |
1111 | 1125 | | |
1112 | | - | |
| 1126 | + | |
| 1127 | + | |
1113 | 1128 | | |
1114 | 1129 | | |
1115 | 1130 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
44 | 47 | | |
45 | 48 | | |
46 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
47 | 54 | | |
48 | 55 | | |
49 | 56 | | |
50 | 57 | | |
51 | 58 | | |
52 | 59 | | |
53 | 60 | | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
54 | 65 | | |
55 | 66 | | |
56 | 67 | | |
57 | 68 | | |
58 | | - | |
| 69 | + | |
59 | 70 | | |
60 | 71 | | |
61 | 72 | | |
| |||
67 | 78 | | |
68 | 79 | | |
69 | 80 | | |
70 | | - | |
| 81 | + | |
71 | 82 | | |
72 | 83 | | |
73 | 84 | | |
| |||
0 commit comments