Commit 232736b
GPTQ official (#853)
## What does this PR do?
Implements official version of GPTQ with decoder level sequential
calibration flow.
Ref:
[https://github.com/IST-DASLab/FP-Quant/tree/master](https://github.com/IST-DASLab/FP-Quant/tree/master)
**Type of change:** New feature <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->
**Overview:**
1. Deletes gptq_lite configuration. gptq() is a more generic
implementation and will resolve to gptq_lite when we set
use_sequential:False
2. Introduce GPTQHelper class to handle hessian collection
patching/unpatching, blockwise weight update, hessian initialization,
hessian_inverse computation etc
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
python hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-8B --qformat nvfp4_gptq --kv_cache_qformat none --dataset cnn_dailymail --batch_size 32 --calib_seq 512 --calib_size 512 --export_path exported_model
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
Measure perplexity and activation MSE on the following models
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a new NVFP4 GPTQ quantization option for CLI/configuration.
* **Improvements**
* Replaced GPTQ-lite with a full GPTQ calibration pipeline for more
accurate, Hessian-aware blockwise updates and token-based sampling.
* Added sequential layer-by-layer calibration support and automatic
promotion of NVFP4 static quantizers.
* Improved logging and timing for calibration runs.
* **Tests**
* Expanded GPTQ tests, including export/roundtrip validation for
quantized weights.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: realAsma <86726418+realAsma@users.noreply.github.com>1 parent a1ca3f7 commit 232736b
8 files changed
Lines changed: 443 additions & 410 deletions
File tree
- modelopt/torch/quantization
- utils
- tests/gpu/torch/quantization
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1503 | 1503 | | |
1504 | 1504 | | |
1505 | 1505 | | |
1506 | | - | |
1507 | | - | |
| 1506 | + | |
| 1507 | + | |
1508 | 1508 | | |
1509 | | - | |
1510 | | - | |
1511 | | - | |
1512 | | - | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
1513 | 1513 | | |
1514 | 1514 | | |
1515 | 1515 | | |
1516 | | - | |
1517 | | - | |
1518 | | - | |
1519 | | - | |
1520 | 1516 | | |
1521 | 1517 | | |
1522 | | - | |
1523 | | - | |
| 1518 | + | |
| 1519 | + | |
1524 | 1520 | | |
1525 | 1521 | | |
1526 | 1522 | | |
| |||
1533 | 1529 | | |
1534 | 1530 | | |
1535 | 1531 | | |
1536 | | - | |
1537 | | - | |
1538 | | - | |
1539 | | - | |
1540 | | - | |
1541 | | - | |
1542 | 1532 | | |
1543 | 1533 | | |
1544 | 1534 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
244 | | - | |
| 243 | + | |
| 244 | + | |
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
| |||
493 | 493 | | |
494 | 494 | | |
495 | 495 | | |
496 | | - | |
| 496 | + | |
497 | 497 | | |
498 | 498 | | |
499 | 499 | | |
500 | 500 | | |
501 | 501 | | |
502 | | - | |
| 502 | + | |
503 | 503 | | |
504 | | - | |
| 504 | + | |
0 commit comments