Commit dc7ad66
authored
GPTQ vector (#1223)
### What does this PR do?
Type of change: ? <!-- Use one of the following: Bug fix, new feature,
new example, new tests, documentation. -->
<!-- Details about the change. -->
### Usage
```python
# Add a code snippet demonstrating how to use this
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added backend-specific GPTQ helper registration to allow
backend-tailored GPTQ behavior.
* **Bug Fixes**
* Prevented KV-cache state from leaking across repeated per-layer
forwards during calibration.
* **Tests**
* Added GPU-focused tests validating GPTQ combined with vector
quantization, including accuracy and end-to-end comparisons.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>1 parent e4b054b commit dc7ad66
3 files changed
Lines changed: 39 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
| 286 | + | |
286 | 287 | | |
287 | 288 | | |
288 | 289 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
1589 | 1589 | | |
1590 | 1590 | | |
1591 | 1591 | | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
1592 | 1607 | | |
1593 | 1608 | | |
1594 | 1609 | | |
| |||
1648 | 1663 | | |
1649 | 1664 | | |
1650 | 1665 | | |
1651 | | - | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
1652 | 1675 | | |
1653 | 1676 | | |
1654 | 1677 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
146 | | - | |
147 | 146 | | |
148 | | - | |
149 | 147 | | |
150 | 148 | | |
151 | 149 | | |
| |||
231 | 229 | | |
232 | 230 | | |
233 | 231 | | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
0 commit comments