Commit beac6e9
Sequential calibrate refactor (#982)
### What does this PR do?
Type of change: New feature <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->
The current sequential calibration support has O(N^2) complexity for
collecting updated activations for a decoder layer. To solve this, we
adopted a modular/plugin based approach which involves hooks to capture
the updated activations by running forward on the previous decoder layer
using cached prev layer activations. This leads to an issue with nested
modules i.e. the logic in the parent module might need to be replicated
in the lower level modules to ensure equivalence. For example, in the
nemotron model, the parent module NemotronHModel has logic to create and
select appropriate mask based on the decoder layer type (mamba vs
attention).
This PR implements a more generic solution for sequential calibration,
by choosing to collect activations using model forward, thereby ensuring
that all the parent module logic is preserved. We use an attribute
"state"on the modules to indicate whether to perform recomputation/skip
the layer while running module forward. This can help us avoid redundant
computations for getting updated activations.
The overall flow is as follows
1. The user must register a get_decoder_layers() function that returns a
list of layers to be calibrated sequentially
2. LayerActivationCollector, goes through the list of layers and patches
module forward with a "state aware" module forward
3. When model.forward() is called, all the parent logic is recomputed as
expected (embeddings, residual connections, generating attention mask
etc).
4. Lets say we are currently calibrating layer N and we want to get
updated activations; we set layer N to capture and layer N-1 to run
(because this layer was processed previously and updated activations
need to be generated). Already processed layers are set to skip. When
model.forward() is called, all the previous decoder layer computations
are skipped. Layer N-1 uses the cached inputs to generate new
activations. Layer N inputs are captured using the same logic as before
and cached so that they can be used to get updated activations for Layer
N+1.
### Usage
```python
# Sequential calibrate config
NVFP4_SEQUENTIAL_CFG = {
"quant_cfg": {
"*weight_quantizer": _nvfp4_quantizer,
"*input_quantizer": _nvfp4_quantizer,
**_default_disabled_quantizer_cfg,
},
"algorithm": {"method": "max", "use_sequential": True},
}
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, using
`torch.load(..., weights_only=True)`, avoiding `pickle`, etc.).
- Is this change backward compatible?: ✅
- If you copied code from any other source, did you follow IP policy in
[CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?:
✅
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Public sequential, per-layer calibration API and an
activation-collection utility.
* Broader model discovery support including Nemotron-H and homogeneous
HuggingFace variants.
* **Improvements**
* Clearer validation/error messages and deterministic
patching/unpatching with guaranteed cleanup and resource handling.
* Consolidated discovery/registration flow for decoder-layer handling
and improved per-layer logging/progress.
* **Tests**
* Extensive new unit tests covering discovery, per-layer capture/replay,
inter-layer behavior, and edge cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Co-authored-by: realAsma <akuriparambi@nvidia.com>1 parent 7b34de6 commit beac6e9
File tree
11 files changed
+1520
-286
lines changed- modelopt/torch
- quantization
- plugins
- utils
- utils
- tests
- gpu/torch/quantization
- unit/torch/quantization
- plugins
11 files changed
+1520
-286
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 34 | + | |
39 | 35 | | |
40 | 36 | | |
41 | 37 | | |
| |||
1848 | 1844 | | |
1849 | 1845 | | |
1850 | 1846 | | |
1851 | | - | |
| 1847 | + | |
| 1848 | + | |
| 1849 | + | |
| 1850 | + | |
| 1851 | + | |
| 1852 | + | |
1852 | 1853 | | |
1853 | | - | |
| 1854 | + | |
| 1855 | + | |
| 1856 | + | |
| 1857 | + | |
1854 | 1858 | | |
1855 | | - | |
1856 | | - | |
| 1859 | + | |
| 1860 | + | |
1857 | 1861 | | |
1858 | | - | |
| 1862 | + | |
1859 | 1863 | | |
1860 | 1864 | | |
1861 | 1865 | | |
1862 | 1866 | | |
1863 | 1867 | | |
1864 | | - | |
| 1868 | + | |
| 1869 | + | |
1865 | 1870 | | |
1866 | | - | |
1867 | | - | |
1868 | | - | |
| 1871 | + | |
| 1872 | + | |
| 1873 | + | |
| 1874 | + | |
1869 | 1875 | | |
1870 | | - | |
1871 | | - | |
1872 | | - | |
1873 | | - | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
1874 | 1879 | | |
1875 | | - | |
1876 | | - | |
1877 | | - | |
1878 | | - | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
1367 | 1368 | | |
1368 | 1369 | | |
1369 | 1370 | | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
1370 | 1407 | | |
1371 | 1408 | | |
1372 | 1409 | | |
| |||
1420 | 1457 | | |
1421 | 1458 | | |
1422 | 1459 | | |
| 1460 | + | |
| 1461 | + | |
| 1462 | + | |
| 1463 | + | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
1423 | 1471 | | |
1424 | 1472 | | |
1425 | 1473 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
0 commit comments