Skip to content

Commit 6ffe4a5

Browse files
sungsoohaclaude
andauthored
Add nvfp4_local_hessian to QUANT_CFG_CHOICES (#1065)
### What does this PR do? Type of change: New feature Wire up `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` (from PR #788) to the `hf_ptq.py` CLI so it can be used via `--qformat nvfp4_local_hessian`. One-line addition to `QUANT_CFG_CHOICES` dict. ### Usage ```bash python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --qformat nvfp4_local_hessian \ --kv_cache_qformat fp8 \ --export_fmt hf ``` ### Testing Tested via modelopt-quantization CI pipeline (quant_flow) on GB200 (`oci-hsg` launcher) with Qwen3-8B. PTQ stage completed successfully. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: N/A (wiring existing config to existing CLI) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A ### Additional Information - `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` was added in PR #788 but not exposed via the CLI. - Also used in modelopt-quantization CI (`quant_flow`) for automated NVFP4 scale-setting sweeps. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new KV-cache quantization configuration option, expanding the available quantization choices for users. This provides an additional quantization mode to select from in configuration UIs and CLIs while preserving existing behavior and compatibility. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1dc890d commit 6ffe4a5

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

examples/llm_ptq/hf_ptq.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ def _set_kv_cache_constant_amax(quant_cfg: dict) -> None:
107107
"nvfp4_omlp_only": mtq.NVFP4_OMLP_ONLY_CFG,
108108
"nvfp4_svdquant": mtq.NVFP4_SVDQUANT_DEFAULT_CFG,
109109
"mxfp8": mtq.MXFP8_DEFAULT_CFG,
110+
"nvfp4_local_hessian": mtq.NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG,
110111
}
111112

112113
KV_QUANT_CFG_CHOICES = {

0 commit comments

Comments
 (0)