Add nvfp4_local_hessian to QUANT_CFG_CHOICES (#1065)

sungsooha · claude · web-flow · commit 6ffe4a52b3cc · 2026-03-18T11:38:26.000-07:00
### What does this PR do? Type of change: New feature Wire up `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` (from PR #788) to the `hf_ptq.py` CLI so it can be used via `--qformat nvfp4_local_hessian`. One-line addition to `QUANT_CFG_CHOICES` dict. ### Usage ```bash python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --qformat nvfp4_local_hessian \ --kv_cache_qformat fp8 \ --export_fmt hf ``` ### Testing Tested via modelopt-quantization CI pipeline (quant_flow) on GB200 (`oci-hsg` launcher) with Qwen3-8B. PTQ stage completed successfully. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: N/A (wiring existing config to existing CLI) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A ### Additional Information - `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` was added in PR #788 but not exposed via the CLI. - Also used in modelopt-quantization CI (`quant_flow`) for automated NVFP4 scale-setting sweeps.  ## Summary by CodeRabbit * **New Features** * Added a new KV-cache quantization configuration option, expanding the available quantization choices for users. This provides an additional quantization mode to select from in configuration UIs and CLIs while preserving existing behavior and compatibility.  Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py
@@ -107,6 +107,7 @@ def _set_kv_cache_constant_amax(quant_cfg: dict) -> None:
     "nvfp4_omlp_only": mtq.NVFP4_OMLP_ONLY_CFG,
     "nvfp4_svdquant": mtq.NVFP4_SVDQUANT_DEFAULT_CFG,
     "mxfp8": mtq.MXFP8_DEFAULT_CFG,
+    "nvfp4_local_hessian": mtq.NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG,
 }
 
 KV_QUANT_CFG_CHOICES = {

Original file line number	Diff line number	Diff line change
`@@ -107,6 +107,7 @@ def _set_kv_cache_constant_amax(quant_cfg: dict) -> None:`
`107`	`107`	`"nvfp4_omlp_only": mtq.NVFP4_OMLP_ONLY_CFG,`
`108`	`108`	`"nvfp4_svdquant": mtq.NVFP4_SVDQUANT_DEFAULT_CFG,`
`109`	`109`	`"mxfp8": mtq.MXFP8_DEFAULT_CFG,`
	`110`	`+ "nvfp4_local_hessian": mtq.NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG,`
`110`	`111`	`}`
`111`	`112`
`112`	`113`	`KV_QUANT_CFG_CHOICES = {`