Add nvfp4_local_hessian to QUANT_CFG_CHOICES#1065
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📥 CommitsReviewing files that changed from the base of the PR and between dce46d2dd7aac943fde29a5ac0ff87a72ecd6279 and d871382. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdded a new KV quantization configuration key "nvfp4_local_hessian" to the example script, mapping it to Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
b322b8e to
dce46d2
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1065 +/- ##
=======================================
Coverage 70.29% 70.30%
=======================================
Files 227 227
Lines 25860 25854 -6
=======================================
- Hits 18179 18176 -3
+ Misses 7681 7678 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Wire up NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG (from PR NVIDIA#788) to the hf_ptq.py CLI so it can be used via --qformat nvfp4_local_hessian. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
dce46d2 to
d871382
Compare
### What does this PR do? Type of change: New feature Wire up `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` (from PR #788) to the `hf_ptq.py` CLI so it can be used via `--qformat nvfp4_local_hessian`. One-line addition to `QUANT_CFG_CHOICES` dict. ### Usage ```bash python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --qformat nvfp4_local_hessian \ --kv_cache_qformat fp8 \ --export_fmt hf ``` ### Testing Tested via modelopt-quantization CI pipeline (quant_flow) on GB200 (`oci-hsg` launcher) with Qwen3-8B. PTQ stage completed successfully. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: N/A (wiring existing config to existing CLI) - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A ### Additional Information - `NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG` was added in PR #788 but not exposed via the CLI. - Also used in modelopt-quantization CI (`quant_flow`) for automated NVFP4 scale-setting sweeps. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new KV-cache quantization configuration option, expanding the available quantization choices for users. This provides an additional quantization mode to select from in configuration UIs and CLIs while preserving existing behavior and compatibility. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
What does this PR do?
Type of change: New feature
Wire up
NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG(from PR #788) to thehf_ptq.pyCLI so it can be used via--qformat nvfp4_local_hessian.One-line addition to
QUANT_CFG_CHOICESdict.Usage
Testing
Tested via modelopt-quantization CI pipeline (quant_flow) on GB200 (
oci-hsglauncher) with Qwen3-8B. PTQ stage completed successfully.Before your PR is "Ready for review"
CONTRIBUTING.md: N/AAdditional Information
NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFGwas added in PR add local hessian calibration #788 but not exposed via the CLI.quant_flow) for automated NVFP4 scale-setting sweeps.Summary by CodeRabbit