You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Split #1501 so this PR keeps only the pack=True calibration dataloader
change. The HybridModel pruning, fused-TE-spec import/export, and
related fixes are now in #1518 (also targeting main).
This commit reverts to main the files that belong to #1518:
- modelopt/torch/export/plugins/{mcore_deepseek,mcore_gptoss,mcore_llama,mcore_qwen,megatron_importer,unified_export_megatron}.py
- modelopt/torch/nas/plugins/megatron.py
- modelopt/torch/prune/plugins/mcore_minitron.py
- modelopt/torch/utils/logging.py
- modelopt/torch/utils/plugins/{megatron_generate,megatron_mmlu}.py
- tools/launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml
CHANGELOG: drop the Bug Fixes entry and the 0.44 date correction
(those go with #1518). Keep the pack=True New Features entry.
History is preserved — earlier commits with the dropped changes remain
on this branch's log, this commit just rolls the working state back to
"pack only".
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1-5Lines changed: 1 addition & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,11 +27,7 @@ Changelog
27
27
- Add NVFP4 W4A16 weight-only quantization (``w4a16_nvfp4``): FP4 weights with group_size=16, BF16 activations, no calibration forward pass required. Use ``mtq.W4A16_NVFP4_CFG`` or ``--qformat w4a16_nvfp4`` in ``hf_ptq.py``. vLLM deployment support is in progress.
28
28
- Add ``pack: bool`` option to ``modelopt.torch.utils.dataset_utils.get_dataset_dataloader``. When ``True``, raw samples from each source are concatenated into a per-source token stream (separated by ``tokenizer.eos_token_id``) and sliced into uniform ``max_sample_length`` chunks, preserving the requested per-source ratio in ``num_samples``. Eliminates padding-token noise from calibration and keeps long-document context intact. Default ``False`` for backward compatibility; recommended for pruning and amax-based PTQ.
29
29
30
-
**Bug Fixes**
31
-
32
-
- Fix Megatron-Core HF importer to load fused ``TELayerNormColumnParallelLinear.layer_norm_weight`` from HF for GPT-family models (Qwen3 etc.) under ``--export-default-te-spec``. Importer now prefers per-context keys ``fused_input_layernorm`` / ``fused_pre_mlp_layernorm`` (fallback ``fused_norm`` for Nemotron-H backward compatibility); ``mcore_qwen.py`` provides the new rules. Without this fix, post-prune MMLU sat at chance.
0 commit comments