You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add ModelOptHFTrainer and simplify KDTrainer distillation API
Introduce ModelOptHFTrainer wrapping HF Trainer with modelopt features
(quantization, LR config, trainable/frozen param globs, save_dtype
config rewrite, Liger fused CE, manual GC, etc.) and simplify the
KDTrainer distillation API on top of it.
Also includes follow-up fixes applied during review:
- Causal shift fix and forward-restore safety in KDTrainer
- DeepSpeed ZeRO-3 support in KDTrainer; Liger hidden-states dtype fix
- save_dtype defaults to "bfloat16"; config.json rewrite skipped when
save_dtype is None
- Narrowed exceptions, moved defaults to configs, fixed recipe.quantize
reference in transformers_trainer.py
Signed-off-by: realAsma <akuriparambi@nvidia.com>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,9 @@ Changelog
6
6
7
7
**New Features**
8
8
9
+
- Add model-agnostic `Liger kernel <https://github.com/linkedin/Liger-Kernel>`_ fused loss support in ``ModelOptHFTrainer`` for any HuggingFace causal LM, with distributed param gathering for FSDP2, DeepSpeed ZeRO-3, and DDP. Extends HuggingFace's built-in Liger integration which is limited to `a fixed set of model architectures <https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/transformers/monkey_patch.py>`_, FSDP only, and CrossEntropy loss. ModelOpt additionally supports Liger fused KD loss (JSD) for knowledge distillation.
10
+
- Add ``ModelOptTrainerArguments`` to ``ModelOptHFTrainer`` with ``--trainable_params``, ``--frozen_params``, ``--lr_config``, ``--save_dtype``, and ``--manual_gc`` flags. Add per-parameter learning rate support via YAML config.
11
+
- Simplify ``KDTrainer`` for HuggingFace knowledge distillation: remove ``mtd.convert()`` class-swap in favor of explicit teacher forwarding with logit-level distillation support.
9
12
- Support full Transformer Engine spec for Minitron pruning (``mcore_minitron``). Now we no longer need to use custom ModelOpt spec. Note that this does not affect the usage of the pruning workflow but makes pruning slightly faster and may result in slightly different pruned model because of different kernel and numerics.
10
13
- Add Puzzletron - a new algorithm for heterogeneous pruning of LLM and VLM models. See `examples/puzzletron/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/puzzletron>`_ for more details.
11
14
- Added iterator interface using CalibrationDataReader in ONNX quantization workflow.
|`--trainable_params`|`list[str]`|`None`| Glob patterns (fnmatch) for parameters that should be trainable. All other parameters will be frozen. Mutually exclusive with frozen_params. |
53
+
|`--frozen_params`|`list[str]`|`None`| Glob patterns (fnmatch) for parameters that should be frozen. Mutually exclusive with trainable_params. |
54
+
|`--lr_config`|`str`|`None`| Path to a YAML file mapping fnmatch patterns to optimizer kwargs (e.g. lr, weight_decay). First matching pattern wins per parameter. See examples/llm_qat/configs/train/lr_config_example.yaml. |
55
+
|`--save_dtype`|`str`|`"bfloat16"`| Dtype string to write into the saved model's config.json (e.g. 'bfloat16', 'float16'). Set to None to preserve the original dtype. |
56
+
|`--manual_gc`|`bool`|`False`| Run `gc.collect()` before each training/prediction step to work around GPU memory leaks during QAT/distillation. |
57
+
|`--liger_ce_label_smoothing`|`float`|`0.0`| Label smoothing for Liger fused CE loss. Only used when --use_liger_kernel is enabled. |
50
58
|`--lora`|`bool`|`False`| Whether to add LoRA (Low-Rank Adaptation) adapter before training. When using real quantization, the LoRA adapter must be set, as quantized weights will be frozen during training. |
0 commit comments