You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ NVIDIA Model Optimizer Changelog
7
7
**New Features**
8
8
9
9
- Support full Transformer Engine spec for Minitron pruning (``mcore_minitron``). Now we no longer need to use custom ModelOpt spec. Note that this does not affect the usage of the pruning workflow but makes pruning slightly faster and may result in slightly different pruned model because of different kernel and numerics.
10
-
- Add skip-softmax tile skipping to the Triton flash attention kernel (``modelopt.torch.kernels.triton_fa``). KV tiles with negligible attention scores are skipped entirely during prefill, saving V loads and computation on long sequences with strong attention locality. Integrates with ``mtsa.sparsify()`` via the ``triton_skip_softmax`` method.
10
+
- Add skip-softmax tile skipping to the Triton flash attention kernel (``modelopt.torch.kernels.triton_fa``). Integrates with the ``mtsa.sparsify()`` API.
0 commit comments