[Minor] Force 'fuse_wgrad_accumulation' to false for TE GroupedLinear (#814)

realAsma · danielkorzekwa · commit 282675b65ccc · 2026-02-17T06:44:28.000-08:00
## What does this PR do? **Type of change:** ? Minor **Overview:** ? ## Usage  ```python # Add a code snippet demonstrating how to use this ``` ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No  - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No  ## Additional Information   ## Summary by CodeRabbit * **Bug Fixes** * Automatically disables fuse_wgrad_accumulation when using ModelOpt quantization with Transformer Engine-based quantization paths. A warning is now displayed to notify users when this adjustment occurs. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  Signed-off-by: realAsma <akuriparambi@nvidia.com>
diff --git a/modelopt/torch/quantization/plugins/transformer_engine.py b/modelopt/torch/quantization/plugins/transformer_engine.py
@@ -120,6 +120,13 @@ def _functionals_to_replace(self, value):
         self._functionals_to_replace = value
 
     def _setup(self):
+        if getattr(self, "fuse_wgrad_accumulation", False):
+            warnings.warn(
+                "fuse_wgrad_accumulation is not supported with ModelOpt quantization. "
+                "Setting fuse_wgrad_accumulation to False."
+            )
+            self.fuse_wgrad_accumulation = False
+
         # GroupedMLP stores the weights as weight0, weight1, etc. To run setup in order to
         # initialize the quantizer states, self.weight is used to extract shape, dtype etc. Assigning
         # self.weight0 to self.weight to run the quantizer states initialization.
@@ -131,6 +138,9 @@ def _setup(self):
         # Remove self.weight after setup.
         delattr(self, "weight")
 
+        # TODO: GroupedLinear supports weights split by `num_gemms`, to support quantization
+        # with static parameters beyond per-tensor, we need to support a unique quantizer for each gemm.
+
     def modelopt_post_restore(self, prefix: str = ""):
         # GroupedMLP stores the weights as weight0, weight1, etc. To run post_restore in order to
         # initialize the quantizer states, self.weight is used to extract shape, dtype etc. Assigning