[Bug fix] Fake quantized model save after HF accelerate hooks are added (#906)

realAsma · web-flow · commit c4b662fbc8bd · 2026-02-19T05:39:14.000-08:00
## What does this PR do? **Type of change:** Bug fix **Overview:** Fix `AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward'` when saving a quantized model a second time after restoring it with `device_map="auto"`. When a model is loaded with `device_map="auto"`, accelerate's `add_hook_to_module` patches every submodule (including `TensorQuantizer` instances) and injects three instance attributes: `_hf_hook`, `_old_forward`, and `forward` (a `functools.partial` wrapping a local function). These are not picklable and were leaking into the modelopt state dict collected by `get_modelopt_state()`, causing `torch.save` to fail. This PR adds the three accelerate-injected attributes to `TensorQuantizer._skip_properties_for_save_restore` so they are excluded from the serialized state, matching the existing pattern used for HuggingFace and DeepSpeed attributes. ## Usage ```python mto.enable_huggingface_checkpointing() # Quantize and save model = AutoModelForCausalLM.from_pretrained(name, device_map="auto") model = mtq.quantize(model, mtq.FP8_DEFAULT_CFG, forward_loop=forward_loop) model.save_pretrained(save_dir) # Restore and save again (this previously failed) model2 = AutoModelForCausalLM.from_pretrained(save_dir, device_map="auto") model2.save_pretrained(save_dir_round2) # now works ``` ## Testing - Added unit test `test_tensor_quantizer_modelopt_state_with_accelerate_hook` in `tests/unit/torch/quantization/plugins/test_accelerate.py` that verifies accelerate hook attributes are excluded from modelopt state and the state dict remains picklable. ## Before your PR is "*Ready for review*" - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes — only adds entries to a skip set; existing saved checkpoints are unaffected. - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No (internal fix, no API change) - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information The root cause is in accelerate's `add_hook_to_module`, which defines `new_forward` as a local function and binds it via `functools.partial` onto `module.forward`. Since local functions cannot be pickled, any `TensorQuantizer` that has been hooked by accelerate becomes unserializable unless these attributes are excluded.  ## Summary by CodeRabbit * **Bug Fixes** * Enhanced compatibility with accelerate library by excluding framework-specific hooks and attributes from model state serialization, preventing issues during save/restore operations. * **Tests** * Added test to validate that accelerate-related attributes are properly excluded from model state and that the state remains picklable. * **Public API** * TensorQuantizer is now publicly exported.  Signed-off-by: realAsma <akuriparambi@nvidia.com>
diff --git a/modelopt/torch/quantization/nn/modules/tensor_quantizer.py b/modelopt/torch/quantization/nn/modules/tensor_quantizer.py
@@ -156,6 +156,10 @@ class TensorQuantizer(nn.Module):
         "_padding",
         # Extra flags added by huggingface
         "_is_hf_initialized",
+        # Extra flags added by accelerate
+        "_hf_hook",
+        "_old_forward",
+        "forward",
         # Extra flags added by deepspeed
         "ds_external_parameters",
         "all_parameters",
diff --git a/tests/unit/torch/quantization/plugins/test_accelerate.py b/tests/unit/torch/quantization/plugins/test_accelerate.py
@@ -13,12 +13,14 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+import pickle
+
 import pytest
 import torch
 import torch.nn as nn
 
 import modelopt.torch.quantization as mtq
-from modelopt.torch.quantization.nn import QuantLinearConvBase
+from modelopt.torch.quantization.nn import QuantLinearConvBase, TensorQuantizer
 
 try:
     from accelerate.hooks import ModelHook, add_hook_to_module
@@ -51,3 +53,30 @@ def test_linear_with_accelerate_monkey_patched_forward():
 
     assert module_test.input_quantizer.amax is not None
     assert module_test.weight_quantizer.amax is not None
+
+
+def test_tensor_quantizer_modelopt_state_with_accelerate_hook():
+    """Verify accelerate hook attributes are excluded from modelopt state.
+
+    When accelerate's add_hook_to_module patches a TensorQuantizer, it adds
+    _hf_hook, _old_forward, and an instance-level forward (a functools.partial
+    wrapping a local function). These must be excluded from the modelopt state
+    dict, otherwise torch.save / pickle will fail with:
+        AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward'
+    """
+    tq = TensorQuantizer()
+    add_hook_to_module(tq, ModelHook())
+
+    # The hook should have injected these instance attributes
+    assert hasattr(tq, "_hf_hook")
+    assert hasattr(tq, "_old_forward")
+    assert "forward" in tq.__dict__
+
+    # None of the accelerate attributes should appear in the modelopt state
+    state = tq.get_modelopt_state()
+    accelerate_attrs = {"_hf_hook", "_old_forward", "forward"}
+    leaked = accelerate_attrs & state.keys()
+    assert not leaked, f"Accelerate attributes leaked into modelopt state: {leaked}"
+
+    # The state dict must be picklable (torch.save uses pickle internally)
+    pickle.dumps(state)