Skip to content

Commit c4b662f

Browse files
authored
[Bug fix] Fake quantized model save after HF accelerate hooks are added (#906)
## What does this PR do? **Type of change:** Bug fix **Overview:** Fix `AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward'` when saving a quantized model a second time after restoring it with `device_map="auto"`. When a model is loaded with `device_map="auto"`, accelerate's `add_hook_to_module` patches every submodule (including `TensorQuantizer` instances) and injects three instance attributes: `_hf_hook`, `_old_forward`, and `forward` (a `functools.partial` wrapping a local function). These are not picklable and were leaking into the modelopt state dict collected by `get_modelopt_state()`, causing `torch.save` to fail. This PR adds the three accelerate-injected attributes to `TensorQuantizer._skip_properties_for_save_restore` so they are excluded from the serialized state, matching the existing pattern used for HuggingFace and DeepSpeed attributes. ## Usage ```python mto.enable_huggingface_checkpointing() # Quantize and save model = AutoModelForCausalLM.from_pretrained(name, device_map="auto") model = mtq.quantize(model, mtq.FP8_DEFAULT_CFG, forward_loop=forward_loop) model.save_pretrained(save_dir) # Restore and save again (this previously failed) model2 = AutoModelForCausalLM.from_pretrained(save_dir, device_map="auto") model2.save_pretrained(save_dir_round2) # now works ``` ## Testing - Added unit test `test_tensor_quantizer_modelopt_state_with_accelerate_hook` in `tests/unit/torch/quantization/plugins/test_accelerate.py` that verifies accelerate hook attributes are excluded from modelopt state and the state dict remains picklable. ## Before your PR is "*Ready for review*" - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes — only adds entries to a skip set; existing saved checkpoints are unaffected. - **Did you write any new necessary tests?**: Yes - **Did you add or update any necessary documentation?**: No (internal fix, no API change) - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No ## Additional Information The root cause is in accelerate's `add_hook_to_module`, which defines `new_forward` as a local function and binds it via `functools.partial` onto `module.forward`. Since local functions cannot be pickled, any `TensorQuantizer` that has been hooked by accelerate becomes unserializable unless these attributes are excluded. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Enhanced compatibility with accelerate library by excluding framework-specific hooks and attributes from model state serialization, preventing issues during save/restore operations. * **Tests** * Added test to validate that accelerate-related attributes are properly excluded from model state and that the state remains picklable. * **Public API** * TensorQuantizer is now publicly exported. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: realAsma <akuriparambi@nvidia.com>
1 parent eb99488 commit c4b662f

File tree

2 files changed

+34
-1
lines changed

2 files changed

+34
-1
lines changed

modelopt/torch/quantization/nn/modules/tensor_quantizer.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,10 @@ class TensorQuantizer(nn.Module):
156156
"_padding",
157157
# Extra flags added by huggingface
158158
"_is_hf_initialized",
159+
# Extra flags added by accelerate
160+
"_hf_hook",
161+
"_old_forward",
162+
"forward",
159163
# Extra flags added by deepspeed
160164
"ds_external_parameters",
161165
"all_parameters",

tests/unit/torch/quantization/plugins/test_accelerate.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,14 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16+
import pickle
17+
1618
import pytest
1719
import torch
1820
import torch.nn as nn
1921

2022
import modelopt.torch.quantization as mtq
21-
from modelopt.torch.quantization.nn import QuantLinearConvBase
23+
from modelopt.torch.quantization.nn import QuantLinearConvBase, TensorQuantizer
2224

2325
try:
2426
from accelerate.hooks import ModelHook, add_hook_to_module
@@ -51,3 +53,30 @@ def test_linear_with_accelerate_monkey_patched_forward():
5153

5254
assert module_test.input_quantizer.amax is not None
5355
assert module_test.weight_quantizer.amax is not None
56+
57+
58+
def test_tensor_quantizer_modelopt_state_with_accelerate_hook():
59+
"""Verify accelerate hook attributes are excluded from modelopt state.
60+
61+
When accelerate's add_hook_to_module patches a TensorQuantizer, it adds
62+
_hf_hook, _old_forward, and an instance-level forward (a functools.partial
63+
wrapping a local function). These must be excluded from the modelopt state
64+
dict, otherwise torch.save / pickle will fail with:
65+
AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward'
66+
"""
67+
tq = TensorQuantizer()
68+
add_hook_to_module(tq, ModelHook())
69+
70+
# The hook should have injected these instance attributes
71+
assert hasattr(tq, "_hf_hook")
72+
assert hasattr(tq, "_old_forward")
73+
assert "forward" in tq.__dict__
74+
75+
# None of the accelerate attributes should appear in the modelopt state
76+
state = tq.get_modelopt_state()
77+
accelerate_attrs = {"_hf_hook", "_old_forward", "forward"}
78+
leaked = accelerate_attrs & state.keys()
79+
assert not leaked, f"Accelerate attributes leaked into modelopt state: {leaked}"
80+
81+
# The state dict must be picklable (torch.save uses pickle internally)
82+
pickle.dumps(state)

0 commit comments

Comments
 (0)