fix(awq): 修复低内存模式下apply_scale和apply_clip后所有子模块不在同一设备上的问题，添加_remove_accelerate_hooks函数来清理accelerate的hooks by imleoo · Pull Request #184 · Tencent/AngelSlim

imleoo · 2025-12-24T10:31:18Z

在低内存设备上量化qwen3-32b模型的时候出现 RuntimeError: Expected all tensors to be on the same device... 错误，是因为 AngelSlim 的 AWQ 模块显式地将层移动到了主 GPU (cuda:0) 进行处理，但是模型（使用 accelerate 或 device_map="auto" 加载）附带了钩子，这些钩子试图强制执行最初的设备放置策略（例如将第 19 层放在 cuda:1 上）。
当 awq.py 将某一层移动到 cuda:0 时，accelerate 钩子并不知道这次手动移动，并试图将输入移回它认为该层应在的位置（cuda:1），从而导致输入张量和权重张量之间出现不匹配。
我通过在将每一层移动到 AWQ 处理的目标设备之前，移除其上的 accelerate 钩子解决了这个问题。这确保了该层表现得像一个标准的 PyTorch 模块，并遵守显式的设备放置规则。
应用于 angelslim/compressor/quant/modules/awq/awq.py 的更改如下：
添加了一个辅助函数 _remove_accelerate_hooks，用于递归地移除模块上的 accelerate 钩子。
在主 AWQ 循环中，在执行 layers[i].to(dev) 之前调用了 _remove_accelerate_hooks(layers[i])。
这应该能解决设备不匹配错误，并允许 AWQ 流程顺利完成。
文件路径：angelslim/compressor/quant/modules/awq/awq.py

确保在apply_scale和apply_clip后所有子模块在同一设备上添加_remove_accelerate_hooks函数来清理accelerate的hooks

yghstill · 2025-12-29T08:02:08Z

@imleoo 在awq里你额外增加了accelerate的调用逻辑吗？

imleoo · 2025-12-29T14:46:42Z

主要是增加了这个方法，其他没有增加调用逻辑了，以及增加调用这个方法清理hooks，目前已经在2*4090 48G的下跑完模型量化了。

def _remove_accelerate_hooks(module):
for submodule in module.modules():
if hasattr(submodule, "_hf_hook"):
try:
from accelerate.hooks import remove_hook_from_module
remove_hook_from_module(submodule)
except ImportError:
# Should not happen if _hf_hook is present
delattr(submodule, "_hf_hook")
if hasattr(submodule, "_old_forward"):
submodule.forward = submodule._old_forward
delattr(submodule, "_old_forward")

yghstill · 2026-01-04T08:14:16Z

@imleoo 开启accelerate的config，或者启动加载的修改可以同步提交下吗？

imleoo · 2026-01-04T16:27:10Z

我使用的配置文件
config.yaml:

global:
save_path: ./output_xxxx_qwen3_32b_int4_awq

model:
name: Qwen
model_path: /data2/leoobai/merged-xxxxx-qwen3-32b
trust_remote_code: true
low_cpu_mem_usage: true
use_cache: false
torch_dtype: auto
device_map: auto # 关键修改1: 改为 cpu

compression:
name: PTQ
quantization:
name: int4_awq
bits: 4
low_memory: true # 关键修改2: 添加 low_memory
quant_method:
weight: "per-group"
group_size: 128
zero_point: true
mse_range: false
ignore_layers:
- "lm_head"

dataset:
name: TextDataset
data_path: ./sharegpt_gpt4-qwen3_a22B_output.jsonl
max_seq_length: 4096
num_samples: 128
batch_size: 1

…e_hooks (Tencent#184)

fix(awq): 修复低内存模式下设备不一致问题并移除accelerate hooks

151af0f

确保在apply_scale和apply_clip后所有子模块在同一设备上添加_remove_accelerate_hooks函数来清理accelerate的hooks

yghstill approved these changes Dec 28, 2025

View reviewed changes

yghstill merged commit 5721a0d into Tencent:main Jan 5, 2026
4 of 5 checks passed

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

fix(awq): apply_scale and apply_clip device and add _remove_accelerat…

3ff6c21

…e_hooks (Tencent#184)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(awq): 修复低内存模式下apply_scale和apply_clip后所有子模块不在同一设备上的问题，添加_remove_accelerate_hooks函数来清理accelerate的hooks#184

fix(awq): 修复低内存模式下apply_scale和apply_clip后所有子模块不在同一设备上的问题，添加_remove_accelerate_hooks函数来清理accelerate的hooks#184
yghstill merged 1 commit into
Tencent:mainfrom
imleoo:main

imleoo commented Dec 24, 2025

Uh oh!

yghstill commented Dec 29, 2025

Uh oh!

imleoo commented Dec 29, 2025 •

edited

Loading

Uh oh!

yghstill commented Jan 4, 2026

Uh oh!

imleoo commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imleoo commented Dec 24, 2025

Uh oh!

yghstill commented Dec 29, 2025

Uh oh!

imleoo commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yghstill commented Jan 4, 2026

Uh oh!

imleoo commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imleoo commented Dec 29, 2025 •

edited

Loading