feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export by gavingavin99 · Pull Request #269 · Tencent/AngelSlim

gavingavin99 · 2026-03-23T03:17:40Z

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export

Move rotation fuse computation to CPU to avoid GPU OOM on large models
Add multi-threaded parallel fuse for R1/R2/R4 with ThreadPoolExecutor
Support meta device weights via accelerate hook-based materialization
Add block-diagonal Hadamard (had_dim > 0) for R4 rotation
Auto-generate transform_config for vLLM deployment when R4 is enabled
Save transform_config in quantization_config during model export
Fix fuse_ln_linear to compute on CPU and write back to original device
Add .detach() in pack_weight_to_int8 to prevent gradient errors
Add comprehensive SpinQuant documentation (spinquant.md)

…meta device support, and vLLM export - Move rotation fuse computation to CPU to avoid GPU OOM on large models - Add multi-threaded parallel fuse for R1/R2/R4 with ThreadPoolExecutor - Support meta device weights via accelerate hook-based materialization - Add block-diagonal Hadamard (had_dim > 0) for R4 rotation - Auto-generate transform_config for vLLM deployment when R4 is enabled - Save transform_config in quantization_config during model export - Fix fuse_ln_linear to compute on CPU and write back to original device - Add .detach() in pack_weight_to_int8 to prevent gradient errors - Add comprehensive SpinQuant documentation (spinquant.md)

gavingavin99 force-pushed the dev_rotation0320 branch from 0000af1 to bdf4fe8 Compare March 24, 2026 04:13

gavinlee added 3 commits March 24, 2026 12:19

revert file

1e9006c

revert file

9e1e030

remove useless comments

2a156a1

yghstill approved these changes Mar 24, 2026

View reviewed changes

gavingavin99 merged commit 99efb57 into Tencent:main Mar 24, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export#269

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export#269
gavingavin99 merged 4 commits into
Tencent:mainfrom
gavingavin99:dev_rotation0320

gavingavin99 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gavingavin99 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants