Skip to content

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export#269

Merged
gavingavin99 merged 4 commits into
Tencent:mainfrom
gavingavin99:dev_rotation0320
Mar 24, 2026
Merged

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export#269
gavingavin99 merged 4 commits into
Tencent:mainfrom
gavingavin99:dev_rotation0320

Conversation

@gavingavin99
Copy link
Copy Markdown
Collaborator

feat(spinquant): refactor SpinQuant with CPU offload, parallel fuse, meta device support, and vLLM export

  • Move rotation fuse computation to CPU to avoid GPU OOM on large models
  • Add multi-threaded parallel fuse for R1/R2/R4 with ThreadPoolExecutor
  • Support meta device weights via accelerate hook-based materialization
  • Add block-diagonal Hadamard (had_dim > 0) for R4 rotation
  • Auto-generate transform_config for vLLM deployment when R4 is enabled
  • Save transform_config in quantization_config during model export
  • Fix fuse_ln_linear to compute on CPU and write back to original device
  • Add .detach() in pack_weight_to_int8 to prevent gradient errors
  • Add comprehensive SpinQuant documentation (spinquant.md)

…meta device support, and vLLM export

      - Move rotation fuse computation to CPU to avoid GPU OOM on large models
      - Add multi-threaded parallel fuse for R1/R2/R4 with ThreadPoolExecutor
      - Support meta device weights via accelerate hook-based materialization
      - Add block-diagonal Hadamard (had_dim > 0) for R4 rotation
      - Auto-generate transform_config for vLLM deployment when R4 is enabled
      - Save transform_config in quantization_config during model export
      - Fix fuse_ln_linear to compute on CPU and write back to original device
      - Add .detach() in pack_weight_to_int8 to prevent gradient errors
      - Add comprehensive SpinQuant documentation (spinquant.md)
@gavingavin99 gavingavin99 merged commit 99efb57 into Tencent:main Mar 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants