Skip to content

feat: add SmoothQuant calibration pipeline for HY3#322

Merged
gavingavin99 merged 4 commits into
Tencent:mainfrom
gavingavin99:dev_smooth0528
Jun 2, 2026
Merged

feat: add SmoothQuant calibration pipeline for HY3#322
gavingavin99 merged 4 commits into
Tencent:mainfrom
gavingavin99:dev_smooth0528

Conversation

@gavingavin99
Copy link
Copy Markdown
Collaborator

  • Add SmoothAttn/SmoothDownProj hooks and AlphaSearcher in vllm_calibrate_utils for smooth stats collection and alpha search
  • Add tools/smooth with vLLM-side smooth calibration runner and weight conversion utility
  • Add HY3 smooth config, calibrate/convert/e2e scripts, and docs
  • Extend vllm_patch (envs, fused_moe, install) for smooth support

  - Add SmoothAttn/SmoothDownProj hooks and AlphaSearcher in
    vllm_calibrate_utils for smooth stats collection and alpha search
  - Add tools/smooth with vLLM-side smooth calibration runner and
    weight conversion utility
  - Add HY3 smooth config, calibrate/convert/e2e scripts, and docs
  - Extend vllm_patch (envs, fused_moe, install) for smooth support
Move SmoothQuant code into a new self-contained package with a shared
core/ layer; the vLLM online path and the HF offline convert path now
both consume the same algorithm primitives (formulas, QDQ, RoPE/GQA,
alpha searcher, smooth stats I/O) instead of carrying ~3000 lines of
duplicated implementation across vllm_calibrate_utils and
tools/smooth/convert_smooth_weights.py. Patched fused_moe.py now
imports from a standalone smooth_moe_inject module deployed by
install.sh next to vllm_calibrate_utils.
@gavingavin99 gavingavin99 merged commit e29a2ec into Tencent:main Jun 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants