Skip to content

[BugFix] A4W4 FMoE run_config weight shuffle#3110

Open
zihaomu wants to merge 1 commit intoROCm:mainfrom
zihaomu:pr/aiter-a4w4-fmoe-run-config-shuffle
Open

[BugFix] A4W4 FMoE run_config weight shuffle#3110
zihaomu wants to merge 1 commit intoROCm:mainfrom
zihaomu:pr/aiter-a4w4-fmoe-run-config-shuffle

Conversation

@zihaomu
Copy link
Copy Markdown
Member

@zihaomu zihaomu commented May 11, 2026

Motivation

Fix FMoE run_config to preshuffle FP4 weights when marking them as shuffled, keeping weight and scale layouts consistent during tuning compare.

Technical Details

Test Plan

Test Result

Submission Checklist

@zihaomu zihaomu requested review from a team and Copilot May 11, 2026 01:49
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3110 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the FMoE tuning run_config path to preshuffle FP4 weights when they are treated as preshuffled (is_shuffled=True), keeping the FP4 weight and scale layouts consistent during tuning comparisons.

Changes:

  • Preshuffle w1_qt_fmoe / w2_qt_fmoe with shuffle_weight(..., (16, 16)) in the FP4 fallback branch so is_shuffled=True accurately reflects the tensor layout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 2950 to +2952
else:
w1_qt_fmoe = shuffle_weight(w1_qt_fmoe, (16, 16))
w2_qt_fmoe = shuffle_weight(w2_qt_fmoe, (16, 16))
@zihaomu zihaomu changed the title Fix A4W4 FMoE run_config weight shuffle [BugFix] A4W4 FMoE run_config weight shuffle May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants