[MoE] Align Swiglu MXFP4 fused quant paths#3123
Open
XiaobingSuper wants to merge 3 commits intoROCm:mainfrom
Open
[MoE] Align Swiglu MXFP4 fused quant paths#3123XiaobingSuper wants to merge 3 commits intoROCm:mainfrom
XiaobingSuper wants to merge 3 commits intoROCm:mainfrom
Conversation
Remove the GPT-OSS Swiglu layout env switch in favor of GateMode, align the CSV test filter with runtime dtype selection, and restore FlyDSL Swiglu _fp4 fused quant accuracy by matching the non-fused bf16 stage1 semantics. Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Swiglu MXFP4 MoE codepaths to remove the legacy GPT-OSS layout environment switch, align runtime q_dtype_a selection with GateMode, and restore FlyDSL fused-quant numerical behavior to match the non-fused bf16 materialization/clamp semantics.
Changes:
- Switch Swiglu MXFP4
q_dtype_aselection to be driven byGateMode.SEPARATEDvs non-separated modes, and threadgate_modethrough the 2-stage config path. - Update CSV-driven MoE 2-stage tests to skip cases whose
q_dtype_ano longer matches the runtime Swiglu MXFP4 selection logic (now includinggate_mode). - Adjust FlyDSL fused quant kernels to apply the Swiglu alpha/clamp path and bf16 round-trip prior to MXFP4 quantization to match the non-fused semantics.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
op_tests/test_moe_2stage.py |
Updates CSV-case filtering to match runtime Swiglu MXFP4 q_dtype_a selection, now factoring in gateMode. |
aiter/ops/flydsl/kernels/silu_and_mul_fq.py |
Aligns fused activation/clamp behavior for Swiglu and adds bf16 round-trip to match non-fused quant semantics. |
aiter/ops/flydsl/kernels/mixed_moe_gemm_2stage.py |
Adds bf16 materialization before MXFP4 quantization in the fused stage1 store path for Swiglu FP4. |
aiter/fused_moe.py |
Removes the GPT-OSS Swiglu MXFP4 layout env switch and keys runtime dtype selection/config dispatch off gate_mode. |
Comments suppressed due to low confidence (1)
aiter/fused_moe.py:827
get_2stage_cfgs()now acceptsgate_mode, but the tuned-config lookup keys (_INDEX_COLS/keys) do not incorporate it. If SEPARATED vs INTERLEAVE share the sameq_dtype_a/q_dtype_w(e.g. Swiglu MXFP4 small-M where both may be bf16+fp4), this can cause the wrong tuned kernel to be selected or make it impossible to keep separate tuned entries. Consider threadinggate_modethrough the config index (and logging) so the selected kernel is unambiguous across gate layouts.
def get_2stage_cfgs(
token,
model_dim,
inter_dim,
expert,
topk,
dtype,
q_dtype_a,
q_dtype_w,
q_type,
use_g1u1,
activation,
doweight_stage1,
hidden_pad,
intermediate_pad,
is_shuffled=True,
gate_mode=GateMode.SEPARATED.value,
):
gate_mode = GateMode(gate_mode)
_INDEX_COLS = [
"cu_num",
"token",
"model_dim",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| q_dtype_a = dtypes.bf16 | ||
| elif quant_type == QuantType.per_1x32: | ||
| if activation == ActivationType.Swiglu and _USE_GENERIC_SWIGLU_MXFP4_LAYOUT: | ||
| if activation == ActivationType.Swiglu and gate_mode == GateMode.SEPARATED: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_moe_2stage.pyreferences with runtime Swiglu MXFP4 fused quant semantics by using an f32 stage1 reference for FP4 fused-quant cases.gateModefrom dtype/layout because tuned rows do not carry an explicitgateModefield.Test plan
podman exec zxb_vllm_gptoss bash -lc 'cd /workdir/aiter_main && python3 -m py_compile op_tests/test_moe_2stage.py aiter/ops/flydsl/kernels/mixed_moe_gemm_2stage.py aiter/ops/flydsl/kernels/silu_and_mul_fq.py && git diff --check'podman exec zxb_vllm_gptoss bash -lc 'cd /workdir/aiter_main && HIP_VISIBLE_DEVICES=1 FLYDSL_RUNTIME_CACHE_DIR=/tmp/flydsl_pr3123_test_fp4 AITER_CONFIG_FMOE=/workdir/aiter_main/aiter/configs/model_configs/gptoss_fp4_tuned_fmoe.csv python3 -m op_tests.test_moe_2stage --no-legacy'podman exec zxb_vllm_gptoss bash -lc 'cd /workdir/aiter_main && HIP_VISIBLE_DEVICES=1 FLYDSL_RUNTIME_CACHE_DIR=/tmp/flydsl_pr3123_test_fp8fp4 AITER_CONFIG_FMOE=/workdir/aiter_main/aiter/configs/model_configs/gptoss_fp8fp4_tuned_fmoe.csv python3 -m op_tests.test_moe_2stage --no-legacy'podman exec zxb_vllm_gptoss bash -lc 'cd /workdir/aiter_main && HIP_VISIBLE_DEVICES=1 FLYDSL_RUNTIME_CACHE_DIR=/tmp/flydsl_pr3123_test_legacy python3 -m op_tests.test_moe_2stage --no-flydsl-csv -t 1024 -dim 3072,3072 -e 128 -k 4 -q 4 -a swiglu -s f -p t -hip 0,0'Test result
gptoss_fp4_tuned_fmoe.csv --no-legacy: passed 8 strict CSV cases, command exit code 0.gptoss_fp8fp4_tuned_fmoe.csv --no-legacy: passed 7 strict CSV cases, command exit code 0.Made with Cursor