[MoE] adapt to triton_kernels matmul_ogs -> matmul rename by Liang-jianhao97 · Pull Request #763 · ROCm/ATOM

Liang-jianhao97 · 2026-05-12T09:42:18Z

Motivation

Fix DeepseekV4 accuracy error. https://github.com/ROCm/triton-internal/issues/1823?reload=1?reload=1

Technical Details

Upstream triton_kernels merged the matmul_ogs module into matmul and the matmul_ogs_details package into matmul_details. The PrecisionConfig dataclass was also reshaped: weight_scale is now b_mx_scale, and setting it requires b_microblock_size to be provided explicitly (enforced by an assert in the new matmul()).

fused_moe_triton: try importing FnSpecs / FusedActivation / PrecisionConfig / matmul from triton_kernels.matmul first, fall back to the old triton_kernels.matmul_ogs path. Alias matmul as matmul_ogs so existing call sites stay unchanged.
moe (Mxfp4MoEMethod.process_weights_after_loading): same dual-path import for FlexCtx / PrecisionConfig; detect the kwarg name via dataclasses.fields so the old weight_scale=... path keeps working while the new API takes b_mx_scale=... and b_microblock_size=....
Drop the _amd_smem_safe_tile workaround that pinned block_m / block_n on gfx950: the underlying LDS-spill is no longer reproducible against current triton / triton_kernels.

Test Plan

ATOM Test

Test Result

PASS

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Upstream triton_kernels merged the `matmul_ogs` module into `matmul` and the `matmul_ogs_details` package into `matmul_details`. The `PrecisionConfig` dataclass was also reshaped: `weight_scale` is now `b_mx_scale`, and setting it requires `b_microblock_size` to be provided explicitly (enforced by an assert in the new `matmul()`). - fused_moe_triton: try importing `FnSpecs / FusedActivation / PrecisionConfig / matmul` from `triton_kernels.matmul` first, fall back to the old `triton_kernels.matmul_ogs` path. Alias `matmul as matmul_ogs` so existing call sites stay unchanged. - moe (Mxfp4MoEMethod.process_weights_after_loading): same dual-path import for `FlexCtx / PrecisionConfig`; detect the kwarg name via `dataclasses.fields` so the old `weight_scale=` path keeps working while the new API takes `b_mx_scale=` + `b_microblock_size=`. - Drop the `_amd_smem_safe_tile` workaround that pinned block_m / block_n on gfx950: the underlying LDS-spill is no longer reproducible against current triton / triton_kernels. Co-authored-by: Cursor <cursoragent@cursor.com>

Pre Checkin's `psf/black@stable` flagged the `import triton_kernels.swiglu` + nested `try:` block inside the top-level guard. Add the required blank line so Black treats the inner try as a new statement group. Co-authored-by: Cursor <cursoragent@cursor.com>

Temporarily swap the DeepSeek-V4-Pro accuracy entry from the AITER MoE env (AITER_BF16_FP8_MOE_BOUND=0 + ATOM_MOE_GU_ITLV=1) to the triton MoE env (ATOM_USE_TRITON_MOE=1) so this PR's CI exercises the triton path adapted in fused_moe_triton.py / moe.py. Revert before merge. Co-authored-by: Cursor <cursoragent@cursor.com>

jianlian and others added 3 commits May 12, 2026 04:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MoE] adapt to triton_kernels matmul_ogs -> matmul rename#763

[MoE] adapt to triton_kernels matmul_ogs -> matmul rename#763
Liang-jianhao97 wants to merge 3 commits into
mainfrom
jianlian/triton-kernels-matmul-rename

Liang-jianhao97 commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Liang-jianhao97 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Liang-jianhao97 commented May 12, 2026 •

edited

Loading