[Klaud Cold] Add minimaxm3-fp4-mi355x-atom by indianspeedster · Pull Request #1812 · SemiAnalysisAI/InferenceX

indianspeedster · 2026-06-17T19:20:10Z

Summary

Adds the minimaxm3-fp4-mi355x-atom config — MiniMax-M3 MXFP4 (amd/MiniMax-M3-MXFP4) on MI355X, single-node atom engine — for the 1k/1k and 8k/1k fixed-seq-len cells, TP4.

Follows the ROCm/ATOM MiniMax-M3 recipe (FP4 on 4×MI355 section).

.github/configs/amd-master.yaml: new config entry + search space (TP4, conc 1→128, image rocm/atom-dev:M3).
benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom.sh: atom serve script — --block-size 128 (mandatory for MiniMax MSA), --gpu-memory-utilization 0.8, --trust-remote-code. KV cache left at the default dtype: this MXFP4 checkpoint ships no calibrated FP8 KV scales, so --kv_cache_dtype fp8 asserts (k_scale is None) in the MSA fused_qknorm kernel during init.
runners/launch_mi355x-amds.sh: route amd/MiniMax-M3* weights to the NFS cache (alongside the existing MiniMaxAI/MiniMax-M3* rule).
perf-changelog entry.

Validation

generate_sweep_configs.py test-config → 16 configs: minimaxm3_1k1k and minimaxm3_8k1k, each TP4 at conc {1,2,4,8,16,32,64,128}; max-model-len = 2304 (1k1k) / 9472 (8k1k); framework atom.
Smoke-tested on real MI355X hardware (TP4 / conc-1 / 1k1k): atom server came up across 4 ranks, served, and the benchmark wrote a well-formed result JSON.

🤖 Generated with Claude Code

Note

Low Risk
Benchmark and launch-routing changes only; no production serving or auth paths touched.

Overview
Adds day-zero fixed-seq-len benchmarking for MiniMax-M3 MXFP4 (amd/MiniMax-M3-MXFP4) on MI355X using the ATOM engine (minimaxm3-fp4-mi355x-atom).

The new amd-master.yaml entry uses image rocm/atom-dev:M3, TP4, concurrency 1→128, and 1k/1k and 8k/1k cells per the ROCm/ATOM recipe. A new serve script starts atom.entrypoints.openai_server with --block-size 128 (MSA requirement), 0.8 GPU memory utilization, and default KV cache dtype (no FP8 KV — the MXFP4 checkpoint has no calibrated scales). launch_mi355x-amds.sh now NFS-mounts weights for amd/MiniMax-M3* as well as MiniMaxAI/MiniMax-M3*. perf-changelog documents the new config key.

^{Reviewed by Cursor Bugbot for commit a68c303. Bugbot is set up for automated code reviews on this repo. Configure here.}

Smoke-tested on MI355X (mia1-p01-g07): TP4 conc-1 1k1k served and benched clean (mean TPOT 6.8ms). KV cache left at default dtype — amd/MiniMax-M3-MXFP4 has no calibrated FP8 KV scales, so --kv_cache_dtype fp8 asserts in the MSA fused_qknorm kernel.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9a2b0f4. Configure here.}

functionstackx · 2026-06-17T19:48:32Z

@indianspeedster thanks for the contribution, can u or one of ur teammates add create an upstream branch & add full-sweep-enabled PR validation

andyluo7 · 2026-06-17T20:38:42Z

@functionstackx done — mirrored this to an upstream branch so the GPU sweep can run (fork PRs can't access the self-hosted runners): #1813 (feat/minimaxm3-fp4-mi355x-atom on the upstream repo, same commits, credits @indianspeedster). Added the full-sweep-enabled label and kicked off full-sweep PR validation there — it's running now across the full matrix (1k1k + 8k1k, TP4, conc 1→128) on MI355X: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27718177974

(Also: the earlier Cursor Bugbot MAX_MODEL_LEN comment is already resolved by commit a68c303, which uses the matrix $MAX_MODEL_LEN.)

functionstackx · 2026-06-17T20:55:02Z

thanks @andyluo7

superceded by #1813

indianspeedster added 2 commits June 17, 2026 19:16

minimaxm3-fp4-mi355x-atom: route amd/MiniMax-M3* weights to NFS cache

2d7158a

indianspeedster requested a review from a team June 17, 2026 19:20

indianspeedster requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 17, 2026 19:20

github-project-automation Bot added this to InferenceMAX Board Jun 17, 2026

minimaxm3-fp4-mi355x-atom: fill perf-changelog pr-link

9a2b0f4

indianspeedster changed the title ~~[Klaud Cold] Add minimaxm3-fp4-mi355x-atom single-node atom benchmark~~ [Klaud Cold] Add minimaxm3-fp4-mi355x-atom Jun 17, 2026

cursor Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom.sh

minimaxm3-fp4-mi355x-atom: use matrix MAX_MODEL_LEN (isl+osl+256)

a68c303

andyluo7 mentioned this pull request Jun 17, 2026

[Klaud Cold] Add minimaxm3-fp4-mi355x-atom (upstream branch for full-sweep validation) #1813

Merged

functionstackx closed this Jun 17, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] Add minimaxm3-fp4-mi355x-atom#1812

[Klaud Cold] Add minimaxm3-fp4-mi355x-atom#1812
indianspeedster wants to merge 4 commits into
SemiAnalysisAI:mainfrom
indianspeedster:feat/minimaxm3-fp4-mi355x-atom

indianspeedster commented Jun 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

functionstackx commented Jun 17, 2026

Uh oh!

andyluo7 commented Jun 17, 2026

Uh oh!

functionstackx commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

indianspeedster commented Jun 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

functionstackx commented Jun 17, 2026

Uh oh!

andyluo7 commented Jun 17, 2026

Uh oh!

functionstackx commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

indianspeedster commented Jun 17, 2026 •

edited by cursor Bot

Loading