[Klaud Cold] Add minimaxm3-fp4-mi355x-atom#1812
Conversation
Smoke-tested on MI355X (mia1-p01-g07): TP4 conc-1 1k1k served and benched clean (mean TPOT 6.8ms). KV cache left at default dtype — amd/MiniMax-M3-MXFP4 has no calibrated FP8 KV scales, so --kv_cache_dtype fp8 asserts in the MSA fused_qknorm kernel.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9a2b0f4. Configure here.
|
@indianspeedster thanks for the contribution, can u or one of ur teammates add create an upstream branch & add full-sweep-enabled PR validation |
|
@functionstackx done — mirrored this to an upstream branch so the GPU sweep can run (fork PRs can't access the self-hosted runners): #1813 ( (Also: the earlier Cursor Bugbot |

Summary
Adds the
minimaxm3-fp4-mi355x-atomconfig — MiniMax-M3 MXFP4 (amd/MiniMax-M3-MXFP4) on MI355X, single-node atom engine — for the 1k/1k and 8k/1k fixed-seq-len cells, TP4.Follows the ROCm/ATOM MiniMax-M3 recipe (FP4 on 4×MI355 section).
.github/configs/amd-master.yaml: new config entry + search space (TP4, conc 1→128, imagerocm/atom-dev:M3).benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom.sh: atom serve script —--block-size 128(mandatory for MiniMax MSA),--gpu-memory-utilization 0.8,--trust-remote-code. KV cache left at the default dtype: this MXFP4 checkpoint ships no calibrated FP8 KV scales, so--kv_cache_dtype fp8asserts (k_scale is None) in the MSAfused_qknormkernel during init.runners/launch_mi355x-amds.sh: routeamd/MiniMax-M3*weights to the NFS cache (alongside the existingMiniMaxAI/MiniMax-M3*rule).Validation
generate_sweep_configs.py test-config→ 16 configs:minimaxm3_1k1kandminimaxm3_8k1k, each TP4 at conc {1,2,4,8,16,32,64,128};max-model-len= 2304 (1k1k) / 9472 (8k1k); frameworkatom.🤖 Generated with Claude Code
Note
Low Risk
Benchmark and launch-routing changes only; no production serving or auth paths touched.
Overview
Adds day-zero fixed-seq-len benchmarking for MiniMax-M3 MXFP4 (
amd/MiniMax-M3-MXFP4) on MI355X using the ATOM engine (minimaxm3-fp4-mi355x-atom).The new
amd-master.yamlentry uses imagerocm/atom-dev:M3, TP4, concurrency 1→128, and 1k/1k and 8k/1k cells per the ROCm/ATOM recipe. A new serve script startsatom.entrypoints.openai_serverwith--block-size 128(MSA requirement), 0.8 GPU memory utilization, and default KV cache dtype (no FP8 KV — the MXFP4 checkpoint has no calibrated scales).launch_mi355x-amds.shnow NFS-mounts weights foramd/MiniMax-M3*as well asMiniMaxAI/MiniMax-M3*. perf-changelog documents the new config key.Reviewed by Cursor Bugbot for commit a68c303. Bugbot is set up for automated code reviews on this repo. Configure here.