Skip to content

[Klaud Cold] Add minimaxm3-fp4-mi355x-atom (upstream branch for full-sweep validation)#1813

Merged
seungrokj merged 7 commits into
mainfrom
feat/minimaxm3-fp4-mi355x-atom
Jun 18, 2026
Merged

[Klaud Cold] Add minimaxm3-fp4-mi355x-atom (upstream branch for full-sweep validation)#1813
seungrokj merged 7 commits into
mainfrom
feat/minimaxm3-fp4-mi355x-atom

Conversation

@andyluo7

@andyluo7 andyluo7 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Upstream-branch mirror of #1812 (originally from indianspeedster/InferenceMAX:feat/minimaxm3-fp4-mi355x-atom) so the GPU full-sweep PR validation can run — fork PRs can't access the self-hosted runners/secrets that run-sweep.yml needs.

Original author: @indianspeedster. Mirrors PR #1812 at commit a68c303 (includes the Bugbot fix "use matrix MAX_MODEL_LEN").

Summary

Adds the minimaxm3-fp4-mi355x-atom config — MiniMax-M3 MXFP4 (amd/MiniMax-M3-MXFP4) on MI355X, single-node atom engine — for the 1k/1k and 8k/1k fixed-seq-len cells, TP4. Follows the ROCm/ATOM MiniMax-M3 recipe (FP4 on 4×MI355 section).

  • .github/configs/amd-master.yaml: new config entry + search space (TP4, conc 1→128, image rocm/atom-dev:M3).
  • benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom.sh: atom serve script — --block-size 128 (mandatory for MiniMax MSA), --gpu-memory-utilization 0.8, --trust-remote-code, --max-model-len $MAX_MODEL_LEN. KV cache left at default dtype (MXFP4 checkpoint ships no calibrated FP8 KV scales).
  • runners/launch_mi355x-amds.sh: route amd/MiniMax-M3* weights to the NFS cache.
  • perf-changelog entry.

Validation

  • generate_sweep_configs.py test-config → 16 configs (minimaxm3_1k1k, minimaxm3_8k1k, TP4 conc 1–128).
  • Smoke-tested on real MI355X (TP4 / conc-1 / 1k1k): atom server came up across 4 ranks, served, wrote a well-formed result JSON.

Adding full-sweep-enabled to run the full PR validation sweep.

Closes/supersedes #1812 once validated.


Note

Low Risk
Benchmark and CI config only; no changes to auth, data handling, or production serving paths.

Overview
Adds a day-zero minimaxm3-fp4-mi355x-atom sweep for MiniMax-M3 MXFP4 (amd/MiniMax-M3-MXFP4) on MI355X using the ATOM engine, aligned with the ROCm/ATOM MiniMax-M3 recipe (TP4, --block-size 128 for MSA).

.github/configs/amd-master.yaml defines fixed-seq-len cells at 1k/1k and 8k/1k with TP4 and concurrency 1→128, image rocm/atom-dev:M3.

benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_atom.sh starts atom.entrypoints.openai_server with matrix MAX_MODEL_LEN, mandatory block size 128, and default KV cache dtype (no FP8 KV — the MXFP4 checkpoint has no calibrated scales). Optional eval and standard serving benchmark follow.

runners/launch_mi355x-amds.sh routes amd/MiniMax-M3* weights to the NFS Hugging Face cache, same as existing MiniMaxAI M3 paths.

perf-changelog.yaml documents the new config key.

Reviewed by Cursor Bugbot for commit c251a03. Bugbot is set up for automated code reviews on this repo. Configure here.

Smoke-tested on MI355X (mia1-p01-g07): TP4 conc-1 1k1k served and benched
clean (mean TPOT 6.8ms). KV cache left at default dtype — amd/MiniMax-M3-MXFP4
has no calibrated FP8 KV scales, so --kv_cache_dtype fp8 asserts in the MSA
fused_qknorm kernel.
@andyluo7 andyluo7 marked this pull request as draft June 17, 2026 20:32
@andyluo7 andyluo7 marked this pull request as ready for review June 17, 2026 20:33
@andyluo7

Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-keys minimaxm3-fp4-mi355x-atom --config-files .github/configs/amd-master.yaml

@github-actions

Copy link
Copy Markdown
Contributor

@andyluo7 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/27718177974
Command: test-config --config-keys minimaxm3-fp4-mi355x-atom --config-files .github/configs/amd-master.yaml
Pinned ref: 61a6a94
Approval: not required (trusted collaborator).

@github-actions

Copy link
Copy Markdown
Contributor

@seungrokj seungrokj left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seungrokj

Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@seungrokj

Copy link
Copy Markdown
Collaborator

/merge-prs

@seungrokj

Copy link
Copy Markdown
Collaborator

@functionstackx @cquil11 can you plz approve this ?

@Oseltamivir Oseltamivir left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@seungrokj seungrokj merged commit cc78fc9 into main Jun 18, 2026
40 checks passed
@seungrokj seungrokj deleted the feat/minimaxm3-fp4-mi355x-atom branch June 18, 2026 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants