Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2851,10 +2851,11 @@ minimaxm3-fp8-mi355x-vllm-mtp:
- { tp: 4, conc-start: 1, conc-end: 64, spec-decoding: mtp }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 128, conc-end: 256, spec-decoding: mtp }

# MiniMax-M3 MXFP8 MI300X day-zero recipe. Reuse the dedicated ROCm image and
# MI355X serving shape, but retain the default BF16 KV cache because this
# checkpoint lacks calibrated ROCm FP8 attention scales. Use the TP8-only H100
# search space: TP8 for latency and TP8+EP8 (TEP) at high concurrency.
# MiniMax-M3 MXFP8 MI300X recipe. Convert the checkpoint's MXFP8 MoE weights to
# 128x128 block FP8 at load time and use the regular Triton block-FP8 backend.
# Retain the default BF16 KV cache because this checkpoint lacks calibrated
# ROCm FP8 attention scales. Use TP8 for latency and TP8+EP8 at high
# concurrency.
minimaxm3-fp8-mi300x-vllm:
image: vllm/vllm-openai-rocm:minimax-m3
model: MiniMaxAI/MiniMax-M3-MXFP8
Expand Down
50 changes: 46 additions & 4 deletions benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi300x.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
#!/usr/bin/env bash

# MiniMax-M3 MXFP8 MI300X (gfx942) single-node vLLM recipe.
# Reuses the dedicated ROCm image and the MI355X serving shape. Block size 128
# is mandatory for MSA sparse attention. Keep the default BF16 KV cache on
# gfx942: the checkpoint has no calibrated q/prob scales for ROCm FP8
# attention, and vLLM's fallback scale of 1.0 corrupts model accuracy.
# Reuses the dedicated ROCm image and converts MXFP8 MoE weights to 128x128
# block FP8 at load time. Block size 128 is mandatory for MSA sparse attention.
# Keep the default BF16 KV cache on gfx942: the checkpoint has no calibrated
# q/prob scales for ROCm FP8 attention, and vLLM's fallback scale of 1.0
# corrupts model accuracy.
# Target image vLLM revision: 4a560dd8db67c270f5e2afb614558271b76f2294.

source "$(dirname "$0")/../../benchmark_lib.sh"

Expand All @@ -24,6 +26,46 @@ if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

if ! VLLM_PACKAGE_ROOT="$(
python3 - <<'PY'
from pathlib import Path

import vllm

print(Path(vllm.__file__).resolve().parent.parent)
PY
)"; then
echo "Failed to locate the installed vLLM package" >&2
exit 1
fi
if [[ -z "$VLLM_PACKAGE_ROOT" || ! -d "$VLLM_PACKAGE_ROOT/vllm" ]]; then
echo "Invalid installed vLLM package root: $VLLM_PACKAGE_ROOT" >&2
exit 1
fi

MXFP8_PATCH="$(dirname "$0")/minimaxm3_mi300x_mxfp8.patch"
if [[ ! -f "$MXFP8_PATCH" ]]; then
echo "MI300X MXFP8 patch is missing: $MXFP8_PATCH" >&2
exit 1
fi

PATCH_CHECK_ARGS=(--batch --silent -d "$VLLM_PACKAGE_ROOT" -p1 --dry-run)
if patch "${PATCH_CHECK_ARGS[@]}" --reverse --forward < "$MXFP8_PATCH"; then
echo "MI300X MXFP8 patch is already fully applied"
elif patch "${PATCH_CHECK_ARGS[@]}" --forward < "$MXFP8_PATCH"; then
if ! patch --batch --forward -d "$VLLM_PACKAGE_ROOT" -p1 < "$MXFP8_PATCH"; then
Comment thread
cursor[bot] marked this conversation as resolved.
echo "Failed to apply the MI300X MXFP8 patch" >&2
exit 1
fi
else
echo "Installed vLLM is neither cleanly patchable nor fully patched" >&2
exit 1
fi
if ! patch "${PATCH_CHECK_ARGS[@]}" --reverse --forward < "$MXFP8_PATCH"; then
echo "MI300X MXFP8 patch verification failed" >&2
exit 1
fi
Comment thread
cursor[bot] marked this conversation as resolved.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTP script skips MXFP8 patch

Medium Severity

Runtime MXFP8 patching was added only to the non-MTP MI300X benchmark script. launch_mi300x-amds.sh runs minimaxm3_fp8_mi300x_mtp.sh for spec-decoding: mtp configs, so those jobs never apply minimaxm3_mi300x_mxfp8.patch despite the MTP script claiming it mirrors this recipe.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c3cdc37. Configure here.

Comment thread
cursor[bot] marked this conversation as resolved.

if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi

if [ -n "$ROCR_VISIBLE_DEVICES" ]; then
Expand Down
Loading