-
Notifications
You must be signed in to change notification settings - Fork 203
[Experimental][DNM till upstream PR merges][AMD] perf: load-time block FP8 MoE for MiniMax M3 on MI300X #1753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
27510c4
7521394
6c29d32
6f5a399
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,12 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| # MiniMax-M3 MXFP8 MI300X (gfx942) single-node vLLM recipe. | ||
| # Reuses the dedicated ROCm image and the MI355X serving shape. Block size 128 | ||
| # is mandatory for MSA sparse attention. Keep the default BF16 KV cache on | ||
| # gfx942: the checkpoint has no calibrated q/prob scales for ROCm FP8 | ||
| # attention, and vLLM's fallback scale of 1.0 corrupts model accuracy. | ||
| # Reuses the dedicated ROCm image and converts MXFP8 MoE weights to 128x128 | ||
| # block FP8 at load time. Block size 128 is mandatory for MSA sparse attention. | ||
| # Keep the default BF16 KV cache on gfx942: the checkpoint has no calibrated | ||
| # q/prob scales for ROCm FP8 attention, and vLLM's fallback scale of 1.0 | ||
| # corrupts model accuracy. | ||
| # Target image vLLM revision: 4a560dd8db67c270f5e2afb614558271b76f2294. | ||
|
|
||
| source "$(dirname "$0")/../../benchmark_lib.sh" | ||
|
|
||
|
|
@@ -24,6 +26,46 @@ if [[ -n "$SLURM_JOB_ID" ]]; then | |
| echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME" | ||
| fi | ||
|
|
||
| if ! VLLM_PACKAGE_ROOT="$( | ||
| python3 - <<'PY' | ||
| from pathlib import Path | ||
|
|
||
| import vllm | ||
|
|
||
| print(Path(vllm.__file__).resolve().parent.parent) | ||
| PY | ||
| )"; then | ||
| echo "Failed to locate the installed vLLM package" >&2 | ||
| exit 1 | ||
| fi | ||
| if [[ -z "$VLLM_PACKAGE_ROOT" || ! -d "$VLLM_PACKAGE_ROOT/vllm" ]]; then | ||
| echo "Invalid installed vLLM package root: $VLLM_PACKAGE_ROOT" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| MXFP8_PATCH="$(dirname "$0")/minimaxm3_mi300x_mxfp8.patch" | ||
| if [[ ! -f "$MXFP8_PATCH" ]]; then | ||
| echo "MI300X MXFP8 patch is missing: $MXFP8_PATCH" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| PATCH_CHECK_ARGS=(--batch --silent -d "$VLLM_PACKAGE_ROOT" -p1 --dry-run) | ||
| if patch "${PATCH_CHECK_ARGS[@]}" --reverse --forward < "$MXFP8_PATCH"; then | ||
| echo "MI300X MXFP8 patch is already fully applied" | ||
| elif patch "${PATCH_CHECK_ARGS[@]}" --forward < "$MXFP8_PATCH"; then | ||
| if ! patch --batch --forward -d "$VLLM_PACKAGE_ROOT" -p1 < "$MXFP8_PATCH"; then | ||
| echo "Failed to apply the MI300X MXFP8 patch" >&2 | ||
| exit 1 | ||
| fi | ||
| else | ||
| echo "Installed vLLM is neither cleanly patchable nor fully patched" >&2 | ||
| exit 1 | ||
| fi | ||
| if ! patch "${PATCH_CHECK_ARGS[@]}" --reverse --forward < "$MXFP8_PATCH"; then | ||
| echo "MI300X MXFP8 patch verification failed" >&2 | ||
| exit 1 | ||
| fi | ||
|
cursor[bot] marked this conversation as resolved.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MTP script skips MXFP8 patchMedium Severity Runtime MXFP8 patching was added only to the non-MTP MI300X benchmark script. Reviewed by Cursor Bugbot for commit c3cdc37. Configure here.
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi | ||
|
|
||
| if [ -n "$ROCR_VISIBLE_DEVICES" ]; then | ||
|
|
||


Uh oh!
There was an error while loading. Please reload this page.