Skip to content

Commit ca8f30f

Browse files
committed
make changes to perf changelog
1 parent 9cc728c commit ca8f30f

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

perf-changelog.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,12 @@
8181
- Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2
8282
- Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh
8383
PR: https://github.com/InferenceMAX/InferenceMAX/pull/273
84+
- config-keys:
85+
- gptoss-fp4-b200-vllm
86+
- gptoss-fp4-h100-vllm
87+
- gptoss-fp4-h200-vllm
88+
description: |
89+
- Update vLLM image for NVIDIA configs from vLLM 0.11.2 to vLLM 0.12.0
90+
- Adds VLLM_MXFP4_USE_MARLIN=1 to benchmarks/gptoss_fp4_h100_docker.sh and benchmarks/gptoss_fp4_h200_slurm.sh
91+
- Adds VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1 to benchmarks/gptoss_fp4_h100_slurm.sh
92+
PR: https://github.com/InferenceMAX/InferenceMAX/pull/327

0 commit comments

Comments
 (0)