We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 9cc728c commit ca8f30fCopy full SHA for ca8f30f
1 file changed
perf-changelog.yaml
@@ -81,3 +81,12 @@
81
- Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2
82
- Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh
83
PR: https://github.com/InferenceMAX/InferenceMAX/pull/273
84
+- config-keys:
85
+ - gptoss-fp4-b200-vllm
86
+ - gptoss-fp4-h100-vllm
87
+ - gptoss-fp4-h200-vllm
88
+ description: |
89
+ - Update vLLM image for NVIDIA configs from vLLM 0.11.2 to vLLM 0.12.0
90
+ - Adds VLLM_MXFP4_USE_MARLIN=1 to benchmarks/gptoss_fp4_h100_docker.sh and benchmarks/gptoss_fp4_h200_slurm.sh
91
+ - Adds VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1 to benchmarks/gptoss_fp4_h100_slurm.sh
92
+ PR: https://github.com/InferenceMAX/InferenceMAX/pull/327
0 commit comments