Skip to content

Commit 5a70e5a

Browse files
committed
test 4
1 parent c0838d4 commit 5a70e5a

1 file changed

Lines changed: 2 additions & 91 deletions

File tree

perf-changelog.yaml

Lines changed: 2 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,97 +1,8 @@
1-
# - config-keys:
2-
# - 70b-fp8-*-vllm
3-
# description:
4-
# - 'Add compilation-config ''{"custom_ops": ["-rms_norm", "-quant_fp8", "-silu_and_mul"]}'' as extra config to all benchmarks/70b_fp8_mi*.sh scripts'
5-
# - "6-7% uplift for llama for 6/8 configs"
6-
# pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/95
7-
8-
- config-keys:
9-
- gptoss-fp4-*-trt
10-
description:
11-
- "Upgrade GPT-OSS TRT images from 'release:1.1.0rc2.post2' to '1.2.0rc0.post1'"
12-
- "Add NCCL_GRAPH_REGISTER=0 to benchmarks/gptoss_fp4_b200_trt_slurm.sh"
13-
- "Change kv_cache_config.dtype from 'auto' to 'fp8' in benchmarks/gptoss_fp4_b200_trt_slurm.sh"
14-
- "Remove MOE_BACKEND=CUTLASS, now just defaults to TRTLLM"
15-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/110
16-
17-
- config-keys:
18-
- gptoss*
19-
- dsr1*
20-
description:
21-
- "Remove Llama 70B runs to make room for multi-node disagg prefill+wideEP on h100/h200/b200/mi300/mi325/mi355"
22-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/149
23-
24-
- config-keys:
25-
- gptoss-fp4-b200-vllm
26-
- gptoss-fp4-h100-vllm
27-
- gptoss-fp4-h200-vllm
28-
description:
29-
- "Upgrade vLLM from 0.10.2 to 0.11.0 for GPT-OSS NVIDIA single-node configs"
30-
- 'Add compilation-config ''{"cudagraph_mode":"PIECEWISE"}'' since vLLM 0.11.0 now defaults to FULL_AND_PIECEWISE'
31-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/159
32-
33-
- config-keys:
34-
- dsr1*
35-
description:
36-
- "Fix bug where 1k8k and 8k1k full sweeps had incorrect max-model-len for DeepSeek"
37-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/163
38-
39-
- config-keys:
40-
- dsr1-fp4-b200-sglang
41-
- dsr1-fp8-b200-sglang
42-
- dsr1-fp8-h200-sglang
43-
description:
44-
- "Consolidate H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64 image tag"
45-
- "Update deprecated SGLang server arguments to current equivalents"
46-
- "Replace --enable-ep-moe with --ep-size $EP_SIZE"
47-
- "Replace --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm"
48-
- "Add -e EP_SIZE to Docker run commands in launch scripts"
49-
- "Set ep:4 for all tp:4 entries, ep:8 for all tp:8 entries"
50-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/204
51-
52-
- config-keys:
53-
- gptoss-fp4-mi355x-vllm
54-
- gptoss-fp4-b200-vllm
55-
description:
56-
- "Extend concurrency to 128 for gptoss mi355x/b200 vllm configurations"
57-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/209
58-
59-
- config-keys:
60-
- gptoss-fp4-b200-trt
61-
description:
62-
- "Extend concurrency to 128 for gptoss b200 TRT configurations"
63-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/233
64-
65-
- config-keys:
66-
- "*gb200-dynamo-sglang"
67-
description:
68-
- "Introduce improvements in GB200 SGLang DSR1 submission"
69-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/257
70-
71-
- config-keys:
72-
- dsr1-fp8-h200-trt
73-
description:
74-
- "Update TRT image from nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 to nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc2"
75-
- "Increase concurrency for some configurations"
76-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/266
77-
781
- config-keys:
792
- gptoss-fp4-b200-vllm
803
- gptoss-fp4-h100-vllm
814
- gptoss-fp4-h200-vllm
825
description:
836
- "Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2"
84-
- "Add kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh"
85-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/273
86-
87-
- config-keys:
88-
- dsr1-fp4-mi355x-sglang
89-
description:
90-
- "Update MI355x Deepseek-R1 FP4 SGLang Image to upstream v0.5.6.post1"
91-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/330
92-
93-
- config-keys:
94-
- gptoss-fp4-b200-trt
95-
description:
96-
- "Add benchmark script for GPTOSS FP4 B200 TRT-LLM"
97-
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/256
7+
- "Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh"
8+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/273

0 commit comments

Comments
 (0)