Skip to content

Commit 2731ccb

Browse files
authored
Merge branch 'main' into ishan/moreconfigs
2 parents a6cc157 + abdb40a commit 2731ccb

2 files changed

Lines changed: 66 additions & 61 deletions

File tree

perf-changelog.yaml

Lines changed: 64 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,97 @@
11
- config-keys:
22
- 70b-fp8-*-vllm
3-
description: |
4-
- Add compilation-config: '{"custom_ops": ["-rms_norm", "-quant_fp8", "-silu_and_mul"]}' as
5-
extra config to all benchmarks/70b_fp8_mi*.sh scripts
6-
- 6-7% uplift for llama for 6/8 configs
7-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/95
3+
description:
4+
- 'Add compilation-config ''{"custom_ops": ["-rms_norm", "-quant_fp8", "-silu_and_mul"]}'' as extra config to all benchmarks/70b_fp8_mi*.sh scripts'
5+
- "6-7% uplift for llama for 6/8 configs"
6+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/95
7+
88
- config-keys:
99
- gptoss-fp4-*-trt
10-
description: |
11-
- Upgrade GPT-OSS TRT images from 'release:1.1.0rc2.post2' to '1.2.0rc0.post1'
12-
- Add NCCL_GRAPH_REGISTER=0 to benchmarks/gptoss_fp4_b200_trt_slurm.sh
13-
- Change kv_cache_config.dtype from 'auto' to 'fp8' in benchmarks/gptoss_fp4_b200_trt_slurm.sh
14-
- Remove MOE_BACKEND=CUTLASS, now just defaults to TRTLLM
15-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/110
10+
description:
11+
- "Upgrade GPT-OSS TRT images from 'release:1.1.0rc2.post2' to '1.2.0rc0.post1'"
12+
- "Add NCCL_GRAPH_REGISTER=0 to benchmarks/gptoss_fp4_b200_trt_slurm.sh"
13+
- "Change kv_cache_config.dtype from 'auto' to 'fp8' in benchmarks/gptoss_fp4_b200_trt_slurm.sh"
14+
- "Remove MOE_BACKEND=CUTLASS, now just defaults to TRTLLM"
15+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/110
16+
1617
- config-keys:
1718
- gptoss*
1819
- dsr1*
19-
description: |
20-
- Remove Llama 70B runs to make room for multi-node disagg prefill+wideEP on
21-
h100/h200/b200/mi300/mi325/mi355
22-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/149
20+
description:
21+
- "Remove Llama 70B runs to make room for multi-node disagg prefill+wideEP on h100/h200/b200/mi300/mi325/mi355"
22+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/149
23+
2324
- config-keys:
2425
- gptoss-fp4-b200-vllm
2526
- gptoss-fp4-h100-vllm
2627
- gptoss-fp4-h200-vllm
27-
description: |
28-
- Upgrade vLLM from 0.10.2 to 0.11.0 for GPT-OSS NVIDIA single-node configs
29-
- Adds compilation-config: '{"cudagraph_mode":"PIECEWISE"} accordingly since vLLM 0.11.0
30-
requires now defaults to FULL_AND_PIECEWISE
31-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/159
28+
description:
29+
- "Upgrade vLLM from 0.10.2 to 0.11.0 for GPT-OSS NVIDIA single-node configs"
30+
- 'Add compilation-config ''{"cudagraph_mode":"PIECEWISE"}'' since vLLM 0.11.0 now defaults to FULL_AND_PIECEWISE'
31+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/159
32+
3233
- config-keys:
3334
- dsr1*
34-
description: |
35-
- Fixes bug where 1k8k and 8k1k full sweeps had incorrect max-model-len for DeepSeek
36-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/163
35+
description:
36+
- "Fix bug where 1k8k and 8k1k full sweeps had incorrect max-model-len for DeepSeek"
37+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/163
38+
3739
- config-keys:
3840
- dsr1-fp4-b200-sglang
3941
- dsr1-fp8-b200-sglang
4042
- dsr1-fp8-h200-sglang
41-
description: |
42-
- Consolidates H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64
43-
image tag and updates deprecated SGLang server arguments to their current equivalents.
44-
- --enable-flashinfer-trtllm-moe & --enable-ep-moe is no longer available in sglang so we needed to change it
45-
- ep: 4 for all tp: 4 entries (3 occurrences in dsr1-fp4-b200-sglang)
46-
- ep: 8 for all tp: 8 entries (6 occurrences across dsr1-fp4-b200-sglang and dsr1-fp8-b200-sglang)
47-
- dsr1_fp4_b200_docker.sh: Replaced --enable-ep-moe with --ep-size $EP_SIZE and --enable-flashinfer-trtllm-moe with
48-
--moe-runner-backend flashinfer_trtllm
49-
- dsr1_fp8_b200_docker.sh: Replaced --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm and
50-
added --ep-size $EP_SIZE
51-
- launch_b200-nvd.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
52-
- launch_b200-tg.sh: Added -e EP_SIZE to Docker run command to pass environment variable to container
53-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/204
43+
description:
44+
- "Consolidate H200 and B200 SGLang configurations to use unified v0.5.5-cu129-amd64 image tag"
45+
- "Update deprecated SGLang server arguments to current equivalents"
46+
- "Replace --enable-ep-moe with --ep-size $EP_SIZE"
47+
- "Replace --enable-flashinfer-trtllm-moe with --moe-runner-backend flashinfer_trtllm"
48+
- "Add -e EP_SIZE to Docker run commands in launch scripts"
49+
- "Set ep:4 for all tp:4 entries, ep:8 for all tp:8 entries"
50+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/204
51+
5452
- config-keys:
5553
- gptoss-fp4-mi355x-vllm
5654
- gptoss-fp4-b200-vllm
57-
description: |
58-
- Extend concurrency to 128 for gptoss mi355x/b200 vllm configurations
59-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/209
55+
description:
56+
- "Extend concurrency to 128 for gptoss mi355x/b200 vllm configurations"
57+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/209
58+
6059
- config-keys:
6160
- gptoss-fp4-b200-trt
62-
description: |
63-
- Extend concurrency to 128 for gptoss b200 TRT configurations
64-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/233
61+
description:
62+
- "Extend concurrency to 128 for gptoss b200 TRT configurations"
63+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/233
64+
6565
- config-keys:
66-
- "*gb200-sglang"
67-
description: |
68-
- Introducing some improvements in GB200 SGLang DSR1 submission
69-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/257
66+
- "*gb200-dynamo-sglang"
67+
description:
68+
- "Introduce improvements in GB200 SGLang DSR1 submission"
69+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/257
70+
7071
- config-keys:
7172
- dsr1-fp8-h200-trt
72-
description: |
73-
- Update TRT image from nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 to nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc2
74-
- Increase concurrency for some configurations
75-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/266
73+
description:
74+
- "Update TRT image from nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0.post1 to nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc2"
75+
- "Increase concurrency for some configurations"
76+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/266
77+
7678
- config-keys:
7779
- gptoss-fp4-b200-vllm
7880
- gptoss-fp4-h100-vllm
7981
- gptoss-fp4-h200-vllm
80-
description: |
81-
- Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2
82-
- Adds kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh
83-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/273
82+
description:
83+
- "Update vLLM image for NVIDIA configs from vLLM 0.11.0 to vLLM 0.11.2"
84+
- "Add kv-cache-dtype: fp8 to benchmarks/gptoss_fp4_b200_docker.sh"
85+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/273
86+
8487
- config-keys:
8588
- dsr1-fp4-mi355x-sglang
86-
description: |
87-
- Updating MI355x Deepseek-R1 FP4 SGLang Image to upstream v0.5.6.post1
88-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/330
89+
description:
90+
- "Update MI355x Deepseek-R1 FP4 SGLang Image to upstream v0.5.6.post1"
91+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/330
92+
8993
- config-keys:
9094
- gptoss-fp4-b200-trt
91-
description: |
92-
- Add benchmark script for GPTOSS FP4 B200 TRT-LLM
93-
PR: https://github.com/InferenceMAX/InferenceMAX/pull/256
95+
description:
96+
- "Add benchmark script for GPTOSS FP4 B200 TRT-LLM"
97+
pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/256

utils/matrix_logic/validation.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -330,7 +330,8 @@ class ChangelogEntry(BaseModel):
330330
model_config = ConfigDict(extra="forbid", populate_by_name=True)
331331

332332
config_keys: list[str] = Field(alias="config-keys", min_length=1)
333-
description: str
333+
description: list[str] = Field(min_length=1)
334+
pr_link: str = Field(alias="pr-link")
334335

335336

336337
class ChangelogMetadata(BaseModel):

0 commit comments

Comments
 (0)