Skip to content

Commit 6e01f1e

Browse files
[Klaud Cold] qwen3.5-fp4-mi355x-mtp: enable --use-chat-template (#1554)
* [Klaud Cold] qwen3.5-fp4-mi355x-mtp: enable --use-chat-template Aligns with other Qwen MTP recipes (e.g. qwen3.5_fp8_mi355x_mtp.sh) which apply the chat template during benchmark serving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: qwen3.5-fp4-mi355x-sglang-mtp chat-template entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6a4fe2a commit 6e01f1e

2 files changed

Lines changed: 8 additions & 2 deletions

File tree

benchmarks/single_node/qwen3.5_fp4_mi355x_mtp.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,8 @@ run_benchmark_serving \
6060
--num-prompts "$((CONC * 10))" \
6161
--max-concurrency "$CONC" \
6262
--result-filename "$RESULT_FILENAME" \
63-
--result-dir /workspace/
63+
--result-dir /workspace/ \
64+
--use-chat-template
6465

6566
# After throughput, run evaluation only if RUN_EVAL is true
6667
if [ "${RUN_EVAL}" = "true" ]; then

perf-changelog.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3063,7 +3063,6 @@
30633063
- "Remove SGLANG_OPT_FP8_WO_A_GEMM=0 workaround (topk_v2 crash fixed upstream in sgl-project/sglang#25805)"
30643064
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1528
30653065

3066-
30673066
- config-keys:
30683067
- qwen3.5-fp4-b300-sglang
30693068
- qwen3.5-fp4-b300-sglang-mtp
@@ -3100,3 +3099,9 @@
31003099
description:
31013100
- "Truncate sweep to conc=1 and conc=2 only: set conc-start=1, conc-end=2 in every search-space across all six DSR1 SGLang agg configs"
31023101
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1534
3102+
3103+
- config-keys:
3104+
- qwen3.5-fp4-mi355x-sglang-mtp
3105+
description:
3106+
- "Add --use-chat-template to run_benchmark_serving so prompts are formatted with the Qwen chat template (matching the other Qwen MTP recipes)"
3107+
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1554

0 commit comments

Comments
 (0)