[Klaud Cold] qwen3.5-fp4-mi355x-mtp: enable --use-chat-template (#1554)

functionstackx · claude · web-flow · commit 6e01f1e25c81 · 2026-05-22T15:44:24.000-04:00
* [Klaud Cold] qwen3.5-fp4-mi355x-mtp: enable --use-chat-template

Aligns with other Qwen MTP recipes (e.g. qwen3.5_fp8_mi355x_mtp.sh)
which apply the chat template during benchmark serving.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* perf-changelog: qwen3.5-fp4-mi355x-sglang-mtp chat-template entry

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/single_node/qwen3.5_fp4_mi355x_mtp.sh b/benchmarks/single_node/qwen3.5_fp4_mi355x_mtp.sh
@@ -60,7 +60,8 @@ run_benchmark_serving \
     --num-prompts "$((CONC * 10))" \
     --max-concurrency "$CONC" \
     --result-filename "$RESULT_FILENAME" \
-    --result-dir /workspace/
+    --result-dir /workspace/ \
+    --use-chat-template
 
 # After throughput, run evaluation only if RUN_EVAL is true
 if [ "${RUN_EVAL}" = "true" ]; then
diff --git a/perf-changelog.yaml b/perf-changelog.yaml
@@ -3063,7 +3063,6 @@
     - "Remove SGLANG_OPT_FP8_WO_A_GEMM=0 workaround (topk_v2 crash fixed upstream in sgl-project/sglang#25805)"
   pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1528
 
-
 - config-keys:
     - qwen3.5-fp4-b300-sglang
     - qwen3.5-fp4-b300-sglang-mtp
@@ -3100,3 +3099,9 @@
   description:
     - "Truncate sweep to conc=1 and conc=2 only: set conc-start=1, conc-end=2 in every search-space across all six DSR1 SGLang agg configs"
   pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1534
+
+- config-keys:
+    - qwen3.5-fp4-mi355x-sglang-mtp
+  description:
+    - "Add --use-chat-template to run_benchmark_serving so prompts are formatted with the Qwen chat template (matching the other Qwen MTP recipes)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1554