[Klaud Cold] qwen3.5-fp8-mi355x-atom-mtp: enable --use-chat-template (#1555)

functionstackx · claude · web-flow · commit d4948f993748 · 2026-05-22T16:43:57.000-04:00
* [Klaud Cold] qwen3.5-fp8-mi355x-atom-mtp: enable --use-chat-template

Aligns with other Qwen MTP recipes which apply the chat template
during benchmark serving.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* perf-changelog: qwen3.5-fp8-mi355x-atom-mtp chat-template entry

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/single_node/qwen3.5_fp8_mi355x_atom_mtp.sh b/benchmarks/single_node/qwen3.5_fp8_mi355x_atom_mtp.sh
@@ -71,7 +71,8 @@ run_benchmark_serving \
     --max-concurrency "$CONC" \
     --result-filename "$RESULT_FILENAME" \
     --result-dir /workspace/ \
-    --trust-remote-code
+    --trust-remote-code \
+    --use-chat-template
 
 # After throughput, run evaluation only if RUN_EVAL is true
 if [ "${RUN_EVAL}" = "true" ]; then
diff --git a/perf-changelog.yaml b/perf-changelog.yaml
@@ -3123,3 +3123,9 @@
     - "Search-space: tp:8 ep:8 (TEP=8), conc-end 128 chosen at saturation per local sweep"
     - "Local bench: TEP=8 peaks at C=128 with 26923 tot tps (+178% vs TEP=4 peak at C=32 in May 6 j11600242 sweep)"
   pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1516
+
+- config-keys:
+    - qwen3.5-fp8-mi355x-atom-mtp
+  description:
+    - "Add --use-chat-template to run_benchmark_serving so prompts are formatted with the Qwen chat template (matching the other Qwen MTP recipes)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1555