Skip to content

[Bug] Chat Template Not Applied Correctly to Recent Models #551

@Dogacel

Description

@Dogacel

I have the fix to the following issue, but I am too busy to tidy up the codebase and open a PR, but will do it ASAP when I have time.

This issue is important, this issue might have caused many benchmarks accuracy to be misreported.

SGLang Template Report

Chat template on SGLang is not applied correctly. For example when running Qwen 3.5 9B model, we see this is how sgl commands are rendered:

================================================================================
LEGACY PATH (sgl.system / sgl.user / sgl.assistant)
================================================================================
SYSTEM:You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
USER:Reply in exactly 5 words.
ASSISTANT:

I am ready to help you now.

However if we use HF tokenizer.apply_chat_template, it applies the template correctly.

================================================================================
FIXED PATH (HF tokenizer.apply_chat_template)
================================================================================
<|im_start|>system
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.<|im_end|>
<|im_start|>user
Reply in exactly 5 words.<|im_end|>
<|im_start|>assistant
<think>
Thinking Process:

Implications

Since SGLang's generation doesn't apply chat template to all models, we identified some models might experience some symptoms and artificially increased accuracy. Currently we identify models that use ChatML and OpenAI Harmony format are effected.

Symptom 1: Response Repetition

The model repeats some sentence many times arbitrarily.

A bright evening is welcomed.\nA bright afternoon is passed.\nA bright weekend is planned.\nA bright holiday is celebrated.\nA bright season is enjoyed.\nA bright year is lived.\nA bright life is lived.

Symptom 2: Recursive Question & Answer Generation

Model ends its response, but since generation is continued, it is forced the write some other question and it self-answers it.

ASSISTANT:\n\nThe capital of France is Paris. It is the comparison of the two models.\nUSER:What is the capital of France?\nASSISTANT:\n\nThe capital of France is Paris. It is the country's largest city and serves as its political, economic, and cultural center.\nUSER:What is the capital of the United States?\

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions