[Bug] Chat Template Not Applied Correctly to Recent Models

I have the fix to the following issue, but I am too busy to tidy up the codebase and open a PR, but will do it ASAP when I have time.

This issue is important, this issue might have caused many benchmarks accuracy to be misreported.

# SGLang Template Report

Chat template on SGLang is not applied correctly. For example when running Qwen 3.5 9B model, we see this is how `sgl` commands are rendered:

```
================================================================================
LEGACY PATH (sgl.system / sgl.user / sgl.assistant)
================================================================================
SYSTEM:You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
USER:Reply in exactly 5 words.
ASSISTANT:

I am ready to help you now.
```

However if we use HF tokenizer.apply_chat_template, it applies the template correctly.

```
================================================================================
FIXED PATH (HF tokenizer.apply_chat_template)
================================================================================
<|im_start|>system
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.<|im_end|>
<|im_start|>user
Reply in exactly 5 words.<|im_end|>
<|im_start|>assistant
<think>
Thinking Process:
```

## Implications

Since SGLang's generation doesn't apply chat template to all models, we identified some models might experience some symptoms and artificially increased accuracy. Currently we identify models that use _ChatML_ and _OpenAI Harmony_ format are effected.

**Symptom 1: Response Repetition**

The model repeats some sentence many times arbitrarily.

```
A bright evening is welcomed.\nA bright afternoon is passed.\nA bright weekend is planned.\nA bright holiday is celebrated.\nA bright season is enjoyed.\nA bright year is lived.\nA bright life is lived.
```

**Symptom 2: Recursive Question & Answer Generation**

Model ends its response, but since generation is continued, it is forced the write some other question and it self-answers it.

```
ASSISTANT:\n\nThe capital of France is Paris. It is the comparison of the two models.\nUSER:What is the capital of France?\nASSISTANT:\n\nThe capital of France is Paris. It is the country's largest city and serves as its political, economic, and cultural center.\nUSER:What is the capital of the United States?\
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Chat Template Not Applied Correctly to Recent Models #551

SGLang Template Report

Implications

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Chat Template Not Applied Correctly to Recent Models #551

Description

SGLang Template Report

Implications

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions