[Bugfix][Reasoning] Strip grouped think markers from streaming deltas by wuyingjun-lucky · Pull Request #40348 · vllm-project/vllm

wuyingjun-lucky · 2026-04-20T11:39:18Z

Summary

Fix streaming reasoning parsing when the thinking start marker is grouped into the same delta as reasoning text, including prefixed same-delta cases.

Strip grouped start markers in BaseThinkingReasoningParser before emitting reasoning text, matching the non-streaming partition behavior.
Apply the same normalization in KimiK2ReasoningParser before splitting on </think> or <|tool_calls_section_begin|>, while preserving any implicit reasoning prefix before <think>.
Add regression tests for grouped start-token streaming cases, including prefixed deltas and Kimi same-delta end/content and tool-section transitions.

Why this is not a duplicate

Not duplicating fix(reasoning): prevent streaming end-token desync in base and other parsers #39044. That PR fixes end-token / buffered stop-sequence desync; this PR fixes start-token leakage when <think> and reasoning text are grouped into one streaming delta.
Not duplicating [Bugfix][Tool Parser] Fix Kimi-K2.5 parser accuracy, buffer limits, and token leaks #37384. That PR rewrites the Kimi-K2.5 tool parser; this change is in the reasoning parsers and covers grouped start markers before tool-section routing.

Test plan

/home/wuyingjun/llm/vllm/.venv/bin/python -m pytest -q tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py
- Result: 37 passed
/home/wuyingjun/llm/vllm/.venv/bin/pre-commit run --files vllm/reasoning/basic_parsers.py vllm/reasoning/kimi_k2_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py
- Result: passed
Local GPU OpenAI server smoke test using /home/wuyingjun/code/model/Qwen3-0.6B
- Result: server started successfully, emitted streaming reasoning chunks, and returned final content 1 + 1 = 2.

AI assistance

This PR was prepared with AI assistance. I reviewed the diff and ran the validation steps above.

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request addresses an issue where reasoning start markers (e.g., ) could leak into the output when they are grouped with reasoning text in a single streaming delta. It adds comprehensive test cases for both the base and Kimi K2 reasoning parsers and updates the streaming extraction logic to strip the start token. Review feedback highlights that using removeprefix is too restrictive because the start token might not always appear at the very beginning of a delta (for instance, if preceded by whitespace). It is recommended to use more robust string operations like partition or replace to ensure the tag is correctly removed regardless of its position within the delta.

gemini-code-assist · 2026-04-20T11:41:01Z

-                return DeltaMessage(reasoning=delta_text)
+                # reasoning content continues. Strip the leading start token when
+                # it is grouped with reasoning text in the same delta.
+                return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token))


The use of removeprefix is too restrictive for streaming deltas. If the model emits a delta where the start token is preceded by other characters (such as a space or a newline), removeprefix will fail to strip the tag, causing it to leak into the reasoning output. To maintain consistency with the non-streaming extract_reasoning implementation (lines 169-172) which discards the prefix and the tag, you should strip the tag regardless of its position in the delta. Using partition is a more robust approach here.

Suggested change

return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token))

return DeltaMessage(reasoning=delta_text.partition(self.start_token)[2])

Updated in 30d3e81. The base parser now uses partition-based stripping so grouped start tags are removed even if they are prefixed in the delta, and I added a regression test covering that case.

gemini-code-assist · 2026-04-20T11:41:01Z

+        delta_reasoning = (
+            delta_text.removeprefix(self._start_token)
+            if (
+                self._start_token_id in delta_token_ids
+                and self._start_token_id not in previous_token_ids
+            )
+            else delta_text
+        )


Using removeprefix here is insufficient if the <think> tag appears in the middle of a streaming delta (e.g., after a leading space). If the tag is not at the very beginning of delta_text, it won't be removed, and will subsequently be included in the reasoning content or interfere with the end-token detection logic at line 225. Since Kimi supports implicit reasoning (where text before the tag is also reasoning), you should specifically remove the tag itself while preserving any preceding text in the delta.

Suggested change

delta_reasoning = (

delta_text.removeprefix(self._start_token)

if (

self._start_token_id in delta_token_ids

and self._start_token_id not in previous_token_ids

)

else delta_text

)

delta_reasoning = (

delta_text.replace(self._start_token, "", 1)

if (

self._start_token_id in delta_token_ids

and self._start_token_id not in previous_token_ids

)

else delta_text

)

Updated in 30d3e81. Kimi now removes only the first marker via replace(..., 1), preserving any implicit reasoning prefix before the tag, and there is a regression test for the prefixed same-delta case.

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>

wuyingjun-lucky · 2026-04-20T11:46:47Z

Addressed the latest review feedback in 30d3e81 by handling prefixed grouped <think> tags in both streaming parsers and adding regression coverage for those cases. Local validation passed:

/home/wuyingjun/llm/vllm/.venv/bin/python -m pytest -q tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py -> 37 passed
/home/wuyingjun/llm/vllm/.venv/bin/pre-commit run --files vllm/reasoning/basic_parsers.py vllm/reasoning/kimi_k2_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py -> passed

The remaining failing check is pre-run-check, which is waiting on the contributor gate: the PR needs a maintainer-applied ready or verified label before the rest of CI will run.

mergify · 2026-05-23T09:02:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wuyingjun-lucky.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

[Bugfix][Reasoning] Strip grouped think markers from streaming deltas

a2485b5

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>

wuyingjun-lucky requested review from aarnphm, bbrowning, chaunceyjiang and sfeng33 as code owners April 20, 2026 11:39

claude Bot reviewed Apr 20, 2026

View reviewed changes

mergify Bot added the bug Something isn't working label Apr 20, 2026

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

[Bugfix][Reasoning] Handle prefixed grouped think tags in streaming

30d3e81

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>

mergify Bot added the needs-rebase label May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348

[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348
wuyingjun-lucky wants to merge 2 commits into
vllm-project:mainfrom
wuyingjun-lucky:fix/kimi-k2-streaming-reasoning-start-token

wuyingjun-lucky commented Apr 20, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

wuyingjun-lucky Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

wuyingjun-lucky Apr 20, 2026

Uh oh!

wuyingjun-lucky commented Apr 20, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token))
	return DeltaMessage(reasoning=delta_text.partition(self.start_token)[2])

Uh oh!

Conversation

wuyingjun-lucky commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is not a duplicate

Test plan

AI assistance

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuyingjun-lucky Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuyingjun-lucky Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuyingjun-lucky commented Apr 20, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wuyingjun-lucky commented Apr 20, 2026 •

edited

Loading