Skip to content

[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348

Open
wuyingjun-lucky wants to merge 2 commits into
vllm-project:mainfrom
wuyingjun-lucky:fix/kimi-k2-streaming-reasoning-start-token
Open

[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348
wuyingjun-lucky wants to merge 2 commits into
vllm-project:mainfrom
wuyingjun-lucky:fix/kimi-k2-streaming-reasoning-start-token

Conversation

@wuyingjun-lucky
Copy link
Copy Markdown
Contributor

@wuyingjun-lucky wuyingjun-lucky commented Apr 20, 2026

Summary

Fix streaming reasoning parsing when the thinking start marker is grouped into the same delta as reasoning text, including prefixed same-delta cases.

  • Strip grouped start markers in BaseThinkingReasoningParser before emitting reasoning text, matching the non-streaming partition behavior.
  • Apply the same normalization in KimiK2ReasoningParser before splitting on </think> or <|tool_calls_section_begin|>, while preserving any implicit reasoning prefix before <think>.
  • Add regression tests for grouped start-token streaming cases, including prefixed deltas and Kimi same-delta end/content and tool-section transitions.

Why this is not a duplicate

Test plan

  • /home/wuyingjun/llm/vllm/.venv/bin/python -m pytest -q tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py
    • Result: 37 passed
  • /home/wuyingjun/llm/vllm/.venv/bin/pre-commit run --files vllm/reasoning/basic_parsers.py vllm/reasoning/kimi_k2_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py
    • Result: passed
  • Local GPU OpenAI server smoke test using /home/wuyingjun/code/model/Qwen3-0.6B
    • Result: server started successfully, emitted streaming reasoning chunks, and returned final content 1 + 1 = 2.

AI assistance

This PR was prepared with AI assistance. I reviewed the diff and ran the validation steps above.

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the bug Something isn't working label Apr 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where reasoning start markers (e.g., ) could leak into the output when they are grouped with reasoning text in a single streaming delta. It adds comprehensive test cases for both the base and Kimi K2 reasoning parsers and updates the streaming extraction logic to strip the start token. Review feedback highlights that using removeprefix is too restrictive because the start token might not always appear at the very beginning of a delta (for instance, if preceded by whitespace). It is recommended to use more robust string operations like partition or replace to ensure the tag is correctly removed regardless of its position within the delta.

Comment thread vllm/reasoning/basic_parsers.py Outdated
return DeltaMessage(reasoning=delta_text)
# reasoning content continues. Strip the leading start token when
# it is grouped with reasoning text in the same delta.
return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of removeprefix is too restrictive for streaming deltas. If the model emits a delta where the start token is preceded by other characters (such as a space or a newline), removeprefix will fail to strip the tag, causing it to leak into the reasoning output. To maintain consistency with the non-streaming extract_reasoning implementation (lines 169-172) which discards the prefix and the tag, you should strip the tag regardless of its position in the delta. Using partition is a more robust approach here.

Suggested change
return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token))
return DeltaMessage(reasoning=delta_text.partition(self.start_token)[2])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 30d3e81. The base parser now uses partition-based stripping so grouped start tags are removed even if they are prefixed in the delta, and I added a regression test covering that case.

Comment on lines +215 to +222
delta_reasoning = (
delta_text.removeprefix(self._start_token)
if (
self._start_token_id in delta_token_ids
and self._start_token_id not in previous_token_ids
)
else delta_text
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using removeprefix here is insufficient if the <think> tag appears in the middle of a streaming delta (e.g., after a leading space). If the tag is not at the very beginning of delta_text, it won't be removed, and will subsequently be included in the reasoning content or interfere with the end-token detection logic at line 225. Since Kimi supports implicit reasoning (where text before the tag is also reasoning), you should specifically remove the tag itself while preserving any preceding text in the delta.

Suggested change
delta_reasoning = (
delta_text.removeprefix(self._start_token)
if (
self._start_token_id in delta_token_ids
and self._start_token_id not in previous_token_ids
)
else delta_text
)
delta_reasoning = (
delta_text.replace(self._start_token, "", 1)
if (
self._start_token_id in delta_token_ids
and self._start_token_id not in previous_token_ids
)
else delta_text
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 30d3e81. Kimi now removes only the first marker via replace(..., 1), preserving any implicit reasoning prefix before the tag, and there is a regression test for the prefixed same-delta case.

Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>
@wuyingjun-lucky
Copy link
Copy Markdown
Contributor Author

Addressed the latest review feedback in 30d3e81 by handling prefixed grouped <think> tags in both streaming parsers and adding regression coverage for those cases. Local validation passed:

  • /home/wuyingjun/llm/vllm/.venv/bin/python -m pytest -q tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py -> 37 passed
  • /home/wuyingjun/llm/vllm/.venv/bin/pre-commit run --files vllm/reasoning/basic_parsers.py vllm/reasoning/kimi_k2_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py -> passed

The remaining failing check is pre-run-check, which is waiting on the contributor gate: the PR needs a maintainer-applied ready or verified label before the rest of CI will run.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wuyingjun-lucky.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant