[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348
[Bugfix][Reasoning] Strip grouped think markers from streaming deltas#40348wuyingjun-lucky wants to merge 2 commits into
Conversation
Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where reasoning start markers (e.g., ) could leak into the output when they are grouped with reasoning text in a single streaming delta. It adds comprehensive test cases for both the base and Kimi K2 reasoning parsers and updates the streaming extraction logic to strip the start token. Review feedback highlights that using removeprefix is too restrictive because the start token might not always appear at the very beginning of a delta (for instance, if preceded by whitespace). It is recommended to use more robust string operations like partition or replace to ensure the tag is correctly removed regardless of its position within the delta.
| return DeltaMessage(reasoning=delta_text) | ||
| # reasoning content continues. Strip the leading start token when | ||
| # it is grouped with reasoning text in the same delta. | ||
| return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token)) |
There was a problem hiding this comment.
The use of removeprefix is too restrictive for streaming deltas. If the model emits a delta where the start token is preceded by other characters (such as a space or a newline), removeprefix will fail to strip the tag, causing it to leak into the reasoning output. To maintain consistency with the non-streaming extract_reasoning implementation (lines 169-172) which discards the prefix and the tag, you should strip the tag regardless of its position in the delta. Using partition is a more robust approach here.
| return DeltaMessage(reasoning=delta_text.removeprefix(self.start_token)) | |
| return DeltaMessage(reasoning=delta_text.partition(self.start_token)[2]) |
There was a problem hiding this comment.
Updated in 30d3e81. The base parser now uses partition-based stripping so grouped start tags are removed even if they are prefixed in the delta, and I added a regression test covering that case.
| delta_reasoning = ( | ||
| delta_text.removeprefix(self._start_token) | ||
| if ( | ||
| self._start_token_id in delta_token_ids | ||
| and self._start_token_id not in previous_token_ids | ||
| ) | ||
| else delta_text | ||
| ) |
There was a problem hiding this comment.
Using removeprefix here is insufficient if the <think> tag appears in the middle of a streaming delta (e.g., after a leading space). If the tag is not at the very beginning of delta_text, it won't be removed, and will subsequently be included in the reasoning content or interfere with the end-token detection logic at line 225. Since Kimi supports implicit reasoning (where text before the tag is also reasoning), you should specifically remove the tag itself while preserving any preceding text in the delta.
| delta_reasoning = ( | |
| delta_text.removeprefix(self._start_token) | |
| if ( | |
| self._start_token_id in delta_token_ids | |
| and self._start_token_id not in previous_token_ids | |
| ) | |
| else delta_text | |
| ) | |
| delta_reasoning = ( | |
| delta_text.replace(self._start_token, "", 1) | |
| if ( | |
| self._start_token_id in delta_token_ids | |
| and self._start_token_id not in previous_token_ids | |
| ) | |
| else delta_text | |
| ) |
There was a problem hiding this comment.
Updated in 30d3e81. Kimi now removes only the first marker via replace(..., 1), preserving any implicit reasoning prefix before the tag, and there is a regression test for the prefixed same-delta case.
Signed-off-by: wuyingjun <wuyingjun_yewu@cmss.chinamobile.com>
|
Addressed the latest review feedback in 30d3e81 by handling prefixed grouped
The remaining failing check is |
|
This pull request has merge conflicts that must be resolved before it can be |
Summary
Fix streaming reasoning parsing when the thinking start marker is grouped into the same delta as reasoning text, including prefixed same-delta cases.
BaseThinkingReasoningParserbefore emitting reasoning text, matching the non-streamingpartitionbehavior.KimiK2ReasoningParserbefore splitting on</think>or<|tool_calls_section_begin|>, while preserving any implicit reasoning prefix before<think>.Why this is not a duplicate
<think>and reasoning text are grouped into one streaming delta.Test plan
/home/wuyingjun/llm/vllm/.venv/bin/python -m pytest -q tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py37 passed/home/wuyingjun/llm/vllm/.venv/bin/pre-commit run --files vllm/reasoning/basic_parsers.py vllm/reasoning/kimi_k2_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py tests/reasoning/test_kimi_k2_reasoning_parser.py/home/wuyingjun/code/model/Qwen3-0.6B1 + 1 = 2.AI assistance
This PR was prepared with AI assistance. I reviewed the diff and ran the validation steps above.