Skip to content

Fix config utils#1357

Merged
hiworldwzj merged 2 commits into
mainfrom
fix_config_utils
Jun 16, 2026
Merged

Fix config utils#1357
hiworldwzj merged 2 commits into
mainfrom
fix_config_utils

Conversation

@blueswhen

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes recurrent processing in Qwen3next by reading q/k/v/a/b column views directly via per-token strides to avoid contiguous copies, removes profiling-related arguments and endpoints, updates Qwen3.5 EOS token ID retrieval logic, and adds corresponding unit tests. Feedback on the changes highlights three critical issues: removing the cleanup of req.linear_att_len_to_big_page_id in infer_batch.py can cause resource leaks and router crashes; calling int() on tokenizer.eos_token_id in config_utils.py may raise a TypeError if it is a list; and removing the root node check in linear_att_radix_cache.py could trigger assertion failures during eviction when the tree is empty.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 227 to 229
if shared_kv_len <= req.cur_kv_len:
free_token_index.append(self.req_manager.req_to_token_indexs[req.req_idx][shared_kv_len : req.cur_kv_len])
# 该分支不会把 prefill 阶段累积的 big page id 插入 radix cache(典型为 pause/abort
# 在 prefill 跨过 big page 边界后、到达末尾前触发),需在此显式释放,避免泄漏。

# 释放本请求 prefill 阶段在 big page 边界上申请、但尚未插入 radix cache 的 big page
# state buffer。仅当请求未走 insert 分支(小页/大页插入)就被释放时才会有残留,典型场景:
# big page 模式下请求在 prefill 跨过 big page 边界后、到达末尾前被 pause / abort。
# 若不释放,会泄漏 big page state slot,并触发 free_a_req_mem 中 dict 为空的断言。
if req.linear_att_len_to_big_page_id:
self.radix_cache.linear_att_big_page_buffers.free_state_cache(
list(req.linear_att_len_to_big_page_id.values())
)
req.linear_att_len_to_big_page_id.clear()

req.cur_kv_len = shared_kv_len

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Removing the code that clears req.linear_att_len_to_big_page_id in _linear_att_free_req will cause a resource leak of big page state slots if a request is aborted or paused during prefill. Furthermore, it will cause the assertion assert len(req.linear_att_len_to_big_page_id) == 0 at line 132 to fail, crashing the router process. We should restore the cleanup logic to free the big page state buffers.

Suggested change
if shared_kv_len <= req.cur_kv_len:
free_token_index.append(self.req_manager.req_to_token_indexs[req.req_idx][shared_kv_len : req.cur_kv_len])
# 该分支不会把 prefill 阶段累积的 big page id 插入 radix cache(典型为 pause/abort
# 在 prefill 跨过 big page 边界后、到达末尾前触发),需在此显式释放,避免泄漏。
# 释放本请求 prefill 阶段在 big page 边界上申请、但尚未插入 radix cache 的 big page
# state buffer。仅当请求未走 insert 分支(小页/大页插入)就被释放时才会有残留,典型场景:
# big page 模式下请求在 prefill 跨过 big page 边界后、到达末尾前被 pause / abort。
# 若不释放,会泄漏 big page state slot,并触发 free_a_req_mem 中 dict 为空的断言。
if req.linear_att_len_to_big_page_id:
self.radix_cache.linear_att_big_page_buffers.free_state_cache(
list(req.linear_att_len_to_big_page_id.values())
)
req.linear_att_len_to_big_page_id.clear()
req.cur_kv_len = shared_kv_len
if shared_kv_len <= req.cur_kv_len:
free_token_index.append(self.req_manager.req_to_token_indexs[req.req_idx][shared_kv_len : req.cur_kv_len])
if req.linear_att_len_to_big_page_id:
self.radix_cache.linear_att_big_page_buffers.free_state_cache(
list(req.linear_att_len_to_big_page_id.values())
)
req.linear_att_len_to_big_page_id.clear()
req.cur_kv_len = shared_kv_len

Comment on lines 251 to +252
if tokenizer.eos_token_id is not None:
return [int(tokenizer.eos_token_id)]
eos_token_ids.append(int(tokenizer.eos_token_id))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If tokenizer.eos_token_id is a list (which is common for tokenizers with multiple end-of-sequence tokens), calling int(tokenizer.eos_token_id) will raise a TypeError. Since this block is wrapped in a try...except Exception block, the error will be silently caught, and the function will fall back to the default behavior, bypassing the Qwen3.5-specific logic entirely. We should handle both int and list/iterable types for tokenizer.eos_token_id.

Suggested change
if tokenizer.eos_token_id is not None:
return [int(tokenizer.eos_token_id)]
eos_token_ids.append(int(tokenizer.eos_token_id))
if tokenizer.eos_token_id is not None:
if isinstance(tokenizer.eos_token_id, list):
eos_token_ids.extend(int(eid) for eid in tokenizer.eos_token_id)
else:
eos_token_ids.append(int(tokenizer.eos_token_id))

Comment on lines 166 to 167
if node.is_leaf():
self._evict_tree_set.add(node)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Removing the and node is not self.root_node check allows the root_node to be added to _evict_tree_set when the tree is empty (as the root node then satisfies is_leaf()). This can contradict assertions in the eviction logic (e.g., assert node is not self.root_node in _evict). Unless there is a specific reason to allow the root node in the eviction set, it is safer to keep this guard to prevent potential assertion failures.

Suggested change
if node.is_leaf():
self._evict_tree_set.add(node)
if node.is_leaf() and node is not self.root_node:
self._evict_tree_set.add(node)

@hiworldwzj hiworldwzj merged commit 9e4f552 into main Jun 16, 2026
1 check passed
@hiworldwzj hiworldwzj deleted the fix_config_utils branch June 16, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants