Open
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces Multi-Token Prediction (MTP) support, including a new 'eagle3' mode, dynamic MTP verification, and optimized Triton kernels for diverse attention. It also adds profiling capabilities and updates benchmark scripts. My feedback focuses on improving code quality by moving local imports to the top of files, replacing print statements with proper logging, removing dead/commented-out code, and optimizing weight loading.
Comment on lines
+116
to
+117
| from lightllm.utils.envs_utils import get_env_start_args | ||
| args_mtp_step = get_env_start_args().mtp_step |
Contributor
Comment on lines
+156
to
+160
| if os.path.exists(os.path.join(draft_model_path[0], "pytorch_model.bin")): | ||
| self.draft_model_weight_dict = torch.load(os.path.join(draft_model_path[0], "pytorch_model.bin")) | ||
| self.hidden_proj_weight = self.draft_model_weight_dict["fc.weight"].to(torch.bfloat16).to("cuda") | ||
| del self.draft_model_weight_dict | ||
| gc.collect() |
Contributor
Comment on lines
+357
to
+362
| # infer_state.b_mark_shared_group = F.pad( | ||
| # infer_state.b_mark_shared_group, | ||
| # (0, infer_state.input_ids.shape[0] - infer_state.b_mark_shared_group.shape[0]), | ||
| # mode="constant", | ||
| # value=0, | ||
| # ) |
Contributor
Comment on lines
+141
to
+144
| try: | ||
| attr_.copy_(attr_value, non_blocking=True) | ||
| except Exception as e: | ||
| print(f"Warning: copy tensor {attr_name} failed during cuda graph copy, error: {e}") |
Contributor
Comment on lines
+619
to
+631
| # # 1. 根据当前的 group_sizes 将原来的索引拆分成多个组 | ||
| # # 这里的 group_sizes 应该对应之前未处理前的每一组的大小 | ||
| # chunks = torch.split(draft_model_input.mem_indexes, mtp_group_sizes) | ||
| # # 2. 对每一个 chunk 进行处理:去掉第一个元素 ([:, 1:]),并加上对应的 eagle_mem_indexes_i 元素 | ||
| # # 假设 eagle_mem_indexes_i 的形状是 (num_groups,) | ||
| # new_chunks = [] | ||
| # for i, chunk in enumerate(chunks): | ||
| # # chunk[1:] 模拟了原来的 [:, 1:] 操作 | ||
| # # eagle_mem_indexes_i[i:i+1] 确保拿出来的是一个长度为 1 的张量用于拼接 | ||
| # updated_chunk = torch.cat([chunk[1:], eagle_mem_indexes_i[i:i+1]], dim=0) | ||
| # new_chunks.append(updated_chunk) | ||
| # # 3. 重新合并回一维张量 | ||
| # draft_model_input.mem_indexes = torch.cat(new_chunks, dim=0) |
Contributor
3a98a34 to
6fbe8d8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.