【draft】Mtp optimization by hiworldwzj · Pull Request #1266 · ModelTC/LightLLM

hiworldwzj · 2026-04-09T01:31:47Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces Multi-Token Prediction (MTP) support, including a new 'eagle3' mode, dynamic MTP verification, and optimized Triton kernels for diverse attention. It also adds profiling capabilities and updates benchmark scripts. My feedback focuses on improving code quality by moving local imports to the top of files, replacing print statements with proper logging, removing dead/commented-out code, and optimizing weight loading.

gemini-code-assist · 2026-04-09T01:38:56Z

+            from lightllm.utils.envs_utils import get_env_start_args
+            args_mtp_step = get_env_start_args().mtp_step


Avoid using absolute imports inside a function. Move this import to the top of the file to improve readability and maintainability.

gemini-code-assist · 2026-04-09T01:38:56Z

+            if os.path.exists(os.path.join(draft_model_path[0], "pytorch_model.bin")):
+                self.draft_model_weight_dict = torch.load(os.path.join(draft_model_path[0], "pytorch_model.bin"))
+                self.hidden_proj_weight = self.draft_model_weight_dict["fc.weight"].to(torch.bfloat16).to("cuda")
+                del self.draft_model_weight_dict
+                gc.collect()


The use of torch.load with pytorch_model.bin can be slow and memory-intensive. Consider using safetensors for faster and safer model weight loading if possible.

gemini-code-assist · 2026-04-09T01:38:56Z

+                # infer_state.b_mark_shared_group = F.pad(
+                #     infer_state.b_mark_shared_group,
+                #     (0, infer_state.input_ids.shape[0] - infer_state.b_mark_shared_group.shape[0]),
+                #     mode="constant",
+                #     value=0,
+                # )


The commented-out code block should be removed if it is no longer needed, or uncommented if it is intended to be part of the logic. Leaving dead code reduces maintainability.

gemini-code-assist · 2026-04-09T01:38:56Z

+                    try:
+                        attr_.copy_(attr_value, non_blocking=True)
+                    except Exception as e:
+                        print(f"Warning: copy tensor {attr_name} failed during cuda graph copy, error: {e}")


Using print for error logging is not recommended in production code. Use the project's logger to ensure errors are captured in the standard logging infrastructure.

gemini-code-assist · 2026-04-09T01:38:56Z

+                # # 1. 根据当前的 group_sizes 将原来的索引拆分成多个组
+                # # 这里的 group_sizes 应该对应之前未处理前的每一组的大小
+                # chunks = torch.split(draft_model_input.mem_indexes, mtp_group_sizes)
+                # # 2. 对每一个 chunk 进行处理：去掉第一个元素 ([:, 1:])，并加上对应的 eagle_mem_indexes_i 元素
+                # # 假设 eagle_mem_indexes_i 的形状是 (num_groups,)
+                # new_chunks = []
+                # for i, chunk in enumerate(chunks):
+                #     # chunk[1:] 模拟了原来的 [:, 1:] 操作
+                #     # eagle_mem_indexes_i[i:i+1] 确保拿出来的是一个长度为 1 的张量用于拼接
+                #     updated_chunk = torch.cat([chunk[1:], eagle_mem_indexes_i[i:i+1]], dim=0)
+                #     new_chunks.append(updated_chunk)
+                # # 3. 重新合并回一维张量
+                # draft_model_input.mem_indexes = torch.cat(new_chunks, dim=0)


The commented-out code block should be removed to keep the codebase clean and maintainable.

…ss.py

hiworldwzj and others added 16 commits April 9, 2026 01:22

fix

6cf58b4

add gitignore

f1251a3

finish usable mtp kernel

d9b1fdd

end-to-end finish

315366a

fix cudagraph support

73ea125

save runnable version of dynamic mtp

dc91e59

save runnable version of dynamic mtp

aefe67e

fix

3c28fb0

save fixed dynamic mtp

2dc933e

save

0b08de8

save

952ec15

add experiment script

3750118

update mtp kernel support BLOCK_BATCH < max_verify_group_size

4bf4287

fix implementation issues

05e0dfd

save

c2b7569

save

80219af

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

hiworldwzj added 12 commits April 9, 2026 02:15

fix

41180d3

fix

8afd7a8

fix

2b277fa

fix

93c2ada

fix

6395447

fix

fc20624

fix

1b08d15

fix

775adbd

fix

d4830ff

fix

da944f8

fix

22c5996

fix

6fbe8d8

hiworldwzj force-pushed the mtp_optimization branch from 3a98a34 to 6fbe8d8 Compare April 9, 2026 05:09

flyinglandlord and others added 30 commits April 9, 2026 13:17

fix lightllm/server/router/model_infer/mode_backend/generic_pre_proce…

c4a9f74

…ss.py

update generic_pre_process.py

5eb4889

fix

379f256

fix

e943a43

refactor qwen3_eagle3

e121b9d

add stage1

1723230

fix

6890bc0

fix

1e1fb98

fix

29535b2

fix

5b925a6

fix

538200d

fix

2831c70

fix

158c7a3

fix

4c08120

fix

a837cbb

fix

17ed333

fix

7927e15

fix base_backend.py

53a7077

fix

322f713

fix

c3e46c9

fix

2f0c250

fix

8ab2ab4

fix

76ca4ce

fix

1d0f18e

fix

6e69701

fix

1565698

fix

5e857c8

fix

3272073

fix

1126013

fix

f591158

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【draft】Mtp optimization#1266

【draft】Mtp optimization#1266
hiworldwzj wants to merge 58 commits into
mainfrom
mtp_optimization

hiworldwzj commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from lightllm.utils.envs_utils import get_env_start_args
		args_mtp_step = get_env_start_args().mtp_step

Conversation

hiworldwzj commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants