feat: modify kwai template by LeonDiao0427 · Pull Request #9117 · modelscope/ms-swift

LeonDiao0427 · 2026-04-15T12:46:48Z

PR type

Bug Fix

PR information

Fix the KeyeVLTemplate implementation with the following changes:

Add RoPE position_ids computation via model.get_rope_index_slowfast()
Remove incorrect do_resize=False in image preprocessing

gemini-code-assist

Code Review

This pull request refactors the Kwai template to support slow-fast multimodal inputs and modularizes visual embedding logic. It also integrates RoPE index calculation into the post-encoding phase. Review feedback indicates that the early return in _post_encode may incorrectly bypass necessary embedding and position ID logic during inference. A suggestion was also made to use a more robust method for calculating dimension products to avoid floating-point conversions.

gemini-code-assist · 2026-04-15T12:48:39Z

        if not self.is_training:
            return inputs


The early return if not self.is_training prevents the computation of inputs_embeds and the new position_ids (via get_rope_index_slowfast) during inference when using the transformers backend. For multimodal models like KeyeVL, replacing token embeddings with visual embeddings and using the correct RoPE indices is essential for correct inference results. Since this PR aims to fix the RoPE implementation, this logic should also be applied during inference.

gemini-code-assist · 2026-04-15T12:48:39Z

+        cu_seqlens = [0]
+        for idx, thw_tuple in enumerate(grid_hws):
+            numel = int(np.prod(thw_tuple))
+            media_position_ids = torch.arange(numel, device=device) % int(np.prod(thw_tuple[1:]))


Using np.prod on a slice of a tuple returns a float in many NumPy versions. While it is cast to int here, using math.prod (available in Python 3.8+) is generally more idiomatic for calculating the product of dimensions in a shape tuple and avoids floating-point conversions.

Suggested change

media_position_ids = torch.arange(numel, device=device) % int(np.prod(thw_tuple[1:]))

media_position_ids = torch.arange(numel, device=device) % int(torch.prod(torch.tensor(thw_tuple[1:])))

Jintao-Huang · 2026-04-19T09:43:41Z

thanks!

please pass lint test

pip install pre-commit
pre-commit run --all-files

feat: modify kwai template

340b96c

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modify kwai template#9117

feat: modify kwai template#9117
LeonDiao0427 wants to merge 1 commit into
modelscope:mainfrom
LeonDiao0427:feature/update-kwai-template

LeonDiao0427 commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

Jintao-Huang commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	media_position_ids = torch.arange(numel, device=device) % int(np.prod(thw_tuple[1:]))
	media_position_ids = torch.arange(numel, device=device) % int(torch.prod(torch.tensor(thw_tuple[1:])))

Conversation

LeonDiao0427 commented Apr 15, 2026

PR type

PR information

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants