Skip to content

npu qwen3.5 megatron padding_free fix#50

Merged
Jintao-Huang merged 5 commits intomodelscope:mainfrom
addsubmuldiv:qwen3_5_npu
Apr 24, 2026
Merged

npu qwen3.5 megatron padding_free fix#50
Jintao-Huang merged 5 commits intomodelscope:mainfrom
addsubmuldiv:qwen3_5_npu

Conversation

@addsubmuldiv
Copy link
Copy Markdown
Collaborator

Problem

For Qwen3.5 / Qwen3-Next on NPU, padding_free=False may fail in the GDN non-thd path, while padding_free=True works only because the thd path derives mask from cu_seqlens_q internally and bypasses the broken external mask chain. :contentReference[oaicite:2]{index=2}

Cause

The GDN non-thd branch reads external mask from kwargs, but on NPU the correct mask for this path is attention_mask_2d. Without preferring that field, the model may consume an incompatible mask. :contentReference[oaicite:3]{index=3}

Fix

This PR updates the Qwen GDN non-thd path to prefer attention_mask_2d on NPU and fall back to the original attention_mask behavior otherwise.

Notes

This is the model-side half of the fix. It is intended to work together with the ms-swift trainer-side change that preserves attention_mask_2d.

Copilot AI review requested due to automatic review settings April 23, 2026 09:58
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helper function, resolve_gdn_attention_mask, to centralize attention mask logic and add support for NPU devices. This function is then integrated into the forward passes of the Qwen3-Next and Qwen3.5 models. A critical typo was identified in qwen3_next.py where an undefined function resolve_hf_attention_mask is called instead of the new helper, which would lead to a runtime error.

attention_mask = kwargs.get('attention_mask')
if attention_mask is not None:
attention_mask = (~attention_mask).sum(dim=(1, 2)) > 0
attention_mask = resolve_hf_attention_mask(kwargs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function call resolve_hf_attention_mask(kwargs) appears to be a typo. The function defined earlier in this file (at line 62) is named resolve_gdn_attention_mask. Using the incorrect name will result in a NameError at runtime.

Suggested change
attention_mask = resolve_hf_attention_mask(kwargs)
attention_mask = resolve_gdn_attention_mask(kwargs)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Qwen3.5 / Qwen3-Next GatedDeltaNet (GDN) non-THD attention-mask handling to avoid NPU failures when padding_free=False, by preferring attention_mask_2d on NPU and falling back to the existing attention_mask-derived behavior otherwise.

Changes:

  • Add a shared resolve_gdn_attention_mask() helper that selects attention_mask_2d on NPU (fallback to legacy mask logic otherwise).
  • Use resolve_gdn_attention_mask() in the Qwen3.5 MM-GPT GDN non-THD forward path.
  • Update Qwen3-Next GDN non-THD forward path to use a resolver (but currently references an undefined function).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/mcore_bridge/model/mm_gpts/qwen3_5.py Switch non-THD GDN path to use shared mask resolver.
src/mcore_bridge/model/gpts/qwen3_next.py Introduce resolve_gdn_attention_mask() and update non-THD GDN mask resolution call site.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

attention_mask = kwargs.get('attention_mask')
if attention_mask is not None:
attention_mask = (~attention_mask).sum(dim=(1, 2)) > 0
attention_mask = resolve_hf_attention_mask(kwargs)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_hf_attention_mask is referenced here but is not defined anywhere in this module (or elsewhere in the repo). This will raise a NameError at runtime. Use the newly introduced resolve_gdn_attention_mask(kwargs) here (or add/rename the helper consistently if a different resolver is intended).

Suggested change
attention_mask = resolve_hf_attention_mask(kwargs)
attention_mask = resolve_gdn_attention_mask(kwargs)

Copilot uses AI. Check for mistakes.
@Jintao-Huang
Copy link
Copy Markdown
Collaborator

thanks! lint 过一下

@Jintao-Huang Jintao-Huang merged commit f93cbdb into modelscope:main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants