[bugfix] fix gdn sharded_state_dict lora by Jintao-Huang · Pull Request #23 · modelscope/mcore-bridge

Jintao-Huang · 2026-04-09T05:43:15Z

No description provided.

gemini-code-assist

Code Review

This pull request implements the sharded_state_dict method in the GatedDeltaNet module to support distributed checkpointing. The implementation includes logic for sharding parameters and submodules, specifically handling conv1d and in_proj layers with tensor parallel sharding and tensor splitting. A critical issue was identified regarding the use of an undefined attribute self.conv_dim_local_tp in assertions, which should be replaced with a locally calculated variable.

gemini-code-assist · 2026-04-09T05:46:24Z

src/mcore_bridge/model/modules/gated_delta_net.py

+        conv_layer_name_list = ['conv1d.weight']
+        assert (sharded_state_dict[f'{prefix}conv1d.weight'].data.size(0) == self.conv_dim_local_tp), (
+            self.conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.weight'])
+        if self.conv_bias:
+            conv_layer_name_list.append('conv1d.bias')
+            assert (sharded_state_dict[f'{prefix}conv1d.bias'].data.size(0) == self.conv_dim_local_tp), (
+                self.conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.bias'])


The attribute self.conv_dim_local_tp is used in assertions but is not defined within the class or the method. It should be defined as a local variable, similar to in_proj_dim_local_tp on line 364, representing the local dimension of the convolution layers after tensor parallel sharding. Based on the forward pass logic, this dimension is (2 * self.qk_dim + self.v_dim) // self.tp_size.

Suggested change

conv_layer_name_list = ['conv1d.weight']

assert (sharded_state_dict[f'{prefix}conv1d.weight'].data.size(0) == self.conv_dim_local_tp), (

self.conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.weight'])

if self.conv_bias:

conv_layer_name_list.append('conv1d.bias')

assert (sharded_state_dict[f'{prefix}conv1d.bias'].data.size(0) == self.conv_dim_local_tp), (

self.conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.bias'])

conv_dim_local_tp = (2 * self.qk_dim + self.v_dim) // self.tp_size

conv_layer_name_list = ['conv1d.weight']

assert (sharded_state_dict[f'{prefix}conv1d.weight'].data.size(0) == conv_dim_local_tp), (

conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.weight'])

if self.conv_bias:

conv_layer_name_list.append('conv1d.bias')

assert (sharded_state_dict[f'{prefix}conv1d.bias'].data.size(0) == conv_dim_local_tp), (

conv_dim_local_tp, sharded_state_dict[f'{prefix}conv1d.bias'])

Jintao-Huang added 2 commits April 9, 2026 13:38

fix gdn sharded_state_dict lora

8de072a

update

ed2b0e6

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix gdn sharded_state_dict lora#23

[bugfix] fix gdn sharded_state_dict lora#23
Jintao-Huang wants to merge 2 commits intomodelscope:mainfrom
Jintao-Huang:fix_gdn_sharded_state_dict_lora

Jintao-Huang commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jintao-Huang commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant