[Qwen3MoE] Potentially a bug on `Qwen3MoeSparseMoeBlock`

Hi,

I found a typing mismatch on [`Qwen3MoeSparseMoeBlock`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3_moe/modeling_qwen3_moe.py#L275):

```python
class Qwen3MoeSparseMoeBlock(nn.Module):
    def __init__(self, config: Qwen3MoeConfig):
        super().__init__()
        self.experts = Qwen3MoeExperts(config)
        self.gate = Qwen3MoeTopKRouter(config)

    def forward(self, hidden_states: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
        batch_size, sequence_length, hidden_dim = hidden_states.shape
        hidden_states_reshaped = hidden_states.view(-1, hidden_dim)
        _, routing_weights, selected_experts = self.gate(hidden_states_reshaped)
        final_hidden_states = self.experts(hidden_states_reshaped, selected_experts, routing_weights)
        return final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
```

if the code is correct, the return type of `forward` should be `torch.Tensor`. However, i don't know whether returning `routing_weights` is also needed or not. Also, `Qwen3MoeSparseMoeBlock` is used in `Qwen3MoeDecoderLayer` as  `self.mlp`, and there is a residual connection after `self.mlp(hidden_states)`:

```python
# Fully Connected
residual = hidden_states
hidden_states = self.post_attention_layernorm(hidden_states)
hidden_states = self.mlp(hidden_states)
hidden_states = residual + hidden_states
return hidden_states
```

If we return a tuple of tensors, `hidden_states = residual + hidden_states` will give this error
```bash
TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'
```

Did i miss something? Should we also return the `routing_weights` for computing loss during training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Qwen3MoE] Potentially a bug on `Qwen3MoeSparseMoeBlock` #45208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Qwen3MoE] Potentially a bug on Qwen3MoeSparseMoeBlock #45208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Qwen3MoE] Potentially a bug on `Qwen3MoeSparseMoeBlock` #45208