Skip to content

Commit 08777af

Browse files
committed
make moe output dtype consistent on non-cuda backends
1 parent 15589f3 commit 08777af

1 file changed

Lines changed: 0 additions & 4 deletions

File tree

examples/models/qwen3_5_moe/model.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -613,10 +613,6 @@ def forward(
613613
for layer in self.layers:
614614
x = layer(x, input_pos)
615615
x = self.norm(x)
616-
# When no sampling is requested, return the full ``[B, T, V]``
617-
# logits so callers (eval, custom samplers) can inspect every
618-
# position. Otherwise apply the prefill optimization and only
619-
# materialize ``[B, V]`` for the last token.
620616
if temperature is None:
621617
return self.lm_head(x) # [B, T, V] in model dtype
622618
logits = self.lm_head(x[:, -1, :]).float() # [B, V] float32

0 commit comments

Comments
 (0)