Skip to content

[fix]Add logits_to_keep and shift_labels support for Qwen3-VL and Qwen3-VL-MoE#1181

Merged
Mecoli1219 merged 3 commits intolinkedin:mainfrom
luca-888:bugfix/qwen3-vl-logits-and-shift-labels
Apr 17, 2026
Merged

[fix]Add logits_to_keep and shift_labels support for Qwen3-VL and Qwen3-VL-MoE#1181
Mecoli1219 merged 3 commits intolinkedin:mainfrom
luca-888:bugfix/qwen3-vl-logits-and-shift-labels

Conversation

@luca-888
Copy link
Copy Markdown
Contributor

@luca-888 luca-888 commented Apr 2, 2026

Summary

This PR adds logits_to_keep and shift_labels support for both Qwen3-VL and Qwen3-VL-MoE in the Liger-patched forward path. The change aligns the patched implementation with the expected Hugging Face interface and enables selective logits materialization for long-context inference.

Testing Done

  • make test
    • not fully green
    • observed existing failures in GRPO, fused_neighborhood_attention, and gemma3 monkey patch tests
  • make test-convergence
    • not fully green
    • observed failure in test/convergence/bf16/test_mini_models_multimodal.py::test_mini_model_multimodal[mini_llama4-...]
  • make checkstyle
    • passed

Known limitation:

  • The failed make test / make test-convergence cases above do not directly exercise the Qwen3-VL or Qwen3-VL-MoE logits_to_keep / shift_labels change in this PR

@luca-888
Copy link
Copy Markdown
Contributor Author

luca-888 commented Apr 2, 2026

Test Environment

  • GPU: single NVIDIA H20 (~95 GB VRAM)
  • transformers==5.4.0
  • Steady-state latency, model load time excluded
  • Each metric uses 1 warmup iteration and 2 measured iterations
  • Long-text prompt generated by repeating the same sentence with repeat counts 40 / 160 / 320 / 640
  • Measured with full logits vs logits_to_keep=1

Long-Sequence Benchmark

Dense: Qwen3-VL-8B-Instruct

Sequence Length Full Logits Latency Keep1 Latency Latency Delta Full Peak Mem Keep1 Peak Mem Memory Delta
755 0.1111 s 0.1051 s -5.4% 16.809 GB 16.674 GB -0.135 GB
2795 0.3733 s 0.3484 s -6.7% 17.979 GB 17.469 GB -0.510 GB
5515 0.7592 s 0.7097 s -6.5% 19.539 GB 18.531 GB -1.008 GB
10955 1.6723 s 1.5753 s -5.8% 22.660 GB 20.655 GB -2.005 GB

MoE: Qwen3-VL-30B-A3B-Instruct

Sequence Length Full Logits Latency Keep1 Latency Latency Delta Full Peak Mem Keep1 Peak Mem Memory Delta
755 0.2218 s 0.2245 s +1.2% 58.348 GB 58.313 GB -0.035 GB
2795 0.3642 s 0.3510 s -3.6% 59.511 GB 59.365 GB -0.146 GB
5515 0.6716 s 0.6461 s -3.8% 61.061 GB 60.772 GB -0.289 GB
10955 1.4846 s 1.4352 s -3.3% 64.161 GB 63.585 GB -0.576 GB

@luca-888
Copy link
Copy Markdown
Contributor Author

luca-888 commented Apr 8, 2026

@Mecoli1219 @Tcc0403 Could you please review this PR and approve the pending workflows when you have a chance? The implementation follows the same approach currently used for qwen3, qwen3-moe, and qwen3.5 in the repository. Thank you!

Copy link
Copy Markdown
Collaborator

@Mecoli1219 Mecoli1219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Thanks for contribution!

@luca-888
Copy link
Copy Markdown
Contributor Author

@Mecoli1219 Since it already has approval, could this be merged if there are no further concerns?

@Mecoli1219
Copy link
Copy Markdown
Collaborator

Yes. Feel free to merge it!

@luca-888
Copy link
Copy Markdown
Contributor Author

Yes. Feel free to merge it!

If you have permission, could you please merge this PR when you have a moment? Thanks!

@Mecoli1219 Mecoli1219 added this pull request to the merge queue Apr 17, 2026
Merged via the queue into linkedin:main with commit 6fe4681 Apr 17, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants