Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes Qwen3-VL SwiGLU patching so apply_liger_kernel_to_qwen3_vl(..., swiglu=True) is effective (including for already-instantiated model instances), adds regression tests, and documents Qwen3-VL support.
Changes:
- Enable SwiGLU patching for Qwen3-VL by default and patch
Qwen3VLTextMLP/ existingdecoder_layer.mlpinstances. - Add/extend monkey-patch tests to assert MLP forward patching behavior for Qwen3-VL.
- Update the README supported-models table to include Qwen3-VL.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/liger_kernel/transformers/monkey_patch.py |
Implements Qwen3-VL SwiGLU patching (module class + existing instances) and flips the default swiglu behavior. |
test/transformers/test_monkey_patch.py |
Adds assertions and a new test covering Qwen3-VL MLP patching via the swiglu flag. |
README.md |
Adds Qwen3-VL to the support matrix and adjusts Qwen-family rows. |
Comments suppressed due to low confidence (1)
src/liger_kernel/transformers/monkey_patch.py:1799
- The
apply_liger_kernel_to_qwen3_vldocstring’sArgs:section doesn’t document theropeparameter even though it is part of the public signature (and is patched inside the function). Please add anrope (bool): ...entry for clarity and to keep it consistent with the otherapply_liger_kernel_to_*docstrings in this module.
"""
Apply Liger kernels to replace original implementation in HuggingFace Qwen3-VL models.
Args:
cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Qwen2, Qwen2.5, & QwQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2-VL, & QVQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl` | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2.5-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl` | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3 | `liger_kernel.transformers.apply_liger_kernel_to_qwen3` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3 MoE | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_moe` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | |
There was a problem hiding this comment.
This README table section appears to introduce mixed line endings (surrounding lines are CRLF, while the newly added/edited Qwen rows show as LF-only). Please normalize the line endings for the file (and ideally enforce via .editorconfig) to avoid noisy diffs and potential formatting issues on Windows tooling.
Tcc0403
left a comment
There was a problem hiding this comment.
Thank you, overall lgtm. Just need to remove a redundant test
| @pytest.mark.skipif(not is_qwen3_vl_available(), reason="qwen3_vl module not available") | ||
| def test_apply_liger_kernel_to_qwen3_vl_swiglu_flag_patches_mlp(): | ||
| with patch("transformers.models.qwen3_vl.modeling_qwen3_vl"): | ||
| from transformers.models.qwen3_vl.modeling_qwen3_vl import Qwen3VLTextModel | ||
|
|
||
| config = transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLTextConfig( | ||
| vocab_size=32000, | ||
| hidden_size=512, | ||
| intermediate_size=2048, | ||
| num_hidden_layers=2, | ||
| num_attention_heads=8, | ||
| num_key_value_heads=2, | ||
| head_dim=64, | ||
| hidden_act="silu", | ||
| max_position_embeddings=32768, | ||
| initializer_range=0.02, | ||
| rms_norm_eps=1e-6, | ||
| use_cache=False, | ||
| tie_word_embeddings=True, | ||
| attention_dropout=0.0, | ||
| attention_bias=False, | ||
| **get_qwen3_vl_rope_config(), | ||
| ) | ||
| dummy_model_instance = Qwen3VLTextModel._from_config(config) | ||
|
|
||
| for decoder_layer in dummy_model_instance.layers: | ||
| assert inspect.getsource(decoder_layer.mlp.forward) != inspect.getsource(LigerSwiGLUMLP.forward) | ||
|
|
||
| monkey_patch.apply_liger_kernel_to_qwen3_vl( | ||
| model=dummy_model_instance, | ||
| rope=False, | ||
| cross_entropy=False, | ||
| fused_linear_cross_entropy=False, | ||
| rms_norm=False, | ||
| swiglu=True, | ||
| ) | ||
|
|
||
| for decoder_layer in dummy_model_instance.layers: | ||
| assert inspect.getsource(decoder_layer.mlp.forward) == inspect.getsource(LigerSwiGLUMLP.forward) |
There was a problem hiding this comment.
redundant test? adding swiglu check in above test functions should be sufficient.
| if rms_norm: | ||
| _patch_qwen3_vl_rms_norm(text_model.norm) |
| | Qwen2, Qwen2.5, & QwQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2-VL, & QVQ | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl` | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen2.5-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl` | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | | ||
| | Qwen3-VL | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy | |
Summary
Fix
apply_liger_kernel_to_qwen3_vl(..., swiglu=True)so it is no longer a no-op.Changes
transformers.models.qwen3_vl.modeling_qwen3_vl.Qwen3VLTextMLPtoLigerSwiGLUMLPdecoder_layer.mlpmodules for existing Qwen3-VL model instancesswiglu=Trueby default forapply_liger_kernel_to_qwen3_vlswiglu=TrueValidation