Enable SwiGLU patching for Qwen3-VL by dongchany · Pull Request #1175 · linkedin/Liger-Kernel

dongchany · 2026-03-31T16:07:05Z

Summary

Fix apply_liger_kernel_to_qwen3_vl(..., swiglu=True) so it is no longer a no-op.

Changes

patch transformers.models.qwen3_vl.modeling_qwen3_vl.Qwen3VLTextMLP to LigerSwiGLUMLP
patch instantiated decoder_layer.mlp modules for existing Qwen3-VL model instances
enable swiglu=True by default for apply_liger_kernel_to_qwen3_vl
add monkey-patch tests covering both default instance patching and explicit swiglu=True
document Qwen3-VL support in the README table

Validation

python -m pytest -q test/transformers/test_monkey_patch.py -k 'qwen3_vl and not moe'
python -m pytest -q test/convergence/bf16/test_mini_models.py -k 'mini_qwen3_vl and test_mini_model'
python -m ruff check src/liger_kernel/transformers/monkey_patch.py test/transformers/test_monkey_patch.py README.md

Copilot

Pull request overview

This PR fixes Qwen3-VL SwiGLU patching so apply_liger_kernel_to_qwen3_vl(..., swiglu=True) is effective (including for already-instantiated model instances), adds regression tests, and documents Qwen3-VL support.

Changes:

Enable SwiGLU patching for Qwen3-VL by default and patch Qwen3VLTextMLP / existing decoder_layer.mlp instances.
Add/extend monkey-patch tests to assert MLP forward patching behavior for Qwen3-VL.
Update the README supported-models table to include Qwen3-VL.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/liger_kernel/transformers/monkey_patch.py`	Implements Qwen3-VL SwiGLU patching (module class + existing instances) and flips the default `swiglu` behavior.
`test/transformers/test_monkey_patch.py`	Adds assertions and a new test covering Qwen3-VL MLP patching via the `swiglu` flag.
`README.md`	Adds Qwen3-VL to the support matrix and adjusts Qwen-family rows.

Comments suppressed due to low confidence (1)

src/liger_kernel/transformers/monkey_patch.py:1799

The apply_liger_kernel_to_qwen3_vl docstring’s Args: section doesn’t document the rope parameter even though it is part of the public signature (and is patched inside the function). Please add an rope (bool): ... entry for clarity and to keep it consistent with the other apply_liger_kernel_to_* docstrings in this module.

    """
    Apply Liger kernels to replace original implementation in HuggingFace Qwen3-VL models.

    Args:
        cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T16:13:14Z

+| Qwen2, Qwen2.5, & QwQ      | `liger_kernel.transformers.apply_liger_kernel_to_qwen2`    | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen2-VL, & QVQ       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl`    | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen2.5-VL       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl`    | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen3-VL       | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl`    | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen3   | `liger_kernel.transformers.apply_liger_kernel_to_qwen3`    |  RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy       |
+| Qwen3 MoE | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_moe` | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy       |


This README table section appears to introduce mixed line endings (surrounding lines are CRLF, while the newly added/edited Qwen rows show as LF-only). Please normalize the line endings for the file (and ideally enforce via .editorconfig) to avoid noisy diffs and potential formatting issues on Windows tooling.

Tcc0403

Thank you, overall lgtm. Just need to remove a redundant test

Tcc0403 · 2026-04-18T15:37:43Z

+@pytest.mark.skipif(not is_qwen3_vl_available(), reason="qwen3_vl module not available")
+def test_apply_liger_kernel_to_qwen3_vl_swiglu_flag_patches_mlp():
+    with patch("transformers.models.qwen3_vl.modeling_qwen3_vl"):
+        from transformers.models.qwen3_vl.modeling_qwen3_vl import Qwen3VLTextModel
+
+        config = transformers.models.qwen3_vl.configuration_qwen3_vl.Qwen3VLTextConfig(
+            vocab_size=32000,
+            hidden_size=512,
+            intermediate_size=2048,
+            num_hidden_layers=2,
+            num_attention_heads=8,
+            num_key_value_heads=2,
+            head_dim=64,
+            hidden_act="silu",
+            max_position_embeddings=32768,
+            initializer_range=0.02,
+            rms_norm_eps=1e-6,
+            use_cache=False,
+            tie_word_embeddings=True,
+            attention_dropout=0.0,
+            attention_bias=False,
+            **get_qwen3_vl_rope_config(),
+        )
+        dummy_model_instance = Qwen3VLTextModel._from_config(config)
+
+        for decoder_layer in dummy_model_instance.layers:
+            assert inspect.getsource(decoder_layer.mlp.forward) != inspect.getsource(LigerSwiGLUMLP.forward)
+
+        monkey_patch.apply_liger_kernel_to_qwen3_vl(
+            model=dummy_model_instance,
+            rope=False,
+            cross_entropy=False,
+            fused_linear_cross_entropy=False,
+            rms_norm=False,
+            swiglu=True,
+        )
+
+        for decoder_layer in dummy_model_instance.layers:
+            assert inspect.getsource(decoder_layer.mlp.forward) == inspect.getsource(LigerSwiGLUMLP.forward)


redundant test? adding swiglu check in above test functions should be sufficient.

Tcc0403 · 2026-04-18T15:40:04Z

+            if rms_norm:
+                _patch_qwen3_vl_rms_norm(text_model.norm)


great catch!

Tcc0403 · 2026-04-18T15:41:46Z

+| Qwen2, Qwen2.5, & QwQ      | `liger_kernel.transformers.apply_liger_kernel_to_qwen2`    | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen2-VL, & QVQ       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_vl`    | RMSNorm, LayerNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen2.5-VL       | `liger_kernel.transformers.apply_liger_kernel_to_qwen2_5_vl`    | RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |
+| Qwen3-VL       | `liger_kernel.transformers.apply_liger_kernel_to_qwen3_vl`    | RoPE, RMSNorm, SwiGLU, CrossEntropyLoss, FusedLinearCrossEntropy        |


great catch

Enable SwiGLU patching for Qwen3-VL

c36e542

Copilot AI review requested due to automatic review settings March 31, 2026 16:07

Copilot started reviewing on behalf of dongchany March 31, 2026 16:07 View session

dongchany mentioned this pull request Mar 31, 2026

[Feature Request] Add SwiGLU support for Qwen3-VL models #956

Open

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Tcc0403 reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SwiGLU patching for Qwen3-VL#1175

Enable SwiGLU patching for Qwen3-VL#1175
dongchany wants to merge 1 commit intolinkedin:mainfrom
dongchany:qwen3-vl-swiglu

dongchany commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Tcc0403 left a comment

Uh oh!

Tcc0403 Apr 18, 2026 •

edited

Loading

Uh oh!

Tcc0403 Apr 18, 2026

Uh oh!

Tcc0403 Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dongchany commented Mar 31, 2026

Summary

Changes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Tcc0403 left a comment

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Tcc0403 Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tcc0403 Apr 18, 2026 •

edited

Loading