Add OpenVINO export support for glm4_moe_lite (GLM-4.7-Flash) by openvino-agent · Pull Request #1699 · huggingface/optimum-intel

openvino-agent · 2026-04-23T08:26:05Z

What does this PR do?

Adds native OpenVINO IR export support for the glm4_moe_lite model type, which powers the GLM-4.7-Flash family of models (e.g. THUDM/GLM-4.7-Flash).

Architecture overview

Glm4MoeLiteForCausalLM is a decoder-only transformer (available in Transformers ≥ 5.0) that combines:

Multi-head Latent Attention (MLA) — same LoRA-compressed KV projection used in MiniCPM3/DeepSeek, with separate qk_nope_head_dim/qk_rope_head_dim key dimensions and a smaller v_head_dim
Hybrid MLP layers — alternating dense (Glm4MoeLiteMLP) and sparse Mixture-of-Experts (Glm4MoeLiteMoE / Glm4MoeLiteNaiveMoe) layers

Changes

`optimum/exporters/openvino/model_configs.py`

Import Glm4MoeLitePatcher
Add Glm4MoeLiteOpenVINOConfig registered under glm4_moe_lite for text-generation and text-generation-with-past tasks. Inherits from MiniCPM3OpenVINOConfig to reuse OVMiniCPM3DummyPastKeyValuesGenerator (which correctly handles MLA-style KV cache shapes: k_head_dim = qk_nope_head_dim + qk_rope_head_dim, v_head_dim).

`optimum/exporters/openvino/model_patcher.py`

Add glm4_moe_lite_naive_moe_forward — a fully vectorized replacement for Glm4MoeLiteNaiveMoe.forward. The original implementation uses a nonzero()-driven Python for loop over active experts, which is incompatible with torch.jit.trace (loop count varies per input). The patch replaces it with batched matrix multiplications (torch.bmm) over all experts simultaneously, producing a consistent graph regardless of routing decisions.
Add Glm4MoeLitePatcher — patches the experts sub-module inside every sparse MoE layer during export and restores the original on exit.

Tests

tests/openvino/utils_tests.py — model ID entry for glm4_moe_lite and expected INT8 node count (42)
tests/openvino/test_decoder.py — add glm4_moe_lite to SUPPORTED_ARCHITECTURES (Transformers ≥ 5.0) and EXPECTED_NUM_SDPA
tests/openvino/test_export.py — add glm4_moe_lite export test (Transformers ≥ 5.0)
tests/openvino/test_exporters_cli.py — add CLI export test for text-generation-with-past
tests/openvino/test_quantization.py — add to SUPPORTED_ARCHITECTURES_WITH_AUTO_COMPRESSION

Docs

docs/source/openvino/models.mdx — list Glm4MoeLite (GLM-4.7-Flash) as a supported model

Before submitting

Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

- Add Glm4MoeLiteOpenVINOConfig in model_configs.py that inherits from MiniCPM3OpenVINOConfig to reuse the MLA-style PKV dummy generator (key head dim = qk_nope_head_dim + qk_rope_head_dim, value head dim = v_head_dim) - Add Glm4MoeLitePatcher in model_patcher.py with a vectorized replacement for Glm4MoeLiteNaiveMoe.forward to avoid dynamic control flow (nonzero + loop over experts) that breaks torch.jit.trace - Add glm4_moe_lite test entries to test_decoder.py, test_export.py, test_exporters_cli.py, test_quantization.py, and utils_tests.py - Update docs/source/openvino/models.mdx to list Glm4_moe_lite support

openvino-agent mentioned this pull request Apr 23, 2026

Add zai-org/GLM-4.7-Flash (Glm4MoeLiteForCausalLM) to supported LLM models openvinotoolkit/openvino.genai#3746

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenVINO export support for glm4_moe_lite (GLM-4.7-Flash)#1699

Add OpenVINO export support for glm4_moe_lite (GLM-4.7-Flash)#1699
openvino-agent wants to merge 1 commit intohuggingface:mainfrom
openvino-agent:support/glm4_moe_lite

openvino-agent commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

openvino-agent commented Apr 23, 2026

What does this PR do?

Architecture overview

Changes

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/model_patcher.py

Tests

Docs

Before submitting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`optimum/exporters/openvino/model_configs.py`

`optimum/exporters/openvino/model_patcher.py`