Skip to content

Commit ff33927

Browse files
lucasliebmarimuthu-nv
authored andcommitted
[None][feat] Add AD custom model for GLM MoE DSA family (GLM-5) (#240)
* [None][feat] Add AD custom model for GLM MoE DSA family (GLM-5) Add prefill-only AutoDeploy custom model for the glm_moe_dsa architecture (zai-org/GLM-5, zai-org/GLM-5-FP8). The model uses Multi-head Latent Attention (MLA) and Mixture of Experts (MoE) with noaux_tc-style sigmoid routing, similar to DeepSeek-V3. Key implementation details: - Bundled GlmMoeDsaConfig (not yet in transformers) - Uses canonical AD ops: torch_rmsnorm, torch_mla, torch_moe, torch_rope_with_explicit_cos_sin - Vanilla PyTorch noaux_tc router (sigmoid + group topk + normalize) - Shared rotary embedding at model level with _ad_ buffer prefix - RoPE weight de-interleaving via mla_rope_utils load hook - TokenizersBackend alias for GLM-5-FP8 tokenizer compatibility - DSA indexer and MTP layers skipped (not needed for prefill) Includes hierarchical equivalence tests (MLP, MoE, Attention, Decoder Layer, Full Model, Export) against standalone HF-faithful reference implementations. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * Address PR review feedback - Add num_hidden_layers_5.yaml to GLM-5 registry entries for dashboard runs - Switch MoE gate to torch.ops.trtllm.noaux_tc_op (fused routing kernel) Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
1 parent ee03147 commit ff33927

5 files changed

Lines changed: 1449 additions & 2 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Configuration for GLM-5 (zai-org/GLM-5)
2+
# Workaround: extra_special_tokens is a list in the GLM-5 tokenizer config but
3+
# transformers 4.57.x expects a dict in _set_model_specific_special_tokens.
4+
# Passing extra_special_tokens={} overrides the problematic list with an empty dict.
5+
tokenizer_kwargs:
6+
extra_special_tokens: {}

examples/auto_deploy/model_registry/models.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,9 +224,9 @@ models:
224224
yaml_extra: ['qwen3.5_moe_400b.yaml']
225225
# --- GLM-5 (Feb 2026) ---
226226
- name: zai-org/GLM-5
227-
yaml_extra: ['dashboard_default.yaml', 'world_size_8.yaml']
227+
yaml_extra: ['dashboard_default.yaml', 'world_size_8.yaml', 'glm_5.yaml', 'num_hidden_layers_5.yaml']
228228
- name: zai-org/GLM-5-FP8
229-
yaml_extra: ['dashboard_default.yaml', 'world_size_8.yaml']
229+
yaml_extra: ['dashboard_default.yaml', 'world_size_8.yaml', 'glm_5.yaml', 'num_hidden_layers_5.yaml']
230230
# --- MiniMax-M2.5 (Feb 2026) ---
231231
- name: MiniMaxAI/MiniMax-M2.5
232232
yaml_extra: ['dashboard_default.yaml', 'world_size_8.yaml', 'minimax_m2.yaml']

tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from .modeling_gemma2 import Gemma2ForCausalLM
88
from .modeling_glm4_moe import Glm4MoeForCausalLM
99
from .modeling_glm4_moe_lite import Glm4MoeLiteForCausalLM
10+
from .modeling_glm_moe_dsa import GlmMoeDsaForCausalLM
1011
from .modeling_granite import GraniteForCausalLM
1112
from .modeling_granite_moe_hybrid import GraniteMoeHybridForCausalLM
1213
from .modeling_hunyuan_dense import HunYuanDenseForCausalLM
@@ -39,6 +40,7 @@
3940
"Gemma2ForCausalLM",
4041
"Glm4MoeForCausalLM",
4142
"Glm4MoeLiteForCausalLM",
43+
"GlmMoeDsaForCausalLM",
4244
"GraniteForCausalLM",
4345
"GraniteMoeHybridForCausalLM",
4446
"HunYuanDenseForCausalLM",

0 commit comments

Comments
 (0)