Commit ff33927
[None][feat] Add AD custom model for GLM MoE DSA family (GLM-5) (#240)
* [None][feat] Add AD custom model for GLM MoE DSA family (GLM-5)
Add prefill-only AutoDeploy custom model for the glm_moe_dsa architecture
(zai-org/GLM-5, zai-org/GLM-5-FP8). The model uses Multi-head Latent
Attention (MLA) and Mixture of Experts (MoE) with noaux_tc-style sigmoid
routing, similar to DeepSeek-V3.
Key implementation details:
- Bundled GlmMoeDsaConfig (not yet in transformers)
- Uses canonical AD ops: torch_rmsnorm, torch_mla, torch_moe,
torch_rope_with_explicit_cos_sin
- Vanilla PyTorch noaux_tc router (sigmoid + group topk + normalize)
- Shared rotary embedding at model level with _ad_ buffer prefix
- RoPE weight de-interleaving via mla_rope_utils load hook
- TokenizersBackend alias for GLM-5-FP8 tokenizer compatibility
- DSA indexer and MTP layers skipped (not needed for prefill)
Includes hierarchical equivalence tests (MLP, MoE, Attention, Decoder
Layer, Full Model, Export) against standalone HF-faithful reference
implementations.
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* Address PR review feedback
- Add num_hidden_layers_5.yaml to GLM-5 registry entries for dashboard runs
- Switch MoE gate to torch.ops.trtllm.noaux_tc_op (fused routing kernel)
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
---------
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>1 parent ee03147 commit ff33927
5 files changed
Lines changed: 1449 additions & 2 deletions
File tree
- examples/auto_deploy/model_registry
- configs
- tensorrt_llm/_torch/auto_deploy/models/custom
- tests/unittest/auto_deploy/singlegpu/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
227 | | - | |
| 227 | + | |
228 | 228 | | |
229 | | - | |
| 229 | + | |
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
0 commit comments