Skip to content

Commit b7cb88f

Browse files
committed
[None][feat] Add AutoDeploy custom model for OpenELM family
Onboard the OpenELM architecture (apple/OpenELM-270M/1_1B/3B-Instruct) as a custom AutoDeploy model. This is a heterogeneous transformer with: - Per-layer varying query/KV head counts (GQA) - Per-layer varying FFN intermediate sizes - Fused QKV projection with Q/K normalization - Shared input/output embeddings (no separate lm_head) - GLU-style FFN (proj_1 = fused gate+up, proj_2 = down) Uses canonical AD IR ops: torch_rmsnorm, torch_rope_with_explicit_cos_sin, torch_attention. Verified numerically equivalent to HF reference (100% top-1 token match, RMSE < 0.05). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
1 parent 81d6090 commit b7cb88f

3 files changed

Lines changed: 935 additions & 1 deletion

File tree

tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
from .modeling_decilm import DeciLMForCausalLM
22
from .modeling_deepseek import DeepSeekV3ForCausalLM
33
from .modeling_glm4_moe_lite import Glm4MoeLiteForCausalLM
4-
from .modeling_hunyuan_dense_v1 import HunYuanDenseV1ForCausalLM
54
from .modeling_granite_moe_hybrid import GraniteMoeHybridForCausalLM
5+
from .modeling_hunyuan_dense_v1 import HunYuanDenseV1ForCausalLM
66
from .modeling_kimi_k2 import KimiK2ForCausalLM, KimiK25ForConditionalGeneration
77
from .modeling_nemotron_flash import NemotronFlashForCausalLM, NemotronFlashPreTrainedTokenizerFast
88
from .modeling_nemotron_h import NemotronHForCausalLM
9+
from .modeling_openelm import OpenELMForCausalLM
910
from .modeling_qwen3 import Qwen3ForCausalLM
1011
from .modeling_qwen3_5_moe import Qwen3_5MoeForCausalLM, Qwen3_5MoeForConditionalGeneration
1112

@@ -20,6 +21,7 @@
2021
"NemotronFlashForCausalLM",
2122
"NemotronFlashPreTrainedTokenizerFast",
2223
"NemotronHForCausalLM",
24+
"OpenELMForCausalLM",
2325
"Qwen3ForCausalLM",
2426
"Qwen3_5MoeForCausalLM",
2527
"Qwen3_5MoeForConditionalGeneration",

0 commit comments

Comments
 (0)