Commit b7cb88f
committed
[None][feat] Add AutoDeploy custom model for OpenELM family
Onboard the OpenELM architecture (apple/OpenELM-270M/1_1B/3B-Instruct)
as a custom AutoDeploy model. This is a heterogeneous transformer with:
- Per-layer varying query/KV head counts (GQA)
- Per-layer varying FFN intermediate sizes
- Fused QKV projection with Q/K normalization
- Shared input/output embeddings (no separate lm_head)
- GLU-style FFN (proj_1 = fused gate+up, proj_2 = down)
Uses canonical AD IR ops: torch_rmsnorm, torch_rope_with_explicit_cos_sin,
torch_attention. Verified numerically equivalent to HF reference (100%
top-1 token match, RMSE < 0.05).
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>1 parent 81d6090 commit b7cb88f
3 files changed
Lines changed: 935 additions & 1 deletion
File tree
- tensorrt_llm/_torch/auto_deploy/models/custom
- tests/unittest/auto_deploy/singlegpu/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
0 commit comments