Commit 36e3724
committed
[None][feat] Add AutoDeploy custom model for OpenELM family
Onboard the OpenELM architecture (apple/OpenELM-270M/1_1B/3B-Instruct)
as a custom AutoDeploy model. This is a heterogeneous transformer with:
- Per-layer varying query/KV head counts (GQA)
- Per-layer varying FFN intermediate sizes
- Fused QKV projection with Q/K normalization
- Shared input/output embeddings (no separate lm_head)
- GLU-style FFN (proj_1 = fused gate+up, proj_2 = down)
Uses canonical AD IR ops: torch_rmsnorm, torch_rope_with_explicit_cos_sin,
torch_attention. Config loaded from checkpoint via trust_remote_code=True.
Updated openelm.yaml with attn_backend=flashinfer (trtllm backend produces
degenerate output for OpenELM). Works with torch-cudagraph, default batch
settings from dashboard_default.yaml.
All 3 variants produce coherent generation via build_and_run_ad.py.
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>1 parent fcdea57 commit 36e3724
4 files changed
Lines changed: 874 additions & 2 deletions
File tree
- examples/auto_deploy/model_registry/configs
- tensorrt_llm/_torch/auto_deploy/models/custom
- tests/unittest/auto_deploy/singlegpu/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| 75 | + | |
74 | 76 | | |
75 | 77 | | |
76 | 78 | | |
| |||
0 commit comments