Commit dad10d9
[None][feat] Add AutoDeploy custom model for OpenELM family (#198)
Onboard the OpenELM architecture (apple/OpenELM-270M/1_1B/3B-Instruct)
as a custom AutoDeploy model. This is a heterogeneous transformer with:
- Per-layer varying query/KV head counts (GQA)
- Per-layer varying FFN intermediate sizes
- Fused QKV projection with Q/K normalization
- Shared input/output embeddings (no separate lm_head)
- GLU-style FFN (proj_1 = fused gate+up, proj_2 = down)
Uses canonical AD IR ops: torch_rmsnorm, torch_rope_with_explicit_cos_sin,
torch_attention. Config loaded from checkpoint via trust_remote_code=True.
Updated openelm.yaml with attn_backend=flashinfer (trtllm backend produces
degenerate output for OpenELM). Works with torch-cudagraph, default batch
settings from dashboard_default.yaml.
All 3 variants produce coherent generation via build_and_run_ad.py.
Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>1 parent b868c10 commit dad10d9
4 files changed
Lines changed: 874 additions & 2 deletions
File tree
- examples/auto_deploy/model_registry/configs
- tensorrt_llm/_torch/auto_deploy/models/custom
- tests/unittest/auto_deploy/singlegpu/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
| 1 | + | |
| 2 | + | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| |||
0 commit comments