[None][feat] Add AD custom model for InternLM3 family#222
Conversation
lucaslie
left a comment
There was a problem hiding this comment.
please rebase and post RAW LOGS from running the build_and_run_ad with model registry - specifically ALL PROMPTS AND OUTPUTS
Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
[AGENT] Rebased onto Raw generation logs (all 10 prompts + outputs)[PROMPT 7] Compiler
[PROMPT 8] Primary Cause: Solar Wind Interactions
[PROMPT 9]
[03/11/2026-22:35:30] [TRT-LLM AUTO-DEPLOY] [RANK 0] [I] Destroying process group |
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Summary
torch_attention,torch_rmsnorm,torch_rope_with_explicit_cos_sin)InternLM3Configsince the model is not natively in transformers (requirestrust_remote_code)Model Details
internlm/internlm3-8b-instruct(already in model registry withworld_size_2)bias(MLP) andqkv_bias(QKV projections)Files Changed
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_internlm3.pytensorrt_llm/_torch/auto_deploy/models/custom/__init__.py__all__entrytests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.pyAutoDeploy End-to-End Results
Reduced layers (2 layers): Compilation succeeded, bad generation (expected with truncated model)
Full layers (48 layers): Compilation succeeded, excellent coherent generation across all 10 test prompts
Reproduce
# Full model run (requires 2 GPUs) python examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registryUnit Tests
Test plan
/bot run🤖 Generated with Claude Code