[None][feat] Add AD custom model for InternLM3 family by lucaslie · Pull Request #222 · nv-auto-deploy/TensorRT-LLM

lucaslie · 2026-03-12T05:38:00Z

Summary

Add lean prefill-only custom model for InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin)
Bundle minimal InternLM3Config since the model is not natively in transformers (requires trust_remote_code)
Add hierarchical equivalence tests (RMSNorm, MLP, Attention, DecoderLayer, full model, export)

Model Details

Architecture: Dense GQA transformer, 32 Q heads / 2 KV heads, SwiGLU MLP, RMSNorm, dynamic NTK-scaled RoPE (factor=6.0)
Family members: internlm/internlm3-8b-instruct (already in model registry with world_size_2)
Config flags: Separate bias (MLP) and qkv_bias (QKV projections)

Files Changed

File	Change
`tensorrt_llm/_torch/auto_deploy/models/custom/modeling_internlm3.py`	New
`tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py`	Added import + `__all__` entry
`tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py`	New

AutoDeploy End-to-End Results

Reduced layers (2 layers): Compilation succeeded, bad generation (expected with truncated model)
Full layers (48 layers): Compilation succeeded, excellent coherent generation across all 10 test prompts

Reproduce

# Full model run (requires 2 GPUs)
python examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry

Unit Tests

pytest tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py -v

Test plan

Hierarchical equivalence tests (block → layer → model → export) all pass
Full model AD end-to-end run with coherent generation
CI tests via /bot run

🤖 Generated with Claude Code

lucaslie

please rebase and post RAW LOGS from running the build_and_run_ad with model registry - specifically ALL PROMPTS AND OUTPUTS

Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

lucaslie · 2026-03-13T01:54:25Z

[AGENT] Rebased onto feat/paperclip_maximizer. Here are the raw logs from build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry (full 48 layers, world_size=2):

Raw generation logs (all 10 prompts + outputs)

[03/11/2026-22:35:29] [TRT-LLM AUTO-DEPLOY] [I] Running example prompts...
Processed requests: 100%|██████████| 10/10 [00:00<00:00, 10.50it/s]

[PROMPT 0] <s><|im_start|>user
How big is the universe? <|im_end|>
<|im_start|>assistant
: Determining the size of the universe is a complex and ongoing area of research in cosmology, and it's a topic that continues to evolve with new scientific discoveries. Here's a concise overview of our current understanding:

### Current Estimates and Recommendations

1. **Observable Universe:**
   - **Size:** The observable universe is the part of the universe that we can theoretically observe, given the time light has had to travel

[PROMPT 1] <s><|im_start|>user
In simple words and a single sentence, explain the concept of gravity: <|im_end|>
<|im_start|>assistant
: Gravity is a force that pulls all objects towards each other, causing things to fall and keeping planets in orbit around the Sun.

[PROMPT 2] <s><|im_start|>user
How to fix slicing in golf? <|im_end|>
<|im_start|>assistant
: Slicing, or slicing the golf ball to the right (for right-handed golfers) or left (for left-handed golfers), can be challenging to correct but is definitely manageable with the right techniques and practice. Here's a comprehensive guide to help you fix your slice:

### 1. **Analyze Your Swing and Set-Up**
- **Analyze Your Swing:** Start by identifying the root cause of your slice. Possible cul

[PROMPT 3] <s><|im_start|>user
Where is the capital of Iceland? <|im_end|>
<|im_start|>assistant
: The capital of Iceland is Reykjavík. Located on the southwestern coast of the country, on a quiet bay within the so-called Golden Circle, Reykjavík serves as the nation's political, cultural, economic, and social hub. While Iceland is a country with a single official capital, Reykjavík is not just a city but a major urban area encompassing the Greater Reykjavík area, which includes boroughs such

[PROMPT 4] <s><|im_start|>user
What are the three laws of thermodynamics? <|im_end|>
<|im_start|>assistant
: The three laws of thermodynamics are fundamental principles that describe the behavior of energy in a system. Each law builds upon the one before it, providing a deeper understanding of thermal and thermophysical processes. Here are the three laws of thermodynamics:

### 1. **Zeroth Law of Thermodynamics**
**Statement:** If two systems are each in thermal equilibrium with a third system, then they are in thermal equilibrium with each other.
**Implication:** This law introduces the concept of temperature

[PROMPT 5] <s><|im_start|>user
Summarize the plot of Romeo and Juliet in two sentences: <|im_end|>
<|im_start|>assistant
: In "Romeo and Juliet," the romantic tragedy unfolds as the young star-crossed lovers, Romeo Montague and Juliet Capulet, fall in love despite their families' bitter feud. Their passionate relationship ends in heartbreaking tragedy when their secret union is shattered by the deaths of both Romeo and Juliet, leading to the ultimate reconciliation of the Montague and Capulet families.

[PROMPT 6] <s><|im_start|>user
Write a Python function that checks if a number is prime.<|im_end|>
<|im_start|>assistant
: Here is a simple Python function that checks if a number is prime:

```python
def is_prime(num):
    # 1 and numbers less than 1 are not prime
    if num <= 1:
        return False
    # 2 is the only even prime number
    elif num == 2:
        return True
    # all other even numbers are not primes
    elif num % 2 == 0:
        return False

[PROMPT 7] <|im_start|>user
Explain the difference between a compiler and an interpreter: <|im_end|>
<|im_start|>assistant
: Certainly! Both compilers and interpreters are software tools used to convert high-level programming languages into machine code that can be executed by a computer, but they operate in fundamentally different ways. Here's a detailed explanation of their differences:

Compiler

Preprocessing:

The compiler reads the entire source code of the program and performs various preprocessing steps, such as macro expansion, header file inclusion, and removal of comments.

[PROMPT 8] <|im_start|>user
What causes the northern lights? <|im_end|>
<|im_start|>assistant
: The Northern Lights, also known as the Aurora Borealis, are a natural luminous display that occurs in the Earth's sky predominantly in high-latitude regions, regions around the Arctic Circle. The most common causes of the Northern Lights can be attributed to two main processes involving electromagnetic radiation from the Sun and the Earth's magnetic field.

Primary Cause: Solar Wind Interactions

Solar Wind: The Sun continuously emits a stream of

[PROMPT 9] <|im_start|>user
What are the health benefits of drinking green tea?<|im_end|>
<|im_start|>assistant
: Drinking green tea is associated with a wide range of health benefits, largely due to its rich antioxidant content, particularly catechins such as epigallocatechin gallate (EGCG). Here are some of the key health benefits of green tea:

Antioxidant Protection: Green tea is one of the richest sources of antioxidants, including catechins and polyphenols. These compounds help neutralize harmful free radicals that

[03/11/2026-22:35:30] [TRT-LLM AUTO-DEPLOY] [RANK 0] [I] Destroying process group
[03/11/2026-22:35:30] [TRT-LLM AUTO-DEPLOY] [RANK 1] [I] Destroying process group

</details> ### Reproduce ```bash python examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry

* [None][feat] Add AD custom model for InternLM3 family Add a lean prefill-only custom model implementation for the InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin). Includes hierarchical equivalence tests (block, layer, full model, export) and bundles a minimal InternLM3Config since the model is not natively in transformers. Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * [None][feat] Address review: remove bundled config, document inline refs Remove the bundled InternLM3Config from the modeling file. The AD pipeline loads the config from the HF checkpoint via trust_remote_code=True (same pattern as DeciLM). The test file now loads InternLM3Config dynamically from the HF cache. Inline HF reference classes are kept because the HF modeling_internlm3.py cannot be imported on the installed transformers version (requires LossKwargs from transformers >=4.48). Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

github-actions Bot assigned lucaslie Mar 12, 2026

govind-ramnarayan reviewed Mar 12, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/modeling_internlm3.py Outdated

govind-ramnarayan reviewed Mar 12, 2026

View reviewed changes

Comment thread tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py

lucaslie commented Mar 13, 2026

View reviewed changes

lucaslie added 2 commits March 12, 2026 18:53

lucaslie force-pushed the ll/pcm_119 branch from caad5ac to 37fc605 Compare March 13, 2026 01:53

lucaslie merged commit cb8dbb2 into feat/paperclip_maximizer Mar 13, 2026
1 of 3 checks passed

lucaslie mentioned this pull request Mar 13, 2026

[None][feat] Add AutoDeploy custom model for InternLM3 #218

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] Add AD custom model for InternLM3 family#222

[None][feat] Add AD custom model for InternLM3 family#222
lucaslie merged 2 commits into
feat/paperclip_maximizerfrom
ll/pcm_119

lucaslie commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

lucaslie left a comment

Uh oh!

lucaslie commented Mar 13, 2026

Compiler

Primary Cause: Solar Wind Interactions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lucaslie commented Mar 12, 2026

Summary

Model Details

Files Changed

AutoDeploy End-to-End Results

Reproduce

Unit Tests

Test plan

Uh oh!

Uh oh!

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

lucaslie commented Mar 13, 2026

Compiler

Primary Cause: Solar Wind Interactions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants