Skip to content

[None][feat] Add AD custom model for InternLM3 family#222

Merged
lucaslie merged 2 commits into
feat/paperclip_maximizerfrom
ll/pcm_119
Mar 13, 2026
Merged

[None][feat] Add AD custom model for InternLM3 family#222
lucaslie merged 2 commits into
feat/paperclip_maximizerfrom
ll/pcm_119

Conversation

@lucaslie

Copy link
Copy Markdown

Summary

  • Add lean prefill-only custom model for InternLM3 architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE) using AutoDeploy canonical ops (torch_attention, torch_rmsnorm, torch_rope_with_explicit_cos_sin)
  • Bundle minimal InternLM3Config since the model is not natively in transformers (requires trust_remote_code)
  • Add hierarchical equivalence tests (RMSNorm, MLP, Attention, DecoderLayer, full model, export)

Model Details

  • Architecture: Dense GQA transformer, 32 Q heads / 2 KV heads, SwiGLU MLP, RMSNorm, dynamic NTK-scaled RoPE (factor=6.0)
  • Family members: internlm/internlm3-8b-instruct (already in model registry with world_size_2)
  • Config flags: Separate bias (MLP) and qkv_bias (QKV projections)

Files Changed

File Change
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_internlm3.py New
tensorrt_llm/_torch/auto_deploy/models/custom/__init__.py Added import + __all__ entry
tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py New

AutoDeploy End-to-End Results

Reduced layers (2 layers): Compilation succeeded, bad generation (expected with truncated model)
Full layers (48 layers): Compilation succeeded, excellent coherent generation across all 10 test prompts

Reproduce

# Full model run (requires 2 GPUs)
python examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry

Unit Tests

pytest tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py -v

Test plan

  • Hierarchical equivalence tests (block → layer → model → export) all pass
  • Full model AD end-to-end run with coherent generation
  • CI tests via /bot run

🤖 Generated with Claude Code

Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/modeling_internlm3.py Outdated

@lucaslie lucaslie left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase and post RAW LOGS from running the build_and_run_ad with model registry - specifically ALL PROMPTS AND OUTPUTS

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
@lucaslie

Copy link
Copy Markdown
Author

[AGENT] Rebased onto feat/paperclip_maximizer. Here are the raw logs from build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry (full 48 layers, world_size=2):

Raw generation logs (all 10 prompts + outputs)
[03/11/2026-22:35:29] [TRT-LLM AUTO-DEPLOY] [I] Running example prompts...
Processed requests: 100%|██████████| 10/10 [00:00<00:00, 10.50it/s]

[PROMPT 0] <s><|im_start|>user
How big is the universe? <|im_end|>
<|im_start|>assistant
: Determining the size of the universe is a complex and ongoing area of research in cosmology, and it's a topic that continues to evolve with new scientific discoveries. Here's a concise overview of our current understanding:

### Current Estimates and Recommendations

1. **Observable Universe:**
   - **Size:** The observable universe is the part of the universe that we can theoretically observe, given the time light has had to travel

[PROMPT 1] <s><|im_start|>user
In simple words and a single sentence, explain the concept of gravity: <|im_end|>
<|im_start|>assistant
: Gravity is a force that pulls all objects towards each other, causing things to fall and keeping planets in orbit around the Sun.

[PROMPT 2] <s><|im_start|>user
How to fix slicing in golf? <|im_end|>
<|im_start|>assistant
: Slicing, or slicing the golf ball to the right (for right-handed golfers) or left (for left-handed golfers), can be challenging to correct but is definitely manageable with the right techniques and practice. Here's a comprehensive guide to help you fix your slice:

### 1. **Analyze Your Swing and Set-Up**
- **Analyze Your Swing:** Start by identifying the root cause of your slice. Possible cul

[PROMPT 3] <s><|im_start|>user
Where is the capital of Iceland? <|im_end|>
<|im_start|>assistant
: The capital of Iceland is Reykjavík. Located on the southwestern coast of the country, on a quiet bay within the so-called Golden Circle, Reykjavík serves as the nation's political, cultural, economic, and social hub. While Iceland is a country with a single official capital, Reykjavík is not just a city but a major urban area encompassing the Greater Reykjavík area, which includes boroughs such

[PROMPT 4] <s><|im_start|>user
What are the three laws of thermodynamics? <|im_end|>
<|im_start|>assistant
: The three laws of thermodynamics are fundamental principles that describe the behavior of energy in a system. Each law builds upon the one before it, providing a deeper understanding of thermal and thermophysical processes. Here are the three laws of thermodynamics:

### 1. **Zeroth Law of Thermodynamics**
**Statement:** If two systems are each in thermal equilibrium with a third system, then they are in thermal equilibrium with each other.
**Implication:** This law introduces the concept of temperature

[PROMPT 5] <s><|im_start|>user
Summarize the plot of Romeo and Juliet in two sentences: <|im_end|>
<|im_start|>assistant
: In "Romeo and Juliet," the romantic tragedy unfolds as the young star-crossed lovers, Romeo Montague and Juliet Capulet, fall in love despite their families' bitter feud. Their passionate relationship ends in heartbreaking tragedy when their secret union is shattered by the deaths of both Romeo and Juliet, leading to the ultimate reconciliation of the Montague and Capulet families.

[PROMPT 6] <s><|im_start|>user
Write a Python function that checks if a number is prime.<|im_end|>
<|im_start|>assistant
: Here is a simple Python function that checks if a number is prime:

```python
def is_prime(num):
    # 1 and numbers less than 1 are not prime
    if num <= 1:
        return False
    # 2 is the only even prime number
    elif num == 2:
        return True
    # all other even numbers are not primes
    elif num % 2 == 0:
        return False

[PROMPT 7] <|im_start|>user
Explain the difference between a compiler and an interpreter: <|im_end|>
<|im_start|>assistant
: Certainly! Both compilers and interpreters are software tools used to convert high-level programming languages into machine code that can be executed by a computer, but they operate in fundamentally different ways. Here's a detailed explanation of their differences:

Compiler

  1. Preprocessing:
    • The compiler reads the entire source code of the program and performs various preprocessing steps, such as macro expansion, header file inclusion, and removal of comments.

[PROMPT 8] <|im_start|>user
What causes the northern lights? <|im_end|>
<|im_start|>assistant
: The Northern Lights, also known as the Aurora Borealis, are a natural luminous display that occurs in the Earth's sky predominantly in high-latitude regions, regions around the Arctic Circle. The most common causes of the Northern Lights can be attributed to two main processes involving electromagnetic radiation from the Sun and the Earth's magnetic field.

Primary Cause: Solar Wind Interactions

  1. Solar Wind: The Sun continuously emits a stream of

[PROMPT 9] <|im_start|>user
What are the health benefits of drinking green tea?<|im_end|>
<|im_start|>assistant
: Drinking green tea is associated with a wide range of health benefits, largely due to its rich antioxidant content, particularly catechins such as epigallocatechin gallate (EGCG). Here are some of the key health benefits of green tea:

  1. Antioxidant Protection: Green tea is one of the richest sources of antioxidants, including catechins and polyphenols. These compounds help neutralize harmful free radicals that

[03/11/2026-22:35:30] [TRT-LLM AUTO-DEPLOY] [RANK 0] [I] Destroying process group
[03/11/2026-22:35:30] [TRT-LLM AUTO-DEPLOY] [RANK 1] [I] Destroying process group


</details>

### Reproduce
```bash
python examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registry

@lucaslie lucaslie merged commit cb8dbb2 into feat/paperclip_maximizer Mar 13, 2026
1 of 3 checks passed
bmarimuthu-nv pushed a commit that referenced this pull request Mar 13, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
bmarimuthu-nv pushed a commit that referenced this pull request Mar 13, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
bmarimuthu-nv pushed a commit that referenced this pull request Mar 14, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
bmarimuthu-nv pushed a commit that referenced this pull request Mar 18, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
bmarimuthu-nv pushed a commit that referenced this pull request Mar 25, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
bmarimuthu-nv pushed a commit that referenced this pull request Apr 1, 2026
* [None][feat] Add AD custom model for InternLM3 family

Add a lean prefill-only custom model implementation for the InternLM3
architecture (GQA + SwiGLU MLP + RMSNorm + dynamic NTK-scaled RoPE)
using AutoDeploy canonical ops (torch_attention, torch_rmsnorm,
torch_rope_with_explicit_cos_sin).

Includes hierarchical equivalence tests (block, layer, full model,
export) and bundles a minimal InternLM3Config since the model is not
natively in transformers.

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* [None][feat] Address review: remove bundled config, document inline refs

Remove the bundled InternLM3Config from the modeling file. The AD
pipeline loads the config from the HF checkpoint via trust_remote_code=True
(same pattern as DeciLM).

The test file now loads InternLM3Config dynamically from the HF cache.
Inline HF reference classes are kept because the HF modeling_internlm3.py
cannot be imported on the installed transformers version (requires
LossKwargs from transformers >=4.48).

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <lliebenwein@nvidia.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants