[None][feat] Add AutoDeploy custom model for InternLM3#218
Closed
govind-ramnarayan wants to merge 9 commits into
Closed
[None][feat] Add AutoDeploy custom model for InternLM3#218govind-ramnarayan wants to merge 9 commits into
govind-ramnarayan wants to merge 9 commits into
Conversation
7c178af to
b52a2dc
Compare
Author
Unit Test ResultsRan Result: 16/16 passed ✅ Full output |
b52a2dc to
1c97346
Compare
Adds a prefill-only AutoDeploy custom model for internlm/internlm3-8b-instruct
(and any InternLM3-family model sharing the same architecture).
Key implementation details:
- Bundles InternLM3Config since it is not part of standard transformers
(model uses auto_map; the checkpoint's modeling_internlm3.py imports
LossKwargs which is absent from transformers >=4.50)
- Registers config via AutoConfig.register("internlm3", ..., exist_ok=True)
- Uses canonical AD ops: torch_rmsnorm, torch_rope_with_explicit_cos_sin,
torch_attention
- GQA (32Q / 2KV heads) handled natively by torch_attention — no repeat_kv
- Dynamic NTK RoPE (factor=6.0) precomputed at init; full cos/sin table
returned from RotaryEmbedding.forward(), sliced per-layer in attention
- Prefill-only: no KV cache, no attention mask, no training paths
Also adds hierarchical equivalence tests (RMSNorm, MLP, Attention, DecoderLayer,
full model, torch.export) using inline HF reference classes.
Verified with:
python examples/auto_deploy/build_and_run_ad.py \
--model internlm/internlm3-8b-instruct --use-registry
Coherent generation confirmed on all 10 prompts with 48 full layers on 2xGPU.
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
… masking The _RefInternLM3Attention used non-causal (full) attention in its reference implementation, while the custom InternLM3Attention uses is_causal=True in torch_attention. Replace the manual matmul/softmax with F.scaled_dot_product_attention(is_causal=True) to correctly match the causal behavior of the AD custom model. Signed-off-by: Govind Ramnarayan <gramnarayan@nvidia.com> Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
1c97346 to
6b1270e
Compare
- Remove bundled InternLM3Config: AutoConfig.from_pretrained with trust_remote_code=True loads it from the HF snapshot, and the AD factory lookup uses type(config).__name__ which equals "InternLM3Config" in both cases. Also removes keys_to_ignore_at_inference which is not needed for this prefill-only model. - Update unit tests to load HF reference classes directly from the local HF snapshot via importlib.util (synthetic package to handle relative imports). Stubs LossKwargs in transformers.utils to work around the version mismatch in the installed transformers. Signed-off-by: Govind Ramnarayan <gramnarayan@nvidia.com> Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…ons in InternLM3 tests Remove the HF snapshot path dependency (importlib.util + hardcoded /lustre/.../snapshots/... path) and LossKwargs stub. Replace with self-contained inline reference classes (_RefInternLM3Config, _RefInternLM3RMSNorm, _RefInternLM3RotaryEmbedding, _RefInternLM3MLP, _RefInternLM3Attention, _RefInternLM3DecoderLayer, _RefInternLM3ForCausalLM) copied from the HF source. All 16 unit tests continue to pass. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
… use Add InternLM3Config back to modeling_internlm3.py so tests and tooling can instantiate it directly without AutoConfig.from_pretrained or the HuggingFace Hub. The class matches the HF checkpoint config fields (vocab_size, hidden_size, qkv_bias, head_dim, rope_scaling, etc.) and runs rope_config_validation on init. At inference time the config loaded via trust_remote_code also has __name__ == "InternLM3Config", so the AD factory registration is unaffected. No AutoConfig.register call — AutoConfig continues to work out of the box via trust_remote_code. Update tests to import InternLM3Config from the modeling module directly, removing the standalone _RefInternLM3Config workaround. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
InternLM3Config does not belong in the modeling file — the AD factory looks up configs by class name at runtime (from the real checkpoint's trust_remote_code config), so bundling a duplicate config class in the modeling file is unnecessary and misleading. Move it to the test file where it is only used to instantiate small synthetic configs for unit tests without hitting AutoConfig or the HuggingFace Hub. Also update all type hints in the modeling file back to PretrainedConfig and restore config_class = PretrainedConfig on InternLM3PreTrainedModel. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…config) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…needed) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…ther AD custom models) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
| custom_sd = {} | ||
| for key, value in ref_sd.items(): | ||
| if key.startswith("lm_head"): | ||
| custom_sd[key] = value |
Author
There was a problem hiding this comment.
Is this bc the hf checkpoint is missing the lm_head attribute?
|
merged #222 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
internlm/internlm3-8b-instruct(InternLM3 family)InternLM3Configsince the model usesauto_mapand is not in standardtransformers(the checkpoint'smodeling_internlm3.pyimportsLossKwargswhich is absent in transformers ≥4.50)torch_rmsnorm,torch_rope_with_explicit_cos_sin,torch_attention; GQA handled natively (norepeat_kv)RotaryEmbedding.forward(), sliced per-layer in each attention blockAutoConfig.register("internlm3", ..., exist_ok=True)so existing registry entry works out-of-the-boxEnd-to-end verification
Reproducible command:
(Requires 2 GPUs — model is registered with
world_size_2.yaml)Result: Coherent generation confirmed on all 10 prompts with 48 full layers on 2×GPU (H100/A100). All graph transforms (
insert_cached_attention) matched all 48 attention layers. CUDA graphs captured for batch sizes 64, 48, 32, 16, 1.Unit tests
Tests cover (hierarchical):
torch_export_to_gmwith dynamic batch and sequence dimensions, verified at two input shapesTest plan
pytest tests/unittest/auto_deploy/singlegpu/models/test_internlm3_modeling.py -vpython examples/auto_deploy/build_and_run_ad.py --model internlm/internlm3-8b-instruct --use-registryon 2 GPUs🤖 Generated with Claude Code