Skip to content

[model] feat: Add YARN support for mamba_model from MCORE#3289

Draft
guihong-nv wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
guihong-nv:yarn_to_mamba
Draft

[model] feat: Add YARN support for mamba_model from MCORE#3289
guihong-nv wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
guihong-nv:yarn_to_mamba

Conversation

@guihong-nv
Copy link
Copy Markdown

@guihong-nv guihong-nv commented Apr 12, 2026

What does this PR do ?

Add the YARN support for mamba_model from MCORE.

Changelog

src/megatron/bridge/models/mamba/mamba_builder.py

  • Added "yarn" to the position_embedding_type Literal
  • Added 7 YaRN config fields (yarn_rotary_scaling_factor, yarn_original_max_position_embeddings, yarn_beta_fast, yarn_beta_slow, yarn_mscale, yarn_mscale_all_dim, yarn_correction_range_round_to_int)
  • In build_model(), injects YaRN fields onto the embedded TransformerConfig before constructing MCoreMambaModel

src/megatron/bridge/models/mamba/mamba_provider.py

  • Same 7 YaRN fields added as dataclass fields (directly accessible by MCore since the provider IS the TransformerConfig)
  • In provide(), computes yarn_original_max_position_embeddings default from seq_length / yarn_rotary_scaling_factor when not explicitly set

tests/unit_tests/models/mamba/test_mamba_builder.py

  • TestMambaModelConfigYarnDefaults: field defaults, type acceptance, mutability
  • TestMambaModelBuilderBuildModelWithYarn: YaRN attr injection onto transformer config, default/explicit yarn_original_max_position_embeddings, no injection for non-YaRN types

tests/unit_tests/models/mamba/test_mamba_provider.py

  • TestMambaModelProviderYarnDefaults: field defaults and type acceptance
  • TestMambaModelProviderProvideWithYarn: default computation, explicit value preservation, no injection for non-YaRN types

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 12, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: guihong-nv <guihongl@nvidia.com>
@guihong-nv guihong-nv changed the title Add YARN support for mamba_model from MCORE [model] feat: Add YARN support for mamba_model from MCORE Apr 12, 2026
@yaoyu-33 yaoyu-33 added area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-author Author action is required before review or merge can continue labels Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-author Author action is required before review or merge can continue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants