Skip to content

Bump 3rdparty/Megatron-LM from f8d3e2e to 188435a#1071

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/submodules/main/3rdparty/Megatron-LM-188435a
Closed

Bump 3rdparty/Megatron-LM from f8d3e2e to 188435a#1071
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/submodules/main/3rdparty/Megatron-LM-188435a

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Aug 26, 2025

Copy link
Copy Markdown
Contributor

Bumps 3rdparty/Megatron-LM from f8d3e2e to 188435a.

Commits
  • 188435a ci(hotfix): Increase non-determinism attempts
  • f8f6e9b ci(hotfix): Restart on zmq error
  • 4840669 chore: Version bump
  • c7fd91a ci(hotfix): Increase n_nondeterminism_attemps
  • f364164 Merge branch 'ko3n1g/ci/packaging' into 'main'
  • 1c29678 ADLR/megatron-lm!3876 - build: Bump packaging
  • 3d784cb Merge branch 'mblaz/dp_zero_model_space' into 'main'
  • 3d19693 ADLR/megatron-lm!3532 - Implement new optimizer checkpoint formats for DistOpt
  • f6a675a Merge branch 'entity' into 'main'
  • c40a446 ADLR/megatron-lm!3864 - add wandb_entity
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot Bot added dependencies Pull requests that update a dependency file submodules Pull requests that update Submodules code labels Aug 26, 2025
@dependabot dependabot Bot requested a review from dorotat-nv as a code owner August 26, 2025 04:20
@dependabot dependabot Bot added the dependencies Pull requests that update a dependency file label Aug 26, 2025
@dependabot dependabot Bot added the submodules Pull requests that update Submodules code label Aug 26, 2025
@copy-pr-bot

copy-pr-bot Bot commented Aug 26, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pstjohn pstjohn enabled auto-merge August 26, 2025 17:21
@pstjohn pstjohn disabled auto-merge August 26, 2025 17:22
@pstjohn pstjohn enabled auto-merge August 26, 2025 17:36
@codecov-commenter

Copy link
Copy Markdown

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
1194 3 1191 48
View the top 3 failed test(s) by shortest run time
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_evo2.py::test_forward[evo2/1b-8k:1.0-expected_matchpercents1]
Stack Traces | 30s run time
sequences = ['GAATAGGAACAGCTCCGGTCTACAGCTCCCAGCGTGAGCGACGCAGAAGACGGTGATTTCTGCATTTCCATCTGAGGTACCGGGTTCATCTCACTAGGGAGTGCCAGACAGTGGGC...CTCCATGACTTTTTCAAAAAGGTATTAGAAAAACCATTTCATAACTTTGTCAAAGTTAAATTATAGGCTAAATCCTATATATCTTAATGGCACATGCAGCGCAAGTAGGTCTACAAG']
ckpt_name = 'evo2/1b-8k:1.0'
expected_matchpercents = [96.27, 67.93, 77.5, 80.3]

    @pytest.mark.parametrize(
        "ckpt_name,expected_matchpercents",
        [
            ("evo2/1b-8k-bf16:1.0", [96.27, 67.93, 77.50, 80.30]),
            ("evo2/1b-8k:1.0", [96.27, 67.93, 77.50, 80.30]),
            ("evo2/7b-8k:1.0", [97.60, 89.63, 80.03, 84.57]),
            ("evo2/7b-1m:1.0", [97.60, 89.63, 80.03, 84.57]),
        ],
    )
    def test_forward(sequences: list[str], ckpt_name: str, expected_matchpercents: list[float]):
        assert len(sequences) > 0
        seq_len_cap = determine_memory_requirement_and_skip_if_not_met(ckpt_name)
    
        is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
        skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
        if skip:
            # This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.
            pytest.skip(f"Skipping {ckpt_name} because it is not supported on {device_info} ({compute_capability})")
        vortex_style_fp8 = is_fp8_supported and "bf16" not in ckpt_name
        inference_wrapped_model, mcore_tokenizer = get_model_and_tokenizer(
            ckpt_name, vortex_style_fp8=vortex_style_fp8, flash_decode=True, enable_flash_decode=True
        )
        matchrates = []
        for seq in sequences:
            seq = seq[:seq_len_cap]  # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
            with torch.no_grad():
                device = torch.cuda.current_device()
                tokens = torch.tensor([mcore_tokenizer.tokenize(seq)], device=device)
                forward_args = {
                    "tokens": tokens,
                    "position_ids": None,
                    "attention_mask": None,
                }
    
                inference_wrapped_model.prep_model_for_inference(prompts_tokens=None)
                logits = inference_wrapped_model.run_one_forward_step(forward_args)
                inference_wrapped_model.inference_context.reset()
    
                from megatron.core.inference.communication_utils import broadcast_from_last_pipeline_stage
    
                batch_size, context_length, vocab_size = 1, len(seq), 512
>               logits = broadcast_from_last_pipeline_stage(
                    [batch_size, context_length, vocab_size],
                    dtype=inference_wrapped_model.inference_wrapper_config.params_dtype,
                    tensor=logits,
                )

.../bionemo/evo2/test_evo2.py:437: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

size = [1, 6000, 512], dtype = torch.bfloat16
tensor = tensor([[[ -4.2188, -46.0000, -46.0000, -46.0000, -46.0000, -46.0000, -46.0000,
          -46.0000, -46.0000, -46.0000...6.0000, -46.0000, -46.0000, -46.0000, -46.2500, -46.0000,
          -46.0000]]], device='cuda:0', dtype=torch.bfloat16)
pp_group = <torch.distributed.distributed_c10d.ProcessGroup object at 0x79ef081eb830>

    def broadcast_from_last_pipeline_stage(
        size: List[int],
        dtype: torch.dtype,
        tensor: Optional[torch.Tensor] = None,
        pp_group: Optional[ProcessGroup] = None,
    ):
        """Broadcast a tensor from last pipeline stage to all ranks.
    
        Args:
            size: Expected tensor size
            dtype: Expected tensor dtype
            tensor: Tensor to broadcast (only on last stage)
            pp_group: Custom process group (if None, uses global state)
        """
        # Use custom process group or fall back to global state
        if pp_group is None:
            pp_group = parallel_state.get_pipeline_model_parallel_group()
            last_rank = parallel_state.get_pipeline_model_parallel_last_rank()
    
            # add ignore_virtual=True since vpp is not used in inference
            is_last_stage = parallel_state.is_pipeline_last_stage(ignore_virtual=True)
        else:
            # Lists of ProcessGroups are used for multimodal inference but not supported here
            assert isinstance(
                pp_group, ProcessGroup
            ), "pp_group must be a single ProcessGroup, not a list of ProcessGroups"
            last_rank = torch.distributed.get_process_group_ranks(pp_group)[pp_group.size() - 1]
            is_last_stage = pp_group.rank() == pp_group.size() - 1
    
        if is_last_stage:
>           assert size == list(
                tensor.shape
            ), f"Expected tensor of shape {size} but got {list(tensor.shape)}"
E           AssertionError: Expected tensor of shape [1, 6000, 512] but got [1, 1, 512]

.../local/lib/python3.12.../core/inference/communication_utils.py:64: AssertionError
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_evo2.py::test_forward_manual[evo2/1b-8k-bf16:1.0-expected_matchpercents0-True]
Stack Traces | 32.8s run time
sequences = ['GAATAGGAACAGCTCCGGTCTACAGCTCCCAGCGTGAGCGACGCAGAAGACGGTGATTTCTGCATTTCCATCTGAGGTACCGGGTTCATCTCACTAGGGAGTGCCAGACAGTGGGC...CTCCATGACTTTTTCAAAAAGGTATTAGAAAAACCATTTCATAACTTTGTCAAAGTTAAATTATAGGCTAAATCCTATATATCTTAATGGCACATGCAGCGCAAGTAGGTCTACAAG']
ckpt_name = 'evo2/1b-8k-bf16:1.0'
expected_matchpercents = [96.27, 67.93, 77.5, 80.3], flash_decode = True

    @pytest.mark.parametrize(
        "ckpt_name,expected_matchpercents,flash_decode",
        [
            # Try flash decode with one and not the other to verify that both paths work.
            ("evo2/1b-8k-bf16:1.0", [96.27, 67.93, 77.50, 80.30], True),
            ("evo2/1b-8k:1.0", [96.27, 67.93, 77.50, 80.30], False),
            ("evo2/7b-8k:1.0", [97.60, 89.63, 80.03, 84.57], False),
            ("evo2/7b-1m:1.0", [97.60, 89.63, 80.03, 84.57], False),
        ],
    )
    def test_forward_manual(sequences: list[str], ckpt_name: str, expected_matchpercents: list[float], flash_decode: bool):
        assert len(sequences) > 0
        seq_len_cap = determine_memory_requirement_and_skip_if_not_met(ckpt_name, flash_decode)
    
        is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
        skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
    
        vortex_style_fp8 = is_fp8_supported and "bf16" not in ckpt_name
        if skip:
            # This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.
            pytest.skip(f"Skipping {ckpt_name} because it is not supported on {device_info} ({compute_capability})")
        with distributed_model_parallel_state(), torch.no_grad():
            tokenizer = get_nmt_tokenizer(
                "byte-level",
            )
            flash_decode_kwargs: dict[str, Any] = {"flash_decode": flash_decode}
            if flash_decode:
                flash_decode_kwargs["attention_backend"] = AttnBackend.flash
            if "1b-8k" in ckpt_name:
                model_config = llm.Hyena1bConfig(
                    use_te=True,
                    seq_length=8192,
                    vortex_style_fp8=vortex_style_fp8,
                    **flash_decode_kwargs,
                )
            elif "7b-8k" in ckpt_name:
                model_config = llm.Hyena7bConfig(
                    use_te=True,
                    seq_length=8192,
                    vortex_style_fp8=vortex_style_fp8,
                    **flash_decode_kwargs,
                )
            elif "7b-1m" in ckpt_name:
                model_config = llm.Hyena7bARCLongContextConfig(
                    use_te=True,
                    seq_length=8192,
                    vortex_style_fp8=vortex_style_fp8,
                    **flash_decode_kwargs,
                )
            else:
                raise NotImplementedError
            ckpt_weights: Path = load(ckpt_name) / "weights"
            raw_megatron_model = model_config.configure_model(tokenizer).eval().cuda()
            device = raw_megatron_model.parameters().__next__().device
            load_weights_sharded_inplace_nemo2_to_mcore(raw_megatron_model, ckpt_weights, {}, "torch_dist")
            model = Float16Module(model_config, raw_megatron_model)
            if flash_decode:
                inference_context = HyenaInferenceContext(max_batch_size=1, max_sequence_length=8192)
                forward_kwargs = {"runtime_gather_output": True, "inference_context": inference_context}
            else:
                forward_kwargs = {}
            matchrates = []
            for seq in sequences:
                seq = seq[
                    :seq_len_cap
                ]  # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
                with torch.no_grad():
                    device = torch.cuda.current_device()
                    # tokens = torch.tensor([tokenizer.tokenize(seq)], device=device)
                    input_ids = torch.tensor(tokenizer.text_to_ids(seq)).int().unsqueeze(0).to(device)
                    attention_mask = None
                    # when labels is None, the model returns logits
                    logits = model(
                        input_ids=input_ids,
                        position_ids=None,
                        attention_mask=attention_mask,
                        labels=None,
                        **forward_kwargs,
                    )
                    if flash_decode:
                        forward_kwargs["inference_context"].reset()
>                   matchrate = calc_matchrate(tokenizer=tokenizer, in_seq=seq, logits=logits)

.../bionemo/evo2/test_evo2.py:535: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def calc_matchrate(*, tokenizer, in_seq, logits):
        softmax_logprobs = torch.log_softmax(logits, dim=-1)
        softmax_logprobs = softmax_logprobs[:, :-1]
        o = softmax_logprobs.argmax(dim=-1)[0]
        if hasattr(tokenizer, "tokenize"):
            i = torch.tensor(tokenizer.tokenize(in_seq[1:]), device=o.device)
        else:
            i = torch.tensor(tokenizer.text_to_ids(in_seq[1:]), device=o.device)
>       return (i == o).sum().item() / (i.size()[0] - 1)
E       RuntimeError: The size of tensor a (5999) must match the size of tensor b (0) at non-singleton dimension 0

.../bionemo/evo2/test_evo2.py:377: RuntimeError
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_evo2.py::test_forward[evo2/1b-8k-bf16:1.0-expected_matchpercents0]
Stack Traces | 61.9s run time
sequences = ['GAATAGGAACAGCTCCGGTCTACAGCTCCCAGCGTGAGCGACGCAGAAGACGGTGATTTCTGCATTTCCATCTGAGGTACCGGGTTCATCTCACTAGGGAGTGCCAGACAGTGGGC...CTCCATGACTTTTTCAAAAAGGTATTAGAAAAACCATTTCATAACTTTGTCAAAGTTAAATTATAGGCTAAATCCTATATATCTTAATGGCACATGCAGCGCAAGTAGGTCTACAAG']
ckpt_name = 'evo2/1b-8k-bf16:1.0'
expected_matchpercents = [96.27, 67.93, 77.5, 80.3]

    @pytest.mark.parametrize(
        "ckpt_name,expected_matchpercents",
        [
            ("evo2/1b-8k-bf16:1.0", [96.27, 67.93, 77.50, 80.30]),
            ("evo2/1b-8k:1.0", [96.27, 67.93, 77.50, 80.30]),
            ("evo2/7b-8k:1.0", [97.60, 89.63, 80.03, 84.57]),
            ("evo2/7b-1m:1.0", [97.60, 89.63, 80.03, 84.57]),
        ],
    )
    def test_forward(sequences: list[str], ckpt_name: str, expected_matchpercents: list[float]):
        assert len(sequences) > 0
        seq_len_cap = determine_memory_requirement_and_skip_if_not_met(ckpt_name)
    
        is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
        skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
        if skip:
            # This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.
            pytest.skip(f"Skipping {ckpt_name} because it is not supported on {device_info} ({compute_capability})")
        vortex_style_fp8 = is_fp8_supported and "bf16" not in ckpt_name
        inference_wrapped_model, mcore_tokenizer = get_model_and_tokenizer(
            ckpt_name, vortex_style_fp8=vortex_style_fp8, flash_decode=True, enable_flash_decode=True
        )
        matchrates = []
        for seq in sequences:
            seq = seq[:seq_len_cap]  # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
            with torch.no_grad():
                device = torch.cuda.current_device()
                tokens = torch.tensor([mcore_tokenizer.tokenize(seq)], device=device)
                forward_args = {
                    "tokens": tokens,
                    "position_ids": None,
                    "attention_mask": None,
                }
    
                inference_wrapped_model.prep_model_for_inference(prompts_tokens=None)
                logits = inference_wrapped_model.run_one_forward_step(forward_args)
                inference_wrapped_model.inference_context.reset()
    
                from megatron.core.inference.communication_utils import broadcast_from_last_pipeline_stage
    
                batch_size, context_length, vocab_size = 1, len(seq), 512
>               logits = broadcast_from_last_pipeline_stage(
                    [batch_size, context_length, vocab_size],
                    dtype=inference_wrapped_model.inference_wrapper_config.params_dtype,
                    tensor=logits,
                )

.../bionemo/evo2/test_evo2.py:437: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

size = [1, 6000, 512], dtype = torch.bfloat16
tensor = tensor([[[ -5.1250, -41.7500, -41.7500, -41.7500, -41.7500, -41.7500, -41.7500,
          -41.7500, -41.7500, -41.7500...1.7500, -41.7500, -41.7500, -41.7500, -41.7500, -41.7500,
          -41.7500]]], device='cuda:0', dtype=torch.bfloat16)
pp_group = <torch.distributed.distributed_c10d.ProcessGroup object at 0x79ef081eb830>

    def broadcast_from_last_pipeline_stage(
        size: List[int],
        dtype: torch.dtype,
        tensor: Optional[torch.Tensor] = None,
        pp_group: Optional[ProcessGroup] = None,
    ):
        """Broadcast a tensor from last pipeline stage to all ranks.
    
        Args:
            size: Expected tensor size
            dtype: Expected tensor dtype
            tensor: Tensor to broadcast (only on last stage)
            pp_group: Custom process group (if None, uses global state)
        """
        # Use custom process group or fall back to global state
        if pp_group is None:
            pp_group = parallel_state.get_pipeline_model_parallel_group()
            last_rank = parallel_state.get_pipeline_model_parallel_last_rank()
    
            # add ignore_virtual=True since vpp is not used in inference
            is_last_stage = parallel_state.is_pipeline_last_stage(ignore_virtual=True)
        else:
            # Lists of ProcessGroups are used for multimodal inference but not supported here
            assert isinstance(
                pp_group, ProcessGroup
            ), "pp_group must be a single ProcessGroup, not a list of ProcessGroups"
            last_rank = torch.distributed.get_process_group_ranks(pp_group)[pp_group.size() - 1]
            is_last_stage = pp_group.rank() == pp_group.size() - 1
    
        if is_last_stage:
>           assert size == list(
                tensor.shape
            ), f"Expected tensor of shape {size} but got {list(tensor.shape)}"
E           AssertionError: Expected tensor of shape [1, 6000, 512] but got [1, 1, 512]

.../local/lib/python3.12.../core/inference/communication_utils.py:64: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@pstjohn

pstjohn commented Aug 27, 2025

Copy link
Copy Markdown
Collaborator

@dependabot recreate

@dependabot dependabot Bot force-pushed the dependabot/submodules/main/3rdparty/Megatron-LM-188435a branch from a8144c9 to fee2564 Compare August 27, 2025 22:29
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `f8d3e2e` to `188435a`.
- [Release notes](https://github.com/NVIDIA/Megatron-LM/releases)
- [Commits](NVIDIA/Megatron-LM@f8d3e2e...188435a)

---
updated-dependencies:
- dependency-name: 3rdparty/Megatron-LM
  dependency-version: 188435a1d00a7ed29fdd169a17a36f75b496a558
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/submodules/main/3rdparty/Megatron-LM-188435a branch from fee2564 to b44e9c7 Compare August 28, 2025 23:55
@coderabbitai

coderabbitai Bot commented Aug 28, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbit review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Join our Discord community for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dependabot @github

dependabot Bot commented on behalf of github Sep 2, 2025

Copy link
Copy Markdown
Contributor Author

Superseded by #1095.

@dependabot dependabot Bot closed this Sep 2, 2025
auto-merge was automatically disabled September 2, 2025 00:37

Pull request was closed

@dependabot dependabot Bot deleted the dependabot/submodules/main/3rdparty/Megatron-LM-188435a branch September 2, 2025 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file submodules Pull requests that update Submodules code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants