Commit 2d6fc65
authored
Bump 3rdparty/Megatron-LM from
Bumps [3rdparty/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) from `fb7c3f8` to `5e79811`.
- [Release notes](https://github.com/NVIDIA/Megatron-LM/releases)
- [Commits](NVIDIA/Megatron-LM@fb7c3f8...5e79811)
---
updated-dependencies:
- dependency-name: 3rdparty/Megatron-LM
dependency-version: 5e798111e60f45e82c336ef7b89d8d793c93208f
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>fb7c3f8 to 5e79811
1 parent 7d8b81b commit 2d6fc65
1 file changed
Lines changed: 1 addition & 1 deletion
Submodule Megatron-LM updated 97 files
- docs/source/api-guide/encoder_decoder_parallelism.rst-54
- docs/source/api-guide/index.rst-1
- examples/gpt3/gpt_config.yaml-1
- examples/inference/gpt/gpt_dynamic_inference.py+25-2
- examples/inference/gpt/gpt_static_inference.py+7-15
- examples/inference/gpt/utils.py+48-21
- examples/mimo/train.py+1-1
- examples/multimodal/dataloader_provider.py+7-13
- examples/multimodal/model.py+1-34
- examples/multimodal/run_text_generation.py+3-10
- examples/multimodal/train.py+7-15
- megatron/core/enums.py+10-3
- megatron/core/inference/contexts/dynamic_context.py+38-2
- megatron/core/inference/contexts/static_context.py+1-1
- megatron/core/inference/engines/dynamic_engine.py+18-3
- megatron/core/inference/model_inference_wrappers/multimodal/vlm_inference_wrapper.py+2-9
- megatron/core/inference/text_generation_controllers/text_generation_controller.py+17-2
- megatron/core/model_parallel_config.py-5
- megatron/core/models/T5/t5_model.py+2-7
- megatron/core/models/common/embeddings/rotary_pos_embedding.py+1-1
- megatron/core/models/common/language_module/language_module.py+57-17
- megatron/core/models/gpt/gpt_model.py+8-1
- megatron/core/models/mamba/mamba_model.py+5-10
- megatron/core/models/multimodal/llava_model.py+19-4
- megatron/core/models/vision/clip_vit_model.py+9
- megatron/core/models/vision/multimodal_projector.py+10-1
- megatron/core/models/vision/radio.py+7
- megatron/core/optimizer/__init__.py+14-2
- megatron/core/optimizer/distrib_optimizer.py+54-6
- megatron/core/optimizer/optimizer.py+27-1
- megatron/core/parallel_state.py+40-449
- megatron/core/pipeline_parallel/p2p_communication.py+25-68
- megatron/core/pipeline_parallel/schedules.py+12-73
- megatron/core/pipeline_parallel/utils.py+57-1
- megatron/core/transformer/cuda_graphs.py+1-2
- megatron/core/transformer/enums.py+8-1
- megatron/core/transformer/moe/router.py+10
- megatron/core/transformer/multi_latent_attention.py+9-3
- megatron/core/transformer/transformer_config.py+2-2
- megatron/core/transformer/transformer_layer.py-4
- megatron/legacy/model/module.py-14
- megatron/legacy/model/transformer.py+1-13
- megatron/post_training/algos/distillation.py-3
- megatron/training/arguments.py+8-55
- megatron/training/checkpointing.py+2-6
- megatron/training/initialize.py-18
- megatron/training/training.py+37-23
- megatron/training/yaml_arguments.py+1-7
- pretrain_t5.py+5-48
- pretrain_vlm.py+14-67
- pyproject.toml+3-1
- tests/functional_tests/python_test_utils/test_inference_regular_pipeline.py+54-20
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_resume_torch_decoupled_lr_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_resume_torch_dist_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_param_gather_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_untied_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_resume_torch_dist_tunable_overlap_dgx_a100_1N8G/model_config.yaml+2-2
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp1_pp4_vp1_tunable_overlap_dgx_a100_1N8G/model_config.yaml+2-2
- tests/functional_tests/test_cases/gpt/gpt3_mr_mcore_te_tp2_pp2_resume_torch_dist_no_create_attention_mask_in_dataloader_dgx_a100_1N8G/model_config.yaml+2-1
- tests/functional_tests/test_cases/gpt/gpt_dynamic_inference_tp1_pp1_583m_logitsmatch/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/gpt/gpt_dynamic_inference_tp1_pp1_583m_logitsmatch/model_config.yaml+53
- tests/functional_tests/test_cases/gpt/gpt_inference_tp1_pp1_583m_cudagraphs/golden_values_dev_dgx_h100.json-155
- tests/functional_tests/test_cases/gpt/gpt_inference_tp1_pp1_583m_logitsmatch/golden_values_dev_dgx_h100.json-156
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch/README.md+10
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch/golden_values_dev_dgx_a100.json+10
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch/golden_values_dev_dgx_h100.json+10
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch/model_config.yaml+80
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch/test_prompts.jsonl+2
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_583m_cudagraphs/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_583m_cudagraphs/model_config.yaml
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_583m_logitsmatch/golden_values_dev_dgx_a100.json
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_583m_logitsmatch/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/gpt/gpt_static_inference_tp1_pp1_583m_logitsmatch/model_config.yaml
- tests/functional_tests/test_cases/hybrid/hybrid_inference_tp1_pp1_2B_cudagraphs/golden_values_dev_dgx_h100.json-1
- tests/functional_tests/test_cases/hybrid/hybrid_inference_tp1_pp1_2B_logitsmatch/golden_values_dev_dgx_h100.json-1
- tests/functional_tests/test_cases/hybrid/hybrid_static_inference_tp1_pp1_2B_cudagraphs/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/hybrid/hybrid_static_inference_tp1_pp1_2B_cudagraphs/model_config.yaml
- tests/functional_tests/test_cases/hybrid/hybrid_static_inference_tp1_pp1_2B_logitsmatch/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/hybrid/hybrid_static_inference_tp1_pp1_2B_logitsmatch/model_config.yaml
- tests/functional_tests/test_cases/moe/gpt_inference_tp1_pp1_ep1_16B_logitsmatch/golden_values_dev_dgx_h100.json-1
- tests/functional_tests/test_cases/moe/gpt_static_inference_tp1_pp1_ep1_16B_logitsmatch/golden_values_dev_dgx_h100.json+1
- tests/functional_tests/test_cases/moe/gpt_static_inference_tp1_pp1_ep1_16B_logitsmatch/model_config.yaml
- tests/test_utils/recipes/gpt-dynamic-inference.yaml+2-7
- tests/test_utils/recipes/gpt-static-inference.yaml+75
- tests/test_utils/recipes/gpt.yaml+18-18
- tests/test_utils/recipes/mamba-static-inference.yaml+2-2
- tests/test_utils/recipes/moe-static-inference.yaml+1-1
- tests/unit_tests/dist_checkpointing/test_optimizer.py+179-1
- tests/unit_tests/inference/contexts/test_dynamic_context.py+125
- tests/unit_tests/models/test_gpt_model.py+84
- tests/unit_tests/models/test_llava_model.py+2-5
- tests/unit_tests/models/test_mamba_model.py+12-1
- tests/unit_tests/transformer/test_multi_latent_attention.py+88
- tools/autoformat.sh+5
- tools/checkpoint/loader_llava.py+2-2
- tools/checkpoint/saver_llava.py-1
0 commit comments