Skip to content

[megatron] Support deepseek-v4 megatron#9386

Open
Jintao-Huang wants to merge 2 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_megatron
Open

[megatron] Support deepseek-v4 megatron#9386
Jintao-Huang wants to merge 2 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_megatron

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation to indicate support for DeepSeek-V4 models and adds new configuration arguments to the MegatronArguments class, such as csa_dense_mode and use_fused_mhc. A suggestion was made to improve code readability by adding a blank line before the new DeepSeek-V4 argument section in megatron_args.py to maintain consistent formatting.

# dsa
dsa_indexer_loss_coeff: Optional[float] = None
dsa_indexer_use_sparse_loss: bool = False
# deepseek-v4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add a blank line before the # deepseek-v4 comment to maintain consistent section separation and improve readability. The previous section (dsa) ends at line 626, and a blank line would help distinguish the new group of arguments, following the style used for the # other section below.

Suggested change
# deepseek-v4
# deepseek-v4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant