[model] support bailing#55
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for the bailing_moe model by adding it to the model constants, exporting it in the GPTs module, and implementing the BailingMoeBridge class. Additionally, the .gitignore was updated and the configuration parser now includes score_function in its mapping. Feedback indicates that the bailing_moe model type should be explicitly handled in the configuration conversion logic to ensure that qk_layernorm is enabled and the router score function is correctly set to sigmoid.
| 'q_lora_rank': ['q_lora_rank'], | ||
| 'kv_lora_rank': ['kv_lora_rank'], | ||
| 'moe_router_score_function': ['scoring_func', 'moe_router_use_sigmoid'], | ||
| 'moe_router_score_function': ['scoring_func', 'moe_router_use_sigmoid', 'score_function'], |
There was a problem hiding this comment.
While adding score_function to the config_mapping is correct, the bailing_moe model type should also be explicitly handled in the hf_to_mcore_config function (around line 120 and 164) to ensure qk_layernorm is enabled and the router score function is set to sigmoid. The bridge definition in bailing_moe.py includes QK normalization keys and expert bias, which strongly suggests these configurations are required for the model to function correctly in Megatron-Core.
No description provided.