Finetune GLM-5.1 (zai-org/GLM-5.1) in NeMo Automodel #1719
HuiyingLi
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
GLM-5.1 is support in NeMo-Automodel thanks to @hemildesai !
https://huggingface.co/zai-org/GLM-5.1 is Zhipu AI's latest open-source large Mixture-of-Experts model featuring a DeepSeek-style MLA + DSA (Dynamic Sparse Attention) architecture:
Parallel Setup
We provide a fine-tuning https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml for GLM-5.1 that scales training using Expert Parallelism and Pipeline Parallelism. The configuration runs
with EP=64 and PP=4 across 32 nodes (8x H100 GPUs per node).
distributed:
strategy: fsdp2
tp_size: 1
cp_size: 1
pp_size: 4
ep_size: 64
Data
We use the https://huggingface.co/datasets/rowan/hellaswag dataset as an example. HellaSwag is a commonsense reasoning benchmark for evaluating language models on sentence completion tasks.
Below is the loss curve obtained when fine-tuning on HellaSwag with this recipe:
Beta Was this translation helpful? Give feedback.
All reactions