Finetune GLM-5.1 (zai-org/GLM-5.1) in NeMo Automodel #1719

HuiyingLi · 2026-04-07T19:32:40Z

HuiyingLi
Apr 7, 2026
Maintainer

GLM-5.1 is support in NeMo-Automodel thanks to @hemildesai !
https://huggingface.co/zai-org/GLM-5.1 is Zhipu AI's latest open-source large Mixture-of-Experts model featuring a DeepSeek-style MLA + DSA (Dynamic Sparse Attention) architecture:

Mixture of Experts (MoE): 256 routed experts with 8 active per token, enabling efficient scaling.
78 layers, hidden size 6144, with MLA (Multi-head Latent Attention) using KV compression (kv_lora_rank=512) and head_dim=64.
~200k context window (max_position_embeddings=202,752).
3 dense layers followed by MoE layers (first_k_dense_replace=3).
Shares the glm_moe_dsa architecture with GLM-5, with updated weights.

Parallel Setup

We provide a fine-tuning https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_finetune/glm/glm_5_hellaswag_pp.yaml for GLM-5.1 that scales training using Expert Parallelism and Pipeline Parallelism. The configuration runs
with EP=64 and PP=4 across 32 nodes (8x H100 GPUs per node).

distributed:
strategy: fsdp2
tp_size: 1
cp_size: 1
pp_size: 4
ep_size: 64

sequence_parallel: false
activation_checkpointing: true

pipeline:
  pp_schedule: interleaved1f1b
  pp_microbatch_size: 1
  round_virtual_stages_to_pp_multiple: down
  scale_grads_in_schedule: false
  patch_inner_model: false
  patch_causal_lm_model: false
  layers_per_stage: 2

moe:
  reshard_after_forward: false
  wrap_outer_model: false

Data

We use the https://huggingface.co/datasets/rowan/hellaswag dataset as an example. HellaSwag is a commonsense reasoning benchmark for evaluating language models on sentence completion tasks.

Below is the loss curve obtained when fine-tuning on HellaSwag with this recipe:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune GLM-5.1 (zai-org/GLM-5.1) in NeMo Automodel #1719

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Finetune GLM-5.1 (zai-org/GLM-5.1) in NeMo Automodel #1719

Uh oh!

HuiyingLi Apr 7, 2026 Maintainer

Parallel Setup

Data

Replies: 0 comments

HuiyingLi
Apr 7, 2026
Maintainer