Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/maxtext/configs/post_train/lora_module_path.yml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests for Gemma4 Lora?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ran the end-to-end LoRA training loop for Gemma 4 successfully without any issues.4 successfully without any issues. log

Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ mistral: "decoder/layers/.*(attention/(query|key|value|out)|mlp/(wi_0|wi_1|wo))"
deepseek2: "decoder/(dense_layers|moe_stack)/self_attention/(query|out|wkv_a|wkv_b)|decoder/(dense_layers|moe_stack)/(mlp|shared_experts)/(wi_0|wi_1|wo)"
gemma2: "decoder/layers/(self_attention_local|self_attention_global)/(query|key|value|out)|decoder/layers/(mlp_local|mlp_global)/(wi_0|wi_1|wo)"
gemma3: "decoder/layers/.*(self_attention/(query|key|value|out)|mlp/(wi_0|wi_1|wo|gate|up|down))"
gemma4: "decoder/(scanned_blocks|layers_remainder)/layers.*/.*(self_attention/(query|key|value|out)|mlp/.*(wi_0|wi_1|wo|shared_experts/(wi_0|wi_1|wo)))"
olmo3: "decoder/layers/.*(attention/(query|key|value|out)|mlp/(wi_0|wi_1|wo))"
Comment thread
RexBearIU marked this conversation as resolved.
gpt3: "decoder/layers/(self_attention/(qkv_proj|out)|mlp/(wi|wo))"

Expand Down
5 changes: 5 additions & 0 deletions src/maxtext/trainers/post_train/sft/train_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,9 +264,14 @@ def setup_trainer_state(mt_config, goodput_recorder=None):
def train_model(mt_config, trainer, mesh):
"""Runs the SFT training loop in Tunix."""
with mesh, nn_partitioning.axis_rules(mt_config.logical_axis_rules):
# Disable NNX graph caching for MoE models (where experts > 1) to allow
# necessary dynamic metadata synchronization during forward passes (e.g., in jax.lax.scan).
enable_nnx_cache = mt_config.num_experts <= 1

trainer.train(
trainer.data_hooks.train_data_iterator,
trainer.data_hooks.eval_data_iterator,
cache_nnx_graph=enable_nnx_cache,
)
return trainer

Expand Down
Loading