maxtext-v0.2.2

Latest

Latest

SurbhiJainUSC released this 08 May 18:03

· 318 commits to main since this release

348355c

Changes

Upgraded JAX to version 0.9.2, improving support for both pre-training and post-training.
Introduced simplified APIs for accessing MaxText models.
Included maxtext_with_gepa.ipynb, a new notebook demonstrating AIME prompt optimization using the GEPA framework within MaxText.
Added support for Kimi-K2 models and the MuonClip optimizer. Users can explore this with the kimi-k2-1t config (see user guide for details).
Kimi-K2-Thinking, Kimi-K2.5 (text), and Kimi-K2.6 (text) are now supported. See Run_Kimi.md for details.
DeepSeek-V3.2 is now supported, including DeepSeek Sparse Attention for handling long contexts. Use the deepseek3.2-671b config to try it out (refer to the user guide for more information).
Support has been added for Gemma 4 multi-modal models (26B MoE and 31B dense). These can be used with the gemma4-26b and gemma4-31b configs. See Run_Gemma4.md for further details.
Support has been added for Gemma 4 inference using MaxText on vLLM plugin.
Enhanced RL capabilities with support for the open-r1/OpenR1-Math-220k dataset and nvidia/OpenMathReasoning.
Added more evaluation modes for RL like majority voting and pass@1 estimation.
Sync weights to vllm prior to pre RL evaluation.
More robust usage of math-verify in RL.
MaxText's Supervised Fine-Tuning (SFT) now supports non-instruct models.
Added support for tensor parallelism using the Fused MoE kernel for MaxText on vLLM inference.
Added support for MaxText to vllm converters for Qwen3 and Gemma4 family of models.
validate_converter.py now runs on multislice environment to test larger models with utilities to compare maxtext and vllm weights.

Deprecations

Legacy MaxText.* shims have been removed. Please refer to src/MaxText/README.md for details on the new command locations and how to migrate.
Sequence parallelism has been deprecated, please use context parallelism instead.
The flag expert_shard_attention_option is deprecated, use custom_mesh_and_rule=ep-as-cp for the same functionality.

Assets 2