Skip to content

Commit 348355c

Browse files
SurbhiJainUSCGoogle-ML-Automation
authored andcommitted
Update MaxText version to v0.2.2 and add release notes.
PiperOrigin-RevId: 912602689
1 parent f44b423 commit 348355c

2 files changed

Lines changed: 30 additions & 1 deletion

File tree

docs/release_notes.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,37 @@ MaxText is [available in PyPI](https://pypi.org/project/maxtext/) and can be ins
2222

2323
## Releases
2424

25+
### v0.2.2
26+
27+
#### Changes
28+
29+
- Upgraded JAX to version 0.9.2, improving support for both pre-training and post-training.
30+
- Introduced simplified APIs for accessing MaxText models.
31+
- Included [maxtext_with_gepa.ipynb](https://github.com/AI-Hypercomputer/maxtext/blob/3c7d8d27864fc12cccac07786f02bd0e5262c982/src/maxtext/examples/maxtext_with_gepa.ipynb), a new notebook demonstrating AIME prompt optimization using the GEPA framework within MaxText.
32+
- Added support for Kimi-K2 models and the MuonClip optimizer. Users can explore this with the [kimi-k2-1t](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/src/maxtext/configs/models/kimi-k2-1t.yml) config (see [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/fa5b5ebf9a8e4f7a33bd88eae051dc21f3147791/tests/end_to_end/tpu/kimi/Run_Kimi.md) for details).
33+
- Kimi-K2-Thinking, Kimi-K2.5 (text), and Kimi-K2.6 (text) are now supported. See [Run_Kimi.md](https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/end_to_end/tpu/kimi/Run_Kimi.md#quantized-variants-k2-thinking-k25-k26) for details.
34+
- [DeepSeek-V3.2](https://arxiv.org/pdf/2512.02556) is now supported, including DeepSeek Sparse Attention for handling long contexts. Use the [deepseek3.2-671b](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/src/maxtext/configs/models/deepseek3.2-671b.yml) config to try it out (refer to the [user guide](https://github.com/AI-Hypercomputer/maxtext/blob/20d93f62a91899dbbb8f23562973d75104411d3a/tests/end_to_end/tpu/deepseek/Run_DeepSeek.md) for more information).
35+
- Support has been added for Gemma 4 multi-modal models (26B MoE and 31B dense). These can be used with the [gemma4-26b](https://github.com/AI-Hypercomputer/maxtext/blob/cdc587f0935a5e2d6f8287b96669cf2e87a0acdc/src/maxtext/configs/models/gemma4-26b.yml) and [gemma4-31b](https://github.com/AI-Hypercomputer/maxtext/blob/cdc587f0935a5e2d6f8287b96669cf2e87a0acdc/src/maxtext/configs/models/gemma4-31b.yml) configs. See [Run_Gemma4.md](https://github.com/AI-Hypercomputer/maxtext/blob/cdc587f0935a5e2d6f8287b96669cf2e87a0acdc/tests/end_to_end/tpu/gemma4/Run_Gemma4.md) for further details.
36+
- Support has been added for Gemma 4 inference using [MaxText on vLLM plugin](https://maxtext.readthedocs.io/en/maxtext-v0.2.2/tutorials/inference.html).
37+
- Enhanced RL capabilities with support for the `open-r1/OpenR1-Math-220k` dataset and `nvidia/OpenMathReasoning`.
38+
- Added more evaluation modes for RL like majority voting and pass@1 estimation.
39+
- Sync weights to vllm prior to pre RL evaluation.
40+
- More robust usage of math-verify in RL.
41+
- MaxText's Supervised Fine-Tuning (SFT) now supports non-instruct models.
42+
- Added support for tensor parallelism using the Fused MoE kernel for MaxText on vLLM inference.
43+
- Added support for MaxText to vllm converters for Qwen3 and Gemma4 family of models.
44+
- [validate_converter.py](https://github.com/AI-Hypercomputer/maxtext/blob/472f53b70089e661be399ad3905c05a53a172ec5/src/maxtext/integration/vllm/torchax_converter/validate_converter.py#L108) now runs on multislice environment to test larger models with utilities to compare maxtext and vllm weights.
45+
46+
#### Deprecations
47+
48+
- Legacy `MaxText.*` shims have been removed. Please refer to [src/MaxText/README.md](https://github.com/AI-Hypercomputer/maxtext/blob/0536605a8ca116087ed93178433a67e905be566c/src/MaxText/README.md) for details on the new command locations and how to migrate.
49+
- Sequence parallelism has been deprecated, please use context parallelism instead.
50+
- The flag `expert_shard_attention_option` is deprecated, use `custom_mesh_and_rule=ep-as-cp` for the same functionality.
51+
2552
### v0.2.1
2653

54+
#### Changes
55+
2756
- Use the new `maxtext[runner]` installation option to build Docker images without cloning the repository. This can be used for scheduling jobs through XPK. See the [MaxText installation instructions](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/build_maxtext.html) for more info.
2857
- Config can now be inferred for most MaxText commands. If you choose not to provide a config, MaxText will now [select an appropriate one](https://github.com/AI-Hypercomputer/maxtext/blob/9e786c888cc7acdfc00a8f73064e285017e80b86/src/maxtext/configs/pyconfig.py#L51-L67).
2958
- Configs in MaxText PyPI will now be picked up without storing them locally.

src/maxtext/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
"""
2020

2121
__author__ = "Google LLC"
22-
__version__ = "0.2.1"
22+
__version__ = "0.2.2"
2323
__description__ = (
2424
"MaxText is a high performance, highly scalable, open-source LLM written in pure Python/Jax and "
2525
"targeting Google Cloud TPUs and GPUs for training and **inference."

0 commit comments

Comments
 (0)