Skip to content

Commit 2cbf7fd

Browse files
Merge pull request #3900 from AI-Hypercomputer:update_maxtext_version
PiperOrigin-RevId: 915142180
2 parents 19c63a6 + 269ef70 commit 2cbf7fd

18 files changed

Lines changed: 49 additions & 69 deletions

docs/build_maxtext.md

Lines changed: 1 addition & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ source ${VENV_NAME?}/bin/activate
6565
# This enables Docker image building and workload scheduling via XPK.
6666
# Once installed, you will have access to the `build_maxtext_docker_image`
6767
# and `upload_maxtext_docker_image` commands.
68-
uv pip install maxtext[runner]==0.2.1 --resolution=lowest
68+
uv pip install maxtext[runner]=={{version}} --resolution=lowest
6969
```
7070

7171
> **Note:** The `maxtext[runner]` extra includes all necessary dependencies for building MaxText Docker images and running workloads through XPK. It automatically installs XPK, so you do not need to install it separately to manage your clusters and workloads.
@@ -78,25 +78,7 @@ If you plan to contribute to MaxText or need the latest unreleased features, ins
7878
# Clone the repository
7979
git clone https://github.com/AI-Hypercomputer/maxtext.git
8080
cd maxtext
81-
```
82-
83-
:::\{only} is_not_latest
84-
85-
By default, cloning the repository provides the latest version (**HEAD**).
86-
If you wish to use the latest features, please follow the [latest guide](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
87-
If you want to ensure compatibility with the specific version of the documentation
88-
you are currently viewing, you must checkout the corresponding tag for that version
89-
before proceeding with the installation.
90-
91-
```{eval-rst}
92-
.. parsed-literal::
9381

94-
git checkout |version|
95-
```
96-
97-
:::
98-
99-
```bash
10082
# Create virtual environment
10183
export VENV_NAME=<VENV_NAME> # e.g., docker_venv
10284
uv venv --python 3.12 --seed ${VENV_NAME?}

docs/conf.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727

2828
import os
2929
import os.path
30+
import re
3031
import sys
3132
import logging
3233
from sphinx.util import logging as sphinx_logging
@@ -42,7 +43,15 @@
4243
# pylint: disable=redefined-builtin
4344
copyright = "2023–2026, Google LLC"
4445
author = "MaxText developers"
45-
version = os.environ.get("READTHEDOCS_VERSION", "latest")
46+
47+
# Get version from the __init__.py file
48+
init_path = os.path.abspath(os.path.join(MAXTEXT_REPO_ROOT, "src", "maxtext", "__init__.py"))
49+
with open(init_path, "r", encoding="utf-8") as f:
50+
match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]", f.read(), re.MULTILINE)
51+
if match:
52+
version = match.group(1)
53+
else:
54+
raise RuntimeError("Unable to find version string.")
4655

4756
# -- General configuration ---------------------------------------------------
4857
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
@@ -247,6 +256,12 @@ def filter(self, record: logging.LogRecord) -> bool:
247256
return not msg.strip().startswith(filter_out)
248257

249258

259+
def substitute_placeholders(app, docname, source):
260+
result = source[0]
261+
result = result.replace("{{version}}", version)
262+
source[0] = result
263+
264+
250265
def setup(app):
251266
"""Set up the Sphinx application with custom behavior."""
252267

@@ -259,5 +274,4 @@ def setup(app):
259274
warning_handler, *_ = [h for h in logger.handlers if isinstance(h, sphinx_logging.WarningStreamHandler)]
260275
warning_handler.filters.insert(0, FilterSphinxWarnings(app))
261276

262-
if version != "latest":
263-
app.tags.add("is_not_latest")
277+
app.connect("source-read", substitute_placeholders)

docs/guides/checkpointing_solutions/convert_checkpoint.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The following models are supported:
2323

2424
## Prerequisites
2525

26-
- MaxText must be installed in a Python virtual environment using the `maxtext[tpu]` option. For instructions on installing MaxText on your VM, please refer to the official [installation documentation](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/install_maxtext.html).
26+
- MaxText must be installed in a Python virtual environment using the `maxtext[tpu]` option. For instructions on installing MaxText on your VM, please refer to the official [installation documentation](../../install_maxtext.md).
2727
- Hugging Face model checkpoints are cached locally at `$HOME/.cache/huggingface/hub` before conversion. Ensure you have sufficient disk space.
2828
- Authenticate via the [Hugging Face CLI](https://huggingface.co/docs/huggingface_hub/v0.21.2/guides/cli) if using private or gated models.
2929

@@ -71,7 +71,7 @@ You can find your converted checkpoint files under `${BASE_OUTPUT_DIRECTORY}/0/i
7171
### Key Parameters
7272

7373
- `model_name`: The specific model identifier. It must match a supported entry in the MaxText [globals.py](https://github.com/AI-Hypercomputer/maxtext/blob/16b684840db9b96b19e24e84ac49f06af7204ae3/src/maxtext/utils/globals.py#L46C1-L46C7).
74-
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/reference/core_concepts/checkpoints.html) for more information.
74+
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](../../reference/core_concepts/checkpoints.md) for more information.
7575
- `use_multimodal`: Indicates if multimodality is used, important for Gemma3.
7676
- `base_output_directory`: The path where the converted Orbax checkpoint will be stored; it can be Google Cloud Storage (GCS) or local.
7777
- `hardware=cpu`: The conversion script runs on a CPU machine.
@@ -118,7 +118,7 @@ python3 -m maxtext.checkpoint_conversion.to_huggingface \
118118

119119
- `model_name`: The specific model identifier. It must match a supported entry in the MaxText [globals.py](https://github.com/AI-Hypercomputer/maxtext/blob/16b684840db9b96b19e24e84ac49f06af7204ae3/src/maxtext/utils/globals.py#L46C1-L46C7).
120120
- `load_parameters_path`: The path to the MaxText Orbax checkpoint.
121-
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/reference/core_concepts/checkpoints.html) for more information.
121+
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](../../reference/core_concepts/checkpoints.md) for more information.
122122
- `use_multimodal`: Indicates if multimodality is used, important for Gemma3.
123123
- `hardware=cpu`: The conversion script runs on a CPU machine.
124124
- `base_output_directory`: The path where the converted checkpoint will be stored; it can be Google Cloud Storage (GCS), Hugging Face Hub or local.
@@ -128,7 +128,7 @@ python3 -m maxtext.checkpoint_conversion.to_huggingface \
128128

129129
To ensure the conversion was successful, you can use the [test script](https://github.com/AI-Hypercomputer/maxtext/blob/main/tests/utils/forward_pass_logit_checker.py). It runs a forward pass on both the original and converted models and compares the output logits to verify conversion. It is used to verify the bidirectional conversion.
130130

131-
> **Note:** This correctness test will only work when MaxText is installed from source by following the installation instructions [here](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/install_maxtext.html#from-source).
131+
> **Note:** This correctness test will only work when MaxText is installed from source by following the installation instructions [here](../../install_maxtext.md#from-source).
132132
133133
### Setup Environment
134134

@@ -159,7 +159,7 @@ python3 -m tests.utils.forward_pass_logit_checker src/maxtext/configs/base.yml \
159159

160160
- `load_parameters_path`: The path to the MaxText Orbax checkpoint (e.g., `gs://your-bucket/maxtext-checkpoint/0/items`).
161161
- `model_name`: The corresponding model name in the MaxText configuration (e.g., `qwen3-4b`).
162-
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/reference/core_concepts/checkpoints.html) for more information.
162+
- `scan_layers`: Controls whether the output uses a scanned (`scan_layers=true`) or unscanned (`scan_layers=false`) checkpoint format. Refer [here](../../reference/core_concepts/checkpoints.md) for more information.
163163
- `use_multimodal`: Indicates if multimodality is used.
164164
- `--run_hf_model` (Optional): Indicates if loading Hugging Face model from the hf_model_path. If not set, it will compare the maxtext logits with pre-saved golden logits.
165165
- `--hf_model_path` (Optional): The path to the Hugging Face checkpoint (if `--run_hf_model=True`).

docs/guides/optimization/custom_model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,7 @@ Ironwood over ICI:
254254
- `3 * M * 8 / 2 > 12800`
255255
- `M > 1100`
256256

257-
It is important to emphasize that this is a theoretical roofline analysis. Real-world performance will depend on the efficiency of the implementation and XLA compilation on the TPU. Refer to the [link](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/optimization/sharding.html) for specific challenges regarding PP + FSDP/DP.
257+
It is important to emphasize that this is a theoretical roofline analysis. Real-world performance will depend on the efficiency of the implementation and XLA compilation on the TPU. Refer to the [link](../optimization/sharding.md) for specific challenges regarding PP + FSDP/DP.
258258

259259
## Step 4. Analyze experiments
260260

docs/install_maxtext.md

Lines changed: 5 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
Copyright 2023-2025 Google LLC
2+
Copyright 2023-2026 Google LLC
33
44
Licensed under the Apache License, Version 2.0 (the "License");
55
you may not use this file except in compliance with the License.
@@ -51,22 +51,22 @@ This is the easiest way to get started with the latest stable version.
5151
TPUs.
5252

5353
```bash
54-
uv pip install maxtext[tpu]==0.2.1 --resolution=lowest
54+
uv pip install maxtext[tpu]=={{version}} --resolution=lowest
5555
```
5656

5757
- **Option 2:** Install `maxtext[cuda12]`, used for pre-training and decoding
5858
on GPUs.
5959

6060
```bash
61-
uv pip install maxtext[cuda12]==0.2.1 --resolution=lowest
61+
uv pip install maxtext[cuda12]=={{version}} --resolution=lowest
6262
```
6363

6464
- **Option 3:** Install `maxtext[tpu-post-train]`, used for post-training on
6565
TPUs. Currently, this option should also be used for running `vllm_decode`
6666
on TPUs.
6767

6868
```bash
69-
uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest
69+
uv pip install maxtext[tpu-post-train]=={{version}} --resolution=lowest
7070
```
7171

7272
- **Option 4:** Install `maxtext[runner]`, used for building MaxText's Docker
@@ -78,7 +78,7 @@ This is the easiest way to get started with the latest stable version.
7878
guide.
7979
8080
```bash
81-
uv pip install maxtext[runner]==0.2.1 --resolution=lowest
81+
uv pip install maxtext[runner]=={{version}} --resolution=lowest
8282
```
8383
8484
```{note}
@@ -112,22 +112,6 @@ environment to avoid dependency conflicts.
112112
cd maxtext
113113
```
114114
115-
:::\{only} is_not_latest
116-
117-
By default, cloning the repository provides the latest version (**HEAD**).
118-
If you wish to use the latest features, please follow the [latest guide](https://maxtext.readthedocs.io/en/latest/install_maxtext.html).
119-
If you want to ensure compatibility with the specific version of the documentation
120-
you are currently viewing, you must checkout the corresponding tag for that version
121-
before proceeding with the installation.
122-
123-
```{eval-rst}
124-
.. parsed-literal::
125-
126-
git checkout |version|
127-
```
128-
129-
:::
130-
131115
2. Create virtual environment:
132116
133117
```bash

docs/reference/architecture/jax_ai_libraries_chosen.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ For more information on using Orbax, please refer to https://github.com/google/o
6060

6161
Its APIs are explicitly designed for the multi-host paradigm, simplifying the process of ensuring that each host loads a unique shard of the global batch.
6262

63-
For more information on using Grain, please refer to https://github.com/google/grain and the grain guide in maxtext located at https://maxtext.readthedocs.io/en/latest/guides/data_input_pipeline/data_input_grain.html
63+
For more information on using Grain, please refer to https://github.com/google/grain and the grain guide in maxtext located [here](../../guides/data_input_pipeline/data_input_grain.md).
6464

6565
## Qwix: For native JAX quantization
6666

docs/reference/core_concepts/batch_size.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,11 @@ You can set `per_device_batch_size` and `gradient_accumulation_steps` in `config
3434

3535
`global_batch_to_load` = `global_batch_size_to_train_on x expansion_factor_real_data`
3636

37-
When `expansion_factor_real_data > 1`, only a subset of hosts read data from the source (e.g., a GCS bucket). These "loading hosts" read more data than they need for their own devices and distribute the surplus to other "non-loading" hosts. This reduces the number of concurrent connections to the data source, which can significantly improve I/O throughput. When set to between 0 and 1, it's for grain pipeline to use a smaller chip count to read checkpoint from a larger chip count job. Details in https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/data_input_pipeline/data_input_grain.html#using-grain.
37+
When `expansion_factor_real_data > 1`, only a subset of hosts read data from the source (e.g., a GCS bucket). These "loading hosts" read more data than they need for their own devices and distribute the surplus to other "non-loading" hosts. This reduces the number of concurrent connections to the data source, which can significantly improve I/O throughput. When set to between 0 and 1, it's for grain pipeline to use a smaller chip count to read checkpoint from a larger chip count job. Details [here](../../guides/data_input_pipeline/data_input_grain.md#using-grain).
3838

3939
## Gradient Accumulation Steps
4040

41-
`gradient_accumulation_steps` defines how many forward/backward passes are performed before the optimizer updates the model weights. The gradients from each pass are accumulated (summed). It is discussed in more detail [here](https://maxtext.readthedocs.io/en/latest/reference/core_concepts/tiling.html#gradient-accumulation).
41+
`gradient_accumulation_steps` defines how many forward/backward passes are performed before the optimizer updates the model weights. The gradients from each pass are accumulated (summed). It is discussed in more detail [here](../core_concepts/tiling.md#gradient-accumulation).
4242

4343
For example, if `gradient_accumulation_steps` is set to `4`, the model will execute four forward and backward passes, sum the gradients, and then apply a single optimizer step. This achieves the same effective global batch size as quadrupling the `per_device_batch_size` with significantly less memory, but can potentially lead to lower MFU.
4444

docs/reference/core_concepts/tiling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,4 +80,4 @@ Tiling is also crucial for managing data movement across the memory hierarchy (H
8080

8181
**Tiling** and **sharding** are independent concepts that do not conflict; in fact, they are often used together. Sharding distributes a tensor across multiple devices, while tiling processes a tensor in chunks on the same device.
8282

83-
To learn more about sharding in MaxText, please refer to the [sharding documentation](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/optimization/sharding.html).
83+
To learn more about sharding in MaxText, please refer to the [sharding documentation](../../guides/optimization/sharding.md).

docs/reference/models/supported_models_and_architectures.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ MaxText is an open-source, high-performance LLM framework written in Python/JAX.
1010

1111
- **Supported Precisions**: FP32, BF16, INT8, and FP8.
1212
- **Ahead-of-Time Compilation (AOT)**: For faster model development/prototyping and earlier OOM detection.
13-
- **Quantization**: Via **Qwix** (recommended) and AQT. See Quantization [Guide](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/reference/core_concepts/quantization.html).
13+
- **Quantization**: Via **Qwix** (recommended) and AQT. See Quantization [Guide](../reference/core_concepts/quantization.md).
1414
- **Diagnostics**: Structured error context via **`cloud_tpu_diagnostics`** (filters stack traces to user code), simple logging via `max_logging`, profiling in **XProf**, and visualization in **TensorBoard**.
1515
- **Multi-Token Prediction (MTP)**: Enables token efficient training with multi-token prediction.
1616
- **Elastic Training**: Fault-tolerant and dynamic scale-up/scale-down on Cloud TPUs with Pathways.

docs/tutorials/first_run.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Local development is a convenient way to run MaxText on a single host. It doesn'
3636
multiple hosts but is a good way to learn about MaxText.
3737

3838
1. [Create and SSH to the single host VM of your choice](https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm). You can use any available single host TPU, such as `v5litepod-8`, `v5p-8`, or `v4-8`.
39-
2. For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html). For this tutorial on TPUs, install `maxtext[tpu]`.
39+
2. For instructions on installing MaxText on your VM, please refer to the [official documentation](../install_maxtext.md). For this tutorial on TPUs, install `maxtext[tpu]`.
4040
3. After installation completes, run training on synthetic data with the following command:
4141

4242
```sh
@@ -70,7 +70,7 @@ You can use [demo_decoding.ipynb](https://github.com/AI-Hypercomputer/maxtext/bl
7070

7171
### Run MaxText on NVIDIA GPUs
7272

73-
1. For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html). For this tutorial on GPUs, install `maxtext[cuda12]`.
73+
1. For instructions on installing MaxText on your VM, please refer to the [official documentation](../install_maxtext.md). For this tutorial on GPUs, install `maxtext[cuda12]`.
7474
2. After installation is complete, run training with the following command on synthetic data:
7575

7676
```sh

0 commit comments

Comments
 (0)