You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finally, test that the new dependencies install correctly and that MaxText runs
140
140
as expected.
141
141
142
-
1.**Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source).
142
+
1.**Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](../install_maxtext.md#from-source).
143
143
144
144
2.**Run tests:** Run MaxText tests to ensure there are no regressions.
Copy file name to clipboardExpand all lines: docs/guides/model_bringup.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,15 +20,15 @@ This documentation acts as the primary resource for efficiently integrating new
20
20
21
21
## 1. Architecture Analysis
22
22
23
-
The first phase involves determining how the new model's architecture aligns with MaxText's existing capabilities. To facilitate this assessment, refer to the [MaxText architecture overview](https://maxtext.readthedocs.io/en/latest/reference/architecture/architecture_overview.html) and [list of supported models](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html).
23
+
The first phase involves determining how the new model's architecture aligns with MaxText's existing capabilities. To facilitate this assessment, refer to the [MaxText architecture overview](../reference/architecture/architecture_overview.md) and [list of supported models](../reference/models/supported_models_and_architectures.md).
24
24
25
-
**Input Data Pipeline**: MaxText supports HuggingFace, Grain, and TFDS pipelines ([details](https://maxtext.readthedocs.io/en/latest/guides/data_input_pipeline.html)). While synthetic data is typically used for initial performance benchmarks, the framework supports multiple modalities including text and image (audio and video - work in progress).
25
+
**Input Data Pipeline**: MaxText supports HuggingFace, Grain, and TFDS pipelines ([details](data_input_pipeline.md)). While synthetic data is typically used for initial performance benchmarks, the framework supports multiple modalities including text and image (audio and video - work in progress).
26
26
27
27
**Tokenizer**: Supported [tokenizer options](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/input_pipeline/tokenizer.py) include `TikTokenTokenizer`, `SentencePieceTokenizer`, and `HFTokenizer`.
28
28
29
29
**Self-Attention & RoPE**: Available mechanisms include optimized [Flash Attention](https://github.com/AI-Hypercomputer/maxtext/blob/62ee818144eb037ad3fe85ab8e789cd074776f46/src/maxtext/layers/attention_op.py#L1184) (supporting MHA, GQA, and MQA), Multi-head Latent Attention ([MLA](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/layers/attention_mla.py)), and [Gated Delta Network](https://github.com/AI-Hypercomputer/maxtext/blob/62ee818144eb037ad3fe85ab8e789cd074776f46/src/maxtext/models/qwen3.py#L358). MaxText also supports [Regular](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L108), [Llama](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L178), and [YaRN](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L282) variations of Rotary Positional Embeddings (RoPE).
30
30
31
-
**Multi-Layer Perceptron (MLP)**: The framework supports both traditional dense models and Mixture of Experts (MoE) architectures, including [configurations](https://maxtext.readthedocs.io/en/latest/reference/core_concepts/moe_configuration.html) for routed and shared experts.
31
+
**Multi-Layer Perceptron (MLP)**: The framework supports both traditional dense models and Mixture of Experts (MoE) architectures, including [configurations](../reference/core_concepts/moe_configuration.md) for routed and shared experts.
32
32
33
33
**Normalization**: We support different [normalization strategies](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/layers/normalizations.py), including RMSNorm and Gated RMSNorm. These can be configured before or after attention/MLP layers.
34
34
@@ -44,7 +44,7 @@ This step can be bypassed if the current MaxText codebase already supports all c
44
44
45
45
While most open-source models are distributed in Safetensors or PyTorch formats, MaxText requires conversion to the [Orbax](https://orbax.readthedocs.io/en/latest/) format.
46
46
47
-
There are [two primary formats](https://maxtext.readthedocs.io/en/latest/reference/core_concepts/checkpoints.html) for Orbax checkpoints within MaxText, and while both are technically compatible with training and inference, we recommend following these performance-optimized guidelines:
47
+
There are [two primary formats](../reference/core_concepts/checkpoints.md) for Orbax checkpoints within MaxText, and while both are technically compatible with training and inference, we recommend following these performance-optimized guidelines:
48
48
49
49
-**Scanned Format**: Recommended for **training** as it stacks layers for efficient processing via `jax.lax.scan`. To enable this, set `scan_layers=True`.
50
50
-**Unscanned Format**: Recommended for **inference** to simplify loading individual layer parameters. To enable this, set `scan_layers=False`.
@@ -58,7 +58,7 @@ Success starts with a clear map. You must align the parameter names from your so
58
58
59
59
### 3.2 Write Script
60
60
61
-
Use existing model scripts within the repository as templates to tailor the conversion logic for your specific architecture. We strongly recommended to use the [checkpoint conversion utility](https://maxtext.readthedocs.io/en/latest/guides/checkpointing_solutions/convert_checkpoint.html) rather than [standalone scripts](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/checkpoint_conversion/standalone_scripts).
61
+
Use existing model scripts within the repository as templates to tailor the conversion logic for your specific architecture. We strongly recommended to use the [checkpoint conversion utility](checkpointing_solutions/convert_checkpoint.md) rather than [standalone scripts](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/checkpoint_conversion/standalone_scripts).
62
62
63
63
### 3.3 Verify Compatibility
64
64
@@ -132,7 +132,7 @@ If you run the `forward_pass_logit_checker.py` to compare reference logits with
132
132
133
133
**Q: How to compile models for a target hardware without physical access?**
134
134
135
-
**A:** If you need to compile your training run ahead of time, use the train_compile.py tool. This utility allows you to compile the primary train_step for specific target hardware without needing the actual devices on hand. It’s particularly useful for verifying your implementation's functionality on a local Cloud VM or a standard CPU. Please refer [here](https://maxtext.readthedocs.io/en/latest/guides/monitoring_and_debugging/features_and_diagnostics.html#ahead-of-time-compilation-aot) for more examples.
135
+
**A:** If you need to compile your training run ahead of time, use the train_compile.py tool. This utility allows you to compile the primary train_step for specific target hardware without needing the actual devices on hand. It’s particularly useful for verifying your implementation's functionality on a local Cloud VM or a standard CPU. Please refer [here](monitoring_and_debugging/features_and_diagnostics.md#ahead-of-time-compilation-aot) for more examples.
136
136
137
137
**Q: My model is too large for my development machine. What should I do?**
Copy file name to clipboardExpand all lines: docs/guides/run_python_notebook.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,7 +86,7 @@ To install, click the `Extensions` icon on the left sidebar (or press `Ctrl+Shif
86
86
87
87
### Step 3: Install MaxText and Dependencies
88
88
89
-
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
89
+
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](../install_maxtext.md#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
90
90
91
91
> **Note:** If you have previously installed MaxText with a different option (e.g., `maxtext[tpu]`), we strongly recommend using a fresh virtual environment for `maxtext[tpu-post-train]` to avoid potential library version conflicts.
92
92
@@ -139,7 +139,7 @@ pip3 install jupyterlab
139
139
140
140
### Step 3: Install MaxText and Dependencies
141
141
142
-
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
142
+
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](../install_maxtext.md#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
143
143
144
144
> **Note:** If you have previously installed MaxText with a different option (e.g., `maxtext[tpu]`), we strongly recommend using a fresh virtual environment for `maxtext[tpu-post-train]` to avoid potential library version conflicts.
Copy file name to clipboardExpand all lines: docs/index.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ <h3>JAX AI Stack</h3>
35
35
<li><ahref="https://optax.readthedocs.io/en/latest/">Optax</a> - For gradient processing and optimization</li>
36
36
<li><ahref="https://tunix.readthedocs.io/en/latest/">Tunix</a> - A JAX Library with the latest experimental algorithms and post-training techniques</li>
37
37
<li><ahref="https://github.com/jax-ml/ml_dtypes">ml_dtypes</a> - NumPy dtype extensions for machine learning.</li>
38
-
<li><ahref="https://maxtext.readthedocs.io/en/latest/index.html#model-library">MaxText model library</a> for JAX LLMs highly optimized for TPUs</li>
38
+
<li><ahref="reference/models.html">MaxText model library</a> for JAX LLMs highly optimized for TPUs</li>
39
39
<li><ahref="https://blog.vllm.ai/2025/10/16/vllm-tpu.html">vLLM on TPU</a> for high performance sampling (inference) for Reinforcement Learning (RL)</li>
40
40
<li><ahref="https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro">Pathways</a> for multi-host inference (sampling) and highly efficient weight transfer</li>
41
41
<li>Optional data loading libraries (<ahref="https://google-grain.readthedocs.io/en/latest/">Grain</a> or <ahref="https://www.tensorflow.org/guide/data">tf.data</a>)</li>
Copy file name to clipboardExpand all lines: docs/reference/architecture/jax_ai_libraries_chosen.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,7 @@ For more information on using Orbax, please refer to https://github.com/google/o
56
56
57
57
1.**Deterministic by Design**: Grain allows storing data loader states, provides strong guarantees about data ordering and sharding even with preemptions, which is critical for reproducibility.
58
58
2.**Global Shuffle**: Prevents local overfitting.
59
-
3.**Built for Multi-Host Training**: The using random access file format streamlines [data loading in the multi-host environments](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/data_input_pipeline.html#multihost-dataloading-best-practice).
59
+
3.**Built for Multi-Host Training**: The using random access file format streamlines [data loading in the multi-host environments](../../guides/data_input_pipeline.md#multihost-dataloading-best-practice).
60
60
61
61
Its APIs are explicitly designed for the multi-host paradigm, simplifying the process of ensuring that each host loads a unique shard of the global batch.
Copy file name to clipboardExpand all lines: docs/reference/models/tiering.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,4 +40,4 @@ For each of the TPU platforms listed below, we present a list of optimized model
40
40
41
41
\[1\]: Performance results are subject to variations based on system configuration, software versions, and other factors. These benchmarks represent point-in-time measurements under specific conditions.
42
42
43
-
\[2\]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention ([PR #1988](https://github.com/AI-Hypercomputer/maxtext/pull/1988)), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the [performance metrics guide](https://maxtext.readthedocs.io/en/latest/reference/performance_metrics.html).
43
+
\[2\]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention ([PR #1988](https://github.com/AI-Hypercomputer/maxtext/pull/1988)), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the [performance metrics guide](../performance_metrics.md).
0 commit comments