Skip to content

Commit 3f9789f

Browse files
Merge pull request #3958 from AI-Hypercomputer:darisoy-fix-rtd-links-relative
PiperOrigin-RevId: 918635575
2 parents eb22f3b + dc3a658 commit 3f9789f

16 files changed

Lines changed: 35 additions & 35 deletions

docs/development/contribute_docs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ documentation site locally to ensure things work as expected before a deployment
2727
to [Read The Docs](https://about.readthedocs.com/?ref=app.readthedocs.org).
2828

2929
First, make sure you
30-
[install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source)
30+
[install MaxText from source](../install_maxtext.md#from-source)
3131
and install the necessary dependencies. You can do this by navigating to your
3232
local clone of the MaxText repo and running:
3333

docs/development/update_dependencies.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,6 @@ mv generated_artifacts/python3_12/cuda12-requirements.txt \
139139
Finally, test that the new dependencies install correctly and that MaxText runs
140140
as expected.
141141

142-
1. **Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source).
142+
1. **Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](../install_maxtext.md#from-source).
143143

144144
2. **Run tests:** Run MaxText tests to ensure there are no regressions.

docs/guides/model_bringup.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@ This documentation acts as the primary resource for efficiently integrating new
2020

2121
## 1. Architecture Analysis
2222

23-
The first phase involves determining how the new model's architecture aligns with MaxText's existing capabilities. To facilitate this assessment, refer to the [MaxText architecture overview](https://maxtext.readthedocs.io/en/latest/reference/architecture/architecture_overview.html) and [list of supported models](https://maxtext.readthedocs.io/en/latest/reference/models/supported_models_and_architectures.html).
23+
The first phase involves determining how the new model's architecture aligns with MaxText's existing capabilities. To facilitate this assessment, refer to the [MaxText architecture overview](../reference/architecture/architecture_overview.md) and [list of supported models](../reference/models/supported_models_and_architectures.md).
2424

25-
**Input Data Pipeline**: MaxText supports HuggingFace, Grain, and TFDS pipelines ([details](https://maxtext.readthedocs.io/en/latest/guides/data_input_pipeline.html)). While synthetic data is typically used for initial performance benchmarks, the framework supports multiple modalities including text and image (audio and video - work in progress).
25+
**Input Data Pipeline**: MaxText supports HuggingFace, Grain, and TFDS pipelines ([details](data_input_pipeline.md)). While synthetic data is typically used for initial performance benchmarks, the framework supports multiple modalities including text and image (audio and video - work in progress).
2626

2727
**Tokenizer**: Supported [tokenizer options](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/input_pipeline/tokenizer.py) include `TikTokenTokenizer`, `SentencePieceTokenizer`, and `HFTokenizer`.
2828

2929
**Self-Attention & RoPE**: Available mechanisms include optimized [Flash Attention](https://github.com/AI-Hypercomputer/maxtext/blob/62ee818144eb037ad3fe85ab8e789cd074776f46/src/maxtext/layers/attention_op.py#L1184) (supporting MHA, GQA, and MQA), Multi-head Latent Attention ([MLA](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/layers/attention_mla.py)), and [Gated Delta Network](https://github.com/AI-Hypercomputer/maxtext/blob/62ee818144eb037ad3fe85ab8e789cd074776f46/src/maxtext/models/qwen3.py#L358). MaxText also supports [Regular](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L108), [Llama](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L178), and [YaRN](https://github.com/AI-Hypercomputer/maxtext/blob/88d2ffd34c0ace76f836c7ea9c2fe4cd2d271088/MaxText/layers/embeddings.py#L282) variations of Rotary Positional Embeddings (RoPE).
3030

31-
**Multi-Layer Perceptron (MLP)**: The framework supports both traditional dense models and Mixture of Experts (MoE) architectures, including [configurations](https://maxtext.readthedocs.io/en/latest/reference/core_concepts/moe_configuration.html) for routed and shared experts.
31+
**Multi-Layer Perceptron (MLP)**: The framework supports both traditional dense models and Mixture of Experts (MoE) architectures, including [configurations](../reference/core_concepts/moe_configuration.md) for routed and shared experts.
3232

3333
**Normalization**: We support different [normalization strategies](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/layers/normalizations.py), including RMSNorm and Gated RMSNorm. These can be configured before or after attention/MLP layers.
3434

@@ -44,7 +44,7 @@ This step can be bypassed if the current MaxText codebase already supports all c
4444

4545
While most open-source models are distributed in Safetensors or PyTorch formats, MaxText requires conversion to the [Orbax](https://orbax.readthedocs.io/en/latest/) format.
4646

47-
There are [two primary formats](https://maxtext.readthedocs.io/en/latest/reference/core_concepts/checkpoints.html) for Orbax checkpoints within MaxText, and while both are technically compatible with training and inference, we recommend following these performance-optimized guidelines:
47+
There are [two primary formats](../reference/core_concepts/checkpoints.md) for Orbax checkpoints within MaxText, and while both are technically compatible with training and inference, we recommend following these performance-optimized guidelines:
4848

4949
- **Scanned Format**: Recommended for **training** as it stacks layers for efficient processing via `jax.lax.scan`. To enable this, set `scan_layers=True`.
5050
- **Unscanned Format**: Recommended for **inference** to simplify loading individual layer parameters. To enable this, set `scan_layers=False`.
@@ -58,7 +58,7 @@ Success starts with a clear map. You must align the parameter names from your so
5858

5959
### 3.2 Write Script
6060

61-
Use existing model scripts within the repository as templates to tailor the conversion logic for your specific architecture. We strongly recommended to use the [checkpoint conversion utility](https://maxtext.readthedocs.io/en/latest/guides/checkpointing_solutions/convert_checkpoint.html) rather than [standalone scripts](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/checkpoint_conversion/standalone_scripts).
61+
Use existing model scripts within the repository as templates to tailor the conversion logic for your specific architecture. We strongly recommended to use the [checkpoint conversion utility](checkpointing_solutions/convert_checkpoint.md) rather than [standalone scripts](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/checkpoint_conversion/standalone_scripts).
6262

6363
### 3.3 Verify Compatibility
6464

@@ -132,7 +132,7 @@ If you run the `forward_pass_logit_checker.py` to compare reference logits with
132132

133133
**Q: How to compile models for a target hardware without physical access?**
134134

135-
**A:** If you need to compile your training run ahead of time, use the train_compile.py tool. This utility allows you to compile the primary train_step for specific target hardware without needing the actual devices on hand. It’s particularly useful for verifying your implementation's functionality on a local Cloud VM or a standard CPU. Please refer [here](https://maxtext.readthedocs.io/en/latest/guides/monitoring_and_debugging/features_and_diagnostics.html#ahead-of-time-compilation-aot) for more examples.
135+
**A:** If you need to compile your training run ahead of time, use the train_compile.py tool. This utility allows you to compile the primary train_step for specific target hardware without needing the actual devices on hand. It’s particularly useful for verifying your implementation's functionality on a local Cloud VM or a standard CPU. Please refer [here](monitoring_and_debugging/features_and_diagnostics.md#ahead-of-time-compilation-aot) for more examples.
136136

137137
**Q: My model is too large for my development machine. What should I do?**
138138

docs/guides/run_python_notebook.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ To install, click the `Extensions` icon on the left sidebar (or press `Ctrl+Shif
8686

8787
### Step 3: Install MaxText and Dependencies
8888

89-
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
89+
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](../install_maxtext.md#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
9090

9191
> **Note:** If you have previously installed MaxText with a different option (e.g., `maxtext[tpu]`), we strongly recommend using a fresh virtual environment for `maxtext[tpu-post-train]` to avoid potential library version conflicts.
9292
@@ -139,7 +139,7 @@ pip3 install jupyterlab
139139

140140
### Step 3: Install MaxText and Dependencies
141141

142-
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
142+
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](../install_maxtext.md#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
143143

144144
> **Note:** If you have previously installed MaxText with a different option (e.g., `maxtext[tpu]`), we strongly recommend using a fresh virtual environment for `maxtext[tpu-post-train]` to avoid potential library version conflicts.
145145
@@ -200,7 +200,7 @@ jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
200200

201201
## Support and Resources
202202

203-
- 📘 [MaxText Documentation](https://maxtext.readthedocs.io/)
203+
- 📘 [MaxText Documentation](../index.md)
204204
- 💻 [Google Colab](https://colab.research.google.com)
205205
-[Cloud TPU Docs](https://cloud.google.com/tpu/docs)
206206
- 🧩 [Jupyter Lab](https://jupyterlab.readthedocs.io)

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ <h3>JAX AI Stack</h3>
3535
<li><a href="https://optax.readthedocs.io/en/latest/">Optax</a> - For gradient processing and optimization</li>
3636
<li><a href="https://tunix.readthedocs.io/en/latest/">Tunix</a> - A JAX Library with the latest experimental algorithms and post-training techniques</li>
3737
<li><a href="https://github.com/jax-ml/ml_dtypes">ml_dtypes</a> - NumPy dtype extensions for machine learning.</li>
38-
<li><a href="https://maxtext.readthedocs.io/en/latest/index.html#model-library">MaxText model library</a> for JAX LLMs highly optimized for TPUs</li>
38+
<li><a href="reference/models.html">MaxText model library</a> for JAX LLMs highly optimized for TPUs</li>
3939
<li><a href="https://blog.vllm.ai/2025/10/16/vllm-tpu.html">vLLM on TPU</a> for high performance sampling (inference) for Reinforcement Learning (RL)</li>
4040
<li><a href="https://docs.cloud.google.com/ai-hypercomputer/docs/workloads/pathways-on-cloud/pathways-intro">Pathways</a> for multi-host inference (sampling) and highly efficient weight transfer</li>
4141
<li>Optional data loading libraries (<a href="https://google-grain.readthedocs.io/en/latest/">Grain</a> or <a href="https://www.tensorflow.org/guide/data">tf.data</a>)</li>

docs/install_maxtext.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ This is the easiest way to get started with the latest stable version.
7474
access to the `build_maxtext_docker_image`, `upload_maxtext_docker_image`,
7575
and `xpk` commands. For more details on building and uploading Docker
7676
images, see the
77-
[Build MaxText Docker Image](https://maxtext.readthedocs.io/en/latest/build_maxtext.html)
77+
[Build MaxText Docker Image](build_maxtext.md)
7878
guide.
7979
8080
```bash

docs/reference/architecture/jax_ai_libraries_chosen.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ For more information on using Orbax, please refer to https://github.com/google/o
5656

5757
1. **Deterministic by Design**: Grain allows storing data loader states, provides strong guarantees about data ordering and sharding even with preemptions, which is critical for reproducibility.
5858
2. **Global Shuffle**: Prevents local overfitting.
59-
3. **Built for Multi-Host Training**: The using random access file format streamlines [data loading in the multi-host environments](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/guides/data_input_pipeline.html#multihost-dataloading-best-practice).
59+
3. **Built for Multi-Host Training**: The using random access file format streamlines [data loading in the multi-host environments](../../guides/data_input_pipeline.md#multihost-dataloading-best-practice).
6060

6161
Its APIs are explicitly designed for the multi-host paradigm, simplifying the process of ensuring that each host loads a unique shard of the global batch.
6262

docs/reference/models/tiering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,4 @@ For each of the TPU platforms listed below, we present a list of optimized model
4040

4141
\[1\]: Performance results are subject to variations based on system configuration, software versions, and other factors. These benchmarks represent point-in-time measurements under specific conditions.
4242

43-
\[2\]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention ([PR #1988](https://github.com/AI-Hypercomputer/maxtext/pull/1988)), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the [performance metrics guide](https://maxtext.readthedocs.io/en/latest/reference/performance_metrics.html).
43+
\[2\]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention ([PR #1988](https://github.com/AI-Hypercomputer/maxtext/pull/1988)), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the [performance metrics guide](../performance_metrics.md).

0 commit comments

Comments
 (0)