Fix broken links and formatting for documentation

melissawm · melissawm · commit c98e48992b62 · 2026-03-13T10:32:11.000-03:00
Also adds API documentation to ToC.
diff --git a/docs/development.md b/docs/development.md
@@ -9,22 +9,22 @@ The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc
 
 If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs.
 
-First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
+First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running:
 
 ```bash
-pip install -r src/dependencies/requirements/requirements_docs.txt
+uv pip install -r src/dependencies/requirements/requirements_docs.txt
 ```
 
-Once the dependencies are installed, you can navigate to the `docs/` folder and run:
+Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run:
 
 ```bash
-sphinx-build -b html . _build/html
+uv run sphinx-build -b html . _build/html
 ```
 
 This will generate the documentation in the `docs/_build/html` directory. These files can be opened in a web browser directly, or you can use a simple HTTP server to serve the files. For example, you can run:
 
 ```bash
-python -m http.server -d _build/html
+uv run python -m http.server -d _build/html
 ```
 
 Then, open your web browser and navigate to `http://localhost:8000` to view the documentation.
diff --git a/docs/guides/model_bringup.md b/docs/guides/model_bringup.md
@@ -93,7 +93,7 @@ For models with existing Hugging Face support, you can validate parity using the
 
 ### 5.2 Eval Benchmark
 
-MaxText integrates with benchmark libraries like [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [evalchemy](https://github.com/mlfoundations/evalchemy) to facilitate rapid verification of common inference scores ([guide](../../benchmarks/api_server)). This is particularly useful for validating decoding outputs or assessing model performance when logits deviate slightly from reference values.
+MaxText integrates with benchmark libraries like [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [evalchemy](https://github.com/mlfoundations/evalchemy) to facilitate rapid verification of common inference scores ([guide](https://github.com/AI-Hypercomputer/maxtext/tree/main/benchmarks/api_server)). This is particularly useful for validating decoding outputs or assessing model performance when logits deviate slightly from reference values.
 
 ## 6. Completion Checklist
 
diff --git a/docs/guides/optimization/benchmark_and_performance.md b/docs/guides/optimization/benchmark_and_performance.md
@@ -18,7 +18,7 @@ Begin your benchmarking efforts by performing an arithmetic intensity analysis.
 
 Arithmetic intensity is calculated as the ratio of floating-point operations (FLOPs) to memory(bytes) or communication(bytes).
 
-*   **Arithmetic Intensity = FLOPs / Bytes**
+- **Arithmetic Intensity = FLOPs / Bytes**
 
 This metric helps determine whether a computation is MXU-bound (high arithmetic intensity) or memory-bound/communication-bound (low arithmetic intensity).
 
@@ -28,8 +28,8 @@ This metric helps determine whether a computation is MXU-bound (high arithmetic
 
 For benchmarking purposes, we collect the step time for training. This step time is then used to calculate MFU and throughputs, which provide insights into the utilization achieved for each benchmark workload.
 
-*   **MFU = flops_train_step / step_time / peak HW FLOPS**
-*   **Throughput = global tokens / step_time / number of devices**
+- **MFU = flops_train_step / step_time / peak HW FLOPS**
+- **Throughput = global tokens / step_time / number of devices**
 
 More detailed are explained in [](performance-metrics).
 
@@ -69,7 +69,7 @@ Different quantization recipes are available, including` "int8", "fp8", "fp8_ful
 
 For v6e and earlier generation TPUs, use the "int8" recipe. For v7x and later generation TPUs, use "fp8_full". GPUs should use “fp8_gpu” for NVIDIA and "nanoo_fp8" for AMD.
 
-See [](quantization).
+See [](quantization-doc).
 
 ### Choose sharding strategy
 
@@ -98,19 +98,19 @@ There are two methods for asynchronous collective offloading:
 
 1. Offload Collectives to Sparse Core:
 
-    This method is recommended for v7x. To enable it, set the following flags from [[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70)]:
+   This method is recommended for v7x. To enable it, set the following flags from [link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70):
 
-*   `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR`
-*   `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER`
-*   `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER`
-*   `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE`
+- `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR`
+- `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER`
+- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER`
+- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE`
 
- 2. Overlap Collective Using Continuation Fusion:**
+2. Overlap Collective Using Continuation Fusion:
 
-    This method is recommended for v5p and v6e. To enable it, set the following flags [[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)]:
+   This method is recommended for v5p and v6e. To enable it, set the following flags ([link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)):
 
-*   `CF_FOR_ALL_GATHER`
-*   `CF_FOR_ALL_REDUCE`
+- `CF_FOR_ALL_GATHER`
+- `CF_FOR_ALL_REDUCE`
 
 Those XLA can be set via `LIBTPU_INIT_ARGS`
 
diff --git a/docs/guides/optimization/custom_model.md b/docs/guides/optimization/custom_model.md
@@ -85,7 +85,7 @@ Use these general runtime configurations to improve your model's performance.
 
 ## Step 3. Choose efficient sharding strategies using Roofline Analysis
 
-To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
+To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding_on_TPUs) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
 
 | TPU Type | ICI Arithmetic Intensity                                                                                                                           |
 | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
diff --git a/docs/install_maxtext.md b/docs/install_maxtext.md
@@ -24,6 +24,7 @@ MaxText offers three installation modes:
 3. maxtext[tpu-post-train]. Used for post-training on TPUs. Currently, this option should also be used for running vllm_decode on TPUs.
 
 ## From PyPI (Recommended)
+
 This is the easiest way to get started with the latest stable version.
 
 ```bash
@@ -49,13 +50,15 @@ install_maxtext_cuda12_github_dep
 uv pip install maxtext[tpu-post-train] --resolution=lowest
 install_maxtext_tpu_post_train_extra_deps
 ```
+
 > **Note:** The `install_maxtext_tpu_github_deps`, `install_maxtext_cuda12_github_dep`, and
 `install_maxtext_tpu_post_train_extra_deps` commands are temporarily required to install dependencies directly from GitHub
 that are not yet available on PyPI. As shown above, choose the one that corresponds to your use case.
 
 > **Note:** The maxtext package contains a comprehensive list of all direct and transitive dependencies, with lower bounds, generated by [seed-env](https://github.com/google-ml-infra/actions/tree/main/python_seed_env). We highly recommend the `--resolution=lowest` flag. It instructs `uv` to install the specific, tested versions of dependencies defined by MaxText, rather than the latest available ones. This ensures a consistent and reproducible environment, which is critical for stable performance and for running benchmarks.
 
 ## From Source
+
 If you plan to contribute to MaxText or need the latest unreleased features, install from source.
 
 ```bash
@@ -98,11 +101,11 @@ Please keep dependencies updated throughout development. This will allow each co
 
 To update dependencies, you will follow these general steps:
 
-1.  **Modify Base Requirements**: Update the desired dependencies in `base_requirements/requirements.txt` or the hardware-specific files (`base_requirements/tpu-base-requirements.txt`, `base_requirements/gpu-base-requirements.txt`).
-2.  **Generate New Files**: Run the `seed-env` CLI tool to generate new, fully-pinned requirements files based on your changes.
-3.  **Update Project Files**: Copy the newly generated files into the `generated_requirements/` directory.
-4.  **Handle GitHub Dependencies**: Move any dependencies that are installed directly from GitHub from the generated files to `src/install_maxtext_extra_deps/extra_deps_from_github.txt`.
-5.  **Verify**: Test the new dependencies to ensure the project installs and runs correctly.
+1. **Modify Base Requirements**: Update the desired dependencies in `base_requirements/requirements.txt` or the hardware-specific files (`base_requirements/tpu-base-requirements.txt`, `base_requirements/gpu-base-requirements.txt`).
+2. **Generate New Files**: Run the `seed-env` CLI tool to generate new, fully-pinned requirements files based on your changes.
+3. **Update Project Files**: Copy the newly generated files into the `generated_requirements/` directory.
+4. **Handle GitHub Dependencies**: Move any dependencies that are installed directly from GitHub from the generated files to `src/install_maxtext_extra_deps/extra_deps_from_github.txt`.
+5. **Verify**: Test the new dependencies to ensure the project installs and runs correctly.
 
 The following sections provide detailed instructions for each step.
 
@@ -154,25 +157,26 @@ seed-env \
 
 After generating the new requirements, you need to update the files in the MaxText repository.
 
-1.  **Copy the generated files:**
-    -   Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`.
-    -   Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`.
+1. **Copy the generated files:**
+
+   - Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`.
+   - Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`.
 
-2.  **Update `extra_deps_from_github.txt` (if necessary):**
-    Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files.
+2. **Update `extra_deps_from_github.txt` (if necessary):**
+   Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files.
 
 ## Step 5: Verify the New Dependencies
 
 Finally, test that the new dependencies install correctly and that MaxText runs as expected.
 
-1.  **Create a clean environment:** It's best to start with a fresh Python virtual environment.
+1. **Create a clean environment:** It's best to start with a fresh Python virtual environment.
 
 ```bash
 uv venv --python 3.12 --seed maxtext_venv
 source maxtext_venv/bin/activate
 ```
 
-2.  **Run the setup script:** Execute `bash setup.sh` to install the new dependencies.
+2. **Run the setup script:** Execute `bash setup.sh` to install the new dependencies.
 
 ```bash
 pip install uv
@@ -183,4 +187,4 @@ uv pip install -e .[tpu] --resolution=lowest
 install_maxtext_github_deps
 ```
 
-3.  **Run tests:** Run MaxText tests to ensure there are no regressions.
+3. **Run tests:** Run MaxText tests to ensure there are no regressions.
diff --git a/docs/reference.md b/docs/reference.md
@@ -18,44 +18,51 @@
 
 Deep dive into MaxText architecture, models, and core concepts.
 
-::::{grid} 1 2 2 2
-:gutter: 2
-
-:::{grid-item-card} 📊 Performance Metrics
+````{grid} 1 2 2 2
+---
+gutter: 2
+---
+```{grid-item-card} 📊 Performance Metrics
 :link: reference/performance_metrics
 :link-type: doc
 
 Understanding Model Flops Utilization (MFU), calculation methods, and why it matters for performance optimization.
-:::
+```
 
-:::{grid-item-card} 🤖 Models
+```{grid-item-card} 🤖 Models
 :link: reference/models
 :link-type: doc
 
 Supported models and architectures, including Llama, Qwen, and Mixtral. Details on tiering and new additions.
-:::
+```
 
-:::{grid-item-card} 🏗️ Architecture
+```{grid-item-card} 🏗️ Architecture
 :link: reference/architecture
 :link-type: doc
 
 High-level overview of MaxText design, JAX/XLA choices, and how components interact.
-:::
+```
 
-:::{grid-item-card} 💡 Core Concepts
+```{grid-item-card} 💡 Core Concepts
 :link: reference/core_concepts
 :link-type: doc
 
 Key concepts including checkpointing strategies, quantization, tiling, and Mixture of Experts (MoE) configuration.
-:::
-::::
+```
+````
 
-```{toctree}
-:hidden:
-:maxdepth: 1
+## 📚 API Reference
 
+Find comprehensive API documentation for MaxText modules, classes, and functions in the [API Reference page](reference/api.rst).
+
+```{toctree}
+---
+hidden:
+maxdepth: 1
+---
 reference/performance_metrics
 reference/models
 reference/architecture
 reference/core_concepts
+reference/api.rst
 ```
diff --git a/docs/reference/core_concepts/quantization.md b/docs/reference/core_concepts/quantization.md
@@ -14,7 +14,7 @@
  limitations under the License.
  -->
 
-(quantization)=
+(quantization-doc)=
 
 # Quantization