Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc

If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs.

First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running:

```bash
pip install -r src/dependencies/requirements/requirements_docs.txt
uv pip install -r src/dependencies/requirements/requirements_docs.txt
```

Once the dependencies are installed, navigate to the `docs/` directory and run the `sphinx-build`:
Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run:

```bash
cd docs
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/data_input_pipeline/data_input_grain.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
4. Example command:

```sh
bash tools/setup/setup_gcsfuse.sh \
bash src/dependencies/scripts/setup_gcsfuse.sh \
DATASET_GCS_BUCKET=maxtext-dataset \
MOUNT_PATH=/tmp/gcsfuse && \
python3 -m maxtext.trainers.pre_train.train \
Expand Down
8 changes: 4 additions & 4 deletions docs/guides/optimization/benchmark_and_performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Different quantization recipes are available, including` "int8", "fp8", "fp8_ful

For v6e and earlier generation TPUs, use the "int8" recipe. For v7x and later generation TPUs, use "fp8_full". GPUs should use “fp8_gpu” for NVIDIA and "nanoo_fp8" for AMD.

See [](quantization).
See [](quantization-doc).

### Choose sharding strategy

Expand Down Expand Up @@ -98,16 +98,16 @@ There are two methods for asynchronous collective offloading:

1. Offload Collectives to Sparse Core:

This method is recommended for v7x. To enable it, set the following flags from \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70)\]:
This method is recommended for v7x. To enable it, set the following flags from [link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70):

- `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR`
- `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER`
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER`
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE`

2. Overlap Collective Using Continuation Fusion:\*\*
2. Overlap Collective Using Continuation Fusion:

This method is recommended for v5p and v6e. To enable it, set the following flags \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)\]:
This method is recommended for v5p and v6e. To enable it, set the following flags ([link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)):

- `CF_FOR_ALL_GATHER`
- `CF_FOR_ALL_REDUCE`
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/optimization/custom_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Use these general runtime configurations to improve your model's performance.

## Step 3. Choose efficient sharding strategies using Roofline Analysis

To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding_on_TPUs) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).

| TPU Type | ICI Arithmetic Intensity |
| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand Down
30 changes: 18 additions & 12 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,42 @@

Deep dive into MaxText architecture, models, and core concepts.

::::{grid} 1 2 2 2
:gutter: 2

:::{grid-item-card} 📊 Performance Metrics
````{grid} 1 2 2 2
---
gutter: 2
---
```{grid-item-card} 📊 Performance Metrics
:link: reference/performance_metrics
:link-type: doc

Understanding Model Flops Utilization (MFU), calculation methods, and why it matters for performance optimization.
:::
```

:::{grid-item-card} 🤖 Models
```{grid-item-card} 🤖 Models
:link: reference/models
:link-type: doc

Supported models and architectures, including Llama, Qwen, and Mixtral. Details on tiering and new additions.
:::
```

:::{grid-item-card} 🏗️ Architecture
```{grid-item-card} 🏗️ Architecture
:link: reference/architecture
:link-type: doc

High-level overview of MaxText design, JAX/XLA choices, and how components interact.
:::
```

:::{grid-item-card} 💡 Core Concepts
```{grid-item-card} 💡 Core Concepts
:link: reference/core_concepts
:link-type: doc

Key concepts including checkpointing strategies, quantization, tiling, and Mixture of Experts (MoE) configuration.
:::
::::
```
````

## 📚 API Reference

Find comprehensive API documentation for MaxText modules, classes, and functions in the [API Reference page](reference/api.rst).

```{toctree}
---
Expand All @@ -59,4 +64,5 @@ reference/performance_metrics
reference/models
reference/architecture
reference/core_concepts
reference/api.rst
```
2 changes: 1 addition & 1 deletion docs/reference/core_concepts/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
limitations under the License.
-->

(quantization)=
(quantization-doc)=

# Quantization

Expand Down
Loading