Skip to content

Commit b7dbba7

Browse files
Merge pull request #3247 from melissawm:docs-update
PiperOrigin-RevId: 897696885
2 parents b83ff02 + e353494 commit b7dbba7

6 files changed

Lines changed: 28 additions & 22 deletions

File tree

docs/development.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc
99

1010
If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs.
1111

12-
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
12+
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running:
1313

1414
```bash
15-
pip install -r src/dependencies/requirements/requirements_docs.txt
15+
uv pip install -r src/dependencies/requirements/requirements_docs.txt
1616
```
1717

18-
Once the dependencies are installed, navigate to the `docs/` directory and run the `sphinx-build`:
18+
Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run:
1919

2020
```bash
2121
cd docs

docs/guides/data_input_pipeline/data_input_grain.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
109109
4. Example command:
110110

111111
```sh
112-
bash tools/setup/setup_gcsfuse.sh \
112+
bash src/dependencies/scripts/setup_gcsfuse.sh \
113113
DATASET_GCS_BUCKET=maxtext-dataset \
114114
MOUNT_PATH=/tmp/gcsfuse && \
115115
python3 -m maxtext.trainers.pre_train.train \

docs/guides/optimization/benchmark_and_performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Different quantization recipes are available, including` "int8", "fp8", "fp8_ful
6969

7070
For v6e and earlier generation TPUs, use the "int8" recipe. For v7x and later generation TPUs, use "fp8_full". GPUs should use “fp8_gpu” for NVIDIA and "nanoo_fp8" for AMD.
7171

72-
See [](quantization).
72+
See [](quantization-doc).
7373

7474
### Choose sharding strategy
7575

@@ -98,16 +98,16 @@ There are two methods for asynchronous collective offloading:
9898

9999
1. Offload Collectives to Sparse Core:
100100

101-
This method is recommended for v7x. To enable it, set the following flags from \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70)\]:
101+
This method is recommended for v7x. To enable it, set the following flags from [link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70):
102102

103103
- `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR`
104104
- `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER`
105105
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER`
106106
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE`
107107

108-
2. Overlap Collective Using Continuation Fusion:\*\*
108+
2. Overlap Collective Using Continuation Fusion:
109109

110-
This method is recommended for v5p and v6e. To enable it, set the following flags \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)\]:
110+
This method is recommended for v5p and v6e. To enable it, set the following flags ([link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)):
111111

112112
- `CF_FOR_ALL_GATHER`
113113
- `CF_FOR_ALL_REDUCE`

docs/guides/optimization/custom_model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Use these general runtime configurations to improve your model's performance.
8585

8686
## Step 3. Choose efficient sharding strategies using Roofline Analysis
8787

88-
To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
88+
To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding_on_TPUs) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
8989

9090
| TPU Type | ICI Arithmetic Intensity |
9191
| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |

docs/reference.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,37 +18,42 @@
1818

1919
Deep dive into MaxText architecture, models, and core concepts.
2020

21-
::::{grid} 1 2 2 2
22-
:gutter: 2
23-
24-
:::{grid-item-card} 📊 Performance Metrics
21+
````{grid} 1 2 2 2
22+
---
23+
gutter: 2
24+
---
25+
```{grid-item-card} 📊 Performance Metrics
2526
:link: reference/performance_metrics
2627
:link-type: doc
2728
2829
Understanding Model Flops Utilization (MFU), calculation methods, and why it matters for performance optimization.
29-
:::
30+
```
3031
31-
:::{grid-item-card} 🤖 Models
32+
```{grid-item-card} 🤖 Models
3233
:link: reference/models
3334
:link-type: doc
3435
3536
Supported models and architectures, including Llama, Qwen, and Mixtral. Details on tiering and new additions.
36-
:::
37+
```
3738
38-
:::{grid-item-card} 🏗️ Architecture
39+
```{grid-item-card} 🏗️ Architecture
3940
:link: reference/architecture
4041
:link-type: doc
4142
4243
High-level overview of MaxText design, JAX/XLA choices, and how components interact.
43-
:::
44+
```
4445
45-
:::{grid-item-card} 💡 Core Concepts
46+
```{grid-item-card} 💡 Core Concepts
4647
:link: reference/core_concepts
4748
:link-type: doc
4849
4950
Key concepts including checkpointing strategies, quantization, tiling, and Mixture of Experts (MoE) configuration.
50-
:::
51-
::::
51+
```
52+
````
53+
54+
## 📚 API Reference
55+
56+
Find comprehensive API documentation for MaxText modules, classes, and functions in the [API Reference page](reference/api.rst).
5257

5358
```{toctree}
5459
---
@@ -59,4 +64,5 @@ reference/performance_metrics
5964
reference/models
6065
reference/architecture
6166
reference/core_concepts
67+
reference/api.rst
6268
```

docs/reference/core_concepts/quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
limitations under the License.
1515
-->
1616

17-
(quantization)=
17+
(quantization-doc)=
1818

1919
# Quantization
2020

0 commit comments

Comments
 (0)