Skip to content

Commit 99dbb2f

Browse files
committed
Fix broken links and formatting for documentation
Also adds API documentation to ToC. Fix path to setup_gcsfuse.sh
1 parent d245255 commit 99dbb2f

7 files changed

Lines changed: 32 additions & 26 deletions

File tree

docs/development.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc
99

1010
If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs.
1111

12-
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running:
12+
First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running:
1313

1414
```bash
15-
pip install -r src/dependencies/requirements/requirements_docs.txt
15+
uv pip install -r src/dependencies/requirements/requirements_docs.txt
1616
```
1717

18-
Once the dependencies are installed, navigate to the `docs/` directory and run the `sphinx-build`:
18+
Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run:
1919

2020
```bash
2121
cd docs

docs/guides/data_input_pipeline/data_input_grain.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,10 @@ Grain ensures determinism in data input pipelines by saving the pipeline's state
3434

3535
1. Grain currently supports three data formats: [ArrayRecord](https://github.com/google/array_record) (random access), [Parquet](https://arrow.apache.org/docs/python/parquet.html) (partial random-access through row groups) and [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord)(sequential access). Only the ArrayRecord format supports the global shuffle mentioned above. For converting a dataset into ArrayRecord, see [Apache Beam Integration for ArrayRecord](https://github.com/google/array_record/tree/main/beam). Additionally, other random access data sources can be supported via a custom [data source](https://google-grain.readthedocs.io/en/latest/data_sources/protocol.html) class.
3636
- **Community Resource**: The MaxText community has created a [ArrayRecord Documentation](https://array-record.readthedocs.io/). Note: we appreciate the contribution from the community, but as of now it has not been verified by the MaxText or ArrayRecord developers yet.
37-
2. If the dataset is hosted on a Cloud Storage bucket, the path `gs://` can be provided directly. However, for the best performance, it's recommended to read the bucket through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). This will significantly improve the perf for the ArrayRecord format as it allows meta data caching to speeds up random access. The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh). The script configures some parameters for the mount.
37+
2. If the dataset is hosted on a Cloud Storage bucket, the path `gs://` can be provided directly. However, for the best performance, it's recommended to read the bucket through [Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse). This will significantly improve the perf for the ArrayRecord format as it allows meta data caching to speeds up random access. The installation of Cloud Storage FUSE is included in [setup.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/setup.sh). The user then needs to mount the Cloud Storage bucket to a local path for each worker, using the script [setup_gcsfuse.sh](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/dependencies/scripts/setup_gcsfuse.sh). The script configures some parameters for the mount.
3838

3939
```sh
40-
bash tools/setup/setup_gcsfuse.sh \
40+
bash src/dependencies/scripts/setup_gcsfuse.sh \
4141
DATASET_GCS_BUCKET=${BUCKET_NAME?} \
4242
MOUNT_PATH=${MOUNT_PATH?} \
4343
[FILE_PATH=${MOUNT_PATH?}/my_dataset]
@@ -47,7 +47,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
4747

4848
1. Set `dataset_type=grain`, `grain_file_type={arrayrecord|parquet|tfrecord}`, `grain_train_files` in `src/maxtext/configs/base.yml` or through command line arguments to match the file pattern on the mounted local path.
4949

50-
2. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/google/maxtext/blob/main/tools/setup/setup_gcsfuse.sh) to avoid gcsfuse throttling.
50+
2. Tune `grain_worker_count` for performance. This parameter controls the number of child processes used by Grain (more details in [behind_the_scenes](https://google-grain.readthedocs.io/en/latest/behind_the_scenes.html)). If you use a large number of workers, check your config for gcsfuse in [setup_gcsfuse.sh](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/dependencies/scripts/setup_gcsfuse.sh) to avoid gcsfuse throttling.
5151

5252
3. ArrayRecord Only: For multi-source blending, you can specify multiple data sources with their respective weights using semicolon (;) as a separator and a comma (,) for weights. The weights will be automatically normalized to sum to 1.0. For example:
5353

@@ -109,7 +109,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
109109
4. Example command:
110110

111111
```sh
112-
bash tools/setup/setup_gcsfuse.sh \
112+
bash src/dependencies/scripts/setup_gcsfuse.sh \
113113
DATASET_GCS_BUCKET=maxtext-dataset \
114114
MOUNT_PATH=/tmp/gcsfuse && \
115115
python3 -m maxtext.trainers.pre_train.train \

docs/guides/optimization/benchmark_and_performance.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Different quantization recipes are available, including` "int8", "fp8", "fp8_ful
6969

7070
For v6e and earlier generation TPUs, use the "int8" recipe. For v7x and later generation TPUs, use "fp8_full". GPUs should use “fp8_gpu” for NVIDIA and "nanoo_fp8" for AMD.
7171

72-
See [](quantization).
72+
See [](quantization-doc).
7373

7474
### Choose sharding strategy
7575

@@ -98,16 +98,16 @@ There are two methods for asynchronous collective offloading:
9898

9999
1. Offload Collectives to Sparse Core:
100100

101-
This method is recommended for v7x. To enable it, set the following flags from \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70)\]:
101+
This method is recommended for v7x. To enable it, set the following flags from [link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70):
102102

103103
- `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR`
104104
- `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER`
105105
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER`
106106
- `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE`
107107

108-
2. Overlap Collective Using Continuation Fusion:\*\*
108+
2. Overlap Collective Using Continuation Fusion:
109109

110-
This method is recommended for v5p and v6e. To enable it, set the following flags \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)\]:
110+
This method is recommended for v5p and v6e. To enable it, set the following flags ([link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)):
111111

112112
- `CF_FOR_ALL_GATHER`
113113
- `CF_FOR_ALL_REDUCE`

docs/guides/optimization/custom_model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ Use these general runtime configurations to improve your model's performance.
8585

8686
## Step 3. Choose efficient sharding strategies using Roofline Analysis
8787

88-
To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
88+
To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding_on_TPUs) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/).
8989

9090
| TPU Type | ICI Arithmetic Intensity |
9191
| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |

docs/reference.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,37 +18,42 @@
1818

1919
Deep dive into MaxText architecture, models, and core concepts.
2020

21-
::::\{grid} 1 2 2 2
22-
:gutter: 2
23-
24-
:::\{grid-item-card} 📊 Performance Metrics
21+
````{grid} 1 2 2 2
22+
---
23+
gutter: 2
24+
---
25+
```{grid-item-card} 📊 Performance Metrics
2526
:link: reference/performance_metrics
2627
:link-type: doc
2728
2829
Understanding Model Flops Utilization (MFU), calculation methods, and why it matters for performance optimization.
29-
:::
30+
```
3031
31-
:::\{grid-item-card} 🤖 Models
32+
```{grid-item-card} 🤖 Models
3233
:link: reference/models
3334
:link-type: doc
3435
3536
Supported models and architectures, including Llama, Qwen, and Mixtral. Details on tiering and new additions.
36-
:::
37+
```
3738
38-
:::\{grid-item-card} 🏗️ Architecture
39+
```{grid-item-card} 🏗️ Architecture
3940
:link: reference/architecture
4041
:link-type: doc
4142
4243
High-level overview of MaxText design, JAX/XLA choices, and how components interact.
43-
:::
44+
```
4445
45-
:::\{grid-item-card} 💡 Core Concepts
46+
```{grid-item-card} 💡 Core Concepts
4647
:link: reference/core_concepts
4748
:link-type: doc
4849
4950
Key concepts including checkpointing strategies, quantization, tiling, and Mixture of Experts (MoE) configuration.
50-
:::
51-
::::
51+
```
52+
````
53+
54+
## 📚 API Reference
55+
56+
Find comprehensive API documentation for MaxText modules, classes, and functions in the [API Reference page](reference/api.rst).
5257

5358
```{toctree}
5459
---
@@ -59,4 +64,5 @@ reference/performance_metrics
5964
reference/models
6065
reference/architecture
6166
reference/core_concepts
67+
reference/api.rst
6268
```

docs/reference/core_concepts/quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
limitations under the License.
1515
-->
1616

17-
(quantization)=
17+
(quantization-doc)=
1818

1919
# Quantization
2020

docs/tutorials/pretraining.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ eval metrics after step: 9, loss=9.420, total_weights=75264.0
8787

8888
Grain is a library for reading data for training and evaluating JAX models. It is the recommended input pipeline for determinism and resilience! It supports data formats like ArrayRecord and Parquet. You can check [Grain pipeline](../guides/data_input_pipeline/data_input_grain.md) for more details.
8989

90-
**Data preparation**: You need to download data to a Cloud Storage bucket, and read data via Cloud Storage Fuse with [setup_gcsfuse.sh](https://github.com/AI-Hypercomputer/maxtext/blob/main/tools/setup/setup_gcsfuse.sh).
90+
**Data preparation**: You need to download data to a Cloud Storage bucket, and read data via Cloud Storage Fuse with [setup_gcsfuse.sh](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/dependencies/scripts/setup_gcsfuse.sh).
9191

9292
- For example, we can mount the bucket `gs://maxtext-dataset` on the local path `/tmp/gcsfuse` before training
9393
```bash

0 commit comments

Comments
 (0)