diff --git a/docs/development.md b/docs/development.md index 19705ebee0..7237229814 100644 --- a/docs/development.md +++ b/docs/development.md @@ -9,13 +9,13 @@ The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs. -First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo and running: +First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running: ```bash -pip install -r src/dependencies/requirements/requirements_docs.txt +uv pip install -r src/dependencies/requirements/requirements_docs.txt ``` -Once the dependencies are installed, navigate to the `docs/` directory and run the `sphinx-build`: +Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run: ```bash cd docs diff --git a/docs/guides/data_input_pipeline/data_input_grain.md b/docs/guides/data_input_pipeline/data_input_grain.md index 497f968125..5a7d66981d 100644 --- a/docs/guides/data_input_pipeline/data_input_grain.md +++ b/docs/guides/data_input_pipeline/data_input_grain.md @@ -109,7 +109,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr 4. Example command: ```sh -bash tools/setup/setup_gcsfuse.sh \ +bash src/dependencies/scripts/setup_gcsfuse.sh \ DATASET_GCS_BUCKET=maxtext-dataset \ MOUNT_PATH=/tmp/gcsfuse && \ python3 -m maxtext.trainers.pre_train.train \ diff --git a/docs/guides/optimization/benchmark_and_performance.md b/docs/guides/optimization/benchmark_and_performance.md index 858bcb9673..2f50feb644 100644 --- a/docs/guides/optimization/benchmark_and_performance.md +++ b/docs/guides/optimization/benchmark_and_performance.md @@ -69,7 +69,7 @@ Different quantization recipes are available, including` "int8", "fp8", "fp8_ful For v6e and earlier generation TPUs, use the "int8" recipe. For v7x and later generation TPUs, use "fp8_full". GPUs should use “fp8_gpu” for NVIDIA and "nanoo_fp8" for AMD. -See [](quantization). +See [](quantization-doc). ### Choose sharding strategy @@ -98,16 +98,16 @@ There are two methods for asynchronous collective offloading: 1. Offload Collectives to Sparse Core: - This method is recommended for v7x. To enable it, set the following flags from \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70)\]: + This method is recommended for v7x. To enable it, set the following flags from [link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L70): - `ENABLE_SPARSECORE_OFFLOADING_FOR_RS_AG_AR` - `ENABLE_SPARSECORE_OFFLOADING_FOR_REDUCE_SCATTER` - `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_GATHER` - `ENABLE_SPARSECORE_OFFLOADING_FOR_ALL_REDUCE` -2. Overlap Collective Using Continuation Fusion:\*\* +2. Overlap Collective Using Continuation Fusion: - This method is recommended for v5p and v6e. To enable it, set the following flags \[[link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)\]: + This method is recommended for v5p and v6e. To enable it, set the following flags ([link](https://github.com/AI-Hypercomputer/maxtext/blob/main/benchmarks/xla_flags_library.py#L39)): - `CF_FOR_ALL_GATHER` - `CF_FOR_ALL_REDUCE` diff --git a/docs/guides/optimization/custom_model.md b/docs/guides/optimization/custom_model.md index 3ba6a1df59..991c322a99 100644 --- a/docs/guides/optimization/custom_model.md +++ b/docs/guides/optimization/custom_model.md @@ -85,7 +85,7 @@ Use these general runtime configurations to improve your model's performance. ## Step 3. Choose efficient sharding strategies using Roofline Analysis -To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/). +To achieve good performance, it's often necessary to co-design the model's dimensions (like the MLP dimension) along with the sharding strategy. We have included examples for [v5p](https://docs.cloud.google.com/tpu/docs/v5p), [Trillium](https://docs.cloud.google.com/tpu/docs/v6e), and [Ironwood](https://docs.cloud.google.com/tpu/docs/tpu7x) that demonstrate which sharding approaches work well for specific models. We recommend reading [](sharding_on_TPUs) and Jax’s [scaling book](https://jax-ml.github.io/scaling-book/sharding/). | TPU Type | ICI Arithmetic Intensity | | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | diff --git a/docs/reference.md b/docs/reference.md index 3c8d8acc6e..fe8d74faa6 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -18,37 +18,42 @@ Deep dive into MaxText architecture, models, and core concepts. -::::{grid} 1 2 2 2 -:gutter: 2 - -:::{grid-item-card} 📊 Performance Metrics +````{grid} 1 2 2 2 +--- +gutter: 2 +--- +```{grid-item-card} 📊 Performance Metrics :link: reference/performance_metrics :link-type: doc Understanding Model Flops Utilization (MFU), calculation methods, and why it matters for performance optimization. -::: +``` -:::{grid-item-card} 🤖 Models +```{grid-item-card} 🤖 Models :link: reference/models :link-type: doc Supported models and architectures, including Llama, Qwen, and Mixtral. Details on tiering and new additions. -::: +``` -:::{grid-item-card} 🏗️ Architecture +```{grid-item-card} 🏗️ Architecture :link: reference/architecture :link-type: doc High-level overview of MaxText design, JAX/XLA choices, and how components interact. -::: +``` -:::{grid-item-card} 💡 Core Concepts +```{grid-item-card} 💡 Core Concepts :link: reference/core_concepts :link-type: doc Key concepts including checkpointing strategies, quantization, tiling, and Mixture of Experts (MoE) configuration. -::: -:::: +``` +```` + +## 📚 API Reference + +Find comprehensive API documentation for MaxText modules, classes, and functions in the [API Reference page](reference/api.rst). ```{toctree} --- @@ -59,4 +64,5 @@ reference/performance_metrics reference/models reference/architecture reference/core_concepts +reference/api.rst ``` diff --git a/docs/reference/core_concepts/quantization.md b/docs/reference/core_concepts/quantization.md index 6f72da9ea9..dae117a85a 100644 --- a/docs/reference/core_concepts/quantization.md +++ b/docs/reference/core_concepts/quantization.md @@ -14,7 +14,7 @@ limitations under the License. --> -(quantization)= +(quantization-doc)= # Quantization