diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0dd851418d..02ebb9a8fc 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,4 +30,4 @@ This project follows All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult [GitHub Help](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) for more -information on using pull requests. \ No newline at end of file +information on using pull requests. diff --git a/DOCS.md b/DOCS.md deleted file mode 100644 index 7b7f419ab9..0000000000 --- a/DOCS.md +++ /dev/null @@ -1,45 +0,0 @@ - -Documentation… documentation! -============================= - -## Dependencies -First install the dependencies: -```sh -$ python3 -m pip install -r requirements_docs.txt -``` -(or `uv pip install` …) - -## Build -```sh -$ sphinx-build -M html docs out -``` - -## Serve -You can use any static file HTTP server, e.g.: -```sh -$ python3 -m http.server -d 'out/html' -``` - -## Build & server (watch for changes) -```sh -$ python3 -m pip install sphinx-autobuild -$ sphinx-autobuild docs out -``` - -## Release to readthedocs - -See GitHub Action diff --git a/docs/development.md b/docs/development.md index 7237229814..c4e671f44b 100644 --- a/docs/development.md +++ b/docs/development.md @@ -1,39 +1,10 @@ ```{include} ../CONTRIBUTING.md ``` -## Contribute to documentation - -The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc.org) and [MyST](https://myst-parser.readthedocs.io/en/latest/). Documents are written in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/typography.html#syntax-core). - -### Building the documentation locally (optional) - -If you are writing documentation for MaxText, you may want to preview the documentation site locally to ensure things work as expected before a deployment to Read The Docs. - -First, make sure you install the necessary dependencies. You can do this by navigating to your local clone of the MaxText repo, following the [local installation instructions](install_maxtext.md) and running: - -```bash -uv pip install -r src/dependencies/requirements/requirements_docs.txt -``` - -Once the dependencies are installed and your `maxtext_venv` virtual environment is activated, you can navigate to the `docs/` folder and run: - -```bash -cd docs -sphinx-build -b html . _build/html +```{toctree} +--- +hidden: +--- +development/update_dependencies.md +development/contribute_docs.md ``` - -This will generate the documentation in the `docs/_build/html` directory. These files can be opened in a web browser directly, or you can use a simple HTTP server to serve the files. For example, you can run: - -```bash -python -m http.server -d _build/html -``` - -Then, open your web browser and navigate to `http://localhost:8000` to view the documentation. - -### Adding new documentation files - -If you are adding a new document, make sure it is included in the `toctree` directive corresponding to the section where the new document should live. For example, if adding a new tutorial, make sure it is listed in [the `docs/tutorials.md`](https://github.com/AI-Hypercomputer/maxtext/blob/7070e8eecbea8951c8e5281219ce797c8df1441f/docs/tutorials.md?plain=1#L38). - -### Documentation deployment - -The MaxText documentation is deployed to [https://maxtext.readthedocs.io](https://maxtext.readthedocs.io) on any successful merge to the main branch. diff --git a/docs/development/contribute_docs.md b/docs/development/contribute_docs.md new file mode 100644 index 0000000000..d65fbcfe18 --- /dev/null +++ b/docs/development/contribute_docs.md @@ -0,0 +1,69 @@ + + +# Contribute to documentation + +The MaxText documentation website is built using [Sphinx](https://www.sphinx-doc.org) +and [MyST](https://myst-parser.readthedocs.io/en/latest/). Documents are written +in [MyST Markdown syntax](https://myst-parser.readthedocs.io/en/latest/syntax/typography.html#syntax-core). + +## Building the documentation locally (optional) + +If you are writing documentation for MaxText, you may want to preview the +documentation site locally to ensure things work as expected before a deployment +to [Read The Docs](https://readthedocs.org/). + +First, make sure you +[install MaxText from source](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) +and install the necessary dependencies. You can do this by navigating to your +local clone of the MaxText repo and running: + +```bash +uv pip install -r src/dependencies/requirements/requirements_docs.txt +``` + +Once the dependencies are installed and your `maxtext_venv` virtual environment +is activated, you can navigate to the `docs/` folder and run: + +```bash +sphinx-build -b html . _build/html +``` + +This will generate the documentation in the `docs/_build/html` directory. These +files can be opened in a web browser directly, or you can use a simple HTTP +server to serve the files. For example, you can run: + +```bash +python -m http.server -d _build/html +``` + +Then, open your web browser and navigate to `http://localhost:8000` to view the +documentation. + +## Adding new documentation files + +If you are adding a new document, make sure it is included in the `toctree` +directive corresponding to the section where the new document should live. For +example, if adding a new tutorial, make sure it is listed in +[the `docs/tutorials.md`](https://github.com/AI-Hypercomputer/maxtext/blob/7070e8eecbea8951c8e5281219ce797c8df1441f/docs/tutorials.md?plain=1#L38) +toctree. + +## Documentation deployment + +The latest version of the MaxText documentation, tracking the main branch of +development, is automatically deployed to +[https://maxtext.readthedocs.io/en/latest](https://maxtext.readthedocs.io/en/latest) +on any successful merge to the main branch. diff --git a/docs/development/update_dependencies.md b/docs/development/update_dependencies.md new file mode 100644 index 0000000000..4872ef788f --- /dev/null +++ b/docs/development/update_dependencies.md @@ -0,0 +1,148 @@ + + +# Update MaxText dependencies + +## Introduction + +This document provides a guide to updating dependencies in MaxText using the +[seed-env](https://github.com/google-ml-infra/actions/tree/main/python_seed_env) +tool. `seed-env` helps generate deterministic and reproducible Python +environments by creating fully-pinned `requirements.txt` files from a base set +of requirements. + +Please keep dependencies updated throughout development. This will allow each +commit to work properly from both a feature and dependency perspective. We will +periodically upload commits to PyPI for stable releases. But it is also critical +to keep dependencies in sync for users installing MaxText from source. + +## Overview of the process + +To update dependencies, you will follow these general steps: + +1. **Modify base requirements**: Update the desired dependencies in + `src/dependencies/requirements/base_requirements/requirements.txt` or the hardware-specific files + (`src/dependencies/requirements/base_requirements/tpu-base-requirements.txt`, + `src/dependencies/requirements/base_requirements/gpu-base-requirements.txt`). +2. **Find the JAX build commit hash**: The dependency generation process is + pinned to a specific nightly build of JAX. You need to find the commit hash + for the desired JAX build. +3. **Generate the requirement files**: Run the `seed-env` CLI tool to generate + new, fully-pinned requirements files based on your changes. +4. **Update project files**: Copy the newly generated files into the + `src/dependencies/requirements/generated_requirements/` directory. If + necessary, also update any dependencies that are installed directly from + GitHub from the generated files to `src/dependencies/extra_deps`. +5. **Verify the new dependencies**: Test the new dependencies to ensure the + project installs and runs correctly. + +The following sections provide detailed instructions for each step. + +## Step 0: Install `seed-env` + +First, you need to install the `seed-env` command-line tool. We recommend +installing `uv` first following +[uv's official installation instructions](https://docs.astral.sh/uv/getting-started/installation/) +and then using it to install `seed-env`: + +```bash +uv venv --python 3.12 --seed seed_venv +source seed_venv/bin/activate +uv pip install seed-env +``` + +Alternatively, follow the instructions in the +[seed-env repository](https://github.com/google-ml-infra/actions/tree/main/python_seed_env#install-the-seed-env-tool) +if you want to build `seed-env` from source. + +## Step 1: Modify base requirements + +Update the desired dependencies in +`src/dependencies/requirements/base_requirements/requirements.txt` or the +hardware-specific files +(`src/dependencies/requirements/base_requirements/tpu-base-requirements.txt`, +`src/dependencies/requirements/base_requirements/gpu-base-requirements.txt`). + +## Step 2: Find the JAX build commit hash + +The dependency generation process is pinned to a specific nightly build of JAX. +You need to find the commit hash for the desired JAX build. + +You can find the latest commit hashes in the +[JAX `build/` folder](https://github.com/jax-ml/jax/commits/main/build). Choose +a recent, successful build and copy its full commit hash. + +## Step 3: Generate the requirements files + +Next, run the `seed-env` CLI to generate the new requirements files. You will +need to do this separately for the TPU and GPU environments. The generated files +will be placed in a directory specified by `--output-dir`. + +### For TPU + +Run the following command, replacing `` with the hash you +copied in the previous step. + +```bash +seed-env \ + --local-requirements=src/dependencies/requirements/base_requirements/tpu-base-requirements.txt \ + --host-name=MaxText \ + --seed-commit= \ + --python-version=3.12 \ + --requirements-txt=tpu-requirements.txt \ + --output-dir=generated_tpu_artifacts +``` + +### For GPU + +Similarly, run the command for the GPU requirements. + +```bash +seed-env \ + --local-requirements=src/dependencies/requirements/base_requirements/gpu-base-requirements.txt \ + --host-name=MaxText \ + --seed-commit= \ + --python-version=3.12 \ + --requirements-txt=cuda12-requirements.txt \ + --hardware=cuda12 \ + --output-dir=generated_gpu_artifacts +``` + +## Step 4: Update project files + +After generating the new requirements, you need to update the files in the +MaxText repository. + +1. **Copy the generated files:** + + - Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`. + - Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`. + +2. **Update `src/dependencies/extra_deps` (if necessary):** + Currently, MaxText uses a few dependencies, such as `mlperf-logging` and + `google-jetstream`, that are installed directly from GitHub source. These are + defined in `base_requirements/requirements.txt`, and the `seed-env` tool will + carry them over to the generated requirements files. + +## Step 5: Verify the new dependencies + +Finally, test that the new dependencies install correctly and that MaxText runs +as expected. + +1. **Install MaxText:** Follow the instructions to + [install MaxText from source](install-from-source). + +2. **Run tests:** Run MaxText tests to ensure there are no regressions. diff --git a/docs/install_maxtext.md b/docs/install_maxtext.md index d3368801b3..0fb1528860 100644 --- a/docs/install_maxtext.md +++ b/docs/install_maxtext.md @@ -16,195 +16,139 @@ # Install MaxText -This document discusses how to install MaxText. We recommend installing MaxText inside a Python virtual environment. -MaxText offers following installation modes: +This document discusses how to install MaxText. -1. maxtext[tpu]. Used for pre-training and decode on TPUs. -2. maxtext[cuda12]. Used for pre-training and decode on GPUs. -3. maxtext[tpu-post-train]. Used for post-training on TPUs. Currently, this option should also be used for running vllm_decode on TPUs. -4. maxtext[runner]. Used for building MaxText's Docker images and scheduling workloads through XPK. +We recommend installing MaxText inside a Python virtual environment and using +the `uv` package manager following +[uv's official installation instructions](https://docs.astral.sh/uv/getting-started/installation/). -## From PyPI (Recommended) - -This is the easiest way to get started with the latest stable version. - -```bash -# 1. Install uv, a fast Python package installer -pip install uv -# Alternatively, if pip install fails: -# curl -LsSf https://astral.sh/uv/install.sh | sh - -# 2. Create virtual environment -uv venv --python 3.12 --seed maxtext_venv -source maxtext_venv/bin/activate - -# 3. Install MaxText and its dependencies. Choose a single -# installation option from this list to fit your use case. -# IMPORTANT: If you want to switch to a different installation option -# (e.g., from [tpu] to [tpu-post-train]), we strongly recommend -# starting with a fresh virtual environment to avoid dependency conflicts. - -# Option 1: Installing maxtext[tpu] -uv pip install maxtext[tpu]==0.2.1 --resolution=lowest -install_tpu_pre_train_extra_deps - -# Option 2: Installing maxtext[cuda12] -uv pip install maxtext[cuda12]==0.2.1 --resolution=lowest -install_cuda12_pre_train_extra_deps - -# Option 3: Installing maxtext[tpu-post-train] -uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest -install_tpu_post_train_extra_deps - -# Option 4: Installing maxtext[runner] -uv pip install maxtext[runner]==0.2.1 --resolution=lowest +```{note} +MaxText is only tested on Linux during releases. ``` -> **Note:** The `maxtext[runner]` extra is used for building MaxText Docker images and scheduling workloads through XPK. Once installed, you will have access to the `build_maxtext_docker_image`, `upload_maxtext_docker_image`, and `xpk` commands. For more details on building and uploading Docker images, see the [Build MaxText Docker Image](https://maxtext.readthedocs.io/en/latest/build_maxtext.html) guide. - -> **Note:** The `install_tpu_pre_train_extra_deps`, `install_cuda12_pre_train_extra_deps`, and -> `install_tpu_post_train_extra_deps` commands are temporarily required to install dependencies directly from GitHub -> that are not yet available on PyPI. As shown above, choose the one that corresponds to your use case. +## From PyPI (recommended) -> **Note:** The maxtext package contains a comprehensive list of all direct and transitive dependencies, with lower bounds, generated by [seed-env](https://github.com/google-ml-infra/actions/tree/main/python_seed_env). We highly recommend the `--resolution=lowest` flag. It instructs `uv` to install the specific, tested versions of dependencies defined by MaxText, rather than the latest available ones. This ensures a consistent and reproducible environment, which is critical for stable performance and for running benchmarks. - -> **Note:** MaxText is only tested on Linux during releases. - -## From Source - -If you plan to contribute to MaxText or need the latest unreleased features, install from source. - -```bash -# 1. Clone the repository -git clone https://github.com/AI-Hypercomputer/maxtext.git -cd maxtext - -# 2. Create virtual environment -pip install uv -# Alternatively, if pip install fails: -# curl -LsSf https://astral.sh/uv/install.sh | sh -uv venv --python 3.12 --seed maxtext_venv -source maxtext_venv/bin/activate - -# 3. Install dependencies in editable mode. Choose a single -# installation option from this list to fit your use case. -# IMPORTANT: If you want to switch to a different installation option -# (e.g., from [tpu] to [tpu-post-train]), we strongly recommend -# starting with a fresh virtual environment to avoid dependency conflicts. - -# Option 1: Installing .[tpu] -uv pip install -e .[tpu] --resolution=lowest -install_tpu_pre_train_extra_deps - -# Option 2: Installing .[cuda12] -uv pip install -e .[cuda12] --resolution=lowest -install_cuda12_pre_train_extra_deps - -# Option 3: Installing .[tpu-post-train] -uv pip install -e .[tpu-post-train] --resolution=lowest -install_tpu_post_train_extra_deps +This is the easiest way to get started with the latest stable version. -# Option 4: Installing maxtext[runner] -uv pip install -e .[runner] --resolution=lowest +1. **Create a virtual environment:** + + ```bash + uv venv --python 3.12 --seed + source /bin/activate + ``` + +2. **Install MaxText and its dependencies.** + + Choose a single installation option from this list to fit your use case. + + ```{important} + If you want to switch to a different installation option (e.g., from `[tpu]` + to `[tpu-post-train]`), we strongly recommend starting with a fresh virtual + environment to avoid dependency conflicts. + ``` + + - **Option 1:** Install `maxtext[tpu]`, used for pre-training and decoding on + TPUs. + + ```bash + uv pip install maxtext[tpu]==0.2.1 --resolution=lowest + ``` + + - **Option 2:** Install `maxtext[cuda12]`, used for pre-training and decoding + on GPUs. + + ```bash + uv pip install maxtext[cuda12]==0.2.1 --resolution=lowest + ``` + + - **Option 3:** Install `maxtext[tpu-post-train]`, used for post-training on + TPUs. Currently, this option should also be used for running `vllm_decode` + on TPUs. + + ```bash + uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest + ``` + + - **Option 4:** Install `maxtext[runner]`, used for building MaxText's Docker + images and scheduling workloads through XPK. Once installed, you will have + access to the `build_maxtext_docker_image`, `upload_maxtext_docker_image`, + and `xpk` commands. For more details on building and uploading Docker + images, see the + [Build MaxText Docker Image](https://maxtext.readthedocs.io/en/latest/build_maxtext.html) + guide. + + ```bash + uv pip install maxtext[runner]==0.2.1 --resolution=lowest + ``` + +```{note} +The maxtext package contains a comprehensive list of all direct and transitive +dependencies, with lower bounds, generated by +[seed-env](https://github.com/google-ml-infra/actions/tree/main/python_seed_env). +We highly recommend the `--resolution=lowest` flag. It instructs `uv` to install +the specific, tested versions of dependencies defined by MaxText, rather than +the latest available ones. This ensures a consistent and reproducible +environment, which is critical for stable performance and for running +benchmarks. ``` -After installation, you can verify the package is available with `python3 -c "import maxtext"` and run training jobs with `python3 -m maxtext.trainers.pre_train.train ...`. +(install-from-source)= -# Update MaxText dependencies +## From source -## Introduction +If you plan to contribute to MaxText or need the latest unreleased features, +install from source. -This document provides a guide to updating dependencies in MaxText using the `seed-env` tool. `seed-env` helps generate deterministic and reproducible Python environments by creating fully-pinned `requirements.txt` files from a base set of requirements. - -Please keep dependencies updated throughout development. This will allow each commit to work properly from both a feature and dependency perspective. We will periodically upload commits to PyPI for stable releases. But it is also critical to keep dependencies in sync for users installing MaxText from source. - -## Overview of the Process - -To update dependencies, you will follow these general steps: - -1. **Modify Base Requirements**: Update the desired dependencies in `base_requirements/requirements.txt` or the hardware-specific files (`base_requirements/tpu-base-requirements.txt`, `base_requirements/gpu-base-requirements.txt`). -2. **Generate New Files**: Run the `seed-env` CLI tool to generate new, fully-pinned requirements files based on your changes. -3. **Update Project Files**: Copy the newly generated files into the `generated_requirements/` directory. -4. **Handle GitHub Dependencies**: Move any dependencies that are installed directly from GitHub from the generated files to `src/dependencies/github_deps/pre_train_deps.txt`. -5. **Verify**: Test the new dependencies to ensure the project installs and runs correctly. - -The following sections provide detailed instructions for each step. - -## Step 1: Install seed-env - -First, you need to install the `seed-env` command-line tool. We recommend installing `uv` first and then using it to install `seed-env`: - -```bash -# Install uv -pip install uv -# Alternatively, if pip install fails: -# curl -LsSf https://astral.sh/uv/install.sh | sh - -uv venv --python 3.12 --seed seed_venv -source seed_venv/bin/activate - -# Install seed-env using uv -uv pip install seed-env +```{important} +If you want to switch to a different installation option (e.g., from `[tpu]` to +`[tpu-post-train]`), we strongly recommend starting with a fresh virtual +environment to avoid dependency conflicts. ``` -Alternatively, follow the instructions in the -[seed-env repository](https://github.com/google-ml-infra/actions/tree/main/python_seed_env#install-the-seed-env-tool) if you want to build `seed-env` from source. +1. Clone the repository: -## Step 2: Find the JAX Build Commit Hash + ```bash + git clone https://github.com/AI-Hypercomputer/maxtext.git + cd maxtext + ``` -The dependency generation process is pinned to a specific nightly build of JAX. You need to find the commit hash for the desired JAX build. +2. Create virtual environment: -You can find the latest commit hashes in the [JAX `build/` folder](https://github.com/jax-ml/jax/commits/main/build). Choose a recent, successful build and copy its full commit hash. + ```bash + uv venv --python 3.12 --seed + source /bin/activate + ``` -## Step 3: Generate the Requirements Files - -Next, run the `seed-env` CLI to generate the new requirements files. You will need to do this separately for the TPU and GPU environments. The generated files will be placed in a directory specified by `--output-dir`. - -### For TPU - -Run the following command, replacing `` with the hash you copied in the previous step. - -```bash -seed-env \ - --local-requirements=src/dependencies/requirements/base_requirements/tpu-base-requirements.txt \ - --host-name=MaxText \ - --seed-commit= \ - --python-version=3.12 \ - --requirements-txt=tpu-requirements.txt \ - --output-dir=generated_tpu_artifacts -``` - -### For GPU - -Similarly, run the command for the GPU requirements. - -```bash -seed-env \ - --local-requirements=src/dependencies/requirements/base_requirements/cuda12-base-requirements.txt \ - --host-name=MaxText \ - --seed-commit= \ - --python-version=3.12 \ - --requirements-txt=cuda12-requirements.txt \ - --hardware=cuda12 \ - --output-dir=generated_gpu_artifacts -``` +3. Install dependencies in editable mode. Choose a single installation option + from this list to fit your use case. -## Step 4: Update Project Files + - **Option 1:** Install `.[tpu]`: -After generating the new requirements, you need to update the files in the MaxText repository. + ```bash + uv pip install -e .[tpu] --resolution=lowest + install_tpu_pre_train_extra_deps + ``` -1. **Copy the generated files:** + - **Option 2:** Install `.[cuda12]` - - Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`. - - Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`. + ```bash + uv pip install -e .[cuda12] --resolution=lowest + install_cuda12_pre_train_extra_deps + ``` -2. **Update `pre_train_deps.txt` (if necessary):** - Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files. + - **Option 3:** Install `.[tpu-post-train]` -## Step 5: Verify the New Dependencies + ```bash + uv pip install -e .[tpu-post-train] --resolution=lowest + install_tpu_post_train_extra_deps + ``` -Finally, test that the new dependencies install correctly and that MaxText runs as expected. + - **Option 4:** Install `.[runner]` -1. **Install MaxText and dependencies**: For instructions on installing MaxText on your VM, please refer to the [official documentation](https://maxtext.readthedocs.io/en/maxtext-v0.2.0/install_maxtext.html#from-source). + ```bash + uv pip install -e .[runner] --resolution=lowest + ``` -2. **Verify the installation**: Run MaxText tests to ensure everything is working as expected with the newly installed dependencies and there are no regressions. +After installation, you can verify the package is available with +`python3 -c "import maxtext"` and run training jobs with +`python3 -m maxtext.trainers.pre_train.train ...`.