AI-Hypercomputer
diff --git a/‎.github/workflows/run_jupyter_notebooks.yml‎
Lines changed: 0 additions & 2 deletions b/‎.github/workflows/run_jupyter_notebooks.yml‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎PREFLIGHT.md‎
Lines changed: 4 additions & 4 deletions b/‎PREFLIGHT.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/guides/checkpointing_solutions/convert_checkpoint.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/guides/checkpointing_solutions/convert_checkpoint.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/guides/run_python_notebook.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/guides/run_python_notebook.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/install_maxtext.md‎
Lines changed: 22 additions & 18 deletions b/‎docs/install_maxtext.md‎
Lines changed: 22 additions & 18 deletions
diff --git a/‎docs/run_maxtext/run_maxtext_localhost.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/run_maxtext/run_maxtext_localhost.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/run_maxtext/run_maxtext_single_host_gpu.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/run_maxtext/run_maxtext_single_host_gpu.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/run_maxtext/run_maxtext_via_multihost_job.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/run_maxtext/run_maxtext_via_multihost_job.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/run_maxtext/run_maxtext_via_multihost_runner.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/run_maxtext/run_maxtext_via_multihost_runner.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/run_maxtext/run_maxtext_via_pathways.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/run_maxtext/run_maxtext_via_pathways.md‎
Lines changed: 2 additions & 2 deletions
@@ -64,8 +64,6 @@ jobs:
 
           # 2. Install MaxText package and all the post training dependencies
           uv pip install ${maxtext_wheel}[tpu-post-train] --resolution=lowest
-          #TODO: @mazumdera:  replace this with the following after release
-          # uv pip install maxtext[tpu-post-train] --resolution=lowest
           install_maxtext_tpu_post_train_extra_deps
           .venv/bin/python3 -m ipykernel install --user --name maxtext_venv
           
 
@@ -7,12 +7,12 @@ Before you run ML workload on Multihost with GCE or GKE, simply apply `bash pref
 
 Here is an example for GCE:
 ```
-bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
+bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
 ```
 
 Here is an example for GKE:
 ```
-bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
+bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
 ```
 
 # Optimization 2: Numa binding (You can only apply this to v4 and v5p)
@@ -22,14 +22,14 @@ For GCE,
 [preflight.sh](https://github.com/google/maxtext/blob/main/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example:
 
 ```
-bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
+bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
 ```
 
 For GKE,
 `numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
 
 ```
-bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
+bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
 ```
 
 1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance.
 
@@ -70,7 +70,7 @@ Finally, run below command to complete the conversion
 # Optional: If run out of disk space when downloading HuggingFace safetensors,
 # customize your "HF_HOME" to redirect the cache to a larger or mounted disk (e.g., on a TPU VM).
 # export HF_HOME="/dev/shm/huggingface_tmp"
-python3 -m maxtext.checkpoint_conversion.to_maxtext maxtext/configs/base.yml \
+python3 -m maxtext.checkpoint_conversion.to_maxtext \
     model_name=${MODEL_NAME?} \
     hf_access_token=${HF_TOKEN?} \
     base_output_directory=${MODEL_CHECKPOINT_DIRECTORY?} \
@@ -108,7 +108,7 @@ Use the `to_huggingface.py` script to convert a MaxText checkpoint into the Hugg
 The following command converts a MaxText checkpoint and saves it locally, to GCS, or uploads it directly to the Hugging Face Hub.
 
 ```bash
-python3 -m maxtext.checkpoint_conversion.to_huggingface src/maxtext/configs/base.yml \
+python3 -m maxtext.checkpoint_conversion.to_huggingface \
     model_name=<MODEL_NAME> \
     load_parameters_path=<path-to-maxtext-checkpoint> \
     base_output_directory=<path-to-save-converted-checkpoint> \
@@ -221,7 +221,7 @@ To extend conversion support to a new model architecture, you must define its sp
 - In [`utils/param_mapping.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/checkpoint_conversion/utils/param_mapping.py), add the `hook_fn` logic (`def {MODEL}_MAXTEXT_TO_HF_PARAM_HOOK_FN`). This is the transformation needed per layer.
 
 2. **Add Hugging Face weights Shape**: In [`utils/hf_shape.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/checkpoint_conversion/utils/hf_shape.py), define the tensor shape of Hugging Face format (`def {MODEL}_HF_WEIGHTS_TO_SHAPE`). This is used to ensure the tensor shape is matched after to_huggingface conversion.
-3. **Register model key**: In [`utils/utils.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/checkpoint_conversion/utils/utils.py), add the new model key in `HF_IDS`.
+3. **Register model key**: In [`utils/utils.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/utils/globals.py), add the new model key in `HF_IDS`.
 4. **Add transformer config**: In [`utils/hf_model_configs.py`](https://github.com/AI-Hypercomputer/maxtext/blob/main/src/MaxText/checkpoint_conversion/utils/hf_model_configs.py), add the `transformers.Config` object, describing the Hugging Face model configuration (defined in [`src/maxtext/configs/models`](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/maxtext/configs/models)). **Note**: This configuration must precisely match the MaxText model's architecture.
 
 Here is an example [PR to add support for gemma3 multi-modal model](https://github.com/AI-Hypercomputer/maxtext/pull/1983)
 
@@ -103,7 +103,7 @@ To install, click the `Extensions` icon on the left sidebar (or press `Ctrl+Shif
 
 ### Step 4: Install MaxText and Dependencies
 
-To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html#create-virtual-environment-and-install-maxtext-dependencies) to install MaxText and its dependencies inside a dedicated virtual environment.
+To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
 
 ### Step 5: Install the necessary library for Jupyter
 
@@ -162,7 +162,7 @@ pip3 install jupyterlab
 
 ### Step 4: Install MaxText and Dependencies
 
-To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html#create-virtual-environment-and-install-maxtext-dependencies) to install MaxText and its dependencies inside a dedicated virtual environment.
+To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/install_maxtext.html#from-source) and specifically follow `Option 3: Installing [tpu-post-train]`. This will ensure all post-training dependencies are installed inside your virtual environment.
 
 ### Step 5: Register virtual environment as a Jupyter Kernel
 
 
@@ -24,6 +24,7 @@ MaxText offers three installation modes:
 3. maxtext[tpu-post-train]. Used for post-training on TPUs. Currently, this option should also be used for running vllm_decode on TPUs.
 
 ## From PyPI (Recommended)
+
 This is the easiest way to get started with the latest stable version.
 
 ```bash
@@ -38,24 +39,26 @@ source maxtext_venv/bin/activate
 #      installation option from this list to fit your use case.
 
 # Option 1: Installing maxtext[tpu]
-uv pip install maxtext[tpu] --resolution=lowest
+uv pip install "maxtext[tpu]>=0.2.0" --resolution=lowest
 install_maxtext_tpu_github_deps
 
 # Option 2: Installing maxtext[cuda12]
-uv pip install maxtext[cuda12] --resolution=lowest
+uv pip install "maxtext[cuda12]>=0.2.0" --resolution=lowest
 install_maxtext_cuda12_github_dep
 
 # Option 3: Installing maxtext[tpu-post-train]
-uv pip install maxtext[tpu-post-train] --resolution=lowest
+uv pip install "maxtext[tpu-post-train]>=0.2.0" --resolution=lowest
 install_maxtext_tpu_post_train_extra_deps
 ```
+
 > **Note:** The `install_maxtext_tpu_github_deps`, `install_maxtext_cuda12_github_dep`, and
-`install_maxtext_tpu_post_train_extra_deps` commands are temporarily required to install dependencies directly from GitHub
-that are not yet available on PyPI. As shown above, choose the one that corresponds to your use case.
+> `install_maxtext_tpu_post_train_extra_deps` commands are temporarily required to install dependencies directly from GitHub
+> that are not yet available on PyPI. As shown above, choose the one that corresponds to your use case.
 
 > **Note:** The maxtext package contains a comprehensive list of all direct and transitive dependencies, with lower bounds, generated by [seed-env](https://github.com/google-ml-infra/actions/tree/main/python_seed_env). We highly recommend the `--resolution=lowest` flag. It instructs `uv` to install the specific, tested versions of dependencies defined by MaxText, rather than the latest available ones. This ensures a consistent and reproducible environment, which is critical for stable performance and for running benchmarks.
 
 ## From Source
+
 If you plan to contribute to MaxText or need the latest unreleased features, install from source.
 
 ```bash
@@ -98,11 +101,11 @@ Please keep dependencies updated throughout development. This will allow each co
 
 To update dependencies, you will follow these general steps:
 
-1.  **Modify Base Requirements**: Update the desired dependencies in `base_requirements/requirements.txt` or the hardware-specific files (`base_requirements/tpu-base-requirements.txt`, `base_requirements/gpu-base-requirements.txt`).
-2.  **Generate New Files**: Run the `seed-env` CLI tool to generate new, fully-pinned requirements files based on your changes.
-3.  **Update Project Files**: Copy the newly generated files into the `generated_requirements/` directory.
-4.  **Handle GitHub Dependencies**: Move any dependencies that are installed directly from GitHub from the generated files to `src/install_maxtext_extra_deps/extra_deps_from_github.txt`.
-5.  **Verify**: Test the new dependencies to ensure the project installs and runs correctly.
+1. **Modify Base Requirements**: Update the desired dependencies in `base_requirements/requirements.txt` or the hardware-specific files (`base_requirements/tpu-base-requirements.txt`, `base_requirements/gpu-base-requirements.txt`).
+2. **Generate New Files**: Run the `seed-env` CLI tool to generate new, fully-pinned requirements files based on your changes.
+3. **Update Project Files**: Copy the newly generated files into the `generated_requirements/` directory.
+4. **Handle GitHub Dependencies**: Move any dependencies that are installed directly from GitHub from the generated files to `src/install_maxtext_extra_deps/extra_deps_from_github.txt`.
+5. **Verify**: Test the new dependencies to ensure the project installs and runs correctly.
 
 The following sections provide detailed instructions for each step.
 
@@ -154,25 +157,26 @@ seed-env \
 
 After generating the new requirements, you need to update the files in the MaxText repository.
 
-1.  **Copy the generated files:**
-    -   Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`.
-    -   Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`.
+1. **Copy the generated files:**
+
+   - Move `generated_tpu_artifacts/tpu-requirements.txt` to `generated_requirements/tpu-requirements.txt`.
+   - Move `generated_gpu_artifacts/cuda12-requirements.txt` to `generated_requirements/cuda12-requirements.txt`.
 
-2.  **Update `extra_deps_from_github.txt` (if necessary):**
-    Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files.
+2. **Update `extra_deps_from_github.txt` (if necessary):**
+   Currently, MaxText uses a few dependencies, such as `mlperf-logging` and `google-jetstream`, that are installed directly from GitHub source. These are defined in `base_requirements/requirements.txt`, and the `seed-env` tool will carry them over to the generated requirements files.
 
 ## Step 5: Verify the New Dependencies
 
 Finally, test that the new dependencies install correctly and that MaxText runs as expected.
 
-1.  **Create a clean environment:** It's best to start with a fresh Python virtual environment.
+1. **Create a clean environment:** It's best to start with a fresh Python virtual environment.
 
 ```bash
 uv venv --python 3.12 --seed maxtext_venv
 source maxtext_venv/bin/activate
 ```
 
-2.  **Run the setup script:** Execute `bash setup.sh` to install the new dependencies.
+2. **Run the setup script:** Execute `bash setup.sh` to install the new dependencies.
 
 ```bash
 pip install uv
@@ -183,4 +187,4 @@ uv pip install -e .[tpu] --resolution=lowest
 install_maxtext_github_deps
 ```
 
-3.  **Run tests:** Run MaxText tests to ensure there are no regressions.
+3. **Run tests:** Run MaxText tests to ensure there are no regressions.
@@ -58,7 +58,7 @@ bash tools/setup/setup.sh DEVICE={tpu|gpu}
 After the installation is complete, run a short training job using synthetic data to confirm everything is working correctly. This command trains a model for just 10 steps. Remember to replace `$YOUR_JOB_NAME` with a unique name for your run and `gs://<my-bucket>` with the path to the GCS bucket you configured in the prerequisites.
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
   run_name=${YOUR_JOB_NAME?} \
   base_output_directory=gs://<my-bucket> \
   dataset_type=synthetic \
@@ -72,7 +72,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
 To demonstrate model output, run the following command:
 
 ```bash
-python3 -m maxtext.inference.decode src/maxtext/configs/base.yml \
+python3 -m maxtext.inference.decode \
   run_name=${YOUR_JOB_NAME?} \
   base_output_directory=gs://<my-bucket> \
   per_device_batch_size=1
@@ -92,7 +92,7 @@ To use a pre-configured model for TPUs, you override the `model_name` parameter,
 <summary><strong>llama3-8b (TPU)</strong></summary>
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
   model_name=llama3-8b \
   run_name=${YOUR_JOB_NAME?} \
   base_output_directory=gs://<my-bucket> \
@@ -106,7 +106,7 @@ python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
 <summary><strong>qwen3-4b (TPU)</strong></summary>
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
   model_name=qwen3-4b \
   run_name=${YOUR_JOB_NAME?} \
   base_output_directory=gs://<my-bucket> \
 
@@ -148,7 +148,7 @@ Hardware: GPU
 ```
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=gpu01 base_output_directory=/deps/output  \
+python3 -m maxtext.trainers.pre_train.train run_name=gpu01 base_output_directory=/deps/output  \
   dataset_type=synthetic enable_checkpointing=True steps=10 attention=cudnn_flash_te scan_layers=False \
   use_iota_embed=True hardware=gpu per_device_batch_size=12
 ```
 
@@ -68,7 +68,7 @@ The `multihost_job.py` script:
 
    ```sh
    RUN_NAME=${YOUR_JOB_NAME?} # You may set this to any unique name for a fresh run.
-   python3 multihost_job.py --NUM_SLICES=${NODE_COUNT?} --RUN_NAME=${RUN_NAME?} --BUCKET_NAME=${BUCKET_NAME?} --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${RUN_NAME?}"
+   python3 multihost_job.py --NUM_SLICES=${NODE_COUNT?} --RUN_NAME=${RUN_NAME?} --BUCKET_NAME=${BUCKET_NAME?} --CQR_EXTRA_ARGS="--reserved" --COMMAND="bash tools/setup/setup.sh && python3 -m maxtext.trainers.pre_train.train run_name=${RUN_NAME?}"
    ```
 
    We tell `multihost_job` to target the `reserved` pool by by including `--reserved` as extra arguments to the CQR request, but you may instead target the `on-demand` pool by removing the `--CQR_EXTRA_ARGS` flag (on-demand is default), or the pre-emptible pool with `--CQR_EXTRA_ARGS="--best-effort"`, which may be necessary if your reservation is full.
 
@@ -106,7 +106,7 @@ Although there are several steps below, most are for the initial setup. Once set
    Set config values for `base_output_directory` and `dataset_path` in `configs/base.yml` if not set already.
 
    ```
-   python3 multihost_runner.py --TPU_PREFIX=${TPU_PREFIX?} --COMMAND="python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${RUN_NAME?}"
+   python3 multihost_runner.py --TPU_PREFIX=${TPU_PREFIX?} --COMMAND="python3 -m maxtext.trainers.pre_train.train run_name=${RUN_NAME?}"
    ```
 
    If you are running the `multihost_runner.py` script from a TPUVM, you will need to set `--INTERNAL_IP=true`.
 
@@ -96,7 +96,7 @@ xpk workload create-pathways \
   --project=${PROJECT?} \
   --zone=${ZONE?} \
   --docker-image=${DOCKER_IMAGE?} \
-  --command="python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+  --command="python3 -m maxtext.trainers.pre_train.train \
     base_output_directory=gs://${BUCKET_NAME?} \
     per_device_batch_size=1 \
     enable_checkpointing=false \
@@ -154,7 +154,7 @@ export JAX_PLATFORMS=proxy
 export JAX_BACKEND_TARGET=grpc://127.0.0.1:29000
 
 # Run the training script
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
   base_output_directory=gs://${BUCKET_NAME?} \
   per_device_batch_size=1 \
   enable_checkpointing=false \