AI-Hypercomputer · copybara-service · Mar 20, 2026 · Mar 20, 2026
@@ -112,7 +112,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr
 bash tools/setup/setup_gcsfuse.sh \
 DATASET_GCS_BUCKET=maxtext-dataset \
 MOUNT_PATH=/tmp/gcsfuse && \
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
 run_name=<RUN_NAME> base_output_directory=gs://<MY_BUCKET>  \
 dataset_type=grain \
 grain_file_type=arrayrecord # or parquet \ 

@@ -56,7 +56,7 @@ After installing the dependencies listed above, you are ready to compile ahead o
 
 ```sh
 # Run the below on a single machine, e.g. a CPU
-python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 compile_topology_num_slices=2 \
+python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 compile_topology_num_slices=2 \
   global_parameter_scale=16 per_device_batch_size=4
 ```
 
@@ -71,7 +71,7 @@ Here is an example that saves then loads the compiled `train_step`, starting wit
 ```sh
 # Run the below on a single machine, e.g. a CPU
 export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true"
-python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 \
+python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 \
   compile_topology_num_slices=2 \
   compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \
   per_device_batch_size=4 steps=10000 learning_rate=1e-3
@@ -84,7 +84,7 @@ To load the compiled train_step, you just need to pass `compiled_trainstep_file=
 ```sh
 # Run the below on each host of the target hardware, e.g. each host on 2 slices of v5e-256
 export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true"
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \
+python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
   compiled_trainstep_file=my_compiled_train.pickle \
   global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
   base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket
@@ -109,7 +109,7 @@ This example illustrates the flags to use for a multihost GPU compilation target
 ```sh
 # Run the below on a single A3 machine
 export XLA_FLAGS="--xla_gpu_enable_async_collectives=true"
-python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=a3 \
+python3 -m maxtext.trainers.pre_train.train_compile compile_topology=a3 \
   compile_topology_num_slices=4 \
   compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \
   attention=dot_product per_device_batch_size=4 steps=10000 learning_rate=1e-3
@@ -122,7 +122,7 @@ To load the compiled `train_step`, you just need to pass `compiled_trainstep_fil
 ```sh
 # Run the below on each of the 4 target A3 hosts.
 export XLA_FLAGS="--xla_gpu_enable_async_collectives=true"
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \
+python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \
   compiled_trainstep_file=my_compiled_train.pickle \
   attention=dot_product global_parameter_scale=16  per_device_batch_size=4 steps=10000 learning_rate=1e-3 \
   base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket

@@ -35,7 +35,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
 1. Enable ML Diagnostics to just capture Maxtext metrics and configs
 
    ```
-      python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+      python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
          base_output_directory="gs://your-output-bucket/" \
          dataset_path="gs://your-dataset-bucket/" \
@@ -47,7 +47,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
 2. Enable ML Diagnostics to capture Maxtext metrics, configs and singlehost profiles (on the first TPU device)
 
    ```
-      python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+      python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
          base_output_directory="gs://your-output-bucket/" \
          dataset_path="gs://your-dataset-bucket/" \
@@ -60,7 +60,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu
 3. Enable ML Diagnostics to capture Maxtext metrics, configs and multihost profiles (on all TPU devices)
 
    ```
-      python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+      python3 -m maxtext.trainers.pre_train.train \
          run_name=${USER}-tpu-job \
          base_output_directory="gs://your-output-bucket/" \
          dataset_path="gs://your-dataset-bucket/" \

@@ -89,7 +89,7 @@ Please use a unique workload name, unless you intend to monitor cumulative Goodp
 MaxText enables Goodput recording and monitoring by default with `enable_goodput_recording=True` and `monitor_goodput=True`. You can configure the goodput upload frequency by setting `goodput_upload_interval_seconds`.
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
+python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \
   dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30
 ```
 
@@ -98,7 +98,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_ou
 MaxText enables step time deviation monitoring by default with `monitor_step_time_deviation=True`. You can configure the upload frequency by setting `step_deviation_interval_seconds`.
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \
+python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \
   dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 step_deviation_interval_seconds=30
 ```
 
@@ -111,7 +111,7 @@ Enabling `enable_pathways_goodput` turns on Goodput measurement for Pathways wor
 ```
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
+python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
   run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_pathways_goodput=True
 ```
 
@@ -168,7 +168,7 @@ and `enable_gcp_step_deviation_metrics` to `False` for disabling step deviation
 metrics.
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
+python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \
   run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_gcp_goodput_metrics=False \
   enable_gcp_step_deviation_metrics=False
 ```

@@ -23,7 +23,7 @@ When you run a training job, MaxText produces detailed output logs. This guide s
 To start, run a simple pretraining job on a single-host TPU. For instance, we can run the following command on TPU v5p-8. The resulting log is used as an example throughout this guide.
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
 base_output_directory=gs://runner-maxtext-logs run_name=demo \
 model_name=deepseek2-16b \
 per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic enable_checkpointing=false
@@ -123,7 +123,7 @@ To generate all optional artifacts in one run, you can set the corresponding fla
 This command enables tensorboard, profiler, text metrics, config saving, and checkpointing:
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \
+python3 -m maxtext.trainers.pre_train.train \
 base_output_directory=gs://runner-maxtext-logs run_name=demo2 \
 model_name=deepseek2-16b \
 per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic \

@@ -87,7 +87,7 @@ Common options for the `quantization` flag when using Qwix include:
 Here is an example of how to run a training job with int8 quantization enabled via Qwix:
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
+python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=true quantization='int8'
 ```
 
 #### The Qwix Interception API
@@ -142,7 +142,7 @@ When using AQT, you can pass one of the following values to the `quantization` f
 #### Example command for AQT
 
 ```bash
-python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
+python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs://<my-bucket> dataset_type=synthetic use_qwix_quantization=false quantization='int8'
 ```
 
 Note that `use_qwix_quantization` is not set to `True`.

@@ -63,7 +63,7 @@ We include a script for convenient offline inference of MaxText models in `src/m
 An example of how to run this script can be found below:
 
 ```bash
-  python3 -m maxtext.inference.vllm_decode src/maxtext/configs/base.yml \
+  python3 -m maxtext.inference.vllm_decode \
       model_name=qwen3-30b-a3b \
       tokenizer_path=Qwen/Qwen3-30B-A3B \
       load_parameters_path=$CHECKPOINT_PATH \
@@ -133,7 +133,7 @@ curl http://localhost:8000/v1/completions \
 To use a MaxText model architecture for samplers in reinforcement learning algorithms like GRPO, we can override the vLLM model architecture and pass in MaxText specific config arguments similar to the [online inference](online-inference) use-case. An example of an RL command using the MaxText model for samplers can be found below:
 
 ```bash
-python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \
+python3 -m src.maxtext.trainers.post_train.rl.train_rl \
   model_name=qwen3-0.6b \
   tokenizer_path=Qwen/Qwen3-0.6B \
   run_name=$WORKLOAD \

@@ -73,7 +73,6 @@ To run a forward pass and verify the model's output, use the following command:
 ```shell
 # Gemma3 decode
 python -m maxtext.inference.decode \
-    maxtext/configs/base.yml \
     model_name=gemma3-4b \
     hf_access_token=${HF_ACCESS_TOKEN?} \
     tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.gemma3 \
@@ -109,7 +108,6 @@ export TARGET_LENGTH=...  # Adjust to fit expected output length
 export PREDICT_LENGTH=...  # Adjust to fit image tokens + text prompt
 
 python -m maxtext.inference.decode \
-    maxtext/configs/base.yml \
     model_name=gemma3-4b \
     ... \
     max_prefill_predict_length=${PREDICT_LENGTH?}  # Adjust to fit image tokens + text prompt \