diff --git a/docs/guides/data_input_pipeline/data_input_grain.md b/docs/guides/data_input_pipeline/data_input_grain.md index 1191d2ff7b..6b061cc1a1 100644 --- a/docs/guides/data_input_pipeline/data_input_grain.md +++ b/docs/guides/data_input_pipeline/data_input_grain.md @@ -112,7 +112,7 @@ Note that `FILE_PATH` is optional; when provided, the script runs `ls -R` for pr bash tools/setup/setup_gcsfuse.sh \ DATASET_GCS_BUCKET=maxtext-dataset \ MOUNT_PATH=/tmp/gcsfuse && \ -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ +python3 -m maxtext.trainers.pre_train.train \ run_name= base_output_directory=gs:// \ dataset_type=grain \ grain_file_type=arrayrecord # or parquet \ diff --git a/docs/guides/monitoring_and_debugging/features_and_diagnostics.md b/docs/guides/monitoring_and_debugging/features_and_diagnostics.md index 586eb06efa..a6952fae04 100644 --- a/docs/guides/monitoring_and_debugging/features_and_diagnostics.md +++ b/docs/guides/monitoring_and_debugging/features_and_diagnostics.md @@ -56,7 +56,7 @@ After installing the dependencies listed above, you are ready to compile ahead o ```sh # Run the below on a single machine, e.g. a CPU -python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 compile_topology_num_slices=2 \ +python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 compile_topology_num_slices=2 \ global_parameter_scale=16 per_device_batch_size=4 ``` @@ -71,7 +71,7 @@ Here is an example that saves then loads the compiled `train_step`, starting wit ```sh # Run the below on a single machine, e.g. a CPU export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true" -python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=v5e-256 \ +python3 -m maxtext.trainers.pre_train.train_compile compile_topology=v5e-256 \ compile_topology_num_slices=2 \ compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \ per_device_batch_size=4 steps=10000 learning_rate=1e-3 @@ -84,7 +84,7 @@ To load the compiled train_step, you just need to pass `compiled_trainstep_file= ```sh # Run the below on each host of the target hardware, e.g. each host on 2 slices of v5e-256 export LIBTPU_INIT_ARGS="--xla_enable_async_all_gather=true" -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \ +python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \ compiled_trainstep_file=my_compiled_train.pickle \ global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \ base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket @@ -109,7 +109,7 @@ This example illustrates the flags to use for a multihost GPU compilation target ```sh # Run the below on a single A3 machine export XLA_FLAGS="--xla_gpu_enable_async_collectives=true" -python3 -m maxtext.trainers.pre_train.train_compile src/maxtext/configs/base.yml compile_topology=a3 \ +python3 -m maxtext.trainers.pre_train.train_compile compile_topology=a3 \ compile_topology_num_slices=4 \ compiled_trainstep_file=my_compiled_train.pickle global_parameter_scale=16 \ attention=dot_product per_device_batch_size=4 steps=10000 learning_rate=1e-3 @@ -122,7 +122,7 @@ To load the compiled `train_step`, you just need to pass `compiled_trainstep_fil ```sh # Run the below on each of the 4 target A3 hosts. export XLA_FLAGS="--xla_gpu_enable_async_collectives=true" -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=example_load_compile \ +python3 -m maxtext.trainers.pre_train.train run_name=example_load_compile \ compiled_trainstep_file=my_compiled_train.pickle \ attention=dot_product global_parameter_scale=16 per_device_batch_size=4 steps=10000 learning_rate=1e-3 \ base_output_directory=gs://my-output-bucket dataset_path=gs://my-dataset-bucket diff --git a/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md b/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md index 327b69f240..81206bff6b 100644 --- a/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md +++ b/docs/guides/monitoring_and_debugging/ml_workload_diagnostics.md @@ -35,7 +35,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu 1. Enable ML Diagnostics to just capture Maxtext metrics and configs ``` - python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ + python3 -m maxtext.trainers.pre_train.train \ run_name=${USER}-tpu-job \ base_output_directory="gs://your-output-bucket/" \ dataset_path="gs://your-dataset-bucket/" \ @@ -47,7 +47,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu 2. Enable ML Diagnostics to capture Maxtext metrics, configs and singlehost profiles (on the first TPU device) ``` - python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ + python3 -m maxtext.trainers.pre_train.train \ run_name=${USER}-tpu-job \ base_output_directory="gs://your-output-bucket/" \ dataset_path="gs://your-dataset-bucket/" \ @@ -60,7 +60,7 @@ MaxText has integrated the ML Diagnostics [SDK](https://github.com/AI-Hypercompu 3. Enable ML Diagnostics to capture Maxtext metrics, configs and multihost profiles (on all TPU devices) ``` - python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ + python3 -m maxtext.trainers.pre_train.train \ run_name=${USER}-tpu-job \ base_output_directory="gs://your-output-bucket/" \ dataset_path="gs://your-dataset-bucket/" \ diff --git a/docs/guides/monitoring_and_debugging/monitor_goodput.md b/docs/guides/monitoring_and_debugging/monitor_goodput.md index 62bbb7e04a..ca949d6079 100644 --- a/docs/guides/monitoring_and_debugging/monitor_goodput.md +++ b/docs/guides/monitoring_and_debugging/monitor_goodput.md @@ -89,7 +89,7 @@ Please use a unique workload name, unless you intend to monitor cumulative Goodp MaxText enables Goodput recording and monitoring by default with `enable_goodput_recording=True` and `monitor_goodput=True`. You can configure the goodput upload frequency by setting `goodput_upload_interval_seconds`. ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \ +python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \ dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 ``` @@ -98,7 +98,7 @@ python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_ou MaxText enables step time deviation monitoring by default with `monitor_step_time_deviation=True`. You can configure the upload frequency by setting `step_deviation_interval_seconds`. ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} \ +python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} \ dataset_path=${DATA_PATH?} run_name=goodput-test-run steps=200 step_deviation_interval_seconds=30 ``` @@ -111,7 +111,7 @@ Enabling `enable_pathways_goodput` turns on Goodput measurement for Pathways wor ``` ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \ +python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \ run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_pathways_goodput=True ``` @@ -168,7 +168,7 @@ and `enable_gcp_step_deviation_metrics` to `False` for disabling step deviation metrics. ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \ +python3 -m maxtext.trainers.pre_train.train base_output_directory=${OUTPUT_PATH?} dataset_path=${DATA_PATH?} \ run_name=goodput-test-run steps=200 goodput_upload_interval_seconds=30 enable_gcp_goodput_metrics=False \ enable_gcp_step_deviation_metrics=False ``` diff --git a/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md b/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md index 679d0afc42..e381e7e8ac 100644 --- a/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md +++ b/docs/guides/monitoring_and_debugging/understand_logs_and_metrics.md @@ -23,7 +23,7 @@ When you run a training job, MaxText produces detailed output logs. This guide s To start, run a simple pretraining job on a single-host TPU. For instance, we can run the following command on TPU v5p-8. The resulting log is used as an example throughout this guide. ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ +python3 -m maxtext.trainers.pre_train.train \ base_output_directory=gs://runner-maxtext-logs run_name=demo \ model_name=deepseek2-16b \ per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic enable_checkpointing=false @@ -123,7 +123,7 @@ To generate all optional artifacts in one run, you can set the corresponding fla This command enables tensorboard, profiler, text metrics, config saving, and checkpointing: ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml \ +python3 -m maxtext.trainers.pre_train.train \ base_output_directory=gs://runner-maxtext-logs run_name=demo2 \ model_name=deepseek2-16b \ per_device_batch_size=24 max_target_length=2048 steps=10 dataset_type=synthetic \ diff --git a/docs/reference/core_concepts/quantization.md b/docs/reference/core_concepts/quantization.md index 5312d696a8..6f72da9ea9 100644 --- a/docs/reference/core_concepts/quantization.md +++ b/docs/reference/core_concepts/quantization.md @@ -87,7 +87,7 @@ Common options for the `quantization` flag when using Qwix include: Here is an example of how to run a training job with int8 quantization enabled via Qwix: ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs:// dataset_type=synthetic use_qwix_quantization=true quantization='int8' +python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs:// dataset_type=synthetic use_qwix_quantization=true quantization='int8' ``` #### The Qwix Interception API @@ -142,7 +142,7 @@ When using AQT, you can pass one of the following values to the `quantization` f #### Example command for AQT ```bash -python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?} base_output_directory=gs:// dataset_type=synthetic use_qwix_quantization=false quantization='int8' +python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} base_output_directory=gs:// dataset_type=synthetic use_qwix_quantization=false quantization='int8' ``` Note that `use_qwix_quantization` is not set to `True`. diff --git a/docs/tutorials/inference.md b/docs/tutorials/inference.md index 5a5a28fde8..6b7bf432d1 100644 --- a/docs/tutorials/inference.md +++ b/docs/tutorials/inference.md @@ -63,7 +63,7 @@ We include a script for convenient offline inference of MaxText models in `src/m An example of how to run this script can be found below: ```bash - python3 -m maxtext.inference.vllm_decode src/maxtext/configs/base.yml \ + python3 -m maxtext.inference.vllm_decode \ model_name=qwen3-30b-a3b \ tokenizer_path=Qwen/Qwen3-30B-A3B \ load_parameters_path=$CHECKPOINT_PATH \ @@ -133,7 +133,7 @@ curl http://localhost:8000/v1/completions \ To use a MaxText model architecture for samplers in reinforcement learning algorithms like GRPO, we can override the vLLM model architecture and pass in MaxText specific config arguments similar to the [online inference](online-inference) use-case. An example of an RL command using the MaxText model for samplers can be found below: ```bash -python3 -m src.maxtext.trainers.post_train.rl.train_rl src/maxtext/configs/post_train/rl.yml \ +python3 -m src.maxtext.trainers.post_train.rl.train_rl \ model_name=qwen3-0.6b \ tokenizer_path=Qwen/Qwen3-0.6B \ run_name=$WORKLOAD \ diff --git a/docs/tutorials/posttraining/multimodal.md b/docs/tutorials/posttraining/multimodal.md index 0e867f55b5..65bbc1a78d 100644 --- a/docs/tutorials/posttraining/multimodal.md +++ b/docs/tutorials/posttraining/multimodal.md @@ -73,7 +73,6 @@ To run a forward pass and verify the model's output, use the following command: ```shell # Gemma3 decode python -m maxtext.inference.decode \ - maxtext/configs/base.yml \ model_name=gemma3-4b \ hf_access_token=${HF_ACCESS_TOKEN?} \ tokenizer_path=src/maxtext/assets/tokenizers/tokenizer.gemma3 \ @@ -109,7 +108,6 @@ export TARGET_LENGTH=... # Adjust to fit expected output length export PREDICT_LENGTH=... # Adjust to fit image tokens + text prompt python -m maxtext.inference.decode \ - maxtext/configs/base.yml \ model_name=gemma3-4b \ ... \ max_prefill_predict_length=${PREDICT_LENGTH?} # Adjust to fit image tokens + text prompt \