Skip to content

Commit 76c2589

Browse files
Refactor: Replace placeholder GCS bucket name (#187)
Replaced the placeholder GCS bucket name "prefix-artifact-repository" with "your-prefix-artifact-repository" in all relevant files. This change affects configuration files and example scripts for TPU training on GKE. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
1 parent 4dd2cf6 commit 76c2589

4 files changed

Lines changed: 6 additions & 6 deletions

File tree

ai-infrastructure/tpu-training-on-gke/examples/jobset/maxtext/parameters.env.multi_slice_8B

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ ICI_PARALLELISM=16
55
JOB_PARALLELISM=4
66
NUM_SLICES=2
77
RUN_NAME=maxtext-single-slice-101
8-
BASE_OUTPUT_DIRECTORY=gs://prefix-artifact-repository/runs
9-
DATASET_PATH=gs://prefix-artifact-repository/datasets
8+
BASE_OUTPUT_DIRECTORY=gs://your-prefix-artifact-repository/runs
9+
DATASET_PATH=gs://your-prefix-artifact-repository/datasets
1010
TENSORBOARD_NAME=projects/project-id/locations/us-central1/tensorboards/910xxxxx16
1111
WID_KSA=wid-sa
1212
ARGS=steps=150 per_device_batch_size=6 enable_checkpointing=false enable_profiler=false remat_policy=full max_target_length=2048 log_period=50 global_parameter_scale=8

ai-infrastructure/tpu-training-on-gke/examples/jobset/maxtext/parameters.env.single_slice_8B

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ ICI_PARALLELISM=16
55
JOB_PARALLELISM=4
66
NUM_SLICES=1
77
RUN_NAME=maxtext-single-slice-101
8-
BASE_OUTPUT_DIRECTORY=gs://prefix-artifact-repository/runs
9-
DATASET_PATH=gs://prefix-artifact-repository/datasets
8+
BASE_OUTPUT_DIRECTORY=gs://your-prefix-artifact-repository/runs
9+
DATASET_PATH=gs://your-prefix-artifact-repository/datasets
1010
TENSORBOARD_NAME=projects/project-id/locations/us-central1/tensorboards/910xxxxx16
1111
WID_KSA=wid-sa
1212
ARGS=steps=150 per_device_batch_size=6 enable_checkpointing=false enable_profiler=false remat_policy=full max_target_length=2048 log_period=50 global_parameter_scale=8

ai-infrastructure/tpu-training-on-gke/examples/xpk/multi-slice-8b.sh.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ set -e
33

44
export LIBTPU_INIT_ARGS="--xla_tpu_enable_data_parallel_all_reduce_opt=true --xla_tpu_data_parallel_opt_different_sized_ops=true --xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_gather=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_gather=true"
55

6-
python3 MaxText/train.py MaxText/configs/base.yml run_name=maxtext-single-slice-201 dataset_path=gs://prefix-artifact-repository/datasets base_output_directory=gs://prefix-artifact-repository/runs steps=150 log_period=50 per_device_batch_size=6 global_parameter_scale=8 enable_checkpointing=false enable_profiler=false remat_policy=full dcn_data_parallelism=2 ici_fsdp_parallelism=16
6+
python3 MaxText/train.py MaxText/configs/base.yml run_name=maxtext-single-slice-201 dataset_path=gs://your-prefix-artifact-repository/datasets base_output_directory=gs://your-prefix-artifact-repository/runs steps=150 log_period=50 per_device_batch_size=6 global_parameter_scale=8 enable_checkpointing=false enable_profiler=false remat_policy=full dcn_data_parallelism=2 ici_fsdp_parallelism=16

ai-infrastructure/tpu-training-on-gke/examples/xpk/single-slice-8b.sh.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ set -e
33

44
export LIBTPU_INIT_ARGS="--xla_tpu_enable_data_parallel_all_reduce_opt=true --xla_tpu_data_parallel_opt_different_sized_ops=true --xla_tpu_enable_async_collective_fusion=true --xla_tpu_enable_async_collective_fusion_fuse_all_gather=true --xla_tpu_enable_async_collective_fusion_multiple_steps=true --xla_tpu_overlap_compute_collective_tc=true --xla_enable_async_all_gather=true"
55

6-
python3 MaxText/train.py MaxText/configs/base.yml run_name=maxtext-single-slice-201 dataset_path=gs://prefix-artifact-repository/datasets base_output_directory=gs://prefix-artifact-repository/runs steps=150 log_period=50 per_device_batch_size=6 global_parameter_scale=8 enable_checkpointing=false enable_profiler=false remat_policy=full dcn_data_parallelism=1 ici_fsdp_parallelism=16
6+
python3 MaxText/train.py MaxText/configs/base.yml run_name=maxtext-single-slice-201 dataset_path=gs://your-prefix-artifact-repository/datasets base_output_directory=gs://your-prefix-artifact-repository/runs steps=150 log_period=50 per_device_batch_size=6 global_parameter_scale=8 enable_checkpointing=false enable_profiler=false remat_policy=full dcn_data_parallelism=1 ici_fsdp_parallelism=16

0 commit comments

Comments
 (0)