Skip to content

Commit 39de2c9

Browse files
Address PR reviews: rename directories and update launcher commands
1 parent d3672e2 commit 39de2c9

20 files changed

Lines changed: 26 additions & 12 deletions

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/Chart.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/Chart.yaml

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/README.md renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Clone the `gpu-recipes` repository and set a reference to the recipe folder.
7575
git clone https://github.com/ai-hypercomputer/gpu-recipes.git
7676
cd gpu-recipes
7777
export REPO_ROOT=`git rev-parse --show-toplevel`
78-
export RECIPE_ROOT=$REPO_ROOT/training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe
78+
export RECIPE_ROOT=$REPO_ROOT/training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe
7979
cd $RECIPE_ROOT
8080
```
8181

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/custom_setup_experiment.py renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/custom_setup_experiment.py

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/launcher.sh renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/launcher.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,14 @@ worker_command=$(cat <<- EOM
112112
--model_recipe_name deepseek_v3 \
113113
--gpus_per_node 8 \
114114
--num_gpus 256 \
115+
--global_batch_size 2048 \
116+
--micro_batch_size 1 \
115117
--seq_length 4096 \
118+
--tensor_model_parallel_size 1 \
119+
--pipeline_model_parallel_size 16 \
120+
--context_parallel_size 1 \
121+
--virtual_pipeline_model_parallel_size None \
122+
--expert_model_parallel_size 8 \
116123
--compute_dtype bf16 \
117124
--max_steps 30
118125
EOM

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/recipe_launch_command.sh renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/recipe_launch_command.sh

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/templates/workload-config-configmap.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/templates/workload-config-configmap.yaml

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/templates/workload-job.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/templates/workload-job.yaml

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/templates/workload-launcher-configmap.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/templates/workload-launcher-configmap.yaml

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/templates/workload-svc.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/templates/workload-svc.yaml

File renamed without changes.

training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256/recipe/values.yaml renamed to training/a4/deepseek_v3/megatron-bridge-pretraining-gke/32node-BF16-SEQ4096-GBS256-NEMO25.11/recipe/values.yaml

File renamed without changes.

0 commit comments

Comments
 (0)