@@ -20,32 +20,53 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
2020
2121Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe
2222----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------
23- ** GPT3-175B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3mega/gpt3-175b/nemo-pretraining-gke/README.md )
24- ** Llama-3-70B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3mega/llama3-70b/nemo-pretraining-gke/README.md )
25- ** Llama-3.1-70B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3mega/llama3-1-70b/nemo-pretraining-gke/README.md )
26- ** Mixtral-8-7B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3mega/mixtral-8x7b/nemo-pretraining-gke/README.md )
23+ ** GPT3-175B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a3mega/gpt3_175b/nemo-gke/nemo2507/recipe/ )
24+ ** Llama-3-70B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a3mega/llama3_70b/nemo-gke/nemo2507/128gpus-bf16/recipe/ )
25+ ** Mixtral-8-7B** | [ A3 Mega (NVIDIA H100)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a3mega/mixtral_8x7b/nemo-gke/nemo2507/recipe/ )
2726
2827
2928### Training benchmarks A3 Ultra
3029
3130Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe
3231------------------ | ----------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------
33- ** Llama-3.1-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-70b/maxtext-pretraining-gke/README.md )
34- ** Llama-3.1-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-70b/nemo-pretraining-gke/README.md )
35- ** Llama-3.1-405B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-405b/maxtext-pretraining-gke/README.md )
36- ** Llama-3.1-405B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo. | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-405b/nemo-pretraining-gke/README.md )
37- ** Mixtral-8-7B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md )
32+ ** Llama-3.1-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-70b/maxtext-pretraining-gke/README.md )
33+ ** Llama-3.1-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo (24.07) | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3_70b/nemo-gke/nemo2407/recipe/ )
34+ ** Llama-3-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3_70b/megatron-bridge-gke/nemo2602/ )
35+ ** Llama-3-70B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | Megatron-Bridge (25.11) | Pre-training | Slurm | [ Link] ( ./training/a3ultra/llama3_70b/megatron-bridge-slurm/nemo2511/ )
36+ ** Llama-3-8B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | Megatron-Bridge (25.11) | Pre-training | Slurm | [ Link] ( ./training/a3ultra/llama3_8b/megatron-bridge-slurm/nemo2511/ )
37+ ** Llama-3.1-405B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama3-1-405b/maxtext-pretraining-gke/README.md )
38+ ** Llama-3.1-405B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo (24.12) | Pre-training | GKE | [ Link] ( ./training/a3ultra/llama31_405b/nemo-gke/nemo2412/recipe/ )
39+ ** Mixtral-8-7B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo (24.07) | Pre-training | GKE | [ Link] ( ./training/a3ultra/mixtral_8x7b/nemo-gke/nemo2407/recipe/ )
40+ ** DeepSeek-V3** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a3ultra/deepseek_v3/megatron-bridge-gke/nemo2602/ )
41+ ** GPT OSS 120B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo (26.02) | Pre-training | GKE | [ Link] ( ./training/a3ultra/gpt_oss_120b/nemo-gke/nemo2602/ )
42+ ** Qwen-3-30B** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | NeMo (26.02) | Pre-training | GKE | [ Link] ( ./training/a3ultra/qwen3_30b_a3b/nemo-gke/nemo2602/ )
43+ ** Wan-2.1** | [ A3 Ultra (NVIDIA H200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a3ultra/wan/megatron-bridge-gke/nemo2602/ )
44+
3845
3946### Training benchmarks A4
4047
4148Models | GPU Machine Type | Framework / Library | Workload Type | Orchestrator | Link to the recipe
42- ------------------ | ---------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------
43- ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-70b/maxtext-pretraining-gke/README.md )
44- ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-70b/nemo-pretraining-gke )
45- ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-405b/maxtext-pretraining-gke/README.md )
46- ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-405b/nemo-pretraining-gke/README.md )
47- ** Mixtral-8-7B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo | Pre-training | GKE | [ Link] ( ./training/a4/mixtral-8x7b/nemo-pretraining-gke/README.md )
48- ** PaliGemma2** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Hugging Face Accelerate | Finetuning | GKE | [ Link] ( ./training/a4/paligemma2/README.md )
49+ ------------------ | ---------------------------------------------------------------------------------------------------- | ------------------- | ------------- | ------------ | ------------------
50+ ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-70b/maxtext-pretraining-gke/README.md )
51+ ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a4/llama3_70b/nemo-gke/nemo2507/ )
52+ ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/llama3_70b/nemo-gke/nemo2602/ )
53+ ** Llama-3.1-70B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (25.09) | Pre-training | Slurm | [ Link] ( ./training/a4/llama3_70b/megatron-bridge-slurm/nemo2509/ )
54+ ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | MaxText | Pre-training | GKE | [ Link] ( ./training/a4/llama3-1-405b/maxtext-pretraining-gke/README.md )
55+ ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a4/llama31_405b/nemo-gke/nemo2507/ )
56+ ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/llama31_405b/nemo-gke/nemo2602/ )
57+ ** Llama-3.1-405B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (25.09) | Pre-training | Slurm | [ Link] ( ./training/a4/llama31_405b/megatron-bridge-slurm/nemo2509/ )
58+ ** Mixtral-8-7B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (25.07) | Pre-training | GKE | [ Link] ( ./training/a4/mixtral_8x7b/nemo-gke/nemo2507/recipe/ )
59+ ** PaliGemma2** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Hugging Face Accelerate | Finetuning | GKE | [ Link] ( ./training/a4/paligemma2/README.md )
60+ ** DeepSeek-V3** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (25.11) | Pre-training | GKE | [ Link] ( ./training/a4/deepseek_v3/megatron-bridge-gke/nemo2511/ )
61+ ** DeepSeek-V3** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/deepseek_v3/megatron-bridge-gke/nemo2602/ )
62+ ** GPT OSS 120B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/gpt_oss_120b/megatron-bridge-gke/nemo2602/ )
63+ ** Llama-3-8B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/llama3-8b/megatron-bridge-gke/nemo2602/ )
64+ ** Qwen-3-235B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (25.11) | Pre-training | GKE | [ Link] ( ./training/a4/qwen3_235b_a22b/megatron-bridge-gke/nemo2511/ )
65+ ** Qwen-3-235B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/qwen3_235b_a22b/megatron-bridge-gke/nemo2602/ )
66+ ** Qwen-3-235B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | Megatron-Bridge (25.11) | Pre-training | Slurm | [ Link] ( ./training/a4/qwen3_235b_a22b/megatron-bridge-slurm/nemo2511/ )
67+ ** Qwen-3-30B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (26.02) | Pre-training | GKE | [ Link] ( ./training/a4/qwen3_30b_a3b/nemo-gke/nemo2602/ )
68+ ** Wan-2.1-14B** | [ A4 (NVIDIA B200)] ( https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms ) | NeMo (25.11) | Pre-training | GKE | [ Link] ( ./training/a4/wan_14b/nemo-gke/nemo2511/ )
69+
4970
5071### Training benchmarks A4X
5172
0 commit comments