|
1 | 1 | # Optimization 1: Multihost recommended network settings |
2 | | -We included all the recommended network settings in [rto_setup.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/rto_setup.sh). |
| 2 | + |
| 3 | +We included all the recommended network settings in [rto_setup.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/rto_setup.sh). |
3 | 4 |
|
4 | 5 | [preflight.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/preflight.sh) will help you apply them based on GCE or GKE platform. |
5 | 6 |
|
6 | 7 | Before you run ML workload on Multihost with GCE or GKE, simply apply `bash preflight.sh PLATFORM=[GCE or GKE]` to leverage the best DCN network performance. |
7 | 8 |
|
8 | 9 | Here is an example for GCE: |
| 10 | + |
9 | 11 | ``` |
10 | | -bash src/dependencies/scripts/preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
| 12 | +bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
11 | 13 | ``` |
12 | 14 |
|
13 | 15 | Here is an example for GKE: |
| 16 | + |
14 | 17 | ``` |
15 | | -bash src/dependencies/scripts/preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
| 18 | +bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
16 | 19 | ``` |
17 | 20 |
|
18 | 21 | # Optimization 2: Numa binding (You can only apply this to v4 and v5p) |
| 22 | + |
19 | 23 | NUMA binding is recommended for enhanced performance, as it reduces memory latency and maximizes data throughput, ensuring that your high-performance applications operate more efficiently and effectively. |
20 | 24 |
|
21 | | -For GCE, |
| 25 | +For GCE, |
22 | 26 | [preflight.sh](https://github.com/google/maxtext/blob/main/src/dependencies/scripts/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example: |
23 | 27 |
|
24 | 28 | ``` |
25 | | -bash src/dependencies/scripts/preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
| 29 | +bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
26 | 30 | ``` |
27 | 31 |
|
28 | 32 | For GKE, |
29 | 33 | `numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example |
30 | 34 |
|
31 | 35 | ``` |
32 | | -bash src/dependencies/scripts/preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
| 36 | +bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?} |
33 | 37 | ``` |
34 | 38 |
|
35 | 39 | 1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance. |
|
0 commit comments