Skip to content

Commit 44c0cf6

Browse files
authored
Merge branch 'main' into ajkv/offline-distillation-soft
2 parents d7e658d + 8b86c71 commit 44c0cf6

116 files changed

Lines changed: 4155 additions & 703 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/CODEOWNERS

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,34 @@
11
* @gobbleturk @khatwanimohit @bvandermoon @vipannalla @RissyRan @richjames0 @gagika @shralex @SurbhiJainUSC @hengtaoguo @A9isha @aireenmei @NuojCheng @jiangjy1982 @suexu1025 @NicoGrande @jesselu-google @dipannita08 @igorts-git
22

33
# Model bring-up
4-
src/MaxText/assets @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande
5-
src/MaxText/configs/models @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande @suexu1025 @jesselu-google @NuojCheng
4+
src/maxtext/assets @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande
5+
src/maxtext/configs/models @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande @jesselu-google @NuojCheng
66
src/maxtext/checkpoint_conversion @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @hengtaoguo @gagika @shralex @richjames0 @NicoGrande
7-
src/MaxText/layers @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande @suexu1025 @jesselu-google @NuojCheng
7+
src/maxtext/layers @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande @jesselu-google @NuojCheng
8+
src/maxtext/models @parambole @shuningjin @RissyRan @suexu1025 @jiangjy1982 @gobbleturk @bvandermoon @gagika @shralex @richjames0 @NicoGrande @jesselu-google @NuojCheng
89

910
# Features
1011
src/maxtext/experimental/rl @A9isha @khatwanimohit @xuefgu @gagika @richjames0 @shralex @NicoGrande
11-
src/MaxText/input_pipeline @aireenmei @SurbhiJainUSC @richjames0 @shralex @NicoGrande
12-
src/MaxText/kernels/megablox @RissyRan @michelle-yooh @gagika @richjames0 @shralex @suexu1025 @jesselu-google
13-
src/MaxText/kernels/ragged_attention.py @patemotter @vipannalla @richjames0 @shralex
14-
src/MaxText/layers/pipeline.py @gobbleturk @richjames0 @shralex @NuojCheng
15-
src/MaxText/layers/moe.py @RissyRan @michelle-yooh @gagika @richjames0 @shralex @suexu1025 @jesselu-google
16-
src/MaxText/layers/multi_token_prediction.py @parambole @RissyRan @gagika @richjames0 @shralex
17-
src/MaxText/elastic_train.py @lukebaumann @shauryagup @richjames0 @shralex
18-
src/MaxText/layers/quantizations.py @khatwanimohit @jshin1394 @liudangyi @richjames0 @shralex
12+
src/maxtext/input_pipeline @aireenmei @SurbhiJainUSC @richjames0 @shralex @NicoGrande
13+
src/maxtext/kernels/megablox @RissyRan @michelle-yooh @gagika @richjames0 @shralex @suexu1025 @jesselu-google
14+
src/maxtext/kernels/ragged_attention.py @patemotter @vipannalla @richjames0 @shralex
15+
src/maxtext/layers/pipeline.py @gobbleturk @richjames0 @shralex @NuojCheng
16+
src/maxtext/layers/moe.py @RissyRan @michelle-yooh @gagika @richjames0 @shralex @suexu1025 @jesselu-google
17+
src/maxtext/layers/multi_token_prediction.py @parambole @RissyRan @gagika @richjames0 @shralex
18+
src/maxtext/layers/quantizations.py @khatwanimohit @jshin1394 @liudangyi @richjames0 @shralex
1919

2020
# Inference
21-
src/maxtext/tests/inference @vipannalla @mitalisi @gpolovets1 @mailvijayasingh @jrplatin @patemotter @lumosis @richjames0
21+
tests/inference/ @vipannalla @mitalisi @gpolovets1 @mailvijayasingh @jrplatin @patemotter @lumosis @richjames0
2222
src/maxtext/inference @vipannalla @mitalisi @gpolovets1 @mailvijayasingh @jrplatin @patemotter @lumosis @richjames0
23-
src/maxtext/inference_mlperf @vipannalla @mitalisi @gpolovets1 @mailvijayasingh @jrplatin @patemotter @lumosis @richjames0
2423

2524
# Dockerfiles and dependencies
26-
*.Dockerfile @bvandermoon @parambole @richjames0 @shralex
27-
*.txt @bvandermoon @parambole @richjames0 @shralex
25+
src/dependencies/ @bvandermoon @parambole @richjames0 @shralex
2826

2927
# Docs
30-
*.md @jacoguzo @bvandermoon @richjames0 @shralex @gobbleturk @RissyRan @gagika @A9isha @jiangjy1982 @vipannalla
28+
docs/ @jacoguzo @bvandermoon @richjames0 @shralex @gobbleturk @RissyRan @gagika @A9isha @jiangjy1982 @vipannalla
3129

3230
# Workflow files
33-
.github/workflows @gobbleturk @khatwanimohit @shralex @parambole @bvandermoon @richjames0
31+
.github/workflows/ @gobbleturk @khatwanimohit @shralex @parambole @bvandermoon @richjames0
3432

3533
# Benchmarking/Recipes
36-
benchmarks @SujeethJinesh @bvandermoon @richjames0 @shralex @vipannalla @mitalisi @RissyRan @shauryagup @NuojCheng @gobbleturk @khatwanimohit @Obliviour @notabee @suexu1025
34+
benchmarks/ @SujeethJinesh @bvandermoon @richjames0 @shralex @vipannalla @mitalisi @RissyRan @shauryagup @NuojCheng @gobbleturk @khatwanimohit @Obliviour @notabee @suexu1025

.github/workflows/UploadDockerImages.yml

Lines changed: 6 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,11 @@ jobs:
6565
- device: tpu
6666
build_mode: stable
6767
image_name: maxtext_jax_stable
68-
dockerfile: ./dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
68+
dockerfile: ./src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
6969
- device: tpu
7070
build_mode: nightly
7171
image_name: maxtext_jax_nightly
72-
dockerfile: ./dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
72+
dockerfile: ./src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
7373
uses: ./.github/workflows/build_and_push_docker_image.yml
7474
with:
7575
image_name: ${{ matrix.image_name }}
@@ -79,32 +79,18 @@ jobs:
7979
maxtext_sha: ${{ needs.setup.outputs.maxtext_sha }}
8080
image_date: ${{ needs.setup.outputs.image_date }}
8181

82-
tpu-post-training-stable:
83-
name: tpu-post-training-stable
84-
needs: setup
85-
uses: ./.github/workflows/build_and_push_docker_image.yml
86-
with:
87-
image_name: maxtext_post_training_stable
88-
device: tpu
89-
build_mode: stable
90-
workflow: post-training
91-
dockerfile: ./dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
92-
maxtext_sha: ${{ needs.setup.outputs.maxtext_sha }}
93-
image_date: ${{ needs.setup.outputs.image_date }}
94-
9582
tpu-post-training-nightly:
9683
name: tpu-post-training-nightly
97-
needs: [setup, tpu-post-training-stable]
84+
needs: [setup]
9885
uses: ./.github/workflows/build_and_push_docker_image.yml
9986
with:
10087
image_name: maxtext_post_training_nightly
10188
device: tpu
10289
build_mode: nightly
10390
workflow: post-training
104-
dockerfile: ./dependencies/dockerfiles/maxtext_post_training_local_dependencies.Dockerfile
91+
dockerfile: ./src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
10592
maxtext_sha: ${{ needs.setup.outputs.maxtext_sha }}
10693
image_date: ${{ needs.setup.outputs.image_date }}
107-
base_image: gcr.io/tpu-prod-env-multipod/maxtext_post_training_stable:${{ needs.setup.outputs.image_date }}
10894

10995
gpu-pre-training:
11096
name: ${{ matrix.image_name }}
@@ -116,11 +102,11 @@ jobs:
116102
- device: gpu
117103
build_mode: stable
118104
image_name: maxtext_gpu_jax_stable
119-
dockerfile: ./dependencies/dockerfiles/maxtext_gpu_dependencies.Dockerfile
105+
dockerfile: ./src/dependencies/dockerfiles/maxtext_gpu_dependencies.Dockerfile
120106
- device: gpu
121107
build_mode: nightly
122108
image_name: maxtext_gpu_jax_nightly
123-
dockerfile: ./dependencies/dockerfiles/maxtext_gpu_dependencies.Dockerfile
109+
dockerfile: ./src/dependencies/dockerfiles/maxtext_gpu_dependencies.Dockerfile
124110
uses: ./.github/workflows/build_and_push_docker_image.yml
125111
with:
126112
image_name: ${{ matrix.image_name }}

.github/workflows/build_and_push_docker_image.yml

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,16 @@ on:
3535
required: true
3636
type: string
3737
image_date:
38-
required: true
39-
type: string
40-
base_image:
4138
required: false
4239
type: string
43-
default: ''
4440
workflow:
4541
required: false
4642
type: string
4743
default: 'pre-training'
44+
version_name:
45+
required: false
46+
type: string
47+
default: ''
4848

4949
permissions:
5050
contents: read
@@ -115,7 +115,7 @@ jobs:
115115
push: true
116116
context: .
117117
file: ${{ inputs.dockerfile }}
118-
tags: gcr.io/tpu-prod-env-multipod/${{ inputs.image_name }}:latest
118+
tags: gcr.io/tpu-prod-env-multipod/${{ inputs.image_name }}:${{ github.run_id }}
119119
cache-from: type=gha
120120
outputs: type=image,compression=zstd,force-compression=true
121121
build-args: |
@@ -125,34 +125,40 @@ jobs:
125125
JAX_VERSION=NONE
126126
LIBTPU_VERSION=NONE
127127
INCLUDE_TEST_ASSETS=true
128-
${{ inputs.base_image != '' && format('BASEIMAGE={0}', inputs.base_image) || '' }}
129128
130129
- name: Add tags to Docker image
131130
if: steps.check.outputs.should_run == 'true'
132131
shell: bash
133132
run: |
134133
SOURCE_IMAGE="gcr.io/tpu-prod-env-multipod/${INPUTS_IMAGE_NAME}"
135134
136-
# Add date tag
137-
gcloud container images add-tag "$SOURCE_IMAGE:latest" "$SOURCE_IMAGE:${INPUTS_IMAGE_DATE}" --quiet
135+
if [[ $INPUTS_VERSION_NAME ]]; then
136+
echo "Tagging docker images corresponding to PyPI release..."
137+
gcloud container images add-tag "$SOURCE_IMAGE:${{ github.run_id }}" "$SOURCE_IMAGE:${INPUTS_VERSION_NAME}" --quiet
138+
else
139+
echo "Tagging docker images corresponding to nightly release..."
138140
139-
# Convert date to YYYYMMDD format
140-
clean_date=$(echo "${INPUTS_IMAGE_DATE}" | sed 's/[-:]//g' | cut -c1-8)
141+
# Add date tag
142+
gcloud container images add-tag "$SOURCE_IMAGE:${{ github.run_id }}" "$SOURCE_IMAGE:${INPUTS_IMAGE_DATE}" --quiet
141143
142-
# Add MaxText tag
143-
maxtext_hash=$(git rev-parse --short HEAD)
144-
gcloud container images add-tag "$SOURCE_IMAGE:latest" "$SOURCE_IMAGE:maxtext_${maxtext_hash}_${clean_date}" --quiet
144+
# Convert date to YYYYMMDD format
145+
clean_date=$(echo "${INPUTS_IMAGE_DATE}" | sed 's/[-:]//g' | cut -c1-8)
145146
147+
# Add MaxText tag
148+
maxtext_hash=$(git rev-parse --short HEAD)
149+
gcloud container images add-tag "$SOURCE_IMAGE:${{ github.run_id }}" "$SOURCE_IMAGE:maxtext_${maxtext_hash}_${clean_date}" --quiet
146150
147151
# Add post-training dependencies tags
148152
if [ "${{ inputs.workflow }}" == "post-training" ]; then
149153
for dir in tunix vllm tpu-inference; do
150154
if [ -d "./$dir" ]; then
151155
dir_hash=$(git -C "$dir" rev-parse --short HEAD)
152-
gcloud container images add-tag "$SOURCE_IMAGE:latest" "$SOURCE_IMAGE:${dir}_${dir_hash}_${clean_date}" --quiet
153-
fi
154-
done
156+
gcloud container images add-tag "$SOURCE_IMAGE:${{ github.run_id }}" "$SOURCE_IMAGE:${dir}_${dir_hash}_${clean_date}" --quiet
157+
fi
158+
done
159+
fi
155160
fi
156161
env:
157162
INPUTS_IMAGE_NAME: ${{ inputs.image_name }}
158163
INPUTS_IMAGE_DATE: ${{ inputs.image_date }}
164+
INPUTS_VERSION_NAME: ${{ inputs.version_name }}

.github/workflows/build_and_upload_images.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ if [[ ! -v CLOUD_IMAGE_NAME ]] || [[ ! -v PROJECT ]] || [[ ! -v MODE ]] || [[ !
4949
fi
5050

5151
gcloud auth configure-docker us-docker.pkg.dev --quiet
52-
bash "$MAXTEXT_REPO_ROOT"'/dependencies/scripts/docker_build_dependency_image.sh' LOCAL_IMAGE_NAME=$LOCAL_IMAGE_NAME MODE="$MODE" DEVICE="$DEVICE"
52+
bash "$MAXTEXT_REPO_ROOT"'/src/dependencies/scripts/docker_build_dependency_image.sh' LOCAL_IMAGE_NAME=$LOCAL_IMAGE_NAME MODE="$MODE" DEVICE="$DEVICE"
5353
image_date=$(date +%Y-%m-%d)
5454

5555
# Upload only dependencies image
@@ -65,7 +65,7 @@ if ! gcloud storage cp gs://maxtext-test-assets/* "${MAXTEXT_TEST_ASSETS_ROOT:-$
6565
fi
6666

6767
# Build then upload "dependencies + code" image
68-
docker build --build-arg BASEIMAGE=${LOCAL_IMAGE_NAME} -f "$MAXTEXT_REPO_ROOT"'/dependencies/dockerfiles/maxtext_runner.Dockerfile' -t ${LOCAL_IMAGE_NAME}_runner .
68+
docker build --build-arg BASEIMAGE=${LOCAL_IMAGE_NAME} -f "$MAXTEXT_REPO_ROOT"'/src/dependencies/dockerfiles/maxtext_runner.Dockerfile' -t ${LOCAL_IMAGE_NAME}_runner .
6969
docker tag ${LOCAL_IMAGE_NAME}_runner gcr.io/$PROJECT/${CLOUD_IMAGE_NAME}:latest
7070
docker push gcr.io/$PROJECT/${CLOUD_IMAGE_NAME}:latest
7171
docker tag ${LOCAL_IMAGE_NAME}_runner gcr.io/$PROJECT/${CLOUD_IMAGE_NAME}:${image_date}

.github/workflows/check_docs_build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
run: uv venv --python 3.12 $GITHUB_WORKSPACE/venv
2828

2929
- name: Install dependencies
30-
run: . $GITHUB_WORKSPACE/venv/bin/activate && uv pip install -r dependencies/requirements/requirements_docs.txt
30+
run: . $GITHUB_WORKSPACE/venv/bin/activate && uv pip install -r src/dependencies/requirements/requirements_docs.txt
3131

3232
- name: Build documentation
3333
run: |

.github/workflows/pypi_release.yml

Lines changed: 65 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,11 @@ jobs:
4141
name: Build and Test MaxText Package
4242
needs: [release_approval]
4343
uses: ./.github/workflows/build_and_test_maxtext.yml
44+
secrets: inherit
4445

4546
publish_maxtext_package_to_pypi:
4647
name: Publish MaxText package to PyPI
47-
# Temporarily only require release_approval for a one-time upload.
48-
# Immediately revert this to `needs: [build_and_test_maxtext_package]`.
49-
needs: [release_approval]
48+
needs: [build_and_test_maxtext_package]
5049
runs-on: ubuntu-latest
5150
environment: release
5251
steps:
@@ -61,3 +60,66 @@ jobs:
6160
uses: pypa/gh-action-pypi-publish@release/v1
6261
with:
6362
packages-dir: dist/
63+
64+
get_latest_maxtext_pypi_version:
65+
name: Get latest MaxText PyPI version
66+
needs: [publish_maxtext_package_to_pypi]
67+
runs-on: ubuntu-latest
68+
outputs:
69+
latest_pypi_version: ${{ steps.get_version.outputs.version }}
70+
steps:
71+
- name: Install jq
72+
run: sudo apt-get update && sudo apt-get install -y jq
73+
- name: Fetch latest version of maxtext from PyPI
74+
id: get_version
75+
run: |
76+
# Fetch JSON from PyPI for 'maxtext'
77+
echo "Fetching latest version from https://pypi.org/pypi/maxtext/json"
78+
pypi_json=$(curl -s https://pypi.org/pypi/maxtext/json)
79+
80+
# Extract the version from the "info" section using jq
81+
latest_version=$(echo "$pypi_json" | jq -r ".info.version")
82+
83+
if [ -z "$latest_version" ] || [ "$latest_version" == "null" ]; then
84+
echo "Error: Could not parse latest version from PyPI JSON."
85+
exit 1
86+
fi
87+
88+
echo "Successfully fetched latest MaxText version on PyPI: $latest_version"
89+
# Set the output variable for other jobs to consume
90+
echo "version=$latest_version" >> "$GITHUB_OUTPUT"
91+
92+
# This job builds and pushes MaxText stable Docker images for both TPU and GPU devices.
93+
# It runs only after a new release is published to PyPI.
94+
# Creates docker image for MaxText commit corresponding to the release.
95+
upload_maxtext_docker_images:
96+
name: ${{ matrix.image_name }}
97+
needs: [get_latest_maxtext_pypi_version]
98+
strategy:
99+
fail-fast: false
100+
matrix:
101+
include:
102+
- device: tpu
103+
build_mode: stable
104+
image_name: maxtext_jax_stable
105+
workflow: pre-training
106+
dockerfile: ./src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
107+
- device: gpu
108+
build_mode: stable
109+
image_name: maxtext_gpu_jax_stable
110+
workflow: pre-training
111+
dockerfile: ./src/dependencies/dockerfiles/maxtext_gpu_dependencies.Dockerfile
112+
- device: tpu
113+
build_mode: stable
114+
image_name: maxtext_post_training_stable
115+
workflow: post-training
116+
dockerfile: ./src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile
117+
uses: ./.github/workflows/build_and_push_docker_image.yml
118+
with:
119+
image_name: ${{ matrix.image_name }}
120+
device: ${{ matrix.device }}
121+
build_mode: ${{ matrix.build_mode }}
122+
workflow: ${{ matrix.workflow }}
123+
dockerfile: ${{ matrix.dockerfile }}
124+
maxtext_sha: ${{ github.sha }}
125+
version_name: ${{ needs.get_latest_maxtext_pypi_version.outputs.latest_pypi_version }}

.pre-commit-config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ repos:
88
- id: codespell
99
args:
1010
- '-w'
11-
- '--skip="*.txt,pylintrc,.*,src/maxtext/assets/*"'
11+
- '--skip="*.txt,pylintrc,.*,src/maxtext/assets/*,src/maxtext/input_pipeline/protos/*"'
1212
- '-L ND,nd,sems,TE,ROUGE,rouge,astroid,ags,dout'
1313
- '.'
1414
additional_dependencies:
@@ -30,6 +30,7 @@ repos:
3030
args:
3131
- '--disable=R0401,R0917,W0201,W0613'
3232
- "--ignore-patterns='.pytype,.*pyi$'"
33+
- '--ignore-paths=src/maxtext/input_pipeline/protos'
3334
- 'benchmarks'
3435
- 'src'
3536
- 'tests'
@@ -47,6 +48,7 @@ repos:
4748
rev: 24.10.1
4849
hooks:
4950
- id: pyink
51+
exclude: src/maxtext/input_pipeline/protos/
5052
args:
5153
- '--pyink-indentation=2'
5254
- '--line-length=122'

.readthedocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ sphinx:
2121
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
2222
python:
2323
install:
24-
- requirements: dependencies/requirements/requirements_docs.txt
24+
- requirements: src/dependencies/requirements/requirements_docs.txt

PREFLIGHT.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@ Before you run ML workload on Multihost with GCE or GKE, simply apply `bash pref
77

88
Here is an example for GCE:
99
```
10-
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
10+
bash preflight.sh PLATFORM=GCE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
1111
```
1212

1313
Here is an example for GKE:
1414
```
15-
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
15+
bash preflight.sh PLATFORM=GKE && python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
1616
```
1717

1818
# Optimization 2: Numa binding (You can only apply this to v4 and v5p)
@@ -22,14 +22,14 @@ For GCE,
2222
[preflight.sh](https://github.com/google/maxtext/blob/main/preflight.sh) will help you install `numactl` dependency, so you can use it directly, here is an example:
2323

2424
```
25-
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
25+
bash preflight.sh PLATFORM=GCE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
2626
```
2727

2828
For GKE,
29-
`numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
29+
`numactl` should be built into your docker image from [maxtext_tpu_dependencies.Dockerfile](https://github.com/google/maxtext/blob/main/src/dependencies/dockerfiles/maxtext_tpu_dependencies.Dockerfile), so you can use it directly if you built the maxtext docker image. Here is an example
3030

3131
```
32-
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train src/maxtext/configs/base.yml run_name=${YOUR_JOB_NAME?}
32+
bash preflight.sh PLATFORM=GKE && numactl --membind 0 --cpunodebind=0 python3 -m maxtext.trainers.pre_train.train run_name=${YOUR_JOB_NAME?}
3333
```
3434

3535
1. `numactl`: This is the command-line tool used for controlling NUMA policy for processes or shared memory. It's particularly useful on multi-socket systems where memory locality can impact performance.

benchmarks/maxtext_xpk_runner.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,7 @@ def xpk_benchmark_runner(
746746
command, name = generate_xpk_workload_cmd(
747747
cluster_config=cluster_config,
748748
wl_config=wl_config,
749+
workload_name=wl_config.run_name,
749750
user=user,
750751
exp_name=exp_name,
751752
)

0 commit comments

Comments
 (0)