Skip to content

Commit ebb7729

Browse files
committed
Switch from pytest-split to pytest-xdist for parallel test execution
Previously, CPU tests are distributed across multiple workers using pytest-split, which assigns the same number of tests to each worker. However, since the runtime of tests is different, some workers end up finishing fast and stand idle while others take a long time, so we're not utilizing the workers fully. This change replaces the use of pytest-split with pytest-xdist which dynamically assigns work to workers.
1 parent c2574ab commit ebb7729

2 files changed

Lines changed: 3 additions & 17 deletions

File tree

.github/workflows/build_and_test_maxtext.yml

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,23 +47,16 @@ jobs:
4747
maxtext_cpu_unit_tests:
4848
needs: build_and_upload_maxtext_package
4949
uses: ./.github/workflows/run_tests_against_package.yml
50-
strategy:
51-
fail-fast: false # don't cancel all jobs on failure
52-
matrix:
53-
image_type: ["py312"]
54-
worker_group: [1, 2, 3, 4]
5550
with:
5651
device_type: cpu
5752
device_name: X64
5853
cloud_runner: linux-x86-n2-16
59-
image_type: ${{ matrix.image_type }}
54+
image_type: "py312"
6055
pytest_marker: 'cpu_only'
6156
xla_python_client_mem_fraction: 0.75
6257
tf_force_gpu_allow_growth: false
6358
container_resource_option: "--privileged"
6459
is_scheduled_run: ${{ github.event_name == 'schedule' }}
65-
worker_group: ${{ matrix.worker_group }}
66-
total_workers: 4
6760

6861
maxtext_tpu_unit_tests:
6962
needs: build_and_upload_maxtext_package

.github/workflows/run_tests_against_package.yml

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,6 @@ on:
5050
cloud_runner:
5151
required: false
5252
type: string
53-
worker_group:
54-
required: false
55-
type: number
56-
default: 1
57-
total_workers:
58-
required: false
59-
type: number
60-
default: 1
6153

6254
permissions:
6355
contents: read
@@ -71,6 +63,7 @@ jobs:
7163
TF_FORCE_GPU_ALLOW_GROWTH: ${{ inputs.tf_force_gpu_allow_growth }}
7264
TPU_SKIP_MDS_QUERY: ${{ inputs.device_type == 'cpu' && '1' || '' }}
7365
MAXTEXT_PACKAGE_EXTRA: ${{ inputs.device_type == 'cpu' && 'tpu' || inputs.device_type }}
66+
ALLOW_MULTIPLE_LIBTPU_LOAD: ${{ inputs.device_type == 'cpu' && 'true' || '' }} # bypass /tmp/libtpu_lockfile check for cpu tests, which don't actually use accelerators (to allow concurrency)
7467
options: ${{ inputs.container_resource_option }}
7568
steps:
7669
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
@@ -108,5 +101,5 @@ jobs:
108101
export LIBTPU_INIT_ARGS='--xla_tpu_scoped_vmem_limit_kib=65536'
109102
fi
110103
# TODO: Fix the skipped tests and remove the deselect flags
111-
[ "${{ inputs.total_workers }}" -gt 1 ] && .venv/bin/python3 -m pip install --quiet pytest-split && SPLIT_ARGS="--splits ${{ inputs.total_workers }} --group ${{ inputs.worker_group }}" || SPLIT_ARGS=""
104+
[ "${{ inputs.device_type }}" == "cpu" ] && .venv/bin/python3 -m pip install --quiet pytest-xdist && SPLIT_ARGS="-n auto" || SPLIT_ARGS=""
112105
.venv/bin/python3 -m pytest ${{ inputs.pytest_addopts }} -v -m "${FINAL_PYTEST_MARKER}" --durations=0 --deselect "tests/tokenizer_test.py::TokenizerTest::test_detokenize" $SPLIT_ARGS

0 commit comments

Comments
 (0)