Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_and_test_maxtext.yml
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ jobs:
base_image: maxtext-unit-test-tpu:py312
cloud_runner: linux-x86-ct6e-180-4tpu
pytest_marker: 'not cpu_only and not gpu_only and integration_test and not post_training'
pytest_addopts: '--ignore=tests/post_training'
pytest_addopts: '--ignore=tests/post_training --ignore=tests/integration/hlo_diff_test.py'
xla_python_client_mem_fraction: 0.75
tf_force_gpu_allow_growth: false
container_resource_option: "--privileged"
Expand Down
32 changes: 25 additions & 7 deletions .github/workflows/run_tests_against_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ on:
description: 'If false, maxtext_sha must be provided for checkout'
type: boolean
default: false
is_update_hlo:
required: false
type: boolean
default: false

permissions:
contents: read
Expand Down Expand Up @@ -167,13 +171,19 @@ jobs:
else
SPLIT_ARGS=""
fi
$PYTHON_EXE -m pytest ${INPUTS_PYTEST_ADDOPTS} \
-v \
-m "${FINAL_PYTEST_MARKER}" \
--durations=0 \
$PYTEST_COV_ARGS \
$SPLIT_ARGS \
${INPUTS_PYTEST_EXTRA_ARGS}

# Setup substitution: If manually updating HLO, skip tests execution and run only the update script instead!
if [ "${INPUTS_IS_UPDATE_HLO}" == "true" ]; then
python3 tests/utils/update_hlo_references.py
else
$PYTHON_EXE -m pytest ${INPUTS_PYTEST_ADDOPTS} \
-v \
-m "${FINAL_PYTEST_MARKER}" \
--durations=0 \
$PYTEST_COV_ARGS \
$SPLIT_ARGS \
${INPUTS_PYTEST_EXTRA_ARGS}
fi

env:
PYTHONPATH: "${{ github.workspace }}/src"
Expand All @@ -185,6 +195,14 @@ jobs:
INPUTS_WORKER_GROUP: ${{ inputs.worker_group }}
INPUTS_PYTEST_EXTRA_ARGS: ${{ inputs.pytest_extra_args }}
INPUTS_MAXTEXT_INSTALLED: ${{ inputs.maxtext_installed }}
INPUTS_IS_UPDATE_HLO: ${{ inputs.is_update_hlo }}
- name: Upload Reference HLO
if: ${{ inputs.is_update_hlo }}
uses: actions/upload-artifact@v4
with:
name: reference-hlo
path: tests/utils/reference_hlo_*.txt
if-no-files-found: ignore
- name: Upload results to Codecov
if: ${{ !inputs.maxtext_installed }} # Skip code coverage upload for maxtext image testing
uses: codecov/codecov-action@v5
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/run_tests_coordinator.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ on:
description: 'If false, maxtext_sha must be provided for checkout'
type: boolean
default: false
is_update_hlo:
required: false
type: boolean
default: false

permissions:
contents: read
Expand Down Expand Up @@ -150,3 +154,4 @@ jobs:
worker_group: ${{ matrix.worker_group }}
total_workers: ${{ contains(inputs.flavor, 'cpu-unit') && 2 || 1 }}
maxtext_sha: ${{ inputs.maxtext_sha }}
is_update_hlo: ${{ inputs.is_update_hlo }}
49 changes: 49 additions & 0 deletions .github/workflows/update_reference_hlo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: "Update HLO References (for hlo_diff_test.py)"

on:
workflow_dispatch:
permissions:
contents: read

jobs:
build-wheel:
uses: ./.github/workflows/build_package.yml
with:
device_type: tpu
device_name: v6e-4
cloud_runner: linux-x86-n2-16-buildkit

run-tests:
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
needs: build-wheel
uses: ./.github/workflows/run_tests_coordinator.yml
with:
flavor: tpu-integration
base_image: maxtext-unit-test-tpu:py312
is_scheduled_run: false
maxtext_sha: ${{ github.sha }}
is_update_hlo: true

commit-changes:
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
needs: run-tests # Wait for tests to finish
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.ref }}

- name: Download Reference HLO
uses: actions/download-artifact@v4
with:
name: reference-hlo
path: tests/utils/

- name: Commit and Push changes
run: |
git config --global user.name "github-actions[bot]"
git config --global user.email "github-actions[bot]@users.noreply.github.com"
git add tests/utils/reference_hlo_*.txt
git commit -m "Update reference HLO from CI artifact"
git push
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
90 changes: 90 additions & 0 deletions docs/development/hlo_diff_testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# HLO Graph Diff Verification Testing

This document provides context for the HLO Graph Diff tests, what HLO is, and how to manage reference baselines.

## Related Files

- **Test Logic**: `tests/integration/hlo_diff_test.py`
- **Reference Checkpoints baselines**: `tests/utils/reference_hlo_*.txt`
- **Update Helper script**: `tests/utils/update_hlo_references.py`
- **GitHub Action Trigger Workflow**: `.github/workflows/update_reference_hlo.yml`

## What is HLO?

**HLO (High-Level Optimizer)** is the intermediate representation used by XLA (Accelerated Linear Algebra) to capture the lowering compiler graph structures.

An HLO module records:

- The sequences of low-level math operations (dot products, convolutions, additions).
- Array tensor shapes and numerical precisions.
- Multipod TPU cluster partitioning array sharding mappings.

## Purpose of HloDiffTest

The primary purpose of the `TestHloDiff` validation checks is to ensure that **refactoring PRs are purely refactoring code** and not unintentionally impacting graph compiler lowering or performance.

- **For pure refactors:** The HLO graph layout should remain *strictly identical*. Any detected deviation flags that execution boundaries or operation pipelines might have changed under the hood.
- **For dependency updates:** Changes to framework dependencies (like updating JAX or XLA versions) *are expected* to slightly alter compiled HLO output layouts, which makes baseline updates appropriate in those scenarios.

______________________________________________________________________

## How the Test Works

This test runs automatically as part of the [`tpu-integration`](https://github.com/AI-Hypercomputer/maxtext/actions/workflows/build_and_test_maxtext.yml) CI test suite on every Pull Request.

When the test method executes, it performs the following sequence of actions:

1. **Triggers Compilation**: It runs the model training lifecycle compilation-only phase (invoking `train_compile.main()`) without actually allocating hardware compute nodes or running optimization passes.
2. **Dumps HLO modules**: Instructs the XLA compiler back-end to capture optimizer operations lowering structure graphs and dump them to text files.
3. **Strict comparison matches**: Compares the structural lines of the generated representation graph directly against baseline `.txt` copies stored under `tests/utils/`.

______________________________________________________________________

## Updating HLO reference files

When intended architectures transformations alter graph lowering, reference file baselines require updates.

> [!IMPORTANT]\
> While running the update script locally is not the end of the world, **relying on local execution can cause remote CI tests to fail.**
> The PR verification pipelines run the tests in a strictly locked GitHub Actions environment. The smallest discrepancies in local library installations will introduce slight backend lowering graph deviations. If your local execution leads to a remote CI check failure, rely on the GitHub Action trigger described below to generate environment-matching baselines.

### Method 1: Run the manual GitHub Action Workflow (Highly Recommended)

Triggering the CI workflow guarantees execution runs within the correct environment isolation scope.

#### Option A: Using the GitHub UI

1. Go to the Actions tab in the repository browser.
2. Find the manual workflow: `Update HLO References (for hlo_diff_test.py)`.
3. Run it targeting your PR workspace branch. It compiles the graph layout and commits the baseline update files back to the branch automatically.

#### Option B: Using the GitHub CLI (`gh`)

Alternatively, you can trigger the remote workflow via terminal CLI execution:

```bash
gh workflow run update_reference_hlo.yml --ref <branch>
```

> [!NOTE]
> A successful run of the manual update workflow will add a new commit to your Pull Request branch. Once complete, you must:
>
> 1. Pull the new commit from remote.
> 2. Squash the commits in your branch once again to keep your PR history clean.
> 3. Push the squashed commit to remote.
> 4. Retry the `tpu-integration` workflow to verify tests pass on your PR.

### Method 2: Local Execution

If you need to test or update baselines manually during development:

```bash
source .venv/bin/activate
pytest tests/integration/hlo_diff_test.py -v
```

Or to force update the local baselines:

```bash
python3 tests/utils/update_hlo_references.py
```
Loading
Loading