Skip to content

HLO Deviation Unit Tests#3713

Open
darisoy wants to merge 1 commit intomainfrom
hlo-identical-tests
Open

HLO Deviation Unit Tests#3713
darisoy wants to merge 1 commit intomainfrom
hlo-identical-tests

Conversation

@darisoy
Copy link
Copy Markdown
Collaborator

@darisoy darisoy commented Apr 21, 2026

Description

Goal

This PR introduces an automated checks system that detects unintended compiler transformations or model graph deviations without breaking isolation security constraints.

The PR consists of two core components:

  1. Compiler performance validation: Ensure the PR isn't changing compiler performance and model graph generation unintentionally. This is done via parameterized checks in tests/integration/hlo_diff_test.py comparing against valid base references (stored in tests/utils/reference_hlo_*.txt).
  2. Automatic PR Extraction and Updates: If a PR author is making intended compiler graph modifications, they can manually trigger an automated updates workflow from GitHub: Update HLO References (for hlo_diff_test.py). This workflow executes tests/utils/update_hlo_references.py in a secure isolated runner environment to recreate all parameterized reference files and push them back to the workspace PR branch.

Changes Integrated

Parameterized HLO Graph Diff Validations

  • tests/integration/hlo_diff_test.py:

    • Introduces @pytest.mark.parametrize to support scaling HLO validation across multiple model configurations.
    • Added out-of-the-box test cases for DeepSeek v3 (deepseek3), Llama 3 8B (llama3_8b), and Qwen 3 1.7B (qwen3_1.7b).
    • Integrated dynamic filtering logic to ignore operation numbers (stack_frame_id), sharding hints, and normalize trailing operation naming differences.
    • Managed compilation boundaries with a reliable try...finally scoping block to clear compilation landing dirs even on assertion failures.
    • Enforces line limit thresholds via a MAX_LINES = 2000 constant.
  • tests/utils/update_hlo_references.py:

    • A helper script that scans and purges existing local reference checkpoint files (reference_hlo_*.txt) before orchestrating the pytest suite to regenerate them.

Secure CI Automation Workflows

  • .github/workflows/update_reference_hlo.yml:
    • The manual update workflow file: Update HLO References (for hlo_diff_test.py).
    • Fires up execution via an intermediate run_tests_coordinator.yml layer passing is_update_hlo: true.
    • Protects token boundaries with strict top-level contents: read compliance rules.
  • Pathways & Suite isolation:
    • In run_tests_against_package.yml, conditionally branches the workload step to process the references update script instead of the normal pytest check loop.
    • Excluded the file entirely from execution in maxtext_tpu_pathways_unit_tests via job ignores list in .github/workflows/build_and_test_maxtext.yml.

Auto PR Updates Extractions

  • Captures and stores the generated HLO files globally behind runner artifacts securely (reference-hlo).
  • Pulls and unpacks the artifact changes directly onto the workspace tree, staging files matching the tests/utils/reference_hlo_*.txt glob.

FIXES: b/502981577

Tests

Manually ran the "Update HLO Reference" workflow on Github Actions to verify it generates a new reference HLO file and creates a new commit in the PR: http://screen/BwgqGAkWGpqqiE7

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Comment thread .github/workflows/update_reference_hlo.yml Fixed
@darisoy darisoy force-pushed the hlo-identical-tests branch from 800a846 to d68d824 Compare April 21, 2026 22:51
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@darisoy darisoy marked this pull request as draft April 21, 2026 23:14
@darisoy darisoy force-pushed the hlo-identical-tests branch from d68d824 to 472cb8d Compare April 21, 2026 23:16
Comment thread .github/workflows/update_reference_hlo.yml Fixed
@darisoy darisoy force-pushed the hlo-identical-tests branch from 472cb8d to d247e90 Compare April 21, 2026 23:51
Comment thread .github/workflows/update_reference_hlo.yml Fixed
@darisoy darisoy force-pushed the hlo-identical-tests branch from d247e90 to 4e16cdc Compare April 22, 2026 17:05
Comment thread tests/integration/hlo_diff_test.py Outdated
Comment thread tests/utils/reference_hlo_deepseek3.txt
Comment thread tests/utils/update_hlo_references.py
@darisoy darisoy force-pushed the hlo-identical-tests branch from 6fec229 to 99fa017 Compare April 23, 2026 16:06
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM! Moving forward, is there a way we can streamline the process for adding new test cases (e.g., for models other than DeepSeek, or different sharding configs)?

@darisoy darisoy force-pushed the hlo-identical-tests branch 6 times, most recently from 552a1d3 to 120edf7 Compare April 23, 2026 17:08
Comment thread .github/workflows/build_and_test_maxtext.yml Outdated
Comment thread .github/workflows/update_reference_hlo.yml Outdated
Comment thread tests/integration/hlo_diff_test.py
@darisoy darisoy force-pushed the hlo-identical-tests branch 11 times, most recently from 5777e7a to cd0562b Compare April 23, 2026 18:33
Comment thread tests/integration/hlo_diff_test.py Outdated
Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Can you also add some documentation in a .md file about this test (e.g. example commands for how to update the HLO). Some users are unfamiliar with what HLO even is, so a brief explanation of the point of this test (to protect performance regressions) as well

…tion updates pipelines integrations execution rules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants