Skip to content

Add test filter standardisation for hip-tests#5524

Draft
dileepr1 wants to merge 3 commits into
mainfrom
users/dravindr/tr_hiptests
Draft

Add test filter standardisation for hip-tests#5524
dileepr1 wants to merge 3 commits into
mainfrom
users/dravindr/tr_hiptests

Conversation

@dileepr1
Copy link
Copy Markdown
Contributor

@dileepr1 dileepr1 commented May 29, 2026

Summary

Add test-filter standardisation for hip-tests, mirroring the pattern
established by TheRock#4694 (rccl).
Paired with the rocm-systems branch
users/dravindr/tf_hiptests (rocm-systems#6315),
which wires tier (quick / standard / comprehensive / full) and
arch-exclude tags into hip_test_config.hh at build time and exposes
them as CTest labels via
catch_discover_tests(ADD_TAGS_AS_LABELS ... DISCOVERY_MODE PRE_TEST).

Three commits:

  1. Add test filter standardisation for hip-tests

    • fetch_test_configurations.py: route hip-tests test_script to the
      generic test_runner.py, which now drives ctest -L <tier> from
      the installed Catch2 directory using the labels baked into the
      binaries.
    • test_runner.py: add hip-tests COMPONENT_OVERRIDES:
      • test_dir = share/hip/catch_tests (not bin/<component>/).
      • LD_LIBRARY_PATH = ROCM_PATH/lib so the Catch2 binaries can
        dlopen HIP.
    • artifact-core-hiptests.toml already includes share/hip/**, so
      the install-tree CTestTestfile.cmake and Catch2 binaries are
      already bundled — no toml change needed.
  2. `test_runner`: honour `<gfx_pattern>[_]_exclude` labels

    • Generalise exclude-label handling so per-arch / per-OS
      known-broken tests can be expressed purely in
      test_categories.yaml without baking the convention into
      individual component runners.
    • Recognised label shapes:
      • `gfx950_exclude` — OS-agnostic, exact arch
      • `gfx94X_linux_exclude` — Linux-only, wildcard arch
      • `gfx110X_windows_exclude` — Windows-only, wildcard arch
    • New helper `find_matching_arch_exclude_labels()` parses each
      `*_exclude` label with a strict regex anchored to `gfx<...>`,
      skips labels whose OS suffix does not match the current host, and
      uses the existing `find_matching_gpu_arch()` wildcard semantics.
    • `build_ctest_command()` OR's matching labels into the same `-LE`
      regex used for `{category}_exclude`, preserving
      "any-match excludes" semantics.
    • Plain `_exclude` labels and other components' existing
      `*_exclude` labels are unaffected.
  3. `test_runner`: strip `(variant)` suffix from `TEST_COMPONENT`

    • The hip-tests Windows CI job sets `TEST_COMPONENT="hip-tests (PAL)"`
      (and optionally `"hip-tests (ROCR)"`) from
      `fetch_test_configurations.py`, where `(PAL)` / `(ROCR)` are
      matrix-variant tags rather than distinct test components. The
      runner passed those literal strings into both
      `COMPONENT_DIR_MAPPING` and `COMPONENT_OVERRIDES` lookups,
      neither of which keys on the variant suffix, so the override
      silently did not apply and `TEST_DIR` fell back to the
      non-existent `build\bin\hip-tests (PAL)`.
    • Normalise the lookup by stripping a trailing parenthesised suffix
      (`\s*\([^)]\)\s$`) before the mapping / overrides lookups.

Companion PR

Test plan

  • Linux CI green on `hip-tests` job (gfx94X / gfx950 paths exercise
    the new `__exclude` label handling).
  • Windows CI green on `hip-tests (PAL)` and `hip-tests (ROCR)`
    jobs (exercises the variant-suffix strip).
  • Sanity-check that other components' existing `*_exclude` labels
    still parse the same way.

Made with Cursor

dileepr1 and others added 3 commits May 26, 2026 15:32
Mirrors the test-filter standardisation pattern from TheRock#4694 (rccl)
for hip-tests, paired with the rocm-systems branch users/dravindr/tf_hiptests
(ROCm/rocm-systems#6315), which wires tier (quick/standard/comprehensive/full)
and arch-exclude tags into hip_test_config.hh at build time and exposes
them as CTest labels via catch_discover_tests(ADD_TAGS_AS_LABELS ...
DISCOVERY_MODE PRE_TEST).

- fetch_test_configurations.py: hip-tests test_script ->
  test_runner.py so the generic runner drives `ctest -L <tier>`
  from the installed Catch2 directory using the labels baked into
  the binaries.
- test_runner.py: add hip-tests COMPONENT_OVERRIDES with
    test_dir = share/hip/catch_tests (not bin/<component>/)
    LD_LIBRARY_PATH = ROCM_PATH/lib (so the Catch2 binaries dlopen HIP).

artifact-core-hiptests.toml already includes `share/hip/**`, so the
install-tree `CTestTestfile.cmake` and Catch2 binaries are already
bundled - no toml change needed.

Follow-ups (not blocking):
- The legacy `test_hiptests.py` is kept as a fallback per the
  test-filter standardisation skill; remove it once the new runner
  has soaked.
- The rocm-systems YAML uses `<arch>_<os>_exclude` labels
  (e.g. `gfx950_linux_exclude`) instead of the runner's `ex_gpu_<arch>`
  convention, so on gfx94X / gfx950 Linux those known-broken tests
  will not yet be auto-excluded. Either rename the YAML labels to
  `ex_gpu_<arch>` (with the usual inclusive semantics) or extend
  `test_runner.py` to recognise the new convention - tracked separately.

Co-authored-by: Cursor <cursoragent@cursor.com>
Generalise the exclude-label handling so per-arch / per-OS known-broken
tests can be expressed purely in test_categories.yaml without baking the
convention into individual component runners.

Recognised label shapes:
  gfx950_exclude            -> OS-agnostic, exact arch
  gfx94X_linux_exclude      -> Linux-only,  wildcard arch
  gfx110X_windows_exclude   -> Windows-only, wildcard arch

A new helper find_matching_arch_exclude_labels() parses each discovered
*_exclude label with a strict regex anchored to `gfx<...>`, skips labels
whose OS suffix does not match the current host, and uses the existing
find_matching_gpu_arch() wildcard semantics so `gfx94X` correctly
matches gfx940/941/942 while `gfx950` does not.

build_ctest_command() OR's matching labels into the same -LE regex used
for {category}_exclude, preserving "any-match excludes" semantics.

Motivation: hip-tests (rocm-systems#6315) injects tier and arch-exclude
tags into hip_test_config.hh at compile time via
catch_discover_tests(ADD_TAGS_AS_LABELS ... PRE_TEST). The tier labels
worked out of the box, but `gfx950_linux_exclude` /
`gfx110X_windows_exclude` were collected then dropped on the floor,
which would have regressed gfx950 / gfx94X Linux runs vs the legacy
test_hiptests.py TEST_TO_IGNORE table.

The regex is strictly anchored to a `gfx` prefix and a known OS suffix,
so plain `<tier>_exclude` labels and any other component's existing
`*_exclude` labels are unaffected. Verified against eight scenarios
covering exact-arch, wildcard-arch, OS-mismatched, plain tier-exclude
and empty-gpu-arch shapes.

Co-authored-by: Cursor <cursoragent@cursor.com>
The hip-tests Windows CI job sets TEST_COMPONENT="hip-tests (PAL)" (and
optionally "hip-tests (ROCR)") from fetch_test_configurations.py, where
"(PAL)" / "(ROCR)" are matrix-variant tags rather than distinct test
components. The runner was passing those literal strings into both
COMPONENT_DIR_MAPPING and COMPONENT_OVERRIDES lookups, neither of which
keys on the variant suffix, so the "hip-tests" override silently did
not apply and TEST_DIR fell back to the non-existent
`build/bin/hip-tests (PAL)`:

    Error: Test directory does not exist: build\bin\hip-tests (PAL)

    https://github.com/ROCm/rocm-systems/actions/runs/26480955488/job/78042511831?pr=6315

Normalise the lookup by stripping a trailing parenthesised suffix
(`\s*\([^)]*\)\s*$`) before COMPONENT_DIR_MAPPING / COMPONENT_OVERRIDES
lookups. Both `hip-tests (PAL)` and `hip-tests (ROCR)` now resolve via
the single `hip-tests` entry to `share/hip/catch_tests`; component
names without a parenthesised suffix (miopen, rocprofiler-compute,
rocprofiler-systems, ...) keep their existing behaviour.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

1 participant