Skip to content

ci: targeted PR tests, runtime summary, and config-consistency check#802

Merged
Xiaoming-AMD merged 6 commits into
mainfrom
cicd/ci-targeted-runtime-version
Jul 1, 2026
Merged

ci: targeted PR tests, runtime summary, and config-consistency check#802
Xiaoming-AMD merged 6 commits into
mainfrom
cicd/ci-targeted-runtime-version

Conversation

@WangLingxun

@WangLingxun WangLingxun commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Additive, fail-safe CI/CD improvements. Fail-safe is the only invariant: every
selection can over-select but never under-select, and push/release/dispatch
always run everything (main's coverage baseline is unaffected).

Cross-file config consistency check (tools/ci/check_version_consistency.py)

Runs in the lint job; fails on drift between duplicated values:

  • ci.yaml BASE_IMAGE vs the Dockerfile ARG BASE_IMAGE default;
  • primus.__version__ not valid PEP 440;
  • a pyproject runtime dep missing from / mismatched against requirements.txt
    (requirements may add dev/CI-only extras);
  • any workflow uses: not pinned to a 40-hex commit SHA (floating-tag guard);
  • workflow python-versions disagreeing or below requires-python.

Per-job CI runtime summary (tools/ci/runtime_summary.py)

Build/install timers (aiter, primus-turbo) append stage<TAB>seconds to
$RUNNER_TEMP/runtime.tsv, rendered as a Markdown table next to the
test/coverage summaries in the torch and jax jobs (complements junit, which
only reports test time).

Change-aware test selection (tools/ci/select_tests.py)

On pull_request, a single classify(path) decides each changed file's blast
radius; both unit and E2E selection build on it.

  • Unit: unit dirs mirror the source tree (primus/<x> ->
    tests/unit_tests/<x>, resolved to the nearest existing dir); a backend maps
    to tests/unit_tests/backends/<X>/. Anything unlocatable (non-.py under
    primus/, a backend with no unit dir, etc.) falls back to the full suite.
  • E2E: suites are auto-discovered from tests/trainer/test_<name>_trainer.py
    (adding a trainer wires it in automatically). A backend change runs that
    backend's suite; a backend without a trainer, a non-backend source change,
    runner/, or anything global runs all E2E. The torch/jax jobs gate each
    model-test step on it.
  • The only hard-coded list is GLOBAL_TRIGGERS (whole-repo blast radius);
    everything else is convention + auto-discovery, and being unlisted only ever
    falls back to "run more".

Unit-tested in tests/unit_tests/ci/test_select_tests.py.

@WangLingxun WangLingxun marked this pull request as draft June 26, 2026 09:22
@WangLingxun WangLingxun changed the title ci: targeted PR tests, runtime summary, version-pin check (P0 #2/#5/#8) ci: targeted PR tests, runtime summary, and version-pin consistency check Jun 26, 2026
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch 2 times, most recently from 214e460 to 7aee925 Compare June 26, 2026 09:53
@WangLingxun WangLingxun changed the title ci: targeted PR tests, runtime summary, and version-pin consistency check ci: targeted PR tests, runtime summary, config-consistency check, and CodeQL Jun 26, 2026
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch 2 times, most recently from cb8589e to 189e99f Compare June 26, 2026 10:52
@WangLingxun WangLingxun changed the title ci: targeted PR tests, runtime summary, config-consistency check, and CodeQL ci: targeted PR tests, runtime summary, and config-consistency check Jun 26, 2026
Fail CI when ci.yaml's BASE_IMAGE and the Dockerfile's ARG default drift,
or when primus.__version__ is not a valid PEP 440 version.
Render per-job stage wall-clock (aiter, primus-turbo) as a Markdown table
next to the test/coverage summaries, complementing junit test time.
Map a PR's changed files to the minimal tests/unit_tests paths in the torch
job; fail-safe to the full suite on shared/unknown/CI changes. Other events
and the CLI/E2E/model tests are unchanged. Unit-tested.
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch from 189e99f to 1992775 Compare June 30, 2026 02:38
tests/unit_tests/tools/test_utils.py and
tests/unit_tests/core/patches/test_utils.py share a basename. With no
__init__.py in the tests tree, pytest maps both to module `test_utils` and
aborts collection ("import file mismatch"). Rename to test_tools_utils.py so
component-scoped PR test selection can run the tools tests cleanly.
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch from 1992775 to 1b90be8 Compare June 30, 2026 02:56
Beyond BASE_IMAGE/__version__, also fail on: pyproject runtime deps not
matching requirements.txt (requirements may add dev/CI-only extras), any
workflow action not pinned to a 40-hex SHA, and workflow python-version
disagreeing or below requires-python.
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch from fed6b15 to 6feb249 Compare June 30, 2026 06:56
Unify unit + E2E selection behind one classify(): GLOBAL_TRIGGERS is the only
hard-coded list (whole-repo blast radius incl. runner/); unit dirs are resolved
by source-tree convention (primus/<x> -> tests/unit_tests/<x>, nearest existing
dir); E2E suites are auto-discovered from tests/trainer/test_<name>_trainer.py;
a backend is named by its dir (primus/backends/<X> or examples/<X>).

Fail-safe is the only invariant -- anything global, unlocatable, or a backend
without a trainer expands to everything, so it over-selects, never under-selects.
The torch/jax jobs gate each model-test step on --e2e; push/release/dispatch are
unaffected. Unit-tested.
@WangLingxun WangLingxun force-pushed the cicd/ci-targeted-runtime-version branch from 6feb249 to 5002267 Compare June 30, 2026 07:46
@WangLingxun WangLingxun marked this pull request as ready for review June 30, 2026 08:30
@Xiaoming-AMD Xiaoming-AMD merged commit f2c79dd into main Jul 1, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants