Skip to content

Commit 41812a1

Browse files
zhangyue207UsamaRana3444zkjh
authored
ci: CI online test (#596)
* ci: add ci_test GitHub workflows * fix: avoid cross-platform CUDA probing in tests * ci: target master and setup Python in CI * ci: use python module pip for CI dependency * ci: update CI submodule for failure logs * ci: update CI submodule for ci_ref scheduler * ci: update CI submodule for source-mounted scheduler * ci: update CI submodule for Unit generator fix * ci: update CI submodule for tag model configs * ci: update CI submodule for Unit failure logs * ci: pass explicit devices for XPU unit jobs * ci: standardize CI config extension to yml * ci: update CI submodule for concise job names * ci: update CI submodule for skipped job names * Remove obsolete CI and lint config files * ci: add manual platform dispatch * ci: remove smoke and performance pipeline jobs * Update Moore CI deployment fixes * ci: rerun PR checks * ci: default PR tests to nvidia * ci: rerun nvidia check * ci: update nvidia unit workflow * ci: run PR checks on active platforms * ci: register iluvatar platform * ci: trigger checks on ci online branch * ci: enable ascend online runner * ci: rerun with metax scheduler fix * ci: rerun metax after cancel * ci: skip ascend image rebuild * ci: rerun ascend with encoded args * ci: rerun ascend after runner cleanup * ci: rerun iluvatar with timeout guard * ci: cancel stale online runs * ci: cap metax unit runtime * ci: match ascend runner label * ci: avoid queued platforms blocking ascend * ci: rerun ascend with runner proxy * ci: rerun ascend after python compatibility fix * ci: rerun ascend with scheduler image * ci: rerun ascend locally * ci: run metax quick operator subset * ci: install ascend build dependencies * ci: rerun iluvatar after scheduler fix * ci: rerun metax quick subset * ci: rerun with safe matrix output * ci: rerun after matrix output fix * ci: rerun after matrix output fix * ci: rerun iluvatar after report fix * ci: rerun ascend accepting docker 137 * ci: limit metax online smoke cases * ci: rerun metax after busy gpu filter * ci: rerun full ci online * ci: address pr feedback * ci: use prebuilt ascend test image * test: generate fallback randint data on cpu * test: format gemm skip reason as markdown * ci: build ascend test image from dockerfile * ci: update ci tooling submodule * ci: opt ascend into buildkit * ci: keep default repo branch on master * ci: run ascend tests on free npu * ci: let ascend pick an available npu * ci: update dynamic ascend allocation tooling * ci: update ascend npu allocation parser * ci: update ascend logical device mapping * ci: use nvidia base compatible with runner * ci: align nvidia test command with master * ci: run nvidia tests on compatible base image * ci: address review comments * ci: update moore resource locking * ci: update scheduler stale lock cleanup * ci: update nvidia gpu allocation * ci: add v2 shadow workflow * ci: handle unavailable v2 shadow agents * ci: add v2 agent installer * ci: match v2 runner labels * ci: enforce v2 shadow checks * ci: update v2 runner user agent * ci: default v2 shadow to active platforms * ci: limit v2 agent queue wait to ten minutes * ci: use self-healing v2 agent workflow * ci: use transient state dir fallback * ci: use platform lock probe workflow * ci: use checkout-free self-hosted workflow * ci: use nested junit result detection * ci: use per-job checked-out agent * ci: use metax resource allocation fix * test: keep tests aligned with master * ci: update iluvatar ci tooling * ci: use early-exit v2 queue watchdog * ci: handle queued runners and update platform sets * ci: add iluvatar runner filesystem repair script * ci: enable iluvatar in legacy workflow * ci: skip host gpu probing for iluvatar * ci: pin iluvatar local runner support * ci: pin shadow workflow ci ref * Fix Iluvatar CI container setup * Include Iluvatar CI build backend dependency * ci: remove local iluvatar repair script * ci: preflight runner availability before jobs * ci: pin runner preflight token fix * ci: pin best-effort runner preflight --------- Co-authored-by: zhangyue207 <zhangyue207@users.noreply.github.com> Co-authored-by: Vincent777 <140055255+Vincent777@users.noreply.github.com> Co-authored-by: zkjh <zkjh@localhost.localdomain>
1 parent fc5aecb commit 41812a1

27 files changed

Lines changed: 132 additions & 5474 deletions

.ci

Submodule .ci added at c6bf369

.ci/README.md

Lines changed: 0 additions & 388 deletions
This file was deleted.

0 commit comments

Comments
 (0)