[XPU] Expand Stage B with proven AMD/CUDA tests#25543
Conversation
Add register_xpu_ci() to the CI registry system and migrate existing XPU tests from test/srt/xpu/ to test/registered/xpu/, aligning XPU CI architecture with AMD and Nvidia. Changes: - python/sglang/test/ci/ci_register.py: add HWBackend.XPU, register_xpu_ci() function, and REGISTER_MAPPING entry - test/run_suite.py: add "xpu" to HW_MAPPING, XPU suites to PER_COMMIT_SUITES (stage-a-test-1-gpu-xpu, stage-b-test-1-gpu-xpu), XPU to _SUITE_CHECKED_BACKENDS - test/srt/run_suite.py: clear legacy suite_xpu dict (tests now registered via register_xpu_ci) - test/registered/xpu/: move tests from test/srt/xpu/ and add register_xpu_ci() decorators with est_time and suite assignments - .github/workflows/pr-test-xpu.yml: replace single flat job with 3-stage pipeline (stage-a → wait → stage-b) matching AMD/Nvidia structure; run_suite.py --hw xpu replaces hardcoded suite invocation Adding a new XPU test now only requires creating a file in test/registered/xpu/ with register_xpu_ci(est_time=N, suite=...) -- no workflow changes needed.
Expand XPU Stage B test coverage using tests proven passing in WW21 HTML report from AMD/CUDA Stage B suites. Hardware: 2x BMG (Battlemage) GPUs. **Test Selection Methodology:** All Stage B tests are selected from AMD+CUDA Stage B tests that are PASSING in HTML report, ensuring proven reliability. **Stage A (Smoke - 1 GPU, ~120s)** - test_xpu_basic.py (120s) - Quick validation gate **Stage B (Functional - 1 GPU, ~785s)** - test_intel_xpu_backend.py (600s) - ✅ PROVEN PASSING (3 tests in HTML) * test_latency_qwen_model * test_attention_backend * test_mla_decode_attention_backend - test_torch_native_attention_xpu.py (140s) - ✅ Passing in HTML * MMLU benchmark with torch native attention * Adapted from test/registered/attention/test_torch_native_attention_backend.py * AMD Stage B (150s), CUDA Stage B (140s) - test_hidden_states_xpu.py (45s) - ✅ Passing in HTML * Hidden states extraction API * Adapted from test/registered/core/test_hidden_states.py * AMD Stage B (55s), CUDA Stage B (45s) **Stage C**: Skipped for this PR. No suitable AMD/CUDA Stage C tests exist that: - Run on 2 GPUs (all require 4+ GPUs) - Are model inference tests (not unit tests) - Are passing in HTML report Will add Stage C in future PR after infrastructure validation. **Workflow changes (.github/workflows/pr-test-xpu.yml)**: - Stage A and B jobs remain unchanged - No Stage C jobs added **Test infrastructure (test/run_suite.py)**: - XPU suites: stage-a-test-1-gpu-xpu, stage-b-test-1-gpu-xpu **Models used (verified passing)**: - Llama-3.1-8B-Instruct (DEFAULT_MODEL_NAME_FOR_TEST) - Llama-3.2-1B-Instruct (DEFAULT_SMALL_MODEL_NAME_FOR_TEST) **Total CI time: ~55 minutes** - Stage A: ~22 min (20 min build + 2 min test) - Stage B: ~33 min (20 min build + 13 min test) Depends on PR sgl-project#25405 (XPU registry system).
There was a problem hiding this comment.
Code Review
This pull request migrates Intel XPU tests to a registry-based CI system by introducing the register_xpu_ci function and updating the HWBackend enumeration. New test cases for hidden states, torch native attention, and basic generation on XPU were added, while existing tests were updated to use the new registration mechanism. Feedback was provided to use idiomatic unittest assertions instead of raw assert statements to improve error reporting in CI.
| )() | ||
|
|
||
| metrics = run_eval(args) | ||
| assert metrics["accuracy"] >= 0.5 |
There was a problem hiding this comment.
Use self.assertGreaterEqual instead of a raw assert statement. This is more idiomatic when using unittest.TestCase and provides a more descriptive error message if the assertion fails, which is helpful for debugging CI failures.
| assert metrics["accuracy"] >= 0.5 | |
| self.assertGreaterEqual(metrics["accuracy"], 0.5) |
Summary
Expands XPU CI Stage B with additional proven tests from AMD/CUDA CI suites:
Both tests are:
Also removes Stage C infrastructure (no suitable 2-GPU tests found in AMD/CUDA that are passing).
Test Plan
Total CI time: ~55 minutes (Stage A: 22 min, Stage B: 33 min)
Dependencies
Depends on #25405 (XPU basic CI infrastructure)
🤖 Generated with Claude Code
CI States
Latest PR Test (Base): ❌ Missing
run-cilabel — add it to run CI tests.Latest PR Test (Extra): ❌ Blocked —
run-ciis required first.