-
Notifications
You must be signed in to change notification settings - Fork 752
[Cherry-Pick][CI] Sync dev optimizations to release/online/20260415(#7602) #7857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
freeliuzc
merged 1 commit into
PaddlePaddle:release/online/20260415
from
EmmonsCurse:ci_optimize_online_0415
May 19, 2026
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,30 +42,18 @@ classify_tests() { | |
| fi | ||
| fi | ||
|
|
||
| # Rule 5: high-risk OOM tests (treat as multi_gpu for sequential execution) | ||
| if [[ "$test_file" == "tests/entrypoints/cli/test_main.py" || | ||
| "$test_file" == "tests/entrypoints/cli/test_serve.py" || | ||
| "$test_file" == "tests/operators/test_group_swiglu_with_masked.py" || | ||
| "$test_file" == "tests/operators/test_hybrid_mtp_ngram.py" || | ||
| "$test_file" == "tests/operators/test_moe_top_k_select.py" || | ||
| "$test_file" == "tests/operators/test_noaux_tc.py" || | ||
| "$test_file" == "tests/output/test_get_save_output_v1.py" || | ||
| "$test_file" == "tests/output/test_process_batch_draft_tokens.py" || | ||
| "$test_file" == "tests/output/test_process_batch_output.py" ]]; then | ||
| echo "multi_gpu" | ||
| return | ||
| fi | ||
|
|
||
| # ========== Single-GPU tests (no port required, can run in parallel) ========== | ||
| echo "single_gpu" | ||
| } | ||
|
|
||
| # ============================================================ | ||
| # Run Test With Logging | ||
| # Run Test With Logging (with retry for OOM/Kill) | ||
| # ============================================================ | ||
| run_test_with_logging() { | ||
| local test_file=$1 | ||
| local log_prefix=$2 | ||
| local max_retries=3 # Max retries for OOM/Kill issues | ||
| local retry_count=0 | ||
| local status | ||
|
|
||
| echo "Running pytest file: $test_file" | ||
|
|
@@ -81,14 +69,37 @@ run_test_with_logging() { | |
| # Set FD_LOG_DIR to isolate logs for each test | ||
| export FD_LOG_DIR="$isolated_log_dir" | ||
|
|
||
| # Run test | ||
| timeout 600 python -m coverage run -m pytest -c ${PYTEST_INI} "$test_file" -vv -s | ||
| status=$? | ||
| # Retry loop for OOM/Kill issues (only handle "Killed" / SIGKILL) | ||
| while [ $retry_count -le $max_retries ]; do | ||
| if [ $retry_count -gt 0 ]; then | ||
| echo "" | ||
| echo "==================== Retrying (${retry_count}/${max_retries}) ====================" | ||
| echo "Previous attempt was Killed, retrying..." | ||
| # Clean up before retry | ||
| sleep 5 # Wait a bit to let resources be released | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 建议 OOM 重试间隔可能不足以释放 GPU 显存 OOM/SIGKILL 触发后,GPU 显存由 CUDA driver 异步清理,在 CI 机器多任务并发时 5 秒往往不够,可能导致下一次重试仍然触发 OOM。 建议将等待时间延长(如 30 秒),或在重试前检测 GPU 显存使用量: sleep 30 # 给 CUDA driver 更多时间释放显存 |
||
| fi | ||
|
|
||
| # Run test | ||
| timeout 600 python -m coverage run -m pytest -c ${PYTEST_INI} "$test_file" -vv -s | ||
| status=$? | ||
|
|
||
| # Exit code 137 = SIGKILL (Killed / OOM) | ||
| if [ "$status" -eq 137 ] && [ $retry_count -lt $max_retries ]; then | ||
| retry_count=$((retry_count + 1)) | ||
| continue | ||
| fi | ||
|
|
||
| # Break loop on success or non-Kill error or max retries reached | ||
| break | ||
| done | ||
|
|
||
| if [ "$status" -ne 0 ]; then | ||
| echo "$test_file" >> "$log_prefix" | ||
| echo "" | ||
| echo "==================== Test Failed: $test_file ====================" | ||
| if [ $retry_count -gt 0 ]; then | ||
| echo "Total attempts: $((retry_count + 1))" | ||
| fi | ||
|
|
||
| # Use isolated log directory for this test | ||
| if [ -d "$isolated_log_dir" ]; then | ||
|
|
@@ -108,7 +119,7 @@ run_test_with_logging() { | |
| fi | ||
|
|
||
| echo ">>> grep error in ${isolated_log_dir}" | ||
| grep -Rni --color=auto "error" "${isolated_log_dir}" || true | ||
| grep -Rni --color=auto "error" "${isolated_log_dir}" --exclude="pytest_*_error.log" || true | ||
| fi | ||
|
|
||
| # print all server logs | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ 疑问 cu129/cu130 nightly 索引中该日期的包是否存在?
所有 CUDA 版本(cu126/cu129/cu130)和 XPU 均锁定到同一版本
3.5.0.dev20260508,但 nightly 包并不保证每个 CUDA 变体都有相同日期的 build。若 cu129 或 cu130 索引中缺少该日期包,对应流水线将立即失败。建议在合入前验证: