[None][fix] Add AutoDeploy post-merge stages#15418
Conversation
37f9c2a to
df21810
Compare
|
/bot run --post-merge |
📝 WalkthroughWalkthroughThree new post-merge AutoDeploy stage mappings are added to ChangesAutoDeploy Post-Merge CI Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #54654 [ run ] triggered by Bot. Commit: |
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
#15418 (comment) would launch the entire post-merge pipeline, but we only need to run these stages. |
|
PR_Github #54660 [ run ] triggered by Bot. Commit: |
I wanted to make sure the normal post-merge pipeline picked the stages up. While it should be enough the stages exist, it seemed possible something else was mis-wired so that they could be directly invoked but would not auto-run. but if you think it looks ok and we just need to check that the tests are still green (they were off for a week or two) then that's find with me, we can check manually next week or whatever that they ran. |
|
PR_Github #54654 [ run ] completed with state |
|
PR_Github #54660 [ run ] completed with state
|
df21810 to
0b1f948
Compare
|
/bot run --disable-fail-fast |
👎 Promotion blocked, new vulnerability foundVulnerability report
|
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #55073 [ run ] triggered by Bot. Commit: |
|
PR_Github #55073 [ run ] completed with state
|
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
|
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #55152 [ run ] triggered by Bot. Commit: |
|
Let's try this again! |
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #56127 [ run ] triggered by Bot. Commit: |
|
PR_Github #56127 [ run ] completed with state |
|
CI passed but only "partly tested". I don't know what that means but github is taking it as not a pass? I'll try rerunning w/out post-merge stages and see what happens I guess. |
|
/bot run |
|
/bot run |
|
PR_Github #56281 [ run ] triggered by Bot. Commit: |
|
PR_Github #56282 [ run ] triggered by Bot. Commit: |
|
PR_Github #56281 [ run ] completed with state |
|
PR_Github #56282 [ run ] completed with state
|
|
/bot run |
|
PR_Github #56303 [ run ] triggered by Bot. Commit: |
|
PR_Github #56303 [ run ] completed with state
|
|
/bot run |
|
PR_Github #56436 [ run ] triggered by Bot. Commit: |
|
/bot kill |
124f6e8 to
cd4fe54
Compare
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
cd4fe54 to
0644801
Compare
|
PR_Github #56440 [ kill ] triggered by Bot. Commit: |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
PR_Github #56436 [ run ] completed with state |
|
PR_Github #56440 [ kill ] completed with state |
|
/bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #56443 [ run ] triggered by Bot. Commit: |
|
PR_Github #56443 [ run ] completed with state
|
Summary by CodeRabbit
Description
Add missing AutoDeploy post-merge Jenkins stages for test-db blocks that already declare
stage: post_mergeandbackend: autodeploy:H100_PCIe-AutoDeploy-Post-Merge-1forl0_h100DGX_B200-AutoDeploy-Post-Merge-1forl0_b200DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1forl0_dgx_b200These mirror the existing pre-merge AutoDeploy stage configurations for the same YAML files and GPU counts. Without these stages, the corresponding post-merge AutoDeploy test-db blocks are present but not selected by Jenkins unless the same tests also appear in another reachable block.
Also remove one duplicate 4-GPU B200 AutoDeploy perf sanity entry from the post-merge block. That test remains in the pre-merge AutoDeploy block, which is also run during post-merge, so keeping it in both places would make it run twice after this PR makes the post-merge block reachable.
After enabling the previously unreachable B200 AutoDeploy post-merge blocks, PR CI exposed existing Nano NVFP4 failures:
accuracy/test_llm_api_autodeploy.py::TestNemotronNanoV3::test_accuracy[nvfp4-1-attn_dp_off-trtllm]: GSM8K accuracy below threshold.accuracy/test_llm_api_autodeploy.py::TestNemotronNanoV3::test_accuracy[nvfp4-4-attn_dp_off-trtllm]: timeout / hang during executor shutdown.This PR temporarily waives those exact tests so the stage wiring fix can land while the accuracy/hang regressions are investigated separately. The comparable pre-move single-GPU test previously ran as
test_accuracy[nvfp4-1-trtllm]; the newer pytest ids come from the later attention-DP parameter split.Test Coverage
jenkins/L0_Test.groovyandtests/integration/test_lists/test-db/*.ymlfor AutoDeploy post-merge blocks.l0_h100.yml, 1 GPU: 0 overlapping tests.l0_b200.yml, 1 GPU: 0 overlapping tests.l0_dgx_b200.yml, 4 GPUs: 0 overlapping tests after removing the duplicate perf sanity entry.tests/integration/test_lists/test-db/l0_dgx_b200.ymlwith PyYAML after editing../scripts/check_test_list.py --check-duplicate-waives./bot run --stage-list "H100_PCIe-AutoDeploy-Post-Merge-1,DGX_B200-AutoDeploy-Post-Merge-1,DGX_B200-4_GPUs-AutoDeploy-Post-Merge-1"No runtime CI was run locally; this is a Jenkins stage mapping, test-list cleanup, and temporary waive change.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.