[ci] add Qwen3.5 Dense/MoE models accuracy validation and benchmark tests for atom-plugined sglang#700
Open
wanzhenchn wants to merge 8 commits into
Open
[ci] add Qwen3.5 Dense/MoE models accuracy validation and benchmark tests for atom-plugined sglang#700wanzhenchn wants to merge 8 commits into
wanzhenchn wants to merge 8 commits into
Conversation
ea598c1 to
82c8443
Compare
9 tasks
24175af to
69f279f
Compare
ea782c5 to
cff05ee
Compare
Collaborator
|
Since we adding Qwen3.5-397B-A17B-FP8 TP4/TP8 for benchmark, to ensure the benchmark cases' acc, how about add these benchmark model configs in nightly check? We need to ensure a full case cover in nightly |
cff05ee to
248a91a
Compare
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands ATOM’s SGLang CI and nightly pipelines to cover additional Qwen3.5 Dense/MoE models, aligning launch/accuracy/benchmark settings and adding scheduled benchmark rotation to reduce nightly load.
Changes:
- Add Qwen3.5 model coverage to PR CI GSM8K smoke tests and nightly GSM8K accuracy validation.
- Add Qwen3.5-397B benchmark model configs and implement weekday-based benchmark group rotation (DeepSeek / Qwen / All).
- Add support for overriding default SGLang server args via
SGLANG_DEFAULT_SERVER_ARGSin the shared SGLang test script.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| recipes/atom_sglang/Qwen3_5.md | Updates example benchmark + GSM8K commands for Qwen3.5 workflows. |
| .github/workflows/atom-sglang-test.yaml | Adds Qwen3.5-35B-A3B-FP8 TP2 to PR-level SGLang GSM8K smoke CI. |
| .github/workflows/atom-sglang-benchmark.yaml | Adds Qwen3.5 benchmark toggles and rotates scheduled benchmark model groups by weekday. |
| .github/workflows/atom-sglang-accuracy-validation.yaml | Adds Qwen3.5 models to nightly accuracy matrix and to manual dispatch toggles. |
| .github/scripts/atom_sglang_test.sh | Introduces SGLANG_DEFAULT_SERVER_ARGS to control baseline server args per model family. |
| .github/benchmark/sglang_models_accuracy.json | Adds Qwen3.5 entries for dashboard accuracy thresholds/baselines. |
| .github/benchmark/sglang_benchmark_models.json | Adds Qwen3.5-397B benchmark models and nightly grouping metadata. |
Comments suppressed due to low confidence (1)
recipes/atom_sglang/Qwen3_5.md:92
- In the GSM8K
lm_evalexample,base_url=http://localhost:30000/...doesn’t match the server port shown earlier in this doc (--port 8000). To avoid confusion, the accuracy command should use the same port as the server launch command (or the server command should be updated to match).
lm_eval --model local-completions \
--model_args model=${model_path},base_url=http://localhost:30000/v1/completions,num_concurrent=65,max_retries=1,tokenized_requests=False,trust_remote_code=True \
--tasks gsm8k \
--num_fewshot 3 \
--trust_remote_code
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
55
to
+59
| python3 -m sglang.bench_serving --backend sglang-oai-chat \ | ||
| --model ${model_path} \ | ||
| --base-url=http://127.0.0.1:30000 \ | ||
| --max-concurrency 16 \ | ||
| --num-prompts "$(( CONC * 5 ))" \ | ||
| --max-concurrency 16 \ | ||
| --num-prompts "$(( CONC * 5 ))" \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
recipes/atom_sglang/Qwen3_5.mdwith server, benchmark, and GSM8K commands.ATOM SGLang CI / Nightly / Benchmark Scope
CI
.github/workflows/atom-sglang-test.yamlmain, non-draft, non-closeddeepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-40.91amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-40.91Qwen/Qwen3.5-35B-A3B-FP8linux-atom-mi35x-40.76Nightly Accuracy
.github/workflows/atom-sglang-accuracy-validation.yaml18:00 UTC/ Beijing02:00, or manual dispatchgsm8kresults.gsm8k["exact_match,flexible-extract"]3651v0.5.10deepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-40.91deepseek-ai/DeepSeek-R1-0528linux-atom-mi35x-80.93amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-40.91amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4linux-atom-mi35x-80.93Qwen/Qwen3.5-35B-A3B-FP8linux-atom-mi35x-40.76Qwen/Qwen3.5-35B-A3Blinux-atom-mi35x-40.83Qwen/Qwen3.5-397B-A17B-FP8linux-atom-mi35x-40.83Qwen/Qwen3.5-397B-A17B-FP8linux-atom-mi35x-80.83Server Args
--trust-remote-code --kv-cache-dtype fp8_e4m3 --mem-fraction-static 0.8 --page-size 1 --disable-radix-cache--tensor-parallel-size <tp>; EP case adds--expert-parallel-size 8AITER_QUICK_REDUCE_QUANTIZATION=INT4,SGLANG_AITER_FP8_PREFILL_ATTN=0,SGLANG_USE_AITER=1,ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1SGLANG_DEFAULT_SERVER_ARGS=--tensor-parallel-size <tp> --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cacheSGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models,ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=0Nightly Benchmark
.github/workflows/atom-sglang-benchmark.yaml15:00 UTC/ Beijing23:00, or manual dispatchparam_listspublish_to_dashboardA-DEEPSEEK5 × 10 = 50B-QWEN352 × 10 = 20C-ALL7 × 10 = 704, 8, 16, 32, 640.84, 8, 16, 32, 640.8deepseek-ai/DeepSeek-R1-0528atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8deepseek-ai/DeepSeek-R1-0528atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 4amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 4amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4atom-mi355-8gpu-aac-runner--trust-remote-code --tensor-parallel-size 8 --expert-parallel-size 8Qwen/Qwen3.5-397B-A17B-FP8atom-mi355-8gpu-aac-runner--tensor-parallel-size 4 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cacheQwen/Qwen3.5-397B-A17B-FP8atom-mi355-8gpu-aac-runner--tensor-parallel-size 8 --mem-fraction-static 0.9 --reasoning-parser qwen3 --disable-radix-cache