Skip to content

Commit e08ba71

Browse files
cquil11claude
andcommitted
config(dsv4-fp4 agentic): run offloading=none with expanded concurrency sweeps
Comment out the cpu (and prior none) entries for dsv4-fp4-b200/b300-vllm-agentic and run offloading=none only, expanding the concurrency lists: dense low-to-mid ramp on plain-TP entries and a cliff-spanning high-end ramp on the DEP entries (B300 mirrors B200). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 97576fa commit e08ba71

1 file changed

Lines changed: 18 additions & 10 deletions

File tree

.github/configs/nvidia-master.yaml

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9270,11 +9270,14 @@ dsv4-fp4-b200-vllm-agentic:
92709270
agentic-coding:
92719271
- duration: 1800
92729272
search-space:
9273+
# TEMPORARILY COMMENTED OUT — running offloading=none only this iteration.
92739274
# cpu offload only this iteration — none entries already validated in
92749275
# earlier runs (B200 25332045030: TP=8 1..32 + DEP=8 16..128 all 100%).
92759276
# Re-add when investigating regressions in offload=none.
9276-
- { tp: 8, offloading: cpu, conc-list: [16, 32, 64] }
9277-
- { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
9277+
# - { tp: 8, offloading: cpu, conc-list: [16, 32, 64] }
9278+
# - { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
9279+
- { tp: 8, offloading: none, conc-list: [1, 4, 8, 16, 32, 40, 48, 52, 64, 72] }
9280+
- { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [52, 64, 72, 84, 100, 128, 196, 256, 512] }
92789281

92799282
qwen3.5-fp8-b200-sglang-agentic:
92809283
image: lmsysorg/sglang:nightly-dev-20260422-de962f32
@@ -9416,16 +9419,21 @@ dsv4-fp4-b300-vllm-agentic:
94169419
agentic-coding:
94179420
- duration: 1800
94189421
search-space:
9422+
# TEMPORARILY COMMENTED OUT — running offloading=none only this iteration.
94199423
# cpu offload only this iteration — none entries already validated in
94209424
# earlier runs. Re-add when investigating regressions in offload=none.
9421-
- { tp: 4, offloading: cpu, conc-list: [16, 32, 64] }
9422-
- { tp: 8, offloading: cpu, conc-list: [16, 32, 64] }
9423-
- { tp: 4, ep: 4, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
9424-
- { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [128, 256, 512] }
9425-
- { tp: 4, offloading: none, conc-list: [16, 32, 64] }
9426-
- { tp: 8, offloading: none, conc-list: [16, 32, 64] }
9427-
- { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [64, 128, 256] }
9428-
- { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [128, 256, 512] }
9425+
# - { tp: 4, offloading: cpu, conc-list: [16, 32, 64] }
9426+
# - { tp: 8, offloading: cpu, conc-list: [16, 32, 64] }
9427+
# - { tp: 4, ep: 4, dp-attn: true, offloading: cpu, conc-list: [64, 128, 256] }
9428+
# - { tp: 8, ep: 8, dp-attn: true, offloading: cpu, conc-list: [128, 256, 512] }
9429+
# - { tp: 4, offloading: none, conc-list: [16, 32, 64] }
9430+
# - { tp: 8, offloading: none, conc-list: [16, 32, 64] }
9431+
# - { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [64, 128, 256] }
9432+
# - { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [128, 256, 512] }
9433+
- { tp: 4, offloading: none, conc-list: [1, 4, 8, 16, 32, 40, 48, 52, 64, 72] }
9434+
- { tp: 8, offloading: none, conc-list: [1, 4, 8, 16, 32, 40, 48, 52, 64, 72] }
9435+
- { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [52, 64, 72, 84, 100, 128, 196, 256, 512] }
9436+
- { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [52, 64, 72, 84, 100, 128, 196, 256, 512] }
94299437

94309438
gptoss-fp4-b200-vllm-agentic:
94319439
image: vllm/vllm-openai:v0.22.0

0 commit comments

Comments
 (0)