@@ -9420,11 +9420,13 @@ dsv4-fp4-b300-vllm-agentic:
94209420 agentic-coding :
94219421 - duration : 1800
94229422 search-space :
9423- - { tp: 4, offloading: none, conc-list: [1, 4, 8, 16, 32] }
9424- - { tp: 8, offloading: none, conc-list: [1, 4, 8, 16, 32, 40, 48, 52, 64, 72] }
9425- - { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [8, 16, 32, 64, 128] }
9423+ # TEMPORARY: run only native CPU-offload scenarios while diagnosing
9424+ # asynchronous CUDA failures.
9425+ # - { tp: 4, offloading: none, conc-list: [1, 4, 8, 16, 32] }
9426+ # - { tp: 8, offloading: none, conc-list: [1, 4, 8, 16, 32, 40, 48, 52, 64, 72] }
9427+ # - { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [8, 16, 32, 64, 128] }
94269428 - { tp: 4, ep: 4, dp-attn: true, offloading: cpu, conc-list: [32, 48, 64, 96, 128, 192, 256] }
9427- - { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [52, 64, 72, 84, 100, 128, 196, 256, 512] }
9429+ # - { tp: 8, ep: 8, dp-attn: true, offloading: none, conc-list: [52, 64, 72, 84, 100, 128, 196, 256, 512] }
94289430
94299431gptoss-fp4-b200-vllm-agentic :
94309432 image : vllm/vllm-openai:v0.22.0
0 commit comments