Sm90 mega moe on sgl dev#36
Conversation
|
@qiushixiaoyu Can we upstream this change to original DeepGemm? So that we can use it more conveniently in the future |
@Fridge003 |
|
When enable --run-low-latency-baseline, Will there be a performance degradation? |
067fc03 to
78772d1
Compare
78772d1 to
fce68b3
Compare
I don’t think so. This is only for comparing performance against the low-latency baseline. While testing, I found that the performance with small batch sizes is not very stable. I’m still investigating it. |
|
@qiushixiaoyu Can you share your sglang start cmd? I try use this PR, but sglang output is error。 SGLANG_DSV4_FP4_EXPERTS=0 \
sglang serve \
--trust-remote-code \
--model-path /data1/DeepSeek-V4-Flash-FP8 \
--tp 8 \
--moe-a2a-backend megamoe \
--tool-call-parser deepseekv4 \
--reasoning-parser deepseek-v4 \
--host 0.0.0.0 \
--port 8055 |
@yz-tang I still have an SGLang change PR that hasn’t been merged yet. |
MegaMoE SM90 Perf Summary
Flash vs normal baseline
Flash vs low-latency baseline
Pro vs normal baseline
Pro vs low-latency baseline
Benchmark DeepSeekV4Flash
OP Accuracy
correctness:28 scenarios PASS,max diff 0.0006
E2E Accuracy