Skip to content

Commit 93e1b3c

Browse files
feat: multinode first class integration (#251)
* adding initial changes to master configs; adding initial updates to validation logic and config parser * adding new gb200 script * adding integration to gb200 runner script and workflow files * revert and correct name of 1k1k scheduler workflow * adding runners.yaml to workflow invocation * toJson on conc since it is now a list * correctly sending conc list to multnode * hotfix * correct env var to MAX batch size * set -x * debugging with dynmao fork * debugging with dynmao fork pt 2 * experiment * adding separate script for launching * changing filenames * ntasks per node * making the spec-decoding output required * updating ntasks per node gp * test * test * conc list quoted * get rid of debug code * testing support for dsr1 * testing support for dsr1 test * testing support for dsr1 test * testing support for dsr1 test * testing support for dsr1 test * testing * some changes to generate sweeps * testing and debugging * adding new file code for sglang * adding new file code for sglang * changing file path * updating multinode fn hash * updating multinode fn hash * dynamo trtllm to dynamo trt * changing process result * add is multinode * bug fix * bug fix * bug fix * bug fix * polishing * polishing pt 2 * polishing pt 3 * polishing pt 4 * fixing summarize.py * polishing * testing * testing * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding testing workflows * adding tests * adding tests * adding tests * adding tests * adding tests * adding tests * adding tests * add updates for newest gb200 merge * add updates for newest gb200 merge pt 2 * move ntasks per node to framework level instead of runner level * nexp hard coded to 1: * add AMD configs to full sweep * shut the line counter workflow up haha * shut the line counter workflow up haha * shut the line counter workflow up haha pt 2 * updating testing logic * add model prefix to label validator * add more descriptive name to tests * update test for process results * add script mode * fix bug * sglang: add fp8 8k1k and fp4 1k1k (#274) * go * typo * typo... * more * Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)" (#283) This reverts commit efcb4e4. * get rid of ntasks per node required env var for sglang * bug fix * bug fix missing amd * bug fix missing amd pt 2 * add served model name to summary * add served model name to summary pt 2 * add served model name to summary pt 3 * fix max model len bug * add readme * add image to json result --------- Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
1 parent 598eb8b commit 93e1b3c

31 files changed

Lines changed: 4724 additions & 4013 deletions

.github/configs/amd-master.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ dsr1-fp4-mi355x-sglang:
55
runner: mi355x
66
precision: fp4
77
framework: sglang
8+
multinode: false
89
seq-len-configs:
910
- isl: 1024
1011
osl: 1024
@@ -27,6 +28,7 @@ dsr1-fp8-mi300x-sglang:
2728
runner: mi300x
2829
precision: fp8
2930
framework: sglang
31+
multinode: false
3032
seq-len-configs:
3133
- isl: 1024
3234
osl: 1024
@@ -48,6 +50,7 @@ dsr1-fp8-mi325x-sglang:
4850
runner: mi325x
4951
precision: fp8
5052
framework: sglang
53+
multinode: false
5154
seq-len-configs:
5255
- isl: 1024
5356
osl: 1024
@@ -69,6 +72,7 @@ dsr1-fp8-mi355x-sglang:
6972
runner: mi355x
7073
precision: fp8
7174
framework: sglang
75+
multinode: false
7276
seq-len-configs:
7377
- isl: 1024
7478
osl: 1024
@@ -90,6 +94,7 @@ gptoss-fp4-mi300x-vllm:
9094
runner: mi300x
9195
precision: fp4
9296
framework: vllm
97+
multinode: false
9398
seq-len-configs:
9499
- isl: 1024
95100
osl: 1024
@@ -120,6 +125,7 @@ gptoss-fp4-mi325x-vllm:
120125
runner: mi325x
121126
precision: fp4
122127
framework: vllm
128+
multinode: false
123129
seq-len-configs:
124130
- isl: 1024
125131
osl: 1024
@@ -150,6 +156,7 @@ gptoss-fp4-mi355x-vllm:
150156
runner: mi355x
151157
precision: fp4
152158
framework: vllm
159+
multinode: false
153160
seq-len-configs:
154161
- isl: 1024
155162
osl: 1024

0 commit comments

Comments
 (0)