Commit 93e1b3c
feat: multinode first class integration (#251)
* adding initial changes to master configs; adding initial updates to validation logic and config parser
* adding new gb200 script
* adding integration to gb200 runner script and workflow files
* revert and correct name of 1k1k scheduler workflow
* adding runners.yaml to workflow invocation
* toJson on conc since it is now a list
* correctly sending conc list to multnode
* hotfix
* correct env var to MAX batch size
* set -x
* debugging with dynmao fork
* debugging with dynmao fork pt 2
* experiment
* adding separate script for launching
* changing filenames
* ntasks per node
* making the spec-decoding output required
* updating ntasks per node
gp
* test
* test
* conc list quoted
* get rid of debug code
* testing support for dsr1
* testing support for dsr1 test
* testing support for dsr1 test
* testing support for dsr1 test
* testing support for dsr1 test
* testing
* some changes to generate sweeps
* testing and debugging
* adding new file code for sglang
* adding new file code for sglang
* changing file path
* updating multinode fn hash
* updating multinode fn hash
* dynamo trtllm to dynamo trt
* changing process result
* add is multinode
* bug fix
* bug fix
* bug fix
* bug fix
* polishing
* polishing pt 2
* polishing pt 3
* polishing pt 4
* fixing summarize.py
* polishing
* testing
* testing
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding testing workflows
* adding tests
* adding tests
* adding tests
* adding tests
* adding tests
* adding tests
* adding tests
* add updates for newest gb200 merge
* add updates for newest gb200 merge pt 2
* move ntasks per node to framework level instead of runner level
* nexp hard coded to 1:
* add AMD configs to full sweep
* shut the line counter workflow up haha
* shut the line counter workflow up haha
* shut the line counter workflow up haha pt 2
* updating testing logic
* add model prefix to label validator
* add more descriptive name to tests
* update test for process results
* add script mode
* fix bug
* sglang: add fp8 8k1k and fp4 1k1k (#274)
* go
* typo
* typo...
* more
* Revert "sglang: add fp8 8k1k and fp4 1k1k (#274)" (#283)
This reverts commit efcb4e4.
* get rid of ntasks per node required env var for sglang
* bug fix
* bug fix missing amd
* bug fix missing amd pt 2
* add served model name to summary
* add served model name to summary pt 2
* add served model name to summary pt 3
* fix max model len bug
* add readme
* add image to json result
---------
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>1 parent 598eb8b commit 93e1b3c
31 files changed
Lines changed: 4724 additions & 4013 deletions
File tree
- .github
- configs
- workflows
- benchmarks
- runners
- utils
- matrix-logic
- matrix_logic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| 53 | + | |
51 | 54 | | |
52 | 55 | | |
53 | 56 | | |
| |||
69 | 72 | | |
70 | 73 | | |
71 | 74 | | |
| 75 | + | |
72 | 76 | | |
73 | 77 | | |
74 | 78 | | |
| |||
90 | 94 | | |
91 | 95 | | |
92 | 96 | | |
| 97 | + | |
93 | 98 | | |
94 | 99 | | |
95 | 100 | | |
| |||
120 | 125 | | |
121 | 126 | | |
122 | 127 | | |
| 128 | + | |
123 | 129 | | |
124 | 130 | | |
125 | 131 | | |
| |||
150 | 156 | | |
151 | 157 | | |
152 | 158 | | |
| 159 | + | |
153 | 160 | | |
154 | 161 | | |
155 | 162 | | |
| |||
0 commit comments