Commit 64af5a7
Adding evals after throughput benchmarks (#258)
* initial poc
* remove -d flag when launching docker container
* syntax error
* compatibility fixes
* add correct endpoint prefix
* remove reference env var
* run vllm serve in background
* unescape sequences
* stop vllm to stdout after it stops
* stop vllm to stdout after it stops pt 2
* get rid of docker stop as no longer in detatched
* clone bench serving to tmp dir
* clone bench serving to tmp dir pt 2
* add explanatory comment
* cleaning up
* cleaning up
* adding mi355x refactor
* adding h200 initial refactor
* different way to see server logs
* cleanup
* now fail if server fails
* starting on b200
* doign b200
* reverting erroneous change
* fixing b200
* fixing b200 pt 2
* updating mi300
* updating mi300 pt 2
* updating mi300 pt 3 -- remove detached mode
* cleaning up mi355x
* fixing mi300x and updating 325x
* reverting max conc to 512 on gptoss fp4 b200 docker
* mi325x debug
* add back correct launch script for new mi325x slurm cluster (#231)
* fixing mi300x and updating 325x
* cleanng up
* add wait for h200 slurm dsr1
* max num seqs back to 512 for gptoss fpr b200 docker
* fix port issue for dsr1 mi300x docker
* fix mi355x docker NUM_PROMPTS
* adding prop of failure for server logs
* add utils function for benchmark
* add utils function for benchmark
* function-ize the waiting for server to start
* dont show arg parsing set -x
* dont show arg parsing set +x oops
* dont show arg parsing set +x oops
* capture server pid
* Squash-merge bryan/eval into refactor-docker-runner-launch
* evals h100-cr
* evals h100-cw
* evals h200-nb
* move eval script here
* evals mi300x-amd
* evals mi325x-amd
* evals mi300x-tw
* evals mi300x-oci
* evals mi325x-tw
* evals mi325x-tw summary
* evals mi325x-tw summary
* evals mi355x-amd
* evals mi325x-tw summary
* evals mi325x-tw summary
* evals mi325x-tw summary
* all summary
* evals b200-nvd
* evals b200-nvd 2
* evals b200-nvd 3
* evals h100-cr
* evals b200-nvd 1
* evals h200-trt-cw
* evals h200-trt-cw 2
* evals h200-trt-cw 3
* evals h100-cr 2
* evals h200-trt-cw 4
* evals h200-trt-cw 5 (EP/TP HARD)
* evals h200-trt-cw 6 (EP/TP HARD)
* evals h200-trt-cw 6 (EP/TP HARD)
* evals h200-cw dsr1
* evals mi300x-cr dsr1
* evals mi300x-cr dsr1 2
* evals mi325x-cr dsr1
* evals mi325x-cr dsr1 2
* evals mi355x-amd dsr1
* evals mi355x-amd dsr1 2
* evals mi355x-amd dsr1 3
* evals mi355x-amd dsr1 4
* evals b200-nvd dsr1
* evals b200-nvd fp8 dsr1
* Lighteval 1
* Lighteval 1.75
* Lighteval Mi325x
* Lighteval Mi300x CR
* Lighteval Mi355x amd
* Lighteval b200_nvd
* Lighteval h200_cr0
* Lighteval h200-nb_1
* Lighteval h100-cw_1
* Error reproduction
* Error file removal
* error reproducibility
* should NOT error reproduce
* should NOT error reproduce
* should NOT error reproduce
* should NOT error reproduce
* Double check other runner
* Cleanup MI300x_AMD
* Cleanup MI300x_AMD
* Cleanup MI300x_AMD
* Cleanup MI300x_AMD MUST WORK
* works
* Working lighteval
* lightevel fix
* lighteval test h100-cw_1
* lighteval test h100-cr_1 + parsing
* lighteval test b200_nvd
* lighteval test b200_nvd
* lighteval test mi300x-amd_0
* lighteval test h100-cw_1
* lighteval test mi300x-cr_0
* lighteval test mi325x-tw_1
* lighteval test mi355x-amd_4
* lighteval test b200-nvd_3
* lighteval test h100-cw_1 sudo test
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* b200 fix check
* Prelimary lighteval for all
* Prelimary lighteval for all 2 - fixed TP
* Prelimary lighteval for all 3
* Fix lighteval 1
* Check both
* lm-eval check
* lm-eval check
* lm-eval check
* lm-eva
l optimization
* mi325x test
* mi325x test
* all change, test deepseek
* all change, test deepseek
* retest mi325x
* test b200
* clean b200
* test h200
* H200 test
* B200-nvd2 sleep
* B200-nvd2 sleep
* B200-nvd2 sleep
* mi325x test
* mi325x test, no text, no empty fix
* h100, tmp eval_out
* h100, tmp eval_out, sweep integration
* touch up sweep naming, remove funny triton error
* touch up sweep summary
* touch up run name
* Missing eval env var docker
* Typo
* Add proper coverage
* Add evals
* Cam's solution
* b200 scancel fix
* Change to 2 fewshot, forgot eval env var in b200
* Resolve issues
* Resolve issues/nits
* fix summary table hardware
* fix summary table hardware
* fix summary table hardware 2
* final touches
* Cleanup comments, ammend lighteval
* pt 1 manual merge conflict fixes
* pt 2 manual merge conflict fixes
* use double quotes for gha parsing
* getting rid of full sweep sched changes
* add back spec decoding and disagg env vars
* add an option to ONLY run evals
* remove full-sweep-test workflow and add collect-evals job to run sweep and e2e test
* add run-eval to e2e tests
* math500 prompt and h200 trt evals
* remove run prefix
* add result-prefix to benchmark tmpl uploaded artifacts
* Evals summary refactor
* Evals summary refactor 2
* Evals summary aesthetics
* TRT package fix, trt testing
* trt testing 2
* max_num_tokens
* unbounded gen len
* Fix tmpl args, add isl/osl to table
* add isl/osl
* set max tokens
* remove nvd
* In case of multiple evals
* diagnostic
* test dp_attn
* DP_ATTENTION back
* REMOVE LIGHTEVAL
* Add evals for atom, trt_mtp
* remove tokenizer from benchmarkserving
* remove model_name
* More evals for spec decode
* claude pr comments
* chore(deps): bump the github-actions group with 2 updates (#488)
* fix: update ep metadata in gb200 dynamo sglang configs to match comments (#486)
Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers
for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang
configurations.
The metadata isn't used by sglang dynamo scripts (values are hardcoded),
but the frontend uses these values.
Fixes #485
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
* Experimental folder (increasing researcher/developer velocity) (#489)
* summary table
* Remove git installation and repository cloning
Removed git installation check and cloning of bench_serving repository.
* evals final
* more retries, lower conc, for stability
---------
Co-authored-by: Oseltamivir <bryansg2013@gmail.com>
Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>1 parent cb23cd9 commit 64af5a7
53 files changed
Lines changed: 1196 additions & 119 deletions
File tree
- .github/workflows
- benchmarks
- runners
- utils
- evals
- matrix_logic
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
53 | 61 | | |
54 | 62 | | |
55 | 63 | | |
| |||
74 | 82 | | |
75 | 83 | | |
76 | 84 | | |
| 85 | + | |
77 | 86 | | |
78 | 87 | | |
79 | 88 | | |
| |||
82 | 91 | | |
83 | 92 | | |
84 | 93 | | |
85 | | - | |
| 94 | + | |
86 | 95 | | |
87 | 96 | | |
88 | 97 | | |
| |||
113 | 122 | | |
114 | 123 | | |
115 | 124 | | |
| 125 | + | |
116 | 126 | | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
117 | 130 | | |
118 | 131 | | |
119 | 132 | | |
| |||
137 | 150 | | |
138 | 151 | | |
139 | 152 | | |
| 153 | + | |
140 | 154 | | |
141 | 155 | | |
142 | 156 | | |
143 | 157 | | |
144 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| 125 | + | |
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
132 | 141 | | |
133 | 142 | | |
134 | | - | |
| 143 | + | |
135 | 144 | | |
136 | 145 | | |
137 | 146 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
142 | 142 | | |
143 | 143 | | |
144 | 144 | | |
| 145 | + | |
145 | 146 | | |
146 | 147 | | |
147 | 148 | | |
| |||
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
187 | 203 | | |
188 | 204 | | |
189 | 205 | | |
| |||
0 commit comments