Skip to content

Commit 17d5e20

Browse files
cquil11functionstackxkimbo@semianalysis.com
authored
feat: candidate search space (no perf changes for amd/nvidia) (#145)
* initial commit based on kimbos edits * adding config and python script: * adding runner field * finishing up script, ready for testing * testing purposes * testing purposes * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * refactoring more * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * testing concurrency * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * updating the benchmark files with logic * adding pytests * adding other isl osl * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding more workflows * adding script * removing extraneous files * removing extraneous files * removing plottingh * removing plottingh * removing plottingh * removing plottingh * removing plottingh * removing plotting python script * bmk-space -> search-space * updating exp name for full sweep * pip install pydantic * add filtered sweep * allow multiple filter values * reverse seq len mapping * less verbose * deleting files * list tp ep dpa then conc * removing 70b stuff * temp fix (#148) * remove: llama 70b * revert remove: llama 70b * remove llama 70b (#149) * testing concurrency * adding more workflows * deleting files * cleaning up after rebase * adding docs for configs; adding field to configs * hash on dpa too * debug * debug * debug * debug * update hashing * deleting extraneous file * adding gb200 * adding gb200 pt 2 * adding gb200 pt 3 * adding gb200 to other isl osl sweeps * adding gb200 to other isl osl sweeps * adding gb200 test * adding gb200 test * adding gb200 test * adding full sweep test * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * adding full sweep test pt 2 * reverting title * adding full sweep test pt 2 * adding full sweep test pt 2 * reverting title * fixing test files * fixing gha syntax error * fixing gha syntax error * fixing error in multinode script * bug fxes * debug * debug * celaning up the full sweep sched * celaning up other workflows * docs * remove concurrency locks * add dpa to results filename * add back plotting * testing concurrency * adding more workflows * deleting files * temp fix (#148) * testing concurrency * update random range ratio default * get process results vals from env vars instead of argv * get process results vals from env vars instead of argv pt 2 * editing runners yaml * testing concurrency * adding more workflows * deleting files * testing concurrency * testing concurrency * testing concurrency * remove 70b * cleaning up after rebase * changing name of files from XkYk to shceduler * double check and update master configs * double check and update master configs pt 2 * add pydantic pip install * bug fix * update cron trigger to 9:00 PM CDT * runner name bug in process result python script --------- Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com> Co-authored-by: kimbo@semianalysis.com <you@example.com>
1 parent e06929c commit 17d5e20

32 files changed

Lines changed: 4486 additions & 1716 deletions

.github/README.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# How to Test Workflows
2+
3+
In order to test configurations described in `.github/configs`, the primary workflow file used is `.github/workflows/e2e-tests.yml`. As input, this workflow takes in the CLI arguments for the `utils/matrix-logic/generate_sweep_configs.py` script. The usage for this script is shown below:
4+
5+
```
6+
usage: generate_sweep_configs.py [-h] {full-sweep,test-config,runner-model-sweep,runner-sweep,custom} ...
7+
8+
Generate benchmark configurations from YAML config files
9+
10+
positional arguments:
11+
{full-sweep,test-config,runner-model-sweep,runner-sweep,custom}
12+
Available commands
13+
full-sweep Generate full sweep configurations with optional filtering by model, precision, framework, runner type, and sequence lengths
14+
test-config Given a config key, run that configuration as specified. Optionally specify --test-mode to only run one parallelism-concurrency pair for the config.
15+
runner-model-sweep Given a runner type, find all configurations matching the type, and run that configuration on all individual runner nodes for the specified runner type. This is meant to validate
16+
that all runner nodes work on all configurations for a runner type. For instance, to validate that all configs that specify an h200 runner successfully run across all h200 runner
17+
nodes.
18+
runner-sweep Given a model (and optionally a precision and framework), find all configurations matching the inputs, and run those configurations across all compatible runner nodes. This is
19+
meant to validate all runner nodes that should run a particular model can. For instance, this should be used to validate that all runners nodes that should run gptoss-120b
20+
actually do so successfully.
21+
custom Enter custom values
22+
23+
options:
24+
-h, --help show this help message and exit
25+
```
26+
27+
Instead of explaining each command at a high level, let's just walk through some common testing scenarios and describe how to run them.
28+
29+
**Scenario 1**: I want to change increase the concurrency from 128 to 256 in the 1k1k scenario for the `dsr1-fp4-b200-sglang` config (from `.github/configs/nvidia-master.yaml`) and then test it.
30+
31+
Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the text following command as the text input:
32+
```
33+
test-config --key dsr1-fp4-b200-sglang --seq-len 1k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
34+
```
35+
36+
Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986046399
37+
38+
If we wanted to also test 1k8k or 8k1k scenarios, we would simply append `1k8k` or `8k1k` to `--seq-len`, respectively.
39+
40+
Further, if we wanted to run that config on *one specific* runner node, we could specify that by appending `--runner-node` to the argument list. Note that if the specified runner node is not compatible with the specified config key (as dictated by `.github/configs/runners.yaml`), then the workflow will error:
41+
42+
```
43+
test-config --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml --key dsr1-fp4-b200-sglang --seq-len 1k1k --runner-node mi300x-amd_0
44+
45+
ValueError: Runner node 'mi300x-amd_0' is not compatible with config 'dsr1-fp4-b200-sglang' which runs on runner type 'b200'. Available runner nodes for this config are 'b200-nb_0, b200-nb_1, b200-nvd_0, b200-nvd_1, b200-nvd_2, b200-nvd_3, b200-tg_0'.
46+
```
47+
48+
Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986053019/job/54229839736
49+
50+
**Scenario 2**: I just made a change to the `benchmarks/dsr1_fp8_b200_docker.sh` and I need to verify that these changes work across all B200 runners.
51+
52+
Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the text following command as the text input:
53+
```
54+
runner-sweep --runner-type b200 --model-prefix dsr1 --precision fp8 --config-files .github/configs/amd-master.yaml .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
55+
```
56+
57+
Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986283169
58+
59+
This will run a test (just the highest available parallelism and lowest available concurrency) for each B200 runner node for each Deepseek config that runs on B200 with fp8 precision. I.e., this can be used to "sweep" across runners for a particular model to test that all runners still work with changes that have been made.
60+
61+
**Scenario 3**: I just upgraded the CUDA drivers on all H200 runners and need to verify that all models that use H200 still work correctly across all H200 nodes.
62+
63+
Go to the GitHub Actions UI, click on the `End-to-End Tests` workflow, and enter the following command as the text input:
64+
```
65+
runner-model-sweep --runner-type h200 --config-files .github/configs/amd-master.yaml .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
66+
```
67+
68+
Workflow Run Example: https://github.com/InferenceMAX/InferenceMAX/actions/runs/18986292917
69+
70+
This will run a test (just the highest available parallelism and lowest available concurrency) for each configuration that specifies the `h200` runner type, across all H200 runner nodes defined in `.github/configs/runners.yaml`.
71+
72+
For example, if you have configs `dsr1-fp8-h200-sglang`, `dsr1-fp8-h200-trt`, and `gptoss-fp4-h200-vllm` that all use `runner: h200`, and you have 8 H200 nodes (`h200-cw_0`, `h200-cw_1`, etc.), this will run all 3 configs on all 8 nodes (24 total test runs).
73+
74+
This is particularly useful when:
75+
- You've made infrastructure changes to a specific runner type (driver updates, system configuration, Docker setup)
76+
- You've added new runner nodes and want to validate they work with all existing model configurations
77+
- You want to verify that all models remain compatible with a specific GPU type after system updates
78+
79+
**Key difference from Scenario 2**:
80+
- `runner-sweep`: Fix a **model**, sweep across runners → "Does this model work on all its runners?"
81+
- `runner-model-sweep`: Fix a **runner type**, sweep across models → "Do all models work on this runner type?"
82+
83+
## Additional Use Cases with `full-sweep`
84+
85+
The `full-sweep` command supports multiple filters that can be combined for targeted testing:
86+
87+
**Test all gptoss configurations on B200 with 1k1k sequence lengths:**
88+
```
89+
full-sweep --model-prefix gptoss --runner-type b200 --seq-lens 1k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
90+
```
91+
92+
**Test all fp8 precision configs across all runners for 1k8k workloads:**
93+
```
94+
full-sweep --precision fp8 --seq-lens 1k8k --config-files .github/configs/nvidia-master.yaml .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
95+
```
96+
97+
**Test all TRT configs on H200 runners:**
98+
```
99+
full-sweep --framework trt --runner-type h200 h200-trt --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
100+
```
101+
102+
**Quick smoke test of all configs (highest TP, lowest concurrency only):**
103+
```
104+
full-sweep --test-mode --config-files .github/configs/nvidia-master.yaml .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
105+
```
106+
107+
**Test specific model on specific hardware with specific sequence lengths:**
108+
```
109+
full-sweep --model-prefix dsr1 --runner-type b200 --precision fp4 --framework sglang --seq-lens 1k1k 8k1k --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
110+
```
111+
112+
## Custom One-off Tests
113+
114+
**Scenario 4**: I want to run a quick test with a custom image, model, or configuration that isn't in the config files yet.
115+
116+
Use the `custom` command to specify all parameters manually:
117+
```
118+
custom --runner-label b200-nb_0 --image vllm/vllm-openai:v0.11.0 --model meta-llama/Llama-3.1-70B --framework vllm --precision fp8 --exp-name llama70b_test --config-files .github/configs/nvidia-master.yaml --runner-config .github/configs/runners.yaml
119+
```
120+
121+
This runs a single 1k1k test job with your custom parameters on the specified runner node. Useful for:
122+
- Testing new images before adding them to config files
123+
- Quick validation of new models
124+
- Experimenting with different frameworks or precisions

.github/configs/CONFIGS.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Configs
2+
3+
The config files in this directory are meant to be a "source of truth" for what benchmark configurations can/should be run. As such, they must follow a precise format which is described below.
4+
5+
## Master Configs (AMD, NVIDIA, etc.)
6+
7+
```yaml
8+
entry-name:
9+
image: string
10+
model: string
11+
model-prefix: string
12+
runner: string
13+
precision: string
14+
framework: string
15+
seq-len-configs:
16+
- isl: int
17+
osl: int
18+
search-space:
19+
- { tp: int, conc-start: int, conc-end: int }
20+
# Optionally, specify 'ep' (expert-parallelism) and 'dp-attn' (data parallel attention)
21+
- { tp: int, ep: int, dp-attn: bool, conc-start: int, conc-end: int }
22+
- ...
23+
- ...
24+
```
25+
Note: while not required, `entry-name` typically takes the format `<INFMAX_MODEL_PREFIX>-<PRECISION>-<GPU>-<FRAMEWORK>`.
26+
27+
The below list describes what each field is:
28+
29+
- `image`: The image used to serve the benchmark, e.g., `vllm/vllm-openai:v0.10.2`
30+
- `model`: The model to server, e.g., `openai/gpt-oss-120b`
31+
- `model-prefix`: The canonical InferenceMAX model prefix reference, i.e., `dsr1` for Deepseek, `gptoss` for gptoss-120b, etc. This value is used to decipher which script in `benchmarks/` should be used in order to launch the benchmark.
32+
- `runner`: This is the runner on which to run the benchmark. This must be a valid runner (key or value) from `runners.yaml`.
33+
- `precision`: The precision to run the benchmark. Again, this is used to find which script to run in `benchmarks/`.
34+
- `framework`: The framework (serving runtime) to serve the benchmark, e.g., `vllm`, `sglang`, `trt`.
35+
- `seq-len-configs`: A list of possible sequence lengths to benchmark. Each entry must have the following fields:
36+
- `isl`: An integer representing the input sequence length, e.g., `1024`
37+
- `osl`: An integer representing the output sequence length, e.g., `8192`
38+
- `search-space`: A list of configurations to run with respective `isl` and `osl`, each entry must be a dict with the following fields:
39+
- `tp`: An integer representing the tensor parallelism level that the configuration will be served at.
40+
- `conc-start`: An integer representing the starting level of concurrency e.g., `4`
41+
- `conc-end`: An integer representing the ending level of concurrency (inclusive) e.g., `128`
42+
- Note: the step factor between `conc-start` and `conc-end` is 2, so if `conc-start` is 4 and `conc-end` is 128, all concurrencies `4, 8, 16, 32, ..., 128` will be run.
43+
- (Optional) `ep`: An integer representing the expert parallelism level that the configuration will be served at. Default is 1 (no expert parallelism) when not specified.
44+
- (Optional) `dp-attn`: A boolean representing whether or not to activate data parallel attention for the configuration. Default is false when not specified.
45+
46+
Notes:
47+
- No extra fields besides the ones listed may be specified, or else the benchmarks will fail to run.
48+
- Setting the fields above, particularly `ep` and `dp-attn`, only guarantee that the respective values will be passed as environment variables to the benchmark scripts! Actually using those environment variables is an implementation detail at the level of the benchmark Bash script.
49+
50+
## Runners
51+
52+
The `runners.yaml` config represents the available runners in the repository. The keys are the runner *types* (i.e., the GPUs as well as some specific combinations like `h200-trt`) whereas the value is a list of *runner nodes*. This config is used to verify the master configs.

.github/configs/amd-master.yaml

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
dsr1-fp4-mi355x-sglang:
2+
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
3+
model: amd/DeepSeek-R1-0528-MXFP4-Preview
4+
model-prefix: dsr1
5+
runner: mi355x
6+
precision: fp4
7+
framework: sglang
8+
seq-len-configs:
9+
- isl: 1024
10+
osl: 1024
11+
search-space:
12+
- { tp: 4, conc-start: 4, conc-end: 64 }
13+
- { tp: 8, conc-start: 4, conc-end: 64 }
14+
- isl: 1024
15+
osl: 8192
16+
search-space:
17+
- { tp: 8, conc-start: 4, conc-end: 64 }
18+
- isl: 8192
19+
osl: 1024
20+
search-space:
21+
- { tp: 8, conc-start: 4, conc-end: 64 }
22+
23+
dsr1-fp8-mi300x-sglang:
24+
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915
25+
model: deepseek-ai/DeepSeek-R1-0528
26+
model-prefix: dsr1
27+
runner: mi300x
28+
precision: fp8
29+
framework: sglang
30+
seq-len-configs:
31+
- isl: 1024
32+
osl: 1024
33+
search-space:
34+
- { tp: 8, conc-start: 4, conc-end: 64 }
35+
- isl: 1024
36+
osl: 8192
37+
search-space:
38+
- { tp: 8, conc-start: 4, conc-end: 64 }
39+
- isl: 8192
40+
osl: 1024
41+
search-space:
42+
- { tp: 8, conc-start: 4, conc-end: 64 }
43+
44+
dsr1-fp8-mi325x-sglang:
45+
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi30x-20250915
46+
model: deepseek-ai/DeepSeek-R1-0528
47+
model-prefix: dsr1
48+
runner: mi325x
49+
precision: fp8
50+
framework: sglang
51+
seq-len-configs:
52+
- isl: 1024
53+
osl: 1024
54+
search-space:
55+
- { tp: 8, conc-start: 4, conc-end: 64 }
56+
- isl: 1024
57+
osl: 8192
58+
search-space:
59+
- { tp: 8, conc-start: 4, conc-end: 64 }
60+
- isl: 8192
61+
osl: 1024
62+
search-space:
63+
- { tp: 8, conc-start: 4, conc-end: 64 }
64+
65+
dsr1-fp8-mi355x-sglang:
66+
image: rocm/7.0:rocm7.0_ubuntu_22.04_sgl-dev-v0.5.2-rocm7.0-mi35x-20250915
67+
model: deepseek-ai/DeepSeek-R1-0528
68+
model-prefix: dsr1
69+
runner: mi355x
70+
precision: fp8
71+
framework: sglang
72+
seq-len-configs:
73+
- isl: 1024
74+
osl: 1024
75+
search-space:
76+
- { tp: 8, conc-start: 4, conc-end: 64 }
77+
- isl: 1024
78+
osl: 8192
79+
search-space:
80+
- { tp: 8, conc-start: 4, conc-end: 64 }
81+
- isl: 8192
82+
osl: 1024
83+
search-space:
84+
- { tp: 8, conc-start: 4, conc-end: 64 }
85+
86+
gptoss-fp4-mi300x-vllm:
87+
image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1
88+
model: openai/gpt-oss-120b
89+
model-prefix: gptoss
90+
runner: mi300x
91+
precision: fp4
92+
framework: vllm
93+
seq-len-configs:
94+
- isl: 1024
95+
osl: 1024
96+
search-space:
97+
- { tp: 1, conc-start: 64, conc-end: 64 }
98+
- { tp: 2, conc-start: 4, conc-end: 64 }
99+
- { tp: 4, conc-start: 4, conc-end: 64 }
100+
- { tp: 8, conc-start: 4, conc-end: 16 }
101+
- isl: 1024
102+
osl: 8192
103+
search-space:
104+
- { tp: 1, conc-start: 64, conc-end: 64 }
105+
- { tp: 2, conc-start: 4, conc-end: 64 }
106+
- { tp: 4, conc-start: 4, conc-end: 64 }
107+
- { tp: 8, conc-start: 4, conc-end: 16 }
108+
- isl: 8192
109+
osl: 1024
110+
search-space:
111+
- { tp: 1, conc-start: 4, conc-end: 64 }
112+
- { tp: 2, conc-start: 4, conc-end: 64 }
113+
- { tp: 4, conc-start: 4, conc-end: 64 }
114+
- { tp: 8, conc-start: 4, conc-end: 16 }
115+
116+
gptoss-fp4-mi325x-vllm:
117+
image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1
118+
model: openai/gpt-oss-120b
119+
model-prefix: gptoss
120+
runner: mi325x
121+
precision: fp4
122+
framework: vllm
123+
seq-len-configs:
124+
- isl: 1024
125+
osl: 1024
126+
search-space:
127+
- { tp: 1, conc-start: 4, conc-end: 64 }
128+
- { tp: 2, conc-start: 4, conc-end: 64 }
129+
- { tp: 4, conc-start: 4, conc-end: 64 }
130+
- { tp: 8, conc-start: 4, conc-end: 64 }
131+
- isl: 1024
132+
osl: 8192
133+
search-space:
134+
- { tp: 1, conc-start: 64, conc-end: 64 }
135+
- { tp: 2, conc-start: 4, conc-end: 64 }
136+
- { tp: 4, conc-start: 64, conc-end: 64 }
137+
- { tp: 8, conc-start: 4, conc-end: 64 }
138+
- isl: 8192
139+
osl: 1024
140+
search-space:
141+
- { tp: 1, conc-start: 4, conc-end: 64 }
142+
- { tp: 2, conc-start: 4, conc-end: 8 }
143+
- { tp: 4, conc-start: 4, conc-end: 8 }
144+
- { tp: 8, conc-start: 4, conc-end: 16 }
145+
146+
gptoss-fp4-mi355x-vllm:
147+
image: rocm/7.0:rocm7.0_ubuntu_22.04_vllm_0.10.1_instinct_20250927_rc1
148+
model: openai/gpt-oss-120b
149+
model-prefix: gptoss
150+
runner: mi355x
151+
precision: fp4
152+
framework: vllm
153+
seq-len-configs:
154+
- isl: 1024
155+
osl: 1024
156+
search-space:
157+
- { tp: 1, conc-start: 4, conc-end: 64 }
158+
- { tp: 4, conc-start: 4, conc-end: 8 }
159+
- { tp: 8, conc-start: 4, conc-end: 16 }
160+
- isl: 1024
161+
osl: 8192
162+
search-space:
163+
- { tp: 1, conc-start: 4, conc-end: 64 }
164+
- { tp: 4, conc-start: 4, conc-end: 8 }
165+
- { tp: 8, conc-start: 4, conc-end: 16 }
166+
- isl: 8192
167+
osl: 1024
168+
search-space:
169+
- { tp: 1, conc-start: 4, conc-end: 64 }
170+
- { tp: 4, conc-start: 4, conc-end: 4 }
171+
- { tp: 8, conc-start: 4, conc-end: 8 }

0 commit comments

Comments
 (0)