Skip to content

Commit 41dd9e0

Browse files
fsaadyvenkywonka
andauthored
[None][test] Add tests for all database configs. (NVIDIA#11653)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> Signed-off-by: Fadi Saady <fsaady@nvidia.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
1 parent e56397d commit 41dd9e0

183 files changed

Lines changed: 504 additions & 487 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

examples/configs/curated/deepseek-r1-deepgemm.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ trust_remote_code: true
66
enable_attention_dp: true
77
cuda_graph_config:
88
enable_padding: true
9-
max_batch_size: 128
109
kv_cache_config:
1110
dtype: fp8
1211
free_gpu_memory_fraction: 0.8

examples/configs/curated/deepseek-r1-latency.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ tensor_parallel_size: 8
33
moe_expert_parallel_size: 2
44
max_num_tokens: 32768
55
trust_remote_code: true
6-
moe_backend: TRTLLM
7-
use_cuda_graph: true
86
kv_cache_config:
97
free_gpu_memory_fraction: 0.75
8+
moe_config:
9+
backend: TRTLLM
1010
speculative_config:
1111
decoding_type: MTP
1212
num_nextn_predict_layers: 3

examples/configs/curated/deepseek-r1-throughput.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ trust_remote_code: true
66
enable_attention_dp: true
77
cuda_graph_config:
88
enable_padding: true
9-
max_batch_size: 128
109
kv_cache_config:
1110
dtype: fp8
1211
free_gpu_memory_fraction: 0.8

examples/configs/curated/gpt-oss-120b-latency.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@ max_num_tokens: 16384
33
tensor_parallel_size: 8
44
moe_expert_parallel_size: 1
55
trust_remote_code: true
6-
enable_attention_dp: false
7-
kv_cache_config:
8-
free_gpu_memory_fraction: 0.9
96
cuda_graph_config:
107
enable_padding: true
118
max_batch_size: 64

examples/configs/curated/gpt-oss-120b-throughput.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@ tensor_parallel_size: 2
44
moe_expert_parallel_size: 2
55
trust_remote_code: true
66
enable_attention_dp: true
7-
kv_cache_config:
8-
free_gpu_memory_fraction: 0.9
97
cuda_graph_config:
108
enable_padding: true
119
max_batch_size: 720

examples/configs/curated/kimi-k2-thinking.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ max_seq_len: 8212
44
tensor_parallel_size: 8
55
moe_expert_parallel_size: 8
66
enable_attention_dp: true
7-
pipeline_parallel_size: 1
87
print_iter_log: true
98
kv_cache_config:
109
free_gpu_memory_fraction: 0.75
Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
max_batch_size: 1024
22
max_num_tokens: 2048
3-
tensor_parallel_size: 1
43
moe_expert_parallel_size: 1
54
trust_remote_code: true
6-
enable_attention_dp: false
75
cuda_graph_config:
86
enable_padding: true
97
max_batch_size: 1024
108
kv_cache_config:
119
dtype: fp8
12-
free_gpu_memory_fraction: 0.9
Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
max_batch_size: 1024
22
max_num_tokens: 2048
3-
tensor_parallel_size: 1
43
moe_expert_parallel_size: 1
54
trust_remote_code: true
6-
enable_attention_dp: false
75
cuda_graph_config:
86
enable_padding: true
97
max_batch_size: 1024
108
kv_cache_config:
119
dtype: fp8
12-
free_gpu_memory_fraction: 0.9
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# arch: MODEL_CLASS_MAPPING key; required when model has get_model_defaults. Add when adding entries.
2+
- model: Qwen/Qwen3-Next-80B-A3B-Thinking
3+
arch: Qwen3NextForCausalLM
4+
config_path: examples/configs/curated/qwen3-next.yaml
5+
- model: Qwen/Qwen3-30B-A3B
6+
arch: Qwen3MoeForCausalLM
7+
config_path: examples/configs/curated/qwen3.yaml
8+
- model: Qwen/Qwen3-30B-A3B
9+
arch: Qwen3MoeForCausalLM
10+
config_path: examples/configs/curated/qwen3-disagg-prefill.yaml
11+
- model: deepseek-ai/DeepSeek-R1-0528
12+
arch: DeepseekV3ForCausalLM
13+
config_path: examples/configs/curated/deepseek-r1-latency.yaml
14+
- model: deepseek-ai/DeepSeek-R1-0528
15+
arch: DeepseekV3ForCausalLM
16+
config_path: examples/configs/curated/deepseek-r1-throughput.yaml
17+
- model: deepseek-ai/DeepSeek-R1-0528
18+
arch: DeepseekV3ForCausalLM
19+
config_path: examples/configs/curated/deepseek-r1-deepgemm.yaml
20+
- model: openai/gpt-oss-120b
21+
arch: GptOssForCausalLM
22+
config_path: examples/configs/curated/gpt-oss-120b-latency.yaml
23+
- model: openai/gpt-oss-120b
24+
arch: GptOssForCausalLM
25+
config_path: examples/configs/curated/gpt-oss-120b-throughput.yaml
26+
- model: nvidia/Llama-3.3-70B-Instruct-FP8
27+
arch: LlamaForCausalLM
28+
config_path: examples/configs/curated/llama-3.3-70b.yaml
29+
- model: nvidia/Llama-4-Scout-17B-16E-Instruct-FP8
30+
arch: Llama4ForConditionalGeneration
31+
config_path: examples/configs/curated/llama-4-scout.yaml
32+
- model: nvidia/Kimi-K2-Thinking-NVFP4
33+
arch: DeepseekV3ForCausalLM
34+
config_path: examples/configs/curated/kimi-k2-thinking.yaml
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
max_batch_size: 161
22
max_num_tokens: 1160
3-
kv_cache_config:
4-
free_gpu_memory_fraction: 0.8
5-
tensor_parallel_size: 1
63
moe_expert_parallel_size: 1
74
trust_remote_code: true
8-
print_iter_log: true
95
enable_attention_dp: true
6+
kv_cache_config:
7+
free_gpu_memory_fraction: 0.8

0 commit comments

Comments
 (0)