Commit ce5f6a0
committed
[TRTLLM-12154][test] Add Qwen3-32B FP8 disagg stress test
Initial wire-up for a Qwen3-32B FP8 disagg stress test on 8x H200 DGX
(4x TP1 prefill + 1x TP4 decode).
New disagg config (disagg_config_ctxtp1_gentp4_qwen3_32b_fp8.yaml)
exercises chunked prefill, KV block reuse across 4 ctx instances
(kv_cache_aware router + event buffer), FP8 KV cache, disagg cache
transfer, and the structured-output backend selection
(guided_decoding_backend: xgrammar).
Two test entries share the same YAML:
- test_disaggregated_qwen3_32b_fp8 (light): exercises the config end-to-
end via the standard prompts.json client loop. Wired into
l0_dgx_h200.yml post-merge so each merge to main verifies the config
still loads and serves. Local pytest run completes in ~5-10 minutes.
- test_disaggregated_stress_test::qwen3_32b_fp8_stress: the long-running
variant for the QA weekly stress lane (request_count=10000,
accuracy_threshold=0.30 as conservative initial defaults; expect to
tighten after the first baseline run). Wired into
qa/llm_function_stress.txt alongside the existing deepseek/gpt-oss
stress entries.
Marked skip_pre_hopper on both (vs the existing Blackwell-only entries)
because the target is H200.
Eagle3 is deferred (TODO in YAML): NVIDIA's HF speculative-decoding
collection doesn't currently ship a draft for dense Qwen3-32B, and
Eagle3 is mutually exclusive with enable_block_reuse when KV is FP8
per examples/models/core/qwen/README.md.
Signed-off-by: Brian Nguyen <brnguyen@nvidia.com>1 parent 22bafe4 commit ce5f6a0
4 files changed
Lines changed: 70 additions & 0 deletions
File tree
- tests/integration
- defs/disaggregated
- test_configs
- test_lists
- qa
- test-db
Lines changed: 45 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
| 264 | + | |
| 265 | + | |
264 | 266 | | |
265 | 267 | | |
266 | 268 | | |
| |||
2087 | 2089 | | |
2088 | 2090 | | |
2089 | 2091 | | |
| 2092 | + | |
| 2093 | + | |
| 2094 | + | |
| 2095 | + | |
| 2096 | + | |
| 2097 | + | |
| 2098 | + | |
| 2099 | + | |
| 2100 | + | |
| 2101 | + | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
| 2105 | + | |
| 2106 | + | |
| 2107 | + | |
2090 | 2108 | | |
2091 | 2109 | | |
2092 | 2110 | | |
| |||
2099 | 2117 | | |
2100 | 2118 | | |
2101 | 2119 | | |
| 2120 | + | |
| 2121 | + | |
| 2122 | + | |
| 2123 | + | |
| 2124 | + | |
2102 | 2125 | | |
2103 | 2126 | | |
2104 | 2127 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
| |||
0 commit comments