Commit 9ff8446
Disable overlap scheduler for DSv4 B200 TRT (pin max_num_requests=256)
The 2dd03e6 build sizes the slot pool as max_num_requests = max_batch_size *
num_micro_batches, with num_micro_batches=2 under the overlap scheduler -> 512
at --max_batch_size 256 (tensorrt_llm/_torch/pyexecutor/_util.py on
feat/deepseek_v4). The older 9aa3715 build used 256. That extra headroom pushed
the conc-256 dpa=true 8k1k prefill-warmup ~0.3 GiB over B200's 178 GiB and OOM'd
(run 26987679137, job 79643136619).
Setting disable_overlap_scheduler: true makes num_micro_batches=1 ->
max_num_requests=256, matching the 9aa3715 footprint that fit conc-256 on B200.
Trade-off: turns off the overlap scheduler (throughput optimization), so these
B200 numbers are not directly comparable to overlap-on configs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 7baa914 commit 9ff8446
3 files changed
Lines changed: 3 additions & 0 deletions
File tree
- benchmarks/single_node/fixed_seq_len
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3464 | 3464 | | |
3465 | 3465 | | |
3466 | 3466 | | |
| 3467 | + | |
3467 | 3468 | | |
3468 | 3469 | | |
3469 | 3470 | | |
| |||
0 commit comments