Skip to content

Commit 3e18d9c

Browse files
committed
fix: use full cudagraph mode for gb200 profile
1 parent 38f3597 commit 3e18d9c

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/agg-gb200-flash-profile-dep16-conc16-mtp3.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ backend:
7474
no-enable-prefix-caching: true
7575
no-enable-flashinfer-autotune: true
7676
block-size: 256
77-
compilation-config: '{"cudagraph_mode":"FULL_AND_PIECEWISE","mode":3}'
77+
compilation-config: '{"cudagraph_mode":"FULL","mode":3}'
7878
gpu-memory-utilization: 0.9
7979
stream-interval: 50
8080
no-disable-hybrid-kv-cache-manager: true

0 commit comments

Comments
 (0)