Commit 1563b10
committed
Name the OOM-skip threshold and explain the 128*bHSS workspace observation
Address review nits on the deterministic THD-backward OOM guard:
1. Replace the magic number 1_000_000_000 with the named constant
SM90_DET_FUSED_THD_BWD_MAX_BHSS = 1 << 30, so the value is searchable
and labeled.
2. Replace the prefatory comment with a short note tying the number to
cuDNN's actual workspace request (~128 * bHSS bytes, measured on
cuDNN 9.21.0 sm90 — see local sweep). At bHSS = 1<<30 the request is
128 GiB, which doesn't fit on H100's 80 GB.
3. Flag the b>=3 caveat for future readers: cuDNN rounds the batch up
internally so workspace grows super-linearly past b=2 (b=4 asks for
4x the b=2 workspace, not 2x). The current fused-essential matrix is
all b=2, so the threshold stays correct for what the test exercises;
the note is there so the next person doesn't have to rediscover it.
Skip set is unchanged — cp_2_0, cp_2_1, cp_3_1, cp_4_2, cp_4_3.
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>1 parent 3b1e4ce commit 1563b10
1 file changed
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
639 | 639 | | |
640 | 640 | | |
641 | 641 | | |
642 | | - | |
643 | | - | |
644 | | - | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
645 | 647 | | |
646 | 648 | | |
647 | 649 | | |
648 | 650 | | |
649 | 651 | | |
650 | | - | |
| 652 | + | |
651 | 653 | | |
652 | 654 | | |
653 | 655 | | |
| |||
0 commit comments