Commit 7ccb11d
committed
fix(pt): Treat cuBLAS allocation failures as PyTorch OOM during auto batch sizing
PyTorch inference can raise after an oversized
batch attempt, especially during . Previously this was
treated as a generic RuntimeError, so stopped after the first batch
size reduction instead of continuing to shrink the inference batch.
Add this cuBLAS allocation failure to the PyTorch auto-batch OOM markers and
cover it with a unit test, allowing to continue retrying with smaller
batch sizes.1 parent 57f870f commit 7ccb11d
2 files changed
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
24 | 38 | | |
25 | 39 | | |
26 | 40 | | |
| |||
0 commit comments