Commit 84ec712
committed
perf: reduce register pressure in dyn dispatch
We decrease the number of values per tile in the output stage each
GPU thread uses, as well as limit the register count to 32 in the
launch bounds.
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>1 parent 5e5475a commit 84ec712
1 file changed
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
282 | | - | |
| 282 | + | |
| 283 | + | |
283 | 284 | | |
284 | 285 | | |
285 | 286 | | |
| |||
472 | 473 | | |
473 | 474 | | |
474 | 475 | | |
475 | | - | |
476 | | - | |
477 | | - | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
478 | 480 | | |
479 | 481 | | |
480 | 482 | | |
| |||
0 commit comments