Commit 7b150ad
committed
chore[gpu]: don't inline bitunpack lane impls
Fully inlining adds memory pressure such that the dynamic dispatch
kernel operate at the 32 registers per thread max. Preventing to
inline the bit unpack line impls drops the register count per thread
to 24. This is relevant for future changes which will requires more
registers such as patches support on the GPU.
Splitting out the lane implementations into headers is needed such
that the dynamic dispatch kernel ptx can be compiled without the
standalone bitunpack kernels. This reduces the amount of assembly
for the dynamic dispatch to 48k lines from 128k lines. Besides
static compile times, this is relevant for the dynamic dispatch kernel
as the ptx to device compilation should be as fast as possible. For full
JIT static in the background, longer compiler times are fine.
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>1 parent 4a5b7d7 commit 7b150ad
12 files changed
Lines changed: 17007 additions & 16951 deletions
File tree
- .github/workflows
- vortex-cuda
- kernels/src
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
277 | 277 | | |
278 | 278 | | |
279 | 279 | | |
| 280 | + | |
280 | 281 | | |
281 | 282 | | |
282 | 283 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
101 | 111 | | |
102 | 112 | | |
103 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| |||
0 commit comments