Commit 373f23b
authored
[ROCm] Replace compile-time warp size with runtime query in host code (#1885)
* Replace compile-time warp size with runtime query in host code
Add bnb_host_warp_size() that queries hipDeviceGetAttribute at runtime
with per-device caching (up to 32 GPUs), replacing the compile-time
BNB_WARP_SIZE macro in host-side dispatch. This fixes incorrect
defaulting to warp size 64 on RDNA and kernel dispatch with
proper parameters.
* Fix kernel dispatching for RDNA
* Fix linting issues
* Fix linting issues
* Fix linting issues
* Revert device array caching and instead only do device 0
* Use atomics to avoid a race condition
* Fix linting issues1 parent e63e29c commit 373f23b
2 files changed
+38
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
11 | 15 | | |
12 | 16 | | |
| 17 | + | |
| 18 | + | |
13 | 19 | | |
14 | | - | |
| 20 | + | |
15 | 21 | | |
16 | 22 | | |
17 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
13 | 33 | | |
14 | 34 | | |
15 | 35 | | |
| |||
35 | 55 | | |
36 | 56 | | |
37 | 57 | | |
38 | | - | |
39 | 58 | | |
40 | | - | |
41 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
42 | 68 | | |
43 | 69 | | |
44 | 70 | | |
| |||
407 | 433 | | |
408 | 434 | | |
409 | 435 | | |
410 | | - | |
411 | | - | |
| 436 | + | |
412 | 437 | | |
413 | 438 | | |
414 | 439 | | |
| |||
0 commit comments