Commit fa211b0
Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel (#5934)
Summary:
Pull Request resolved: #5934
X-link: https://github.com/facebookresearch/FBGEMM/pull/2852
Tier-2 fix for HIP grid-overflow in `sparse_ops/sparse_index_add.cu`.
`index_add_2d_with_unique_indices_kernel` previously used `blockIdx.x` directly to index unique indices. Capping the host-side grid without first adding a grid-stride loop would silently drop work.
Changes:
- Add `const int num_unique_indices` as a new kernel parameter.
- Convert kernel to a grid-stride loop over `u = blockIdx.x; u < num_unique_indices; u += gridDim.x` (Pattern C). All `blockIdx.x` references replaced with `u`. Hoist `start_D` and `has_remainder` outside the loop since they depend only on `blockIdx.y` / `threadIdx.x`.
- RESET per-iteration register state at the top of each iteration: `sum[MAX_ELEMENTS_PER_THREAD]` re-zeroed and `sum_remainder = 0`.
- Apply standard `#ifdef USE_ROCM min(blocks_x_uncapped, get_max_thread_blocks(stream)) #else blocks_x_uncapped #endif` cap to the x-dim of the launch grid. y dim is bounded by D/stride_D and needs no cap.
Stacked on top of D105029028 (Tier-2 Diff 5/7). Plan:
`/home/bensonma415/.llms/plans/sparse_ops_rocm_grid_overflow_tier2_fix.plan.md` (Diff 6/7).
Reviewed By: henrylhtsang
Differential Revision: D105029511
fbshipit-source-id: 2a33c6218d6b3d1c9c39ca301a1d451f09a393081 parent 4bb8e6f commit fa211b0
2 files changed
Lines changed: 162 additions & 35 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
37 | 36 | | |
38 | 37 | | |
39 | 38 | | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
45 | 53 | | |
46 | | - | |
| 54 | + | |
47 | 55 | | |
48 | | - | |
49 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
50 | 71 | | |
51 | | - | |
52 | | - | |
53 | | - | |
| 72 | + | |
54 | 73 | | |
55 | 74 | | |
56 | 75 | | |
57 | 76 | | |
58 | 77 | | |
59 | | - | |
| 78 | + | |
60 | 79 | | |
61 | 80 | | |
62 | 81 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
| 82 | + | |
74 | 83 | | |
75 | 84 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | 85 | | |
80 | 86 | | |
81 | 87 | | |
| |||
146 | 152 | | |
147 | 153 | | |
148 | 154 | | |
149 | | - | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
150 | 166 | | |
151 | | - | |
152 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
153 | 170 | | |
154 | 171 | | |
155 | 172 | | |
| |||
177 | 194 | | |
178 | 195 | | |
179 | 196 | | |
180 | | - | |
| 197 | + | |
| 198 | + | |
181 | 199 | | |
182 | 200 | | |
183 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
265 | 374 | | |
266 | 375 | | |
0 commit comments