Commit e1cf8fb
Fix HIP grid overflow in permute_1D_sparse_data_cuda (#5763)
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2693
Pull Request resolved: #5763
A MAST training job (`fire-zzt-ESFM-MI350X-20260508-1651-3488ad1a`) was
failing on ROCm in `permute_1D_sparse_data_cuda` with:
```
sparse_permute_1d.hip(339:8) [(permute_1D_data_kernel_vec<false, offsets_t, indices_t, std::nullptr_t>)]
[grid dim 6252106 x 1 x 1] [block dim 64 x 16 x 1]:
Total number of threads 6402156544 is greater than the HIP limit of 2^32
```
The launch site uses `dim3(64, BT_blocks=16)` (block size 1024) and
`blocks = cuda_calc_xblock_count(permuted_lengths_size, BT_blocks)`, so once
`permuted_lengths_size > 2^26 ≈ 67M` segments, total threads exceed `2^32`
and HIP refuses the launch (CUDA's runtime silently handles the wrap; ROCm
does not — see ROCm/hip#2253). The MAST log
shows ~100M segments, well past the limit.
The kernels `permute_1D_data_kernel_vec` and `permute_1D_data_kernel`
already implement a grid-stride loop over `b_t`, so no kernel-side changes
are needed — only the launch site needs to cap the grid. The lengths
kernel uses `CUDA_KERNEL_LOOP`, which also already grid-strides.
Apply the D94944619 conditional-cap pattern at both kernel launch sites in
`permute_1D_sparse_data_cuda`:
- Compute `total_threads` as a `uint64_t` from the unconstrained grid.
- If `total_threads >= numeric_limits<uint32_t>::max()`, cap the grid to
`min(num_threadblocks, utils::cuda::get_max_thread_blocks(stream))`.
- Otherwise pass through the existing value (no perf change for the common
case, including NVIDIA — the generated launch is bit-identical).
Same family of fix as:
- D65009966 (bounds_check_indices)
- D75543767 (TBE forward)
- D94944619 (TBE forward V2 — conditional cap)
Out of scope: `sparse_permute_2d.cu` (`permute_2D_data_kernel_vec`) has
the same pattern at line 253 with `dim3(32, 32)` and is a candidate for
the same fix as a follow-up.
Reviewed By: spcyppt
Differential Revision: D104903707
fbshipit-source-id: 049a7f70ceacd6d7cfd63fa305976e9a95978e011 parent 3203889 commit e1cf8fb
3 files changed
Lines changed: 117 additions & 6 deletions
File tree
- fbgemm_gpu
- fbgemm_gpu
- src/sparse_ops
- test/sparse
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
300 | | - | |
301 | | - | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
302 | 310 | | |
303 | 311 | | |
304 | 312 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
236 | | - | |
| 236 | + | |
237 | 237 | | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
238 | 249 | | |
239 | 250 | | |
240 | 251 | | |
| |||
262 | 273 | | |
263 | 274 | | |
264 | 275 | | |
265 | | - | |
| 276 | + | |
266 | 277 | | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
267 | 289 | | |
268 | 290 | | |
269 | 291 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
31 | 36 | | |
32 | 37 | | |
33 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
34 | 44 | | |
35 | 45 | | |
36 | 46 | | |
| |||
786 | 796 | | |
787 | 797 | | |
788 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
789 | 870 | | |
790 | 871 | | |
791 | 872 | | |
| |||
0 commit comments