Commit 7f8a801
Fix HIP grid overflow in pack_segments_cuda{,_v2} (#5907)
Summary:
Pull Request resolved: #5907
Apply the same `#ifdef USE_ROCM` cap pattern used in D104903707 /
D104937969 / parent diffs to the two launch sites in
`pack_segments_forward_cuda` and `pack_segments_forward_cuda_v2` in
`sparse_pack_segments_forward.cu`.
Both ops launch their kernels with block size 128 and grid
`cuda_calc_xblock_count(num_seq * max_length * cell_size, 128)`. Once
the product `num_seq * max_length * cell_size > 2^32`, total threads
exceed the HIP `2^32` limit and the launch is rejected on ROCm. Both
`pack_segments_cuda_kernel` (uses `CUDA_KERNEL_LOOP`) and
`pack_segments_cuda_v2_kernel` (uses `CUDA_KERNEL_LOOP_TYPE`) already
grid-stride, so capping the grid is correctness-preserving.
The `#ifdef USE_ROCM / #else / #endif` selector keeps NVIDIA codegen
bit-identical and unconditionally caps on ROCm.
Same family of fix as:
- D104903707 (permute_1D_sparse_data — parent diff)
- D104937969 (permute_2D_sparse_data — parent diff)
Reviewed By: henrylhtsang
Differential Revision: D104950916
fbshipit-source-id: e8999860e7b7f64250e61daffbfae00ae71ee36a1 parent d334a2f commit 7f8a801
2 files changed
Lines changed: 124 additions & 4 deletions
Lines changed: 22 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
139 | 149 | | |
140 | 150 | | |
141 | | - | |
| 151 | + | |
142 | 152 | | |
143 | 153 | | |
144 | 154 | | |
| |||
233 | 243 | | |
234 | 244 | | |
235 | 245 | | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
236 | 256 | | |
237 | 257 | | |
238 | | - | |
| 258 | + | |
239 | 259 | | |
240 | 260 | | |
241 | 261 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
| |||
623 | 627 | | |
624 | 628 | | |
625 | 629 | | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
626 | 726 | | |
627 | 727 | | |
628 | 728 | | |
| |||
0 commit comments