Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel (#5934) by q10 · Pull Request #5934 · pytorch/FBGEMM

q10 · 2026-06-18T20:14:33Z

Summary:

X-link: https://github.com/facebookresearch/FBGEMM/pull/2852

Tier-2 fix for HIP grid-overflow in sparse_ops/sparse_index_add.cu.

index_add_2d_with_unique_indices_kernel previously used blockIdx.x directly to index unique indices. Capping the host-side grid without first adding a grid-stride loop would silently drop work.

Changes:

Add const int num_unique_indices as a new kernel parameter.
Convert kernel to a grid-stride loop over u = blockIdx.x; u < num_unique_indices; u += gridDim.x (Pattern C). All blockIdx.x references replaced with u. Hoist start_D and has_remainder outside the loop since they depend only on blockIdx.y / threadIdx.x.
RESET per-iteration register state at the top of each iteration: sum[MAX_ELEMENTS_PER_THREAD] re-zeroed and sum_remainder = 0.
Apply standard #ifdef USE_ROCM min(blocks_x_uncapped, get_max_thread_blocks(stream)) #else blocks_x_uncapped #endif cap to the x-dim of the launch grid. y dim is bounded by D/stride_D and needs no cap.

Stacked on top of D105029028 (Tier-2 Diff 5/7). Plan:
/home/bensonma415/.llms/plans/sparse_ops_rocm_grid_overflow_tier2_fix.plan.md (Diff 6/7).

Reviewed By: henrylhtsang

Differential Revision: D105029511

meta-codesync · 2026-06-18T20:14:50Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105029511.

…_kernel (pytorch#5934) Summary: X-link: facebookresearch/FBGEMM#2852 Tier-2 fix for HIP grid-overflow in `sparse_ops/sparse_index_add.cu`. `index_add_2d_with_unique_indices_kernel` previously used `blockIdx.x` directly to index unique indices. Capping the host-side grid without first adding a grid-stride loop would silently drop work. Changes: - Add `const int num_unique_indices` as a new kernel parameter. - Convert kernel to a grid-stride loop over `u = blockIdx.x; u < num_unique_indices; u += gridDim.x` (Pattern C). All `blockIdx.x` references replaced with `u`. Hoist `start_D` and `has_remainder` outside the loop since they depend only on `blockIdx.y` / `threadIdx.x`. - RESET per-iteration register state at the top of each iteration: `sum[MAX_ELEMENTS_PER_THREAD]` re-zeroed and `sum_remainder = 0`. - Apply standard `#ifdef USE_ROCM min(blocks_x_uncapped, get_max_thread_blocks(stream)) #else blocks_x_uncapped #endif` cap to the x-dim of the launch grid. y dim is bounded by D/stride_D and needs no cap. Stacked on top of D105029028 (Tier-2 Diff 5/7). Plan: `/home/bensonma415/.llms/plans/sparse_ops_rocm_grid_overflow_tier2_fix.plan.md` (Diff 6/7). Reviewed By: henrylhtsang Differential Revision: D105029511

meta-codesync · 2026-06-23T05:37:28Z

This pull request has been merged in fa211b0.

pytorch-bot Bot added ciflow/rocm module: rocm labels Jun 18, 2026

meta-cla Bot added the cla signed label Jun 18, 2026

meta-codesync Bot added the meta-exported label Jun 18, 2026

meta-codesync Bot changed the title ~~Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel~~ Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel (#5934) Jun 21, 2026

q10 force-pushed the export-D105029511 branch from 60bd207 to 768131a Compare June 21, 2026 23:18

q10 force-pushed the export-D105029511 branch from 768131a to a9e0cf2 Compare June 22, 2026 18:09

meta-codesync Bot closed this in fa211b0 Jun 23, 2026

meta-codesync Bot added the Merged label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel (#5934)#5934

Add grid-stride loop and ROCm cap to index_add_2d_with_unique_indices_kernel (#5934)#5934
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D105029511

q10 commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

q10 commented Jun 18, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

meta-codesync Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

q10 commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading