Skip to content

Enable use_subwarp_shuffle=True for CTA kernel on ROCm#5917

Open
spcyppt wants to merge 1 commit into
pytorch:mainfrom
spcyppt:export-D108830772
Open

Enable use_subwarp_shuffle=True for CTA kernel on ROCm#5917
spcyppt wants to merge 1 commit into
pytorch:mainfrom
spcyppt:export-D108830772

Conversation

@spcyppt

@spcyppt spcyppt commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary: Instantiate the CTA backward kernel template with use_subwarp_shuffle=True on ROCm, enabling subwarp shuffle-based reductions instead of shared memory reductions. The warp kernel template is unchanged (use_subwarp_shuffle=False). This is a CTA-only change.

Differential Revision: D108830772

Summary: Instantiate the CTA backward kernel template with `use_subwarp_shuffle=True` on ROCm, enabling subwarp shuffle-based reductions instead of shared memory reductions. The warp kernel template is unchanged (`use_subwarp_shuffle=False`). This is a CTA-only change.

Differential Revision: D108830772
@meta-codesync

meta-codesync Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@spcyppt has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108830772.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant