Skip numeric drop-out when PComputeWindow is a null_tile_window in Bl…#7256
Open
Skip numeric drop-out when PComputeWindow is a null_tile_window in Bl…#7256
Conversation
poyenc
reviewed
May 11, 2026
poyenc
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The BlockDropout implementation already provides very complete logic for generating random numbers and executing dropout for the P tensor after first attention Gemm with capability to support both Warp-Gemm 32x32 and 16x16 as well as to run on both wave32 and wave64 arch.
But in some situation, we only need the block-layer process to generate random numbers, rather than simultaneously execute dropout in real-time on the vgpr tile. For example, xformers'
test_mem_eff_attention.py::test_dropout_ckrequires the host reference implementation ofattention forward with dropoutto use the same random numbers to compare & verify the device side implementation ofattention forward with dropout, so a standalone kernel to generate random numbers only is required.This PR will enable xformers's random_val generating kernel (in file
ck_tiled_rand_uniform_kernel.h) to depend on BlockDropout'sRun()operator completely to generate random numbers for a[MPerBlock, NPerBlock]tile during the tile iteration, no need to replicate the logic of BlockDropout in the xformers kernel