Skip to content

Commit ae9f29b

Browse files
author
Thomas Ning
authored
add the sync barrier for persistent kernel (#2977)
1 parent 19415d0 commit ae9f29b

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

include/ck_tile/ops/gemm/kernel/universal_gemm_kernel.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1134,6 +1134,7 @@ struct UniversalGemmKernel
11341134

11351135
while(block_id < num_work)
11361136
{
1137+
s_waitcnt_barrier();
11371138
// Get the tile index for this block
11381139
const auto tile_idx = amd_wave_read_first_lane(block_id % num_tiles);
11391140
const auto [iM, iN] = TilePartitioner{kargs.M, kargs.N}.GetOutputTileIndex(tile_idx);

0 commit comments

Comments
 (0)