Skip to content

[CuTe][SM70] Add comment clarifying signed cast requirement for blockIdx coords#3203

Open
Flink-ddd wants to merge 1 commit intoNVIDIA:mainfrom
Flink-ddd:fix/cute-signed-coord-comment
Open

[CuTe][SM70] Add comment clarifying signed cast requirement for blockIdx coords#3203
Flink-ddd wants to merge 1 commit intoNVIDIA:mainfrom
Flink-ddd:fix/cute-signed-coord-comment

Conversation

@Flink-ddd
Copy link
Copy Markdown

Problem

blockIdx returns uint3, so m_coord, n_coord, and l_coord are unsigned. When these are passed directly to make_coord, the tile residue calculations:

auto m_max_coord = size<0>(shape_MNKL) - size<0>(gA) * get<0>(cta_coord);
auto n_max_coord = size<1>(shape_MNKL) - size<0>(gB) * get<1>(cta_coord);

can produce negative values for small problem shapes (e.g. M=8, N=8 with TileShape=128x128). With unsigned arithmetic, these wrap around to large positive values, causing incorrect predication and wrong results.

Fix

The int() cast in make_coord was already present in sm70_gemm.hpp and sm70_gemm_array.hpp, but without explanation. This PR adds a comment to clarify why the cast is necessary, so users writing custom kernels based on these files do not accidentally omit it.

Reported in #3190.

@Flink-ddd Flink-ddd force-pushed the fix/cute-signed-coord-comment branch from f3e2f05 to e6aeff4 Compare May 2, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant