Skip to content

Commit 858a8c4

Browse files
TimDettmersclaude
andcommitted
fix: Check _LATEST_CAPABILITY for SM_120a GEMM kernel detection
When only "120" is specified as compute capability, POP_BACK removes it from COMPUTE_CAPABILITY (since it's the latest). The SM_120 check then finds nothing. Also check _LATEST_CAPABILITY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 45bdcce commit 858a8c4

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

CMakeLists.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,12 +229,16 @@ if(BUILD_CUDA)
229229

230230
# SM_120a NVFP4 GEMM kernel: requires compute_120a for block-scaled MMA
231231
# Only include if 120 or 121 is in the target architectures
232+
# Check both COMPUTE_CAPABILITY (may have been popped) and _LATEST_CAPABILITY
232233
set(_HAS_SM120 FALSE)
233234
foreach(_cap IN LISTS COMPUTE_CAPABILITY)
234235
if(_cap MATCHES "^12[01]$")
235236
set(_HAS_SM120 TRUE)
236237
endif()
237238
endforeach()
239+
if(_LATEST_CAPABILITY MATCHES "^12[01]$")
240+
set(_HAS_SM120 TRUE)
241+
endif()
238242
if(_HAS_SM120)
239243
set(SM120A_FILE csrc/kernels_nvfp4_sm120.cu)
240244
list(APPEND SRC_FILES ${SM120A_FILE})

0 commit comments

Comments
 (0)