[CUDA] Add sm_121/Blackwell to known target#24523
Open
Charlie-Tsai1123 wants to merge 3 commits into
Open
Conversation
Add an initial NVIDIA GB10 / sm_121 CUDA target description. The CUDA execution limits are based on local cudaDeviceProp results from an sm_121 device. Existing NVIDIA MMA ops are reused as a conservative baseline until Blackwell-specific MMA intrinsics are modeled. Signed-off-by: Charlie-Tsai1123 <charlie1123tsai@gmail.com>
80db37a to
5b5a04a
Compare
AGindinson
reviewed
May 21, 2026
Contributor
AGindinson
left a comment
There was a problem hiding this comment.
A LIT test would be nice to have, same as in PR #24525, once there's an alignment on the conflicts / order of merging these PRs.
Signed-off-by: Charlie-Tsai1123 <charlie1123tsai@gmail.com>
Signed-off-by: Charlie-Tsai1123 <charlie1123tsai@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add initial CUDA known target support for
sm_121/ Blackwell NVIDIA GB10.The CUDA execution limits are based on local cudaDeviceProp results from an sm_121 device. Existing NVIDIA MMA ops are reused as a conservative baseline until Blackwell-specific MMA intrinsics are modeled.
Related to #24477.
#24477 reports that IREE does not currently recognize newer Blackwell CUDA targets such as
sm_120. This PR addresses the same target-enablement path forsm_121, which is the Blackwell target I can validate locally on NVIDIA GB10.It intentionally does not add
sm_120support because I do not havesm_120hardware to confirm the device limits or runtime behavior.Testing
Tested locally on NVIDIA GB10 /
sm_121.sm_121requires PTX 8.8. Using+ptx88compiles successfully.Compiled and ran a local abs.mlir smoke test:
Results:
Compiled and ran a local matmul.mlir smoke test:
Result: 128x128xf32 values are 256 as expected.