Skip to content

Commit 9950ef9

Browse files
committed
ggml-cuda : fix CDNA2 compute capability constant for gfx90a (MI210)
GGML_CUDA_CC_CDNA2 was set to 0x910 which corresponds to gfx910 (RDNA3), not gfx90a (CDNA2/MI210). This caused CDNA2 GPUs to be misidentified, skipping CDNA2-specific code paths such as MFMA acc register renaming. Fix by setting the constant to 0x90a to match the actual gfx90a ISA.
1 parent c08d28d commit 9950ef9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml/src/ggml-cuda/common.cuh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@
6565
#define GGML_CUDA_CC_VEGA (GGML_CUDA_CC_OFFSET_AMD + 0x900) // Vega56/64, minimum for fp16 dual issue
6666
#define GGML_CUDA_CC_VEGA20 (GGML_CUDA_CC_OFFSET_AMD + 0x906) // MI50/Radeon VII, minimum for dp4a
6767
#define GGML_CUDA_CC_CDNA1 (GGML_CUDA_CC_OFFSET_AMD + 0x908) // MI100, minimum for MFMA, acc registers
68-
#define GGML_CUDA_CC_CDNA2 (GGML_CUDA_CC_OFFSET_AMD + 0x910) // MI210, minimum acc register renameing
68+
#define GGML_CUDA_CC_CDNA2 (GGML_CUDA_CC_OFFSET_AMD + 0x90a) // MI210 (gfx90a), minimum acc register renaming
6969
#define GGML_CUDA_CC_CDNA3 (GGML_CUDA_CC_OFFSET_AMD + 0x942) // MI300
7070

7171
// RDNA removes MFMA, dp4a, xnack, acc registers, wave size is 32

0 commit comments

Comments
 (0)