When called with CblasNonUnit on ARM, cblas_strmm appears to leave its output unchanged. This can be reproduced with the following C code:
#include <cblas.h>
#include <stdio.h>
void print_matrix(float* mat, uint8_t rows, uint8_t cols) {
for (uint8_t row = 0; row < rows; row++) {
for (uint8_t col = 0; col < cols; col++) {
printf("%.0f ", mat[cols * row + col]);
}
printf("\n");
}
}
int main(void) {
float A[] = {
1403, 1215, 883 , 235 ,
434 , 1903, 624 , 10 ,
740 , 305 , 1196, 326 ,
718 , 133 , 878 , 1633,
};
float B[] = {
146 , 1851, 942,
940 , 1327, 503,
852 , 696 , 364,
1568, 1701, 130,
};
print_matrix(A, 4, 4);
printf("\n");
print_matrix(B, 4, 3);
printf("\n");
cblas_strmm(CblasRowMajor, CblasLeft, CblasUpper, CblasTrans, CblasNonUnit, 4, 3, 1.0, A, 4, B, 3);
print_matrix(B, 4, 3);
}
(edit: accidentally wrote col < rows before)
With openblas 0.3.30, the program outputs this:
1403 1215 883 235
434 1903 624 10
740 305 1196 326
718 133 878 1633
146 1851 942
940 1327 503
852 696 364
1568 1701 130
204838 2596953 1321626
1966210 4774246 2101739
1734470 3294897 1581002
2882006 3452884 557354
With openblas 0.3.31 (or commit 02dc625), the program outputs this:
1403 1215 883 235
434 1903 624 10
740 305 1196 326
718 133 878 1633
146 1851 942
940 1327 503
852 696 364
1568 1701 130
146 1851 942
940 1327 503
852 696 364
1568 1701 130
I was able to bisect the issue down to #5450, so I believe the SME1 implementation is to blame.
This issue came up because of a test failure in https://github.com/linbox-team/linbox/blob/master/tests/test-blas-domain.C. This was tested on an M1 Pro MacBook Pro running NixOS 26.05.
When called with
CblasNonUniton ARM,cblas_strmmappears to leave its output unchanged. This can be reproduced with the following C code:(edit: accidentally wrote
col < rowsbefore)With openblas 0.3.30, the program outputs this:
With openblas 0.3.31 (or commit 02dc625), the program outputs this:
I was able to bisect the issue down to #5450, so I believe the SME1 implementation is to blame.
This issue came up because of a test failure in https://github.com/linbox-team/linbox/blob/master/tests/test-blas-domain.C. This was tested on an M1 Pro MacBook Pro running NixOS 26.05.