Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
342 commits
Select commit Hold shift + click to select a range
0b3c569
Merge pull request #5135 from martin-frbg/ghwf-n2
martin-frbg Feb 16, 2025
86cf9d8
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
martin-frbg Feb 16, 2025
b9ae246
define USE_TRMM for RISCV64 targets as well
martin-frbg Feb 16, 2025
ed15846
Merge pull request #5137 from martin-frbg/issue5136
martin-frbg Feb 17, 2025
ebcab90
Handle flang-new runtime library linking on Linux like classic-flang
martin-frbg Feb 17, 2025
abbd78a
Merge pull request #5138 from martin-frbg/issue5131
martin-frbg Feb 18, 2025
eb84aac
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
martin-frbg Feb 19, 2025
6d1444b
Add ARM64 options for NVIDIA HPC
martin-frbg Feb 19, 2025
f1fa370
fix missing endif
martin-frbg Feb 19, 2025
ceb8f1e
Merge pull request #5140 from martin-frbg/issue5139
martin-frbg Feb 19, 2025
b723c1b
Add thread throttling profile for SGEMM on `NEOVERSEV2`
michalowski-arm Feb 20, 2025
650a062
Add thread throttling profile for SGEMV on `NEOVERSEV2`
michalowski-arm Feb 20, 2025
75b958a
Transform the B array back if necessary before returning
martin-frbg Feb 20, 2025
20d1118
Merge pull request #5143 from martin-frbg/issue5111
martin-frbg Feb 21, 2025
f0bea79
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
taoye9 Feb 21, 2025
77fba0f
Fix "dummy2" flag handling
martin-frbg Feb 22, 2025
643966d
Merge pull request #5146 from martin-frbg/issue5123
martin-frbg Feb 22, 2025
c03a81b
Merge pull request #5141 from michalowski-arm/fork-throttle
martin-frbg Feb 23, 2025
1533fe4
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
martin-frbg Feb 24, 2025
030ae1f
Redefined threading logic for WoA
Harishmcw Feb 25, 2025
09ba099
make throttling code conditional on SMP
martin-frbg Feb 25, 2025
ef9e3f7
Merge pull request #5149 from martin-frbg/fixup5077-5088
martin-frbg Feb 25, 2025
edaf51d
Add sbgemv_t_bfdot kernel for ARM64
annop-w Feb 26, 2025
35bdbca
Add sbgemv_n_neon kernel for arm64.
taoye9 Feb 27, 2025
4346b91
add beta and alpha testcase for sbgemv
taoye9 Feb 28, 2025
c797e27
Merge pull request #5159 from annop-w/sbgemv_t_bfdot
martin-frbg Mar 2, 2025
2b941c4
Merge branch 'develop' into sbgemv_n_neon
martin-frbg Mar 2, 2025
35914aa
Expose the option to build without LAPACKE to ccmake
martin-frbg Mar 2, 2025
e4630ed
Merge pull request #5160 from taoye9/sbgemv_n_neon
martin-frbg Mar 2, 2025
217324d
Merge pull request #5162 from taoye9/add_sbgemv_tests
martin-frbg Mar 3, 2025
38ee7c9
Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2
taoye9 Mar 3, 2025
6b8b35c
fix minior issues of redeclaration of float x0,x1 in sbgemv_n_neon.c
taoye9 Mar 3, 2025
8b98db1
Merge pull request #5167 from taoye9/fix_sbgemv_n_kernel_typo
martin-frbg Mar 3, 2025
5f200dc
Merge pull request #5166 from martin-frbg/issue5158
martin-frbg Mar 3, 2025
7338a47
Merge pull request #5150 from Harishmcw/WoA-Experiments
martin-frbg Mar 3, 2025
1d5ed5c
Merge pull request #5168 from taoye9/add_sbgemvn_on_neonversen2
martin-frbg Mar 4, 2025
39eb43d
Improve thread safety of pthreads builds that rely on C11 atomic oper…
martin-frbg Mar 7, 2025
5c4e38a
Optimize gemv_n_sve kernel
manaalmj Feb 27, 2025
80d3c2a
Add Improving Load Imbalance in Thread-Parallel GEMM
nakagawa-fj Mar 11, 2025
a085b6c
Fix aarch64 sbgemv_t compilation error for GCC < 13
annop-w Mar 12, 2025
4c00099
replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16
taoye9 Mar 12, 2025
8865850
Merge pull request #5176 from annop-w/fix_sbgemv_t
martin-frbg Mar 12, 2025
a3e7b16
Merge pull request #5157 from manaalmj/feature
martin-frbg Mar 12, 2025
37b8547
Merge pull request #5173 from nakagawa-fj/gemm_load_imbalance
martin-frbg Mar 12, 2025
b34235c
Fix inclusion of deprecated interfaces and cgesvdq/strsyl3
martin-frbg Mar 12, 2025
8a418b1
Add dummy implementations for the LAPACK_COMPLEX_CUSTOM case
martin-frbg Mar 12, 2025
1ba0265
Merge pull request #5177 from martin-frbg/cmakelapacke
martin-frbg Mar 13, 2025
9807f56
Optimize aarch64 sgemm_ncopy
annop-w Mar 12, 2025
66e0f1e
Merge pull request #5178 from martin-frbg/lapack_cplx_dummy
martin-frbg Mar 13, 2025
2f77855
Merge pull request #5181 from taoye9/change_sbgemn_cast_bf16
martin-frbg Mar 13, 2025
b55ca71
Merge pull request #5182 from annop-w/sgemm_ncopy
martin-frbg Mar 13, 2025
edef2e4
Fix bug in ARM64 sbgemv_t
annop-w Mar 13, 2025
e9fbe0a
Merge pull request #5183 from annop-w/fix_sbgemv_t
martin-frbg Mar 13, 2025
f27ba5e
fix bugs in aarch64 sbgemv_n kernel
taoye9 Mar 14, 2025
51c244a
Merge pull request #5184 from taoye9/fix_sbgemv_n_bug
martin-frbg Mar 15, 2025
b6cb5ec
Add thread throttling profile for DGEMV on NEOVERSEV1
shubham-fujitsu Feb 28, 2025
189dbbc
Add thread throttling for dynamic arch neoversev1
shubham-fujitsu Mar 4, 2025
8e289ec
Simplified thread throttling function in gemv
shubham-fujitsu Mar 18, 2025
c0a5c96
Fix missing commas in gensymbol.pl
Harishmcw Mar 24, 2025
4e3afa7
Merge pull request #5175 from shubhamsvc/dgemv_thread_throttling
martin-frbg Mar 25, 2025
2007710
Merge pull request #5190 from Harishmcw/develop
martin-frbg Mar 25, 2025
c2e7ab5
DLL symbol pre/postfixing in CMake builds
Harishmcw Mar 26, 2025
1724b3f
DLL symbol pre/postfixing in CMake builds
Harishmcw Mar 26, 2025
72f0abe
Merge pull request #5191 from Harishmcw/CMake_Symbol_Fix
martin-frbg Mar 26, 2025
3ca1ba1
resynchronize with the posix shell version
martin-frbg Mar 26, 2025
51c1fb1
Fix ?spmv build and misinterpretation of NO_LAPACK=0
martin-frbg Mar 26, 2025
8b35534
Merge pull request #5195 from martin-frbg/update-gensymbolpl
martin-frbg Mar 26, 2025
02fd1df
CMake: Pass `OpenMP` compiler and linker flags through CMake targets
ywwry66 Mar 13, 2025
1b0c0f0
CMake: Avoid mixed OpenMP linkage
ywwry66 Mar 13, 2025
251c3f8
gh m1: fix mixed linkage when built with OpenMP and clang+gfortran
ywwry66 Mar 27, 2025
f33943d
Merge pull request #5196 from martin-frbg/issue5193
martin-frbg Mar 27, 2025
ea6515c
On zarch don't produce objects from assembler with a writable stack s…
e4t Mar 26, 2025
61b9339
getarch/cpuid.S: Fix warning about executable stack
e4t Mar 28, 2025
3fc15ad
Fix pdb file creation in debug dll builds with CMake on Windows/WoA
martin-frbg Mar 30, 2025
04915be
Add vector registers to clobber list to prevent compiler optimization.
vaiskv Apr 3, 2025
f90eff3
Merge pull request #5197 from e4t/z-arch-exec-stack
martin-frbg Apr 3, 2025
0aa5ef2
Repeat the libs target's "ln" in the all target to ensure completeness
martin-frbg Apr 3, 2025
7bf8484
Update zsum.c -- fixed spelling error to successfully compile
ColumbusAI Apr 5, 2025
f0008f5
Merge pull request #5206 from ColumbusAI/develop
martin-frbg Apr 5, 2025
1ed962d
Fix compilation with xcode16.3/clang17/gcc14
martin-frbg Apr 6, 2025
67c5bdd
Azure CI: Update flang call in OSX_LLVM_flangnew job (#5208)
martin-frbg Apr 7, 2025
1c5d0d5
move libomp to extralib
martin-frbg Apr 8, 2025
fc8090b
Move additional omp dependency to EXTRALIB
martin-frbg Apr 8, 2025
1ff303f
Optimizing the Implementation of GEMV on the RISC-V V Extension
lglglglgy Apr 8, 2025
94fb703
Fix incomplete error message (Reference-LAPACK PR 1119)
martin-frbg Apr 8, 2025
f0f2747
Merge pull request #5207 from martin-frbg/issue5202
martin-frbg Apr 8, 2025
70865a8
Merge pull request #5180 from ywwry66/openmp_use_cmake
martin-frbg Apr 8, 2025
880e43e
Merge pull request #5198 from martin-frbg/woadlldebug
martin-frbg Apr 8, 2025
4270d5b
Merge pull request #5204 from martin-frbg/issue4692
martin-frbg Apr 9, 2025
1b3e7cc
Merge pull request #5212 from martin-frbg/lapack1119
martin-frbg Apr 9, 2025
a34b487
Remove spurious cast from Alpha and Cell's DEFAULT_ALIGN
martin-frbg Apr 9, 2025
de2380e
Merge pull request #5214 from martin-frbg/issue5200
martin-frbg Apr 9, 2025
ec14615
Use SVE kernel for S/DGEMVT for SVE machines
annop-w Apr 2, 2025
51ba70f
test_potrs.c: remove pragma darwin-aarch64 support
haampie Apr 10, 2025
3d6d026
no-gcse when loongarch64
haampie Apr 10, 2025
ed1e470
Merge pull request #5217 from haampie/hs/fix/darwin-gcc
martin-frbg Apr 10, 2025
2893d0a
Merge pull request #5211 from guoyuanplct/develop
martin-frbg Apr 10, 2025
b30dc97
Merge pull request #5215 from annop-w/gemv_t
martin-frbg Apr 10, 2025
fd3afef
lapacke_mangling.h is no longer generated, so don't delete on make clean
martin-frbg Apr 10, 2025
211dfd0
disable the CooperLake microkernel as it produces wrong results
martin-frbg Apr 10, 2025
39718cd
Merge pull request #5218 from martin-frbg/lapacke_mangling
martin-frbg Apr 11, 2025
f1e628b
Further performance improvements to [SD]GEMV.
iha-taisei Apr 11, 2025
d711906
Add symv kernels for arm64
Apr 11, 2025
7b66330
hw.perflevel[01].cpusperl changed to hw.perflevel[01].cpusperl2
Apr 16, 2025
d1c2528
Add L1_DATA_LINESIZE for ifdef __APPLE__
Apr 16, 2025
acef78c
Reset buffer length before every call to sysctlbyname.
Apr 16, 2025
d9369bd
Update and amend parameters for Neoverse cpus
martin-frbg Apr 16, 2025
d535728
Improve performance for SGEMVN on NEONVERSEN1
annop-w Apr 9, 2025
afb6645
Merge pull request #5221 from tetsuzo-usui/tune_symv_for_arm64
martin-frbg Apr 16, 2025
0241d51
Merge pull request #5220 from iha-taisei/sdgemv_n_unroll
martin-frbg Apr 16, 2025
3a088de
Merge pull request #5228 from martin-frbg/cmakecrossarm
martin-frbg Apr 17, 2025
dd38b4e
Merge pull request #5225 from annop-w/gemv_n
martin-frbg Apr 17, 2025
1f687b2
Bump xuantie qemu for c910v
RevySR Apr 20, 2025
0cc2485
Explicit unaligned vector load/stores in PPC64LE GEMV kernels
quickwritereader Apr 20, 2025
afc1dc6
Merge pull request #5234 from RevySR/bump-xuantie-qemu
martin-frbg Apr 20, 2025
2e43093
Merge pull request #5219 from martin-frbg/sbgemvn_cooper
martin-frbg Apr 20, 2025
d659f3c
Fix "Argument list too long" compilation error for Intel macOS
ywwry66 Apr 16, 2025
94fceae
Merge pull request #5233 from ywwry66/apple_workaround
martin-frbg Apr 20, 2025
9aa7a0b
Follow-up to d659f3c
ywwry66 Apr 21, 2025
050c3b2
Merge pull request #5236 from ywwry66/apple_workaround
martin-frbg Apr 21, 2025
f5bc97c
Merge pull request #5227 from zanpeeters/develop
martin-frbg Apr 21, 2025
96d8080
Reinstate the CooperLake microkernel
martin-frbg Apr 21, 2025
99d9f1f
Fix conditional
martin-frbg Apr 21, 2025
1df8738
Merge pull request #5235 from quickwritereader/issue_unaligned_ppc64le
martin-frbg Apr 21, 2025
4ec62d7
remove non-vectorized code path for power8, restoring PR4880
martin-frbg Apr 21, 2025
7389b6c
Merge pull request #5237 from martin-frbg/revert5219
martin-frbg Apr 22, 2025
db0abfa
Merge pull request #5238 from martin-frbg/revert5125
martin-frbg Apr 22, 2025
e11744a
Use SVE kernel for S/DGEMVN for SVE machines
annop-w Apr 22, 2025
08b5c18
fixed a potential out-of-bounds on gemv.
iha-taisei Apr 22, 2025
ddfefd9
Merge pull request #5240 from iha-taisei/fixedIssue5231
martin-frbg Apr 22, 2025
d0e8fd6
Merge pull request #5239 from annop-w/gemv_n_sve
martin-frbg Apr 22, 2025
9c02cdb
optimise dot using thread throttling for NEOVERSE V1
abhishek-iitmadras Mar 23, 2025
0c239c9
update contribution list
abhishek-iitmadras Apr 22, 2025
70dff3b
Merge pull request #5242 from abhishek-iitmadras/abhishekk_dot
martin-frbg Apr 23, 2025
e1bd631
allow the use of LAPACK_COMPLEX_CPP when using MSVC compiler
chitao1234 Apr 24, 2025
7616c42
Optimized RVV_ZVL256B Implementation of zgemv_n
guoyuanplct Apr 24, 2025
11ffc86
Format the code
guoyuanplct Apr 24, 2025
0cd5ca5
Loongarch64: fixed amax_lasx
ErnstPeng Apr 30, 2025
be52552
Loongarch64: fixed asum_lasx
ErnstPeng Apr 30, 2025
74c97ef
Loongarch64: fixed cdot_lasx
ErnstPeng Apr 30, 2025
d49319c
Loongarch64: fixed cnrm2_lasx
ErnstPeng Apr 30, 2025
a98dd6d
Loongarch64: fixed copy_lasx
ErnstPeng Apr 30, 2025
dc5fa29
Loongarch64: fixed cscal_lasx
ErnstPeng Apr 30, 2025
ba9569e
Loongarch64: fixed dot_lasx
ErnstPeng Apr 30, 2025
b528b1b
Loongarch64: fixed iamax_lasx
ErnstPeng Apr 30, 2025
6dc4ca2
Loongarch64: fixed icamax_lasx
ErnstPeng Apr 30, 2025
57bb46b
Loongarch64: fixed rot_lasx
ErnstPeng Apr 30, 2025
b471fa3
Loongarch64: fixed snrm2_lasx
ErnstPeng Apr 30, 2025
f19e72c
Loongarch64: fixed swap_lasx
ErnstPeng Apr 30, 2025
4bee135
cpuid_x86: improve Intel Arrow Lake detection
scottt Apr 30, 2025
52367ea
Merge pull request #5248 from ErnstPeng/fix-lasx
martin-frbg May 1, 2025
47b4305
Avoid out of bounds accesses in SCAL when INFO<0
martin-frbg May 1, 2025
d48a2fc
Avoid out of bounds accesses in SCAL when INFO<0
martin-frbg May 1, 2025
4c0445a
Avoid out of bounds accesses in SCAL when INFO <0
martin-frbg May 1, 2025
5c958df
Avoid of out of bounds accesses in SCAL when INFO<0
martin-frbg May 1, 2025
cba32d0
Merge pull request #5245 from guoyuanplct/develop
martin-frbg May 1, 2025
0ea9205
Merge pull request #5249 from scottt/fix-build-on-intel-arrow-lake
martin-frbg May 1, 2025
3e961c2
Merge pull request #5251 from martin-frbg/issue5250
martin-frbg May 5, 2025
3c878f3
Cirrus CI: Update xcode version in the Apple crossbuilds (#5254)
martin-frbg May 9, 2025
151b742
Merge pull request #5203 from quic/fix-sgemmdirect-sme1
martin-frbg May 9, 2025
ebbe682
Fix function prototypes
martin-frbg May 10, 2025
0d69a29
Fix empty prototypes of select/selctg
martin-frbg May 10, 2025
2320e0b
Merge pull request #5244 from chitao1234/develop
martin-frbg May 10, 2025
5141a90
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS …
martin-frbg May 10, 2025
cf9e34c
Merge pull request #5258 from martin-frbg/issue5255
martin-frbg May 11, 2025
7321444
enable sbgemm to be forward to sbgemv on arm64
taoye9 May 12, 2025
0ccb050
Loongarch64: fixed cgemm_ncopy_16_lasx
ErnstPeng May 13, 2025
a978ad3
Loongarch64: add C functions of zgemm_ncopy_16
ErnstPeng May 13, 2025
5366902
Merge pull request #5261 from ErnstPeng/fix-lasx
martin-frbg May 13, 2025
9a7e3f1
kernel/riscv64:Fixed the bug of openblas_utest_ext failing in c/zgemv…
guoyuanplct May 13, 2025
8afddc1
Merge pull request #5262 from guoyuanplct/develop
martin-frbg May 14, 2025
4d21365
kernel/riscv64:Added support for omatcopy on riscv64.
guoyuanplct May 15, 2025
be9f755
Format Code
guoyuanplct May 15, 2025
7732a55
Add retry mechanism after deadlock timeout for c910v.
guoyuanplct May 16, 2025
0b0bb99
Merge pull request #5265 from guoyuanplct/develop
martin-frbg May 17, 2025
6680e05
Fix conditional inclusion of SGEMM_KERNEL_DIRECT
martin-frbg May 17, 2025
5a322f2
Merge pull request #5268 from martin-frbg/fix-dyn-sgemmdirect
martin-frbg May 17, 2025
b5456c1
Merge pull request #5260 from taoye9/enable_bf16_gemm_gemv_forward_on…
martin-frbg May 18, 2025
f2022c2
Remove sve capability from NeoverseN1 and specify CortexX2/A?10 as ar…
martin-frbg May 19, 2025
3473118
Merge pull request #5272 from martin-frbg/issue5271
martin-frbg May 19, 2025
8779eac
Do not add a 64 suffix to the library name if the user-provided suffi…
martin-frbg May 19, 2025
846a543
Merge pull request #5273 from martin-frbg/issue5259
martin-frbg May 19, 2025
4ca76d9
Expressly provide a shared libs option
martin-frbg May 19, 2025
a5f701c
Merge pull request #5274 from martin-frbg/issue5247
martin-frbg May 20, 2025
2351a98
Update 2D thread-partitioned GEMM for M << N case.
nakagawa-fj May 21, 2025
bd573a9
Expand mingw32 gfortran workaround to all versions after 14.1
martin-frbg May 21, 2025
42b7d1f
Fix addressing of alpha in CBLAS
martin-frbg May 21, 2025
9ef5995
Merge pull request #5277 from martin-frbg/fixmingw32
martin-frbg May 21, 2025
e2e6a4d
Merge pull request #5276 from nakagawa-fj/gemm_2d_thread_partitioning
martin-frbg May 21, 2025
20f2ba0
Move declaration of i for pre-C99 compilers
martin-frbg May 21, 2025
0163143
Merge pull request #5278 from martin-frbg/fixup5276
martin-frbg May 22, 2025
669c847
support extra flag for NaN handling
martin-frbg May 23, 2025
28f8fda
support flag for NaN/Inf handling and fix scaling of NaN/Inf values
martin-frbg May 23, 2025
cf06250
add handling of dummy2 flag
martin-frbg May 24, 2025
fb8dc8f
Add dummy2 flag handling
martin-frbg May 25, 2025
45fd2d9
Optimized the axpby function.
guoyuanplct May 29, 2025
d2003dc
del lines
guoyuanplct May 29, 2025
02267d8
Merge pull request #5288 from guoyuanplct/develop
martin-frbg May 29, 2025
2ae0191
fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small
guoyuanplct Jun 5, 2025
83fcab7
Merge branch 'develop' of https://github.com/guoyuanplct/OpenBLAS int…
guoyuanplct Jun 5, 2025
5442aff
Accumulate results in output register explicitly
arnej27959 Jun 8, 2025
bbdc265
Merge pull request #5294 from arnej27959/arnej/fix-arm64-register
martin-frbg Jun 10, 2025
fe220a0
Merge pull request #5291 from guoyuanplct/develop
martin-frbg Jun 10, 2025
f18b7a4
add dummy2 flag handling for inf/nan agnostic zeroing
martin-frbg Jun 11, 2025
1cefbea
Use generic SCAL kernels to address inf/nan handling for now
martin-frbg Jun 11, 2025
e12132a
Use generic C/ZSCAL kernels to address inf/nan handling for now
martin-frbg Jun 11, 2025
f4194fc
Merge branch 'develop' into la64_fixed_cscal_zscal
martin-frbg Jun 11, 2025
2e2691b
Merge pull request #5078 from XiWeiGu/la64_fixed_cscal_zscal
martin-frbg Jun 12, 2025
11ff18b
Merge pull request #5081 from XiWeiGu/kernel_generic_fixed_cscal_zscal
martin-frbg Jun 12, 2025
a86419f
Merge pull request #5280 from martin-frbg/zscal_x86_64
martin-frbg Jun 12, 2025
1589d0b
Merge pull request #5281 from martin-frbg/zscal_arm64
martin-frbg Jun 12, 2025
1408be5
Merge pull request #5282 from martin-frbg/zscal_power
martin-frbg Jun 12, 2025
d2855d3
Merge pull request #5285 from martin-frbg/zscal_zarch
martin-frbg Jun 12, 2025
63287e1
Merge pull request #5296 from martin-frbg/zscal_riscv
martin-frbg Jun 12, 2025
7c77537
Merge pull request #5297 from martin-frbg/zscal_x86_sparc
martin-frbg Jun 12, 2025
58eeb90
fix handling of dummy2
martin-frbg Jun 12, 2025
ca1ce84
Merge pull request #5298 from martin-frbg/fixup5281
martin-frbg Jun 12, 2025
549a9f1
Disable the default SSE kernels for CSCAL/ZSCAL for now
martin-frbg Jun 12, 2025
73af02b
use dummy2 as Inf/NAN handling flag
martin-frbg Jun 12, 2025
63272b6
Merge pull request #5299 from martin-frbg/x86_64-ssezscal
martin-frbg Jun 12, 2025
6bdc7f9
Merge pull request #5300 from martin-frbg/fixup5296
martin-frbg Jun 12, 2025
b3c9056
resync with the generic arm version for inf/nan handling
martin-frbg Jun 13, 2025
cc4b04a
Merge pull request #5301 from martin-frbg/zscal_mips_2
martin-frbg Jun 13, 2025
d36093d
temporarily change default C/ZSCAL to the non-asm implementation
martin-frbg Jun 13, 2025
e338d34
fix path
martin-frbg Jun 13, 2025
dbd5643
Merge pull request #5302 from martin-frbg/zscal_mips_3
martin-frbg Jun 13, 2025
5e393f2
fix source file used for sbgemmt/sbgemmtr
martin-frbg Jun 14, 2025
0ea173e
Merge pull request #5304 from martin-frbg/fixgemmtr_if
martin-frbg Jun 15, 2025
8747449
fix dimension used in nancheck (Reference-LAPACK PR 1135)
martin-frbg Jun 15, 2025
d8a2324
fix dimension used in nancheck (Reference-LAPACK PR 1135)
martin-frbg Jun 15, 2025
2a6beac
fix dimension used in transposition (Reference-LAPACK PR 1135)
martin-frbg Jun 15, 2025
f4e5177
fix dimension used in nancheck (Reference-LAPACK PR 1135)
martin-frbg Jun 15, 2025
906b9df
fix missing initialization
martin-frbg Jun 15, 2025
1804ff5
fix missing initialization
martin-frbg Jun 15, 2025
7f3093a
Merge pull request #5305 from martin-frbg/lapack1135
martin-frbg Jun 15, 2025
bad47bd
Fix too strict leading dimensions check in LAPACKE_?gesdd_work (Refer…
martin-frbg Jun 15, 2025
3fe7f19
Update the Changelog for version 0.3.30
martin-frbg Jun 15, 2025
f1097d1
Merge pull request #5306 from martin-frbg/lapack1131
martin-frbg Jun 15, 2025
1dd3960
Fix:Problem with identifying some ARM64 processors.
nakagawa-fj Jun 16, 2025
53cd6e7
Update Changelog.txt
martin-frbg Jun 16, 2025
85337c5
Merge pull request #5310 from nakagawa-fj/bugfix/identify_cpu_part_fo…
martin-frbg Jun 16, 2025
3318a2b
override CDOT and ZDOT with the generic C kernel
martin-frbg Jun 17, 2025
e684e36
Add 32bit manylinux to match what python wheel build tests use
martin-frbg Jun 17, 2025
5ad6435
Merge pull request #5312 from martin-frbg/x86cdot
martin-frbg Jun 18, 2025
e541bf6
support AmpereOne/OneA as NeoverseN1
martin-frbg Jun 18, 2025
c2342fc
Merge pull request #5314 from martin-frbg/dynampere1
martin-frbg Jun 18, 2025
79b4dd0
fix(arm): add .note.GNU-stack to ARM assembly to prevent writable-sta…
loss-and-quick Jun 18, 2025
1546599
Merge pull request #5315 from loss-and-quick/arm-exec-stack
martin-frbg Jun 18, 2025
157273f
another round of last minute updates for 0.3.30
martin-frbg Jun 18, 2025
d339bd5
Merge pull request #5308 from martin-frbg/changelog0330
martin-frbg Jun 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .cirrus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ task:
- export VALID_ARCHS="i386 x86_64"
- xcrun --sdk macosx --show-sdk-path
- xcodebuild -version
- export CC=/Applications/Xcode_15.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_15.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.5.sdk -arch x86_64"
- export CC=/Applications/Xcode_16.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_16.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.4.sdk -arch x86_64"
- make TARGET=CORE2 DYNAMIC_ARCH=1 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1 RANLIB="ls -l"
always:
config_artifacts:
Expand All @@ -78,8 +78,8 @@ task:
- export #PATH=/opt/homebrew/opt/llvm/bin:$PATH
- export #LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
- export #CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
- export CC=/Applications/Xcode_15.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_15.4.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS17.5.sdk -arch arm64 -miphoneos-version-min=10.0"
- export CC=/Applications/Xcode_16.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
- export CFLAGS="-O2 -unwindlib=none -Wno-macro-redefined -isysroot /Applications/Xcode_16.3.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS18.4.sdk -arch arm64 -miphoneos-version-min=10.0"
- xcrun --sdk iphoneos --show-sdk-path
- ls -l /Applications
- make TARGET=ARMV8 NUM_THREADS=32 HOSTCC=clang NOFORTRAN=1 CROSS=1
Expand Down Expand Up @@ -127,7 +127,7 @@ task:
FreeBSD_task:
name: FreeBSD-gcc
freebsd_instance:
image_family: freebsd-14-1
image_family: freebsd-14-2
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
compile_script:
Expand All @@ -138,7 +138,7 @@ FreeBSD_task:
FreeBSD_task:
name: freebsd-gcc-ilp64
freebsd_instance:
image_family: freebsd-14-1
image_family: freebsd-14-2
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
compile_script:
Expand All @@ -148,7 +148,7 @@ FreeBSD_task:
FreeBSD_task:
name: FreeBSD-clang-openmp
freebsd_instance:
image_family: freebsd-14-1
image_family: freebsd-14-2
install_script:
- pkg update -f && pkg upgrade -y && pkg install -y gmake gcc
- ln -s /usr/local/lib/gcc13/libgfortran.so.5.0.0 /usr/lib/libgfortran.so
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/apple_m.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ jobs:
mkdir build && cd build
cmake -DDYNAMIC_ARCH=1 \
-DUSE_OPENMP=${{matrix.openmp}} \
-DOpenMP_Fortran_LIB_NAMES=omp \
-DINTERFACE64=${{matrix.ilp64}} \
-DNOFORTRAN=0 \
-DBUILD_WITHOUT_LAPACK=0 \
Expand Down
51 changes: 41 additions & 10 deletions .github/workflows/c910v.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,28 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: install build deps
run: |
sudo apt-get update
sudo apt-get install autoconf automake autotools-dev ninja-build make ccache \
gcc-${{ matrix.apt_triple }} gfortran-${{ matrix.apt_triple }} libgomp1-riscv64-cross
gcc-${{ matrix.apt_triple }} gfortran-${{ matrix.apt_triple }} libgomp1-riscv64-cross libglib2.0-dev

- name: checkout qemu
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: T-head-Semi/qemu
repository: XUANTIE-RV/qemu
path: qemu
ref: 1e692ebb43d396c52352406323fc782c1ac99a42
ref: e0ace167effcd36d1f82c7ccb4522b3126011479 # xuantie-qemu-9.0

- name: build qemu
run: |
# Force use c910v qemu-user
wget https://github.com/revyos/qemu/commit/5164bca5a4bcde4534dc1a9aa3a7f619719874cf.patch
wget https://github.com/revyos/qemu/commit/222729c7455784dd855216d7a2bec4bd8f2a6800.patch
cd qemu
patch -p1 < ../5164bca5a4bcde4534dc1a9aa3a7f619719874cf.patch
patch -p1 < ../222729c7455784dd855216d7a2bec4bd8f2a6800.patch
export CXXFLAGS="-Wno-error"; export CFLAGS="-Wno-error"
./configure --prefix=$GITHUB_WORKSPACE/qemu-install --target-list=riscv64-linux-user --disable-system
make -j$(nproc)
make install
Expand Down Expand Up @@ -82,9 +83,39 @@ jobs:

- name: test
run: |
export PATH=$GITHUB_WORKSPACE/qemu-install/bin/:$PATH
qemu-riscv64 ./utest/openblas_utest
qemu-riscv64 ./utest/openblas_utest_ext
run_with_retry() {
local cmd="$1"
local time_out=10
local retries=10
local attempt=0

for ((i=1; i<=retries; i++)); do
attempt=$((i))
if timeout -s 12 --preserve-status $time_out $cmd; then
echo "Command succeeded on attempt $i."
return 0
else
local exit_code=$?
if [ $exit_code -eq 140 ]; then
echo "Attempt $i timed out (retrying...)"
time_out=$((time_out + 5))
else
echo "Attempt $i failed with exit code $exit_code. Aborting workflow."
exit $exit_code
fi
fi
done
echo "All $retries attempts failed, giving up."
echo "Final failure was due to timeout."
echo "Aborting workflow."
exit $exit_code
}
export PATH=$GITHUB_WORKSPACE/qemu-install/bin:$PATH
which qemu-riscv64
export QEMU_BIN=$(which qemu-riscv64)
run_with_retry "$QEMU_BIN ./utest/openblas_utest"
run_with_retry "$QEMU_BIN ./utest/openblas_utest_ext"

OPENBLAS_NUM_THREADS=2 qemu-riscv64 ./ctest/xscblat1
OPENBLAS_NUM_THREADS=2 qemu-riscv64 ./ctest/xdcblat1
OPENBLAS_NUM_THREADS=2 qemu-riscv64 ./ctest/xccblat1
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/codspeed-bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
os: [ubuntu-22.04]
fortran: [gfortran]
build: [make]
pyver: ["3.12"]
Expand Down Expand Up @@ -147,7 +147,7 @@ jobs:
OPENBLAS_NUM_THREADS=1 pytest benchmarks/bench_blas.py -k 'gesdd'

- name: Run benchmarks
uses: CodSpeedHQ/action@v2
uses: CodSpeedHQ/action@v3
with:
token: ${{ secrets.CODSPEED_TOKEN }}
run: |
Expand Down
24 changes: 23 additions & 1 deletion .github/workflows/dynamic_arch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ jobs:
run: |
if [ "$RUNNER_OS" == "Linux" ]; then
sudo apt-get update
sudo apt-get install -y gfortran cmake ccache libtinfo5
sudo apt-get install -y gfortran cmake ccache
wget http://security.ubuntu.com/ubuntu/pool/universe/n/ncurses/libtinfo5_6.3-2ubuntu0.1_amd64.deb
sudo apt install ./libtinfo5_6.3-2ubuntu0.1_amd64.deb
elif [ "$RUNNER_OS" == "macOS" ]; then
# It looks like "gfortran" isn't working correctly unless "gcc" is re-installed.
brew reinstall gcc
Expand Down Expand Up @@ -354,3 +356,23 @@ jobs:
- name: Build OpenBLAS
run: |
make -j$(nproc) HOSTCC="ccache gcc" CC="ccache ${{ matrix.triple }}-gcc" FC="ccache ${{ matrix.triple }}-gfortran" ARCH=${{ matrix.target }} ${{ matrix.opts }}

neoverse_build:
if: "github.repository == 'OpenMathLib/OpenBLAS'"
runs-on: ubuntu-24.04-arm

steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Install Dependencies
run: |
sudo apt-get update
sudo apt-get install -y gcc gfortran make

- name: Build OpenBLAS
run: |
make -j${nproc}
make -j${nproc} lapack-test


2 changes: 1 addition & 1 deletion .github/workflows/loongarch64_clang.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
- name: Install APT deps
run: |
sudo apt-get update
sudo apt-get install autoconf automake autotools-dev ninja-build make ccache
sudo apt-get install autoconf automake autotools-dev ninja-build make ccache libglib2.0-dev

- name: Download and install loongarch64-toolchain
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/mips64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,14 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install autoconf automake autotools-dev ninja-build make ccache \
gcc-${{ matrix.triple }} gfortran-${{ matrix.triple }} libgomp1-mips64el-cross
gcc-${{ matrix.triple }} gfortran-${{ matrix.triple }} libgomp1-mips64el-cross libglib2.0-dev

- name: checkout qemu
uses: actions/checkout@v3
with:
repository: qemu/qemu
path: qemu
ref: 79dfa177ae348bb5ab5f97c0915359b13d6186e2
ref: ae35f033b874c627d81d51070187fbf55f0bf1a7

- name: build qemu
run: |
Expand Down
Loading
Loading