Skip to content

Use clang instead of gcc to build on Linux#238

Merged
mattip merged 10 commits into
MacPython:mainfrom
mayeut:use-clang
Nov 30, 2025
Merged

Use clang instead of gcc to build on Linux#238
mattip merged 10 commits into
MacPython:mainfrom
mayeut:use-clang

Conversation

@mayeut

@mayeut mayeut commented Nov 22, 2025

Copy link
Copy Markdown
Contributor
  • I updated the package version in pyproject.toml and made sure the first 3 numbers match git describe --tags --abbrev=8 in OpenBLAS at the OPENBLAS_COMMIT. If I did not update OPENBLAS_COMMIT, I incremented the wheel build number (i.e. 0.3.29.0.0 to 0.3.29.0.1)

Builds on top of #230

The use of clang instead of gcc allows:

  • to get a very recent compiler that supports recent SIMD extensions without having to wait for a new gcc-toolset or update the manylinux base image.
  • to get faster builds when using QEMU

The clang install script might end-up included in manylinux images (see pypa/manylinux#1871) and has been copied directly from https://github.com/scikit-build/ninja-python-distributions/blob/master/scripts/install-static-clang.sh for now.

@mayeut

mayeut commented Nov 23, 2025

Copy link
Copy Markdown
Contributor Author

I though the fork tests hang would disappear after an OpenBlas update (looked like the issue mentioned in #229) but there are still random deadlocks in the fork test under QEMU (wether it's a QEMU one or just the fact that running QEMU increases the chance of an existing race condition to happen is yet to be determined).

It seems that aarch64 runners are much faster than x86_64 (for this workload) with QEMU builds going down from 1 hour to 40 minutes.

@mattip

mattip commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

I though the fork tests hang would disappear

One of the ppc64le runs succeeds, the other fails. The failed run prints

2025-11-23T08:09:19.4979912Z TEST 122/127 zgemv:2_0_nan_1_inf_1_incy_2 [OK]
2025-11-23T08:09:19.5031453Z TEST 123/127 potrf:bug_695 [OK]
2025-11-23T08:09:19.5115065Z TEST 124/127 potrf:smoketest_trivial [OK]
2025-11-23T08:09:19.7959236Z TEST 125/127 kernel_regress:skx_avx [OK]
2025-11-23T08:35:46.7868483Z ##[error]The action has timed out.

The successful run prints

2025-11-23T07:58:44.4121838Z TEST 122/127 zgemv:2_0_nan_1_inf_1_incy_2 [OK]
2025-11-23T07:58:44.4172979Z TEST 123/127 potrf:bug_695 [OK]
2025-11-23T07:58:44.4255760Z TEST 124/127 potrf:smoketest_trivial [OK]
2025-11-23T07:58:44.7111789Z TEST 125/127 kernel_regress:skx_avx [OK]
2025-11-23T07:59:10.0551932Z TEST 126/127 fork:safety [OK]
2025-11-23T07:59:10.0740217Z TEST 127/127 fork:safety_after_fork_in_parent [OK]

which suggests the problem is in fork:safety.

The test itself is the one from the scipy issue which is also the test in #229. I will try to debug it in a qemu docker container.

@mattip

mattip commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

Another problem: It seems this compiled shared object from the wheels-macos-latest-arm64-1-macosx- artifact suffers from the same segfault from issue #233 when testing the zladiv interface. Did something change in the way gfortran exports functions?

@mayeut

mayeut commented Nov 23, 2025

Copy link
Copy Markdown
Contributor Author

It seems this compiled shared object from the wheels-macos-latest-arm64-1-macosx- artifact suffers from the same segfault from issue #233 when testing the zladiv interface. Did something change in the way gfortran exports functions?

This PR does not touch the macOS build except for the OpenBLAS update which only has a limited diff compared to what's in main, the only thing related to fortran is OpenMathLib/OpenBLAS#5540 which seems right. Does main passes (or the current nightly build which uses the latest develop) ?

As a side note, this PR still uses gfortran on Linux.

@mattip

mattip commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

I will try to debug it in a qemu docker container.

It is a little convoluted to reproduce the cibuildwheel build since it uses build isolation, so maybe I am not 1:1 accurate but:

When I run the make command without QUIET_MAKE=1, I see it is using cc as the C compiler. Even when setting PATH=/opt/clang/bin:$PATH, cc is /opt/rh/gcc-toolset-14/root/usr/bin/cc.

@mayeut

mayeut commented Nov 23, 2025

Copy link
Copy Markdown
Contributor Author

I see it is using cc as the C compiler

CC, CXX & LDFLAGS are overriden by cibuildwheel at the end of pyproject.toml, maybe not the best way to do this for openblas-libs given how the install script is called (but it allows for easy overriding in pyproject.toml if needed).

@mattip

mattip commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

Maybe we could patch the Makefile to print out the compiler locations and versions just to be sure we are using the right ones

@mayeut

mayeut commented Nov 23, 2025

Copy link
Copy Markdown
Contributor Author

Maybe we could patch the Makefile to print out the compiler locations and versions just to be sure we are using the right ones

It's done at the end - once the build succeeds - that's how I found out gfortran was not found on macOS arm64, we might want to ask for this to also be printed early on.

@mattip

mattip commented Nov 23, 2025

Copy link
Copy Markdown
Collaborator

It's done at the end - once the build succeeds

+1, thanks

I reproduced the build locally on a x86_64 vm host and ran the test 100 times. It doesn't segfault.

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

It is a little convoluted to reproduce the cibuildwheel build since it uses build isolation, so maybe I am not 1:1 accurate but

You can keep the cibuildwheel container from being removed which helps for local debugging using CIBW_DEBUG_KEEP_CONTAINER =1

RUNNER_ARCH=aarch64 CIBW_MANYLINUX_X86_64_IMAGE=manylinux2014 NIGHTLY=false PLAT=x86_64 INTERFACE64=1 MB_ML_VER=2014 MB_ML_LIBC=manylinux OPENBLAS_COMMIT=v0.3.30-359-g29fab2b9 CIBW_DEBUG_KEEP_CONTAINER=1 cibuildwheel --only cp312-manylinux_x86_64

I reproduced the build locally on a x86_64 vm host and ran the test 100 times.

The build above does not deadlock when running with Rosetta 2.

for i in {1..10000}; do echo $i; /project/OpenBLAS/utest/openblas_utest; done

If I run with qemu user, I see the same random deadlocks as in CI:

curl -fsSLO https://github.com/tonistiigi/binfmt/releases/download/deploy%2Fv10.0.4-56/qemu_v10.0.4_linux-arm64.tar.gz
tar -xf qemu_v10.0.4_linux-arm64.tar.gz
for i in {1..100}; do echo $i; ./qemu-x86_64 /project/OpenBLAS/utest/openblas_utest; done

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

What platform are you running on?

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

It might be a qemu bug. I don't see anything relevant though in the qemu issue tracker. Can you try running on a x86_64 host even though it is slower?

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

What platform are you running on?

host is macOS M1
qemu-user used in the command line above is the one used in CI. They do not provide binaries targeting the host architecture so using Rosetta2 vs QEMU is the closest I can get to running the same openblas binary "natively" (Rosetta 2) vs under emulation with QEMU.

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

Can you try running on a x86_64 host even though it is slower?

Already done in CI, the same happened: https://github.com/MacPython/openblas-libs/actions/runs/19600277946/job/56130767510 running on x86_64 (all 4 QEMU builds running on x86_64 deadlocked)

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

Ahh, I am using docker directly

sudo docker run --rm -it --platform linux/ppc64le -v$PWD:/build \
    quay.io/pypa/manylinux_2_28_ppc64le:2025.11.09-2 /bin/bash

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

Could we separate this into two:

  • use clang compilers on linux
  • build ppc64le and s390x via qemu

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

Could we separate this into two

This builds on top of #230, if you cherry pick 18336d8 or merge my suggestions there, this will get us the QEMU part

Or do you want to do it the other way around ? i.e. First clang with Travis CI then QEMU ?

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

I don;t really know. Why do you think this is failing?

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

The whole move to cibuildwheel has made this less stable, harder to understand, and harder to debug, starting with the inconvenience of looking at compacted build logs. Maybe we should go back to multibuild.

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

Why do you think this is failing?

At the moment, I'm not really sure between a QEMU bug or an OpenBLAS bug for this hang. The fact that it never was reproduced using the same binary using Rosetta 2 transpilation tends to suggest it's likely an issue with QEMU but I don't discard the fact it could be a race condition in OpenBLAS. I won't look more into this before the week-end.
The hang happens randomly with both gcc & clang when running under QEMU. IMHO, it's definitely not related to clang.

What binary were you running on your x86_64 vm ?

@mattip

mattip commented Nov 24, 2025

Copy link
Copy Markdown
Collaborator

Ubuntu 24.04 with docker.io, running sudo docker run --rm -it --platform linux/ppc64le -v$PWD:/build quay.io/pypa/manylinux_2_28_ppc64le:2025.11.09-2 /bin/bash. It would be hard to convince me this is not a qemu problem, since the docker, native, and other linux tests (which also use pthreads) all succeed.

@mayeut

mayeut commented Nov 24, 2025

Copy link
Copy Markdown
Contributor Author

What version of qemu-user is registered on your x86_64 host which did not reproduce the issue ?

@mattip

mattip commented Nov 25, 2025

Copy link
Copy Markdown
Collaborator

I don't have qemu-user installed on the machine. I used docker --platform.

$ docker --version
Docker version 23.0.0, build e92dd87

I do have

$ qemu-ppc64le-static --version
qemu-ppc64le version 8.2.2 (Debian 1:8.2.2+ds-0ubuntu1.10)

@mayeut

mayeut commented Nov 29, 2025

Copy link
Copy Markdown
Contributor Author

The bug is indeed in QEMU https://gitlab.com/qemu-project/qemu/-/issues/3226 for this test.
However the fix I did not understand for #229 / OpenMathLib/OpenBLAS#5520 does not seem to fix the issue reported upstream when running the test case mentioned there. It does not require a race condition or even QEMU to reproduce that one (same thread tries to acquire the lock twice as described in the issue, it still deadlocks).

@mattip

mattip commented Nov 29, 2025

Copy link
Copy Markdown
Collaborator

However the fix I did not understand for #229 / OpenMathLib/OpenBLAS#5520 does not seem to fix the issue reported upstream

What are you trying?

@mayeut

mayeut commented Nov 30, 2025

Copy link
Copy Markdown
Contributor Author

What are you trying?

The reproducer you provided in the upstream issue.

Opened a PR adding this reproducer as a test case in OpenMathLib/OpenBLAS#5556

@mattip

mattip commented Nov 30, 2025

Copy link
Copy Markdown
Collaborator

Ahh, right, the patch in #229 was not a revert of the incorrect OpenMathLib/OpenBLAS#5170. Good catch.

Comment thread pyproject.toml
@mattip

mattip commented Nov 30, 2025

Copy link
Copy Markdown
Collaborator

Cool, CI is passing.

@mattip mattip merged commit 202309a into MacPython:main Nov 30, 2025
22 checks passed
@mattip

mattip commented Nov 30, 2025

Copy link
Copy Markdown
Collaborator

Thanks @mayeut

@mattip

mattip commented Nov 30, 2025

Copy link
Copy Markdown
Collaborator

I wonder how hard it would be to cache the compiler tarballs and manylinux images

@mayeut mayeut deleted the use-clang branch December 2, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants