ROCm 7.2.2 on Strix Halo (gfx1151) hangs on process exit for a tiny MLX autograd C++ repro.
This is not a generic MLX failure; it appears to be a HIP runtime teardown-path hang.
Repro environment
Host:
- Machine: Strix Halo system (
framework.lan)
- Kernel:
6.19.12-artix1-1
- Firmware pkg:
linux-firmware 20260410-1
Container:
- Image:
artixlinux/artixlinux:base-devel
- GPU devices passed:
/dev/dri, /dev/kfd
- Security/runtime flags: --group-add video --security-opt seccomp=unconfined
- OS in container: Artix Linux (the same distro as the host)
ROCm-related package versions in container:
- rocm-core 7.2.2-1
- rocm-device-libs 2:7.2.2-1
- rocm-llvm 2:7.2.2-1
- hsa-rocr 7.2.2-1
- rocblas 7.2.2-1
- hipblas 7.2.2-1
- hipblaslt 7.2.2-1
- hiprand 7.2.2-1
- rocsolver 7.2.2-1
- rocsparse 7.2.2-1
- rocthrust 7.2.2-1
- rocprim 7.2.2-1
- rocwmma 7.2.2-1
- rocminfo 7.2.2-1
MLX:
- Repo:
https://github.com/NripeshN/mlx
- Commit:
39fac95d901c72175fce4baf973e375d4a054ba7
- Build flags:
-DMLX_BUILD_ROCM=ON
-DMLX_ROCM_ARCHITECTURES=gfx1151
-DMLX_BUILD_PYTHON_BINDINGS=OFF
-DMLX_BUILD_EXAMPLES=ON
-DMLX_BUILD_TESTS=OFF
Minimal repro source
#include "mlx/mlx.h"
int main() {
namespace mx = mlx::core;
auto fn = [](mx::array x) { return mx::sum(mx::square(x)); };
auto grad_fn = mx::grad(fn);
auto x = mx::array({1.5f, 2.0f, 3.0f});
auto d = grad_fn(x);
mx::eval(d);
mx::synchronize();
return 0;
}
Repro command: timeout 20s env MLX_DEFAULT_DEVICE=gpu ./grad_exit_repro ; echo $?
You will get a Intermittent hang at process exit, you may need to do this a few times to get the hang, its not consistent.
I tried the same test on my desktop with an RX 7900 XT (gfx1100, Artix, ROCm 7.2.2), No hang, I ran it 100 times.
This bug also exists in the MLX "tutorial" example.
ROCm 7.2.2 on Strix Halo (gfx1151) hangs on process exit for a tiny MLX autograd C++ repro.
This is not a generic MLX failure; it appears to be a HIP runtime teardown-path hang.
Repro environment
Host:
framework.lan)6.19.12-artix1-1linux-firmware 20260410-1Container:
artixlinux/artixlinux:base-devel/dev/dri,/dev/kfdROCm-related package versions in container:
MLX:
https://github.com/NripeshN/mlx39fac95d901c72175fce4baf973e375d4a054ba7-DMLX_BUILD_ROCM=ON-DMLX_ROCM_ARCHITECTURES=gfx1151-DMLX_BUILD_PYTHON_BINDINGS=OFF-DMLX_BUILD_EXAMPLES=ON-DMLX_BUILD_TESTS=OFFMinimal repro source
Repro command:
timeout 20s env MLX_DEFAULT_DEVICE=gpu ./grad_exit_repro ; echo $?You will get a Intermittent hang at process exit, you may need to do this a few times to get the hang, its not consistent.
I tried the same test on my desktop with an RX 7900 XT (gfx1100, Artix, ROCm 7.2.2), No hang, I ran it 100 times.
This bug also exists in the MLX "tutorial" example.