Skip to content

[ROCm/HIP] gfx1151: MLX tiny autograd program hangs on process exit in __hipUnregisterFatBinary (ROCm 7.2.2) #24

@flashburns

Description

@flashburns

ROCm 7.2.2 on Strix Halo (gfx1151) hangs on process exit for a tiny MLX autograd C++ repro.
This is not a generic MLX failure; it appears to be a HIP runtime teardown-path hang.

Repro environment

Host:

  • Machine: Strix Halo system (framework.lan)
  • Kernel: 6.19.12-artix1-1
  • Firmware pkg: linux-firmware 20260410-1
    Container:
  • Image: artixlinux/artixlinux:base-devel
  • GPU devices passed: /dev/dri, /dev/kfd
  • Security/runtime flags: --group-add video --security-opt seccomp=unconfined
  • OS in container: Artix Linux (the same distro as the host)
    ROCm-related package versions in container:
  • rocm-core 7.2.2-1
  • rocm-device-libs 2:7.2.2-1
  • rocm-llvm 2:7.2.2-1
  • hsa-rocr 7.2.2-1
  • rocblas 7.2.2-1
  • hipblas 7.2.2-1
  • hipblaslt 7.2.2-1
  • hiprand 7.2.2-1
  • rocsolver 7.2.2-1
  • rocsparse 7.2.2-1
  • rocthrust 7.2.2-1
  • rocprim 7.2.2-1
  • rocwmma 7.2.2-1
  • rocminfo 7.2.2-1
    MLX:
  • Repo: https://github.com/NripeshN/mlx
  • Commit: 39fac95d901c72175fce4baf973e375d4a054ba7
  • Build flags:
    • -DMLX_BUILD_ROCM=ON
    • -DMLX_ROCM_ARCHITECTURES=gfx1151
    • -DMLX_BUILD_PYTHON_BINDINGS=OFF
    • -DMLX_BUILD_EXAMPLES=ON
    • -DMLX_BUILD_TESTS=OFF

Minimal repro source

#include "mlx/mlx.h"
int main() {
  namespace mx = mlx::core;
  auto fn = [](mx::array x) { return mx::sum(mx::square(x)); };
  auto grad_fn = mx::grad(fn);
  auto x = mx::array({1.5f, 2.0f, 3.0f});
  auto d = grad_fn(x);
  mx::eval(d);
  mx::synchronize();
  return 0;
}

Repro command: timeout 20s env MLX_DEFAULT_DEVICE=gpu ./grad_exit_repro ; echo $?

You will get a Intermittent hang at process exit, you may need to do this a few times to get the hang, its not consistent.
I tried the same test on my desktop with an RX 7900 XT (gfx1100, Artix, ROCm 7.2.2), No hang, I ran it 100 times.

This bug also exists in the MLX "tutorial" example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions