Skip to content

Commit 3ab1216

Browse files
authored
Release GIL when calling mkl_lapack::orgqr (#2850)
This PR fixes a deadlock in QR decomposition tests by adding GIL release before `mkl_lapack::orgqr` call. The hang occurred because: 1. dpnp/dpctl submits a `host_task` to manage Python object lifetimes 2. host_task needs to acquire the GIL to decrement reference counts 3. if the main thread holds the GIL during queue submission → deadlock 4. `orgqr` is currently implemented in oneMKL as GPU-to-Host reverse offload: ```cpp exec_q.submit([&](sycl::handler& cgh) { cgh.depends_on(depends); cgh.host_task([=]() { orgqr_host(...); }); }).wait(); ``` As a solution PR proposes to release GIL using before calling the OneMKL operations. The GIL is automatically reacquired when the function returns (RAII).
1 parent 206d85d commit 3ab1216

2 files changed

Lines changed: 10 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ Also, that release drops support for Python 3.9, making Python 3.10 the minimum
8888
* Fixed test tolerance issues for float16 intermediate precision that became visible when testing against conda-forge's NumPy [#2828](https://github.com/IntelPython/dpnp/pull/2828)
8989
* Ensured device aware dtype handling in `dpnp.identity` and `dpnp.gradient` [#2835](https://github.com/IntelPython/dpnp/pull/2835)
9090
* Fixed `dpnp.tensor.round` to use device-aware output dtype for boolean input [#2851](https://github.com/IntelPython/dpnp/pull/2851)
91+
* Resolved a deadlock in `dpnp.linalg.qr` by releasing the GIL before OneMKL `orgqr` call to prevent host tasks contention [#2850](https://github.com/IntelPython/dpnp/pull/2850)
9192

9293
### Security
9394

dpnp/backend/extensions/lapack/orgqr.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,8 +87,17 @@ static sycl::event orgqr_impl(sycl::queue &exec_q,
8787

8888
sycl::event orgqr_event;
8989
try {
90+
// Release GIL to avoid serialization of host task submissions
91+
// to the same queue in OneMKL
92+
py::gil_scoped_release lock{};
93+
9094
scratchpad = sycl::malloc_device<T>(scratchpad_size, exec_q);
9195

96+
// mkl_lapack::orgqr() is done through GPU-to-Host reverse offload:
97+
// exec_q.submit([&](sycl::handler& cgh) {
98+
// cgh.depends_on(depends);
99+
// cgh.host_task([=]() { orgqr_host(...); });
100+
// }).wait();
92101
orgqr_event = mkl_lapack::orgqr(
93102
exec_q,
94103
m, // The number of rows in the matrix; (0 ≤ m).

0 commit comments

Comments
 (0)