Skip to content

fix: Prevent deadlock in decoupled BLS and use-after-free in async_exec pybind callback#439

Merged
pskiran1 merged 5 commits into
mainfrom
spolisetty/tri-1315-fix-ci-test-l0_backend_python-base
Jun 16, 2026
Merged

fix: Prevent deadlock in decoupled BLS and use-after-free in async_exec pybind callback#439
pskiran1 merged 5 commits into
mainfrom
spolisetty/tri-1315-fix-ci-test-l0_backend_python-base

Conversation

@pskiran1

@pskiran1 pskiran1 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Fixes the following bugs in pb_stub.cc:

Deadlock in decoupled BLS with GPU tensors:
Stub::GetCUDAMemoryPoolAddress is called on the ParentToStubMQMonitor thread, which also delivers decoupled BLS responses (ProcessBLSResponseDecoupled). After notifying the parent, it unconditionally blocked waiting for an acknowledgment (waiting_on_stub). On the success path the parent may itself be blocked waiting for a decoupled response that this now-blocked thread is supposed to deliver, creating a circular wait. This caused occasional hangs when a BLS model first returned GPU tensors, triggering CUDA pool initialization.

Use-after-free in async_exec pybind callback:
The async callback lambda captures the local stub shared_ptr by reference ([&stub, ...]). Because stub is a local variable, it could be destroyed before the callback ran on the event loop, leaving a dangling reference and causing a use-after-free.

CI: triton-inference-server/server#8830

@pskiran1 pskiran1 merged commit 4b3337c into main Jun 16, 2026
3 checks passed
@pskiran1 pskiran1 deleted the spolisetty/tri-1315-fix-ci-test-l0_backend_python-base branch June 16, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants