Skip to content

Commit 413e058

Browse files
[UR][L0] Fix event refcount bugs in L0 v1 adapter barrier handling
Fixes three issues in the Level Zero v1 adapter that caused urEventWait to be called for an internal event crashes and segfaults when mixing multiple in-order queues with ext_oneapi_submit_barrier. 1. Fix insertBarrierIntoCmdList passing InterruptBasedEventsEnabled as IsMultiDevice to createEventAndAssociateQueue. Barrier events do not need multi-device visibility; pass false instead. 2. In urEventWait, replace die() with continue when hasExternalRefs() is false. A recycled event (RefCountExternal == 0) was already completed before recycling, so skipping the wait is safe. 3. In urEventRelease, use CleanupCompletedEvent (which is a no-op when CleanedUp is true) instead of calling urEventReleaseInternal directly. The old code double-released the internal refcount when CleanupEventListFromResetCmdList had already cleaned the event. UR_L0_SERIALIZE=2 masked the race by forcing synchronous execution. UR_L0_DISABLE_EVENTS_CACHING=1 turned the recycling into a delete, escalating the bug from a stale-data read to a segfault. Fixes: #21704 Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
1 parent 57aa8ab commit 413e058

1 file changed

Lines changed: 8 additions & 11 deletions

File tree

  • unified-runtime/source/adapters/level_zero

unified-runtime/source/adapters/level_zero/event.cpp

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -199,9 +199,10 @@ ur_result_t urEnqueueEventsWaitWithBarrierExt(
199199
[&Queue](ur_command_list_ptr_t CmdList, ur_ze_event_list_t &EventWaitList,
200200
ur_event_handle_t &Event, bool IsInternal,
201201
bool InterruptBasedEventsEnabled) {
202+
(void)InterruptBasedEventsEnabled;
202203
UR_CALL(createEventAndAssociateQueue(
203204
Queue, &Event, UR_COMMAND_EVENTS_WAIT_WITH_BARRIER, CmdList,
204-
IsInternal, InterruptBasedEventsEnabled));
205+
IsInternal, /* IsMultiDevice */ false));
205206

206207
Event->WaitList = EventWaitList;
207208

@@ -796,7 +797,7 @@ urEventWait(uint32_t NumEvents,
796797
//
797798
ur_event_handle_t_ *Event = ur_cast<ur_event_handle_t_ *>(e);
798799
if (!Event->hasExternalRefs())
799-
die("urEventWait must not be called for an internal event");
800+
continue;
800801

801802
ze_event_handle_t ZeHostVisibleEvent;
802803
if (auto Res = Event->getOrCreateHostVisibleEvent(ZeHostVisibleEvent))
@@ -822,7 +823,7 @@ urEventWait(uint32_t NumEvents,
822823
{
823824
std::shared_lock<ur_shared_mutex> EventLock(Event->Mutex);
824825
if (!Event->hasExternalRefs())
825-
die("urEventWait must not be called for an internal event");
826+
continue;
826827

827828
if (!Event->Completed) {
828829
auto HostVisibleEvent = Event->HostVisibleEvent;
@@ -894,15 +895,11 @@ urEventRelease(/** [in] handle of the event object */ ur_event_handle_t Event) {
894895
UR_CALL(urEventReleaseInternal(Event, &isEventDeleted));
895896
// If this is a Completed Event Wait Out Event, then we need to cleanup the
896897
// event at user release and not at the time of completion.
897-
// If the event is labelled as completed and no additional references are
898-
// removed, then we still need to decrement the event, but not mark as
899-
// completed.
898+
// Use CleanupCompletedEvent which is a no-op if the event was already
899+
// cleaned up (e.g. by CleanupEventListFromResetCmdList), preventing a
900+
// double-release of the internal reference count.
900901
if (isEventsWaitCompleted & !isEventDeleted) {
901-
if (Event->CleanedUp) {
902-
UR_CALL(urEventReleaseInternal(Event));
903-
} else {
904-
UR_CALL(CleanupCompletedEvent((Event), false, false));
905-
}
902+
UR_CALL(CleanupCompletedEvent((Event), false, false));
906903
}
907904

908905
return UR_RESULT_SUCCESS;

0 commit comments

Comments
 (0)