Skip to content

[UR][L0] Fix event refcount bugs in L0 v1 adapter barrier handling#21871

Open
winstonzhang-intel wants to merge 1 commit intointel:syclfrom
winstonzhang-intel:urlza-739
Open

[UR][L0] Fix event refcount bugs in L0 v1 adapter barrier handling#21871
winstonzhang-intel wants to merge 1 commit intointel:syclfrom
winstonzhang-intel:urlza-739

Conversation

@winstonzhang-intel
Copy link
Copy Markdown
Contributor

Fixes three issues in the Level Zero v1 adapter that caused urEventWait to be called for an internal event crashes and segfaults when mixing multiple in-order queues with ext_oneapi_submit_barrier.

  1. Fix insertBarrierIntoCmdList passing InterruptBasedEventsEnabled as IsMultiDevice to createEventAndAssociateQueue. Barrier events do not need multi-device visibility; pass false instead.

  2. In urEventWait, replace die() with continue when hasExternalRefs() is false. A recycled event (RefCountExternal == 0) was already completed before recycling, so skipping the wait is safe.

  3. In urEventRelease, use CleanupCompletedEvent (which is a no-op when CleanedUp is true) instead of calling urEventReleaseInternal directly. The old code double-released the internal refcount when CleanupEventListFromResetCmdList had already cleaned the event.

UR_L0_SERIALIZE=2 masked the race by forcing synchronous execution. UR_L0_DISABLE_EVENTS_CACHING=1 turned the recycling into a delete, escalating the bug from a stale-data read to a segfault.

Fixes: #21704

@winstonzhang-intel winstonzhang-intel requested a review from a team as a code owner April 24, 2026 18:02
Fixes three issues in the Level Zero v1 adapter that caused
urEventWait to be called for an internal event crashes
and segfaults when mixing multiple in-order queues with
ext_oneapi_submit_barrier.

1. Fix insertBarrierIntoCmdList passing InterruptBasedEventsEnabled
   as IsMultiDevice to createEventAndAssociateQueue. Barrier events
   do not need multi-device visibility; pass false instead.

2. In urEventWait, replace die() with continue when hasExternalRefs()
   is false. A recycled event (RefCountExternal == 0) was already
   completed before recycling, so skipping the wait is safe.

3. In urEventRelease, use CleanupCompletedEvent (which is a no-op
   when CleanedUp is true) instead of calling urEventReleaseInternal
   directly. The old code double-released the internal refcount when
   CleanupEventListFromResetCmdList had already cleaned the event.

UR_L0_SERIALIZE=2 masked the race by forcing synchronous execution.
UR_L0_DISABLE_EVENTS_CACHING=1 turned the recycling into a delete,
escalating the bug from a stale-data read to a segfault.

Fixes: intel#21704

Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
ur_event_handle_t_ *Event = ur_cast<ur_event_handle_t_ *>(e);
if (!Event->hasExternalRefs())
die("urEventWait must not be called for an internal event");
continue;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is urEventWait must not be called for an internal event no longer valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SYCL][L0] urEventWait must not be called for an internal event

2 participants