Skip to content

[UR][L0] Add L0 teardown notification support and simply teardown with proxy loader support#18496

Merged
sarnex merged 1 commit into
intel:syclfrom
nrspruit:L0_teardown_notification
May 19, 2025
Merged

[UR][L0] Add L0 teardown notification support and simply teardown with proxy loader support#18496
sarnex merged 1 commit into
intel:syclfrom
nrspruit:L0_teardown_notification

Conversation

@nrspruit

Copy link
Copy Markdown
Contributor
  • Add support for L0 teardown notification in the static L0 loader.
  • Add loading of the L0 dynamic loader in the windows proxy loader to ensure that the L0 dynamic loader is usable during teardown.

@nrspruit

Copy link
Copy Markdown
Contributor Author

related to oneapi-src/level-zero#333

@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 21:55 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:20 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:20 — with GitHub Actions Inactive
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from 8f1f54e to a21160d Compare May 15, 2025 22:37
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from a21160d to ecefdbe Compare May 15, 2025 22:44
@nrspruit nrspruit temporarily deployed to WindowsCILock May 15, 2025 22:45 — with GitHub Actions Inactive
@cperkinsintel

cperkinsintel commented May 16, 2025

Copy link
Copy Markdown
Contributor

@nrspruit - the test failures reported here seem legit. But they are strange.

The buffer_create.cpp test works fine for me if I build and run it manually. Everything working exactly as it should. But when run from llvm-lit, I can get this output:

# | 
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, nullptr)
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, ZeDevices.data())
# | ZE ---> zeDeviceGetProperties(ZeDevices[D], &device_properties)
# | ZE ---> zeDriverGetApiVersion(ZeDriver, &ZeApiVersion)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, nullptr)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, ZeExtensions.data())
# | ZE ---> zelLoaderTranslateHandle(ZEL_HANDLE_DRIVE
# | ...
# `---data was truncated--------

The data was truncated comes from llvm-lit. Essentially, there seemed to have been an error when zelLoaderTranslateHandle was called. I think the next argument it is supposed to log is the "ZeDriver", so there might be something going on. It didn't even output the last "R"

The test failed on Arc, I think. But I can repro the failure on pvc-01.

@nrspruit nrspruit force-pushed the L0_teardown_notification branch from ecefdbe to df6d57d Compare May 16, 2025 20:21
@nrspruit

Copy link
Copy Markdown
Contributor Author

@nrspruit - the test failures reported here seem legit. But they are strange.

The buffer_create.cpp test works fine for me if I build and run it manually. Everything working exactly as it should. But when run from llvm-lit, I can get this output:

# | 
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, nullptr)
# | ZE ---> zeDeviceGet(ZeDrivers[I], &ZeDeviceCount, ZeDevices.data())
# | ZE ---> zeDeviceGetProperties(ZeDevices[D], &device_properties)
# | ZE ---> zeDriverGetApiVersion(ZeDriver, &ZeApiVersion)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, nullptr)
# | ZE ---> zeDriverGetExtensionProperties(ZeDriver, &Count, ZeExtensions.data())
# | ZE ---> zelLoaderTranslateHandle(ZEL_HANDLE_DRIVE
# | ...
# `---data was truncated--------

The data was truncated comes from llvm-lit. Essentially, there seemed to have been an error when zelLoaderTranslateHandle was called. I think the next argument it is supposed to log is the "ZeDriver", so there might be something going on. It didn't even output the last "R"

The test failed on Arc, I think. But I can repro the failure on pvc-01.

that error has existed since before this change, so it may be a real issue, but it is not part of this PR.

See here:
https://github.com/intel/llvm/actions/runs/15042177600/job/42277602962

@nrspruit nrspruit marked this pull request as ready for review May 16, 2025 20:24
@nrspruit nrspruit requested review from a team as code owners May 16, 2025 20:24
@nrspruit nrspruit requested a review from maarquitos14 May 16, 2025 20:25
@cperkinsintel

Copy link
Copy Markdown
Contributor

Test failures are known: #18463

@nrspruit nrspruit force-pushed the L0_teardown_notification branch from df6d57d to 7229f1d Compare May 16, 2025 20:47
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 20:48 — with GitHub Actions Inactive
@sarnex sarnex requested review from a team as code owners May 16, 2025 21:00
@sarnex sarnex requested a review from keyradical May 16, 2025 21:00
@sarnex sarnex requested a review from jchlanda May 16, 2025 21:00
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 21:28 — with GitHub Actions Inactive
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 21:28 — with GitHub Actions Inactive
@gmlueck

gmlueck commented May 16, 2025

Copy link
Copy Markdown
Contributor

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

@nrspruit

Copy link
Copy Markdown
Contributor Author

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

the rebase failed, let me fix

…h proxy loader support

- Add support for L0 teardown notification in the static L0 loader.
- Add loading of the L0 dynamic loader in the windows proxy loader
  to ensure that the L0 dynamic loader is usable during teardown.

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@nrspruit nrspruit force-pushed the L0_teardown_notification branch from 7229f1d to 0987d29 Compare May 16, 2025 22:43
@nrspruit nrspruit temporarily deployed to WindowsCILock May 16, 2025 22:44 — with GitHub Actions Inactive
@nrspruit

Copy link
Copy Markdown
Contributor Author

I assume this PR has some sort of merge problem? Seems like a lot of unrelated things changed.

the rebase failed, let me fix

fixed, for some reason the pulldown was included with I rebased.

loadAdapter(UR_LIBRARY_NAME(adapter_native_cpu));
// Load the Level Zero loader dynamic library to ensure it is loaded during
// the runtime. This is necessary to avoid the level zero loader from being
// unloaded prematurely. the Only trusted loader is the one that is loaded

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// unloaded prematurely. the Only trusted loader is the one that is loaded
// unloaded prematurely. The only trusted loader is the one that is loaded

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess cuda reviewers were pulled in because of the rebase problem, but still, LGTM :)

@maarquitos14 maarquitos14 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nrspruit

Copy link
Copy Markdown
Contributor Author

@intel/llvm-gatekeepers , please merge when available, this enables usage of an improved short term and long term solution to the teardown issues in SYCL when using L0.

@sarnex sarnex merged commit 7540f3d into intel:sycl May 19, 2025
1 check was pending
KornevNikita pushed a commit that referenced this pull request May 27, 2025
…h proxy loader support (#18496)

- Add support for L0 teardown notification in the static L0 loader.
- Add loading of the L0 dynamic loader in the windows proxy loader to
ensure that the L0 dynamic loader is usable during teardown.

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants