UCT/ZE/ZE_IPC: enable level zero ipc support for Intel GPUs#11218
UCT/ZE/ZE_IPC: enable level zero ipc support for Intel GPUs#11218zhangxiaoli73 wants to merge 14 commits into
Conversation
|
@zhangxiaoli73 Please fix the commit titles and do a rebase to resolve the conflicts. Maybe consolidate the two commits at the same time since a forced push is inevitable. @yosefe I am assuming forced push is allow in this case. Is that right? If not, what is the recommended way to fix the commit titles? |
Got it. Let me know if I can force push to change the commit. |
118b1f6 to
8b1525f
Compare
# Conflicts: # src/uct/ze/base/ze_base.c # src/uct/ze/base/ze_base.h
- Remove iface_mem_element_pack from uct_iface_internal_ops (field removed in upstream) - Add uct_ze_base_get_device(int ordinal) helper that returns the root device handle by ordinal, replacing the legacy direct lookup via uct_ze_base_info.devices[i] which was renamed to uct_ze_base.devices[i].root_device Signed-off-by: yuanwu <yuan.wu@intel.com>
- test_ze_base: validates uct_ze_base_get_device / get_num_devices / get_device_ordinal helpers added during the upstream-master merge. - test_ze_ipc_md: smoke-tests component registration, MD resource query, MD open/close (verifies ze_device + ze_context populated per sub-device after the merge), and md_attr.reg_mem_types. - Wire HAVE_ZE block in test/gtest/Makefile.am: appends sources, ZE_CPPFLAGS, ZE_LDFLAGS, ZE_LIBS and libuct_ze.la. Signed-off-by: yuanwu <yuan.wu@intel.com>
Add end-to-end mem_reg/mkey_pack/mem_dereg test for ze_ipc_md (NIXL KV path), ze_ipc_cache lifecycle tests, and full ze_copy_md coverage (open/close, alloc/free, detect_memory_type, mem_query). 21 tests total under HAVE_ZE; all pass on Intel PVC. Signed-off-by: yuanwu <yuan.wu@intel.com>
Ze ipc updated
Signed-off-by: yuanwu <yuan.wu@intel.com>
AUTHORS: add yuanwu
|
@yosefe Please help to review. |
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
|
@zhangxiaoli73 @yafshar can we add a build with ZE to builds.sh so it will run in CI with at least basic compilation check? |
Yes, we can add a ZE compile check to builds.sh and run it in CI. For this to be a meaningful ZE check, the selected Linux job must have Level Zero headers and the ze_loader development package available (or an equivalent module/path setup). We should also force configuration with --with-ze so the job fails when ZE dependencies are missing. If we rely on the default auto-detect mode, ZE may be silently skipped and the build could pass without actually validating ZE compilation.
In parallel, I will investigate a production-grade ZE mock/simulation option and follow up with another separate PR, as that work is broader in scope. |
|
hi @yafshar , |
Add a dedicated 'ze' build_mode that runs configure-devel --with-ze and
verifies HAVE_ZE=1 in config.h. Mirrors the compile-only pattern used by
build_cuda / build_rocm: PR-CI only checks that ZE code keeps compiling
and linking; device gtests run on hardware lanes.
Changes:
- buildlib/tools/builds.sh:
* new build_ze(): strict path when require_ze=yes (used by the
dedicated lane), otherwise auto-skip when level_zero/ze_api.h
is missing so short/long flows on non-ZE containers stay green
* register 'build_ze' in base_tests and add a 'ze' build_mode
* thread require_ze through the Azure env-var unset guard
- buildlib/pr/main.yml: new container alias ubuntu2404_ze reusing the
existing doca-2.9.0 ubuntu24.04 image
- buildlib/pr/build_job.yml:
* new x86_64 matrix row ubuntu2404_ze (build_mode=ze,
require_ze=yes, install_ze_deps=yes)
* pre-build step that apt-installs libze-dev (falls back to
level-zero-dev) only when install_ze_deps=yes
* pass require_ze into the builds.sh env block
Locally validated: configure-devel --with-ze on Ubuntu 24.04 (libze-dev
1.27.0) produces UCT/UCM/Perf 'ze' modules, HAVE_ZE=1, and a full
make -j succeeds, linking libuct_ze.so / libucm_ze.so / libucx_perftest_ze
against -lze_loader.
Signed-off-by: yuanwu <yuan.wu@intel.com>
UCT/ZE/IPC: ze ipc updated


What?
Enable level zero IPC support for Intel GPUs, this PR adds:
Why?
We want to provide IPC transport for users within a single node.