get memory info by mayuyuace · Pull Request #249 · vllm-project/vllm-xpu-kernels

mayuyuace · 2026-04-02T09:04:58Z

Should update l0 as below:

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null && \
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list && \
add-apt-repository -y ppa:kobuk-team/intel-graphics
apt-get remove level-zero
apt install -y libze1 libze-dev

and update NEO after this version:
https://github.com/intel/compute-runtime/releases/tag/26.18.38308.1

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Copilot

Pull request overview

Adds a new XPU memory info operator backed by Level Zero and exposes it through Torch bindings, with an accompanying pytest.

Changes:

Implement getMemoryInfo(device_index) using Level Zero device properties.
Register the new op in the Torch extension and wire it into the build.
Add a pytest validating returned values against torch.xpu.mem_get_info.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/test_get_memory_info.py	Adds coverage for the new `getMemoryInfo` op vs PyTorch’s XPU memory API
csrc/utils/mem_info.cpp	Implements Level Zero-based total/usable memory queries
csrc/torch_bindings.cpp	Registers the new Torch op schema + implementation
csrc/ops.h	Declares `getMemoryInfo` for binding
CMakeLists.txt	Adds build/link flags and compiles the new source file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T09:12:22Z

+#include <level_zero/ze_api.h>
+#include <sycl/sycl.hpp>
+
+#include <iostream>


std::tuple and std::numeric_limits are used in this file but the required standard headers are not included. This can fail to compile depending on transitive includes. Add the missing includes (<tuple> and <limits>) explicitly in this file.

Suggested change

#include <iostream>

#include <iostream>

#include <tuple>

#include <limits>

Copilot · 2026-04-02T09:12:23Z

+  zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);
+  auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];


The Level Zero APIs return a ze_result_t, but the results are currently ignored. If any of these calls fail, the code may return incorrect sizes (e.g., memoryCount staying 0) without surfacing an error to Python. Capture the return values and raise a proper Torch error (e.g., TORCH_CHECK(result == ZE_RESULT_SUCCESS, ...)) so failures are actionable.

Copilot · 2026-04-02T09:12:23Z

+    pMemoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;
+    pMemoryProperties[mem].pNext = nullptr;
+  }
+  zeDeviceGetMemoryProperties(device, &memoryCount, pMemoryProperties);


The Level Zero APIs return a ze_result_t, but the results are currently ignored. If any of these calls fail, the code may return incorrect sizes (e.g., memoryCount staying 0) without surfacing an error to Python. Capture the return values and raise a proper Torch error (e.g., TORCH_CHECK(result == ZE_RESULT_SUCCESS, ...)) so failures are actionable.

Copilot · 2026-04-02T09:12:23Z

+  deviceProperties.stype = ZE_STRUCTURE_TYPE_DEVICE_PROPERTIES;
+  deviceProperties.pNext = &usableMemProps;
+
+  zeDeviceGetProperties(device, &deviceProperties);


The Level Zero APIs return a ze_result_t, but the results are currently ignored. If any of these calls fail, the code may return incorrect sizes (e.g., memoryCount staying 0) without surfacing an error to Python. Capture the return values and raise a proper Torch error (e.g., TORCH_CHECK(result == ZE_RESULT_SUCCESS, ...)) so failures are actionable.

Copilot · 2026-04-02T09:12:24Z

+
+size_t getTotalMemory(ze_device_handle_t& device) {
+  uint32_t memoryCount = 0;
+  zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);
+  auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];
+  for (uint32_t mem = 0; mem < memoryCount; ++mem) {
+    pMemoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;
+    pMemoryProperties[mem].pNext = nullptr;
+  }
+  zeDeviceGetMemoryProperties(device, &memoryCount, pMemoryProperties);
+  size_t totalMemory = 0;
+  for (uint32_t mem = 0; mem < memoryCount; ++mem) {
+    totalMemory += pMemoryProperties[mem].totalSize;
+  }
+  delete[] pMemoryProperties;


Manual new[]/delete[] makes this code harder to maintain and less exception-safe. Prefer std::vector<ze_device_memory_properties_t> sized to memoryCount and pass .data() to Level Zero. This also simplifies cleanup and reduces the chance of leaks on early returns/errors.

Suggested change

size_t getTotalMemory(ze_device_handle_t& device) {

uint32_t memoryCount = 0;

zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);

auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

pMemoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;

pMemoryProperties[mem].pNext = nullptr;

}

zeDeviceGetMemoryProperties(device, &memoryCount, pMemoryProperties);

size_t totalMemory = 0;

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

totalMemory += pMemoryProperties[mem].totalSize;

}

delete[] pMemoryProperties;

#include <vector>

size_t getTotalMemory(ze_device_handle_t& device) {

uint32_t memoryCount = 0;

zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);

std::vector<ze_device_memory_properties_t> memoryProperties(memoryCount);

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

memoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;

memoryProperties[mem].pNext = nullptr;

}

zeDeviceGetMemoryProperties(device, &memoryCount, memoryProperties.data());

size_t totalMemory = 0;

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

totalMemory += memoryProperties[mem].totalSize;

}

Copilot · 2026-04-02T09:12:24Z

+
+size_t getTotalMemory(ze_device_handle_t& device) {
+  uint32_t memoryCount = 0;
+  zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);
+  auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];
+  for (uint32_t mem = 0; mem < memoryCount; ++mem) {
+    pMemoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;
+    pMemoryProperties[mem].pNext = nullptr;
+  }
+  zeDeviceGetMemoryProperties(device, &memoryCount, pMemoryProperties);
+  size_t totalMemory = 0;
+  for (uint32_t mem = 0; mem < memoryCount; ++mem) {
+    totalMemory += pMemoryProperties[mem].totalSize;
+  }
+  delete[] pMemoryProperties;


Manual new[]/delete[] makes this code harder to maintain and less exception-safe. Prefer std::vector<ze_device_memory_properties_t> sized to memoryCount and pass .data() to Level Zero. This also simplifies cleanup and reduces the chance of leaks on early returns/errors.

Suggested change

size_t getTotalMemory(ze_device_handle_t& device) {

uint32_t memoryCount = 0;

zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);

auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

pMemoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;

pMemoryProperties[mem].pNext = nullptr;

}

zeDeviceGetMemoryProperties(device, &memoryCount, pMemoryProperties);

size_t totalMemory = 0;

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

totalMemory += pMemoryProperties[mem].totalSize;

}

delete[] pMemoryProperties;

#include <vector>

size_t getTotalMemory(ze_device_handle_t& device) {

uint32_t memoryCount = 0;

zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);

std::vector<ze_device_memory_properties_t> memoryProperties(memoryCount);

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

memoryProperties[mem].stype = ZE_STRUCTURE_TYPE_DEVICE_MEMORY_PROPERTIES;

memoryProperties[mem].pNext = nullptr;

}

zeDeviceGetMemoryProperties(device, &memoryCount, memoryProperties.data());

size_t totalMemory = 0;

for (uint32_t mem = 0; mem < memoryCount; ++mem) {

totalMemory += memoryProperties[mem].totalSize;

}

Copilot · 2026-04-02T09:12:24Z

+  if (total > static_cast<size_t>(std::numeric_limits<int64_t>::max()) ||
+      free > static_cast<size_t>(std::numeric_limits<int64_t>::max())) {
+    std::cerr << "Memory size exceeds int64_t max value!" << std::endl;
+    return {-1, -1};  // or handle this case as appropriate
+  }


Returning (-1, -1) on overflow and printing to stderr is not a great contract for a Torch op (callers can’t reliably distinguish errors vs real values, and stderr may be lost). Prefer raising a Torch exception (e.g., TORCH_CHECK(false, ...)) so Python callers get a clear failure with a message.

Copilot · 2026-04-02T09:12:24Z

  list(APPEND SYCL_FLAGS "-fsycl")
+  # For l0 ops like getMemoryInfo
+  list(APPEND SYCL_FLAGS "-lze_loader")
+  list(APPEND SYCL_FLAGS "-Wno-unused-command-line-argument")


Linker flags (-lze_loader) should not be added to compile flags (SYCL_FLAGS). The follow-up -Wno-unused-command-line-argument looks like it’s masking the resulting warning rather than fixing the root cause. Move -lze_loader to link settings only (or preferably use target_link_libraries(... ze_loader) / target_link_options) and drop the unused-command-line suppression if it becomes unnecessary.

Copilot · 2026-04-02T09:12:25Z

+
+    if not torch.ops._xpu_C.is_pvc(device):
+        assert total == ref_total
+    assert free == 0


The test asserts free == 0, but getMemoryInfo() returns the device’s usable/free memory, which should generally be non-zero on an idle device. This makes the test fail (or pass only by accident if the API returns 0 due to unsupported properties). Instead, assert free against ref_free (and total against ref_total, including the PVC case if behavior is expected to match), or assert a sensible relation like 0 <= free <= total if exact match isn’t guaranteed.

Suggested change

assert free == 0

assert free == ref_free

else:

assert 0 <= free <= total

jikunshang · 2026-04-02T10:17:22Z

+
+@pytest.mark.parametrize("device", DEVICES)
+def test_get_memory_info(device) -> None:
+    free, total = torch.ops._C_cache_ops.getMemoryInfo(device)


why place it in _C_cache_ops? I prefer _xpu_C

Three files named mem_XXX are put at csrc/utils and placed in _C_cache_ops or _C.
What should it be?

I feel such mem api is for memory copy which will be used in cache context. @chaojun-zhang any comments?

yma11 · 2026-04-17T01:04:19Z

+
+    if not torch.ops._xpu_C.is_pvc(device):
+        assert total == ref_total
+    assert free == 0


free==0 because current UMD still return 0 and it will return correct usable memory after UMD upgrade?

Yes, and this PR should not be merge until UMD is ready.

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

mayuyuace added 2 commits April 2, 2026 09:01

getMemoryInfo

2f02cdf

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

file mode

534d8da

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Copilot AI review requested due to automatic review settings April 2, 2026 09:04

Copilot started reviewing on behalf of mayuyuace April 2, 2026 09:11 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

jikunshang reviewed Apr 2, 2026

View reviewed changes

yma11 reviewed Apr 17, 2026

View reviewed changes

mayuyuace marked this pull request as draft April 17, 2026 01:07

jikunshang and others added 3 commits May 13, 2026 11:55

Merge branch 'main' into qiming/getMemoryInfo

20f99fc

update UT

a5716d7

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

pre-commit mode

fea1026

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

		zeDeviceGetMemoryProperties(device, &memoryCount, nullptr);
		auto pMemoryProperties = new ze_device_memory_properties_t[memoryCount];

-    assert free == 0
+        assert free == ref_free
+    else:
+        assert 0 <= free <= total

Conversation

mayuyuace commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

mayuyuace Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jikunshang Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

yma11 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

mayuyuace Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayuyuace commented Apr 2, 2026 •

edited

Loading