Skip to content

Commit f88d0f8

Browse files
Fix async_io ops building error on Huawei Ascend NPU (#7894)
### Summary Fixes async_io ops building error on Huawei Ascend NPU. ### Environment | Item | Version | | -------------------------- | ------------------ | | kernel version | 5.15.0-101-generic | | torch version | 2.8.0+cpu | | deepspeed info | 0.18.7 | | deepspeed wheel compiled w | torch 2.8 | | torch_npu version | 2.8.0 | | ascend_cann version | 8.1.RC1 | Deepspeed config "zero_optimization.offload_optimizer.device" = "nvme" (device = "cpu" works). ### Error Messages When offloading from NPU to NVME, error occurs: ```text ImportError: /.../async_io.so: undefined symbol: _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl ``` nm tells that the symbol is declared but not defined, but it's found at "csrc/aio/py_lib/deepspeed_py_io_handle.cpp": ```sh nm async_io.so | rg _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl # U _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl ``` # Solution 1. `op_builder/npu/async_io.py`: ```python class AsyncIOBuilder(NPUOpBuilder): def sources(self): return [ 'csrc/aio/py_lib/deepspeed_py_copy.cpp', 'csrc/aio/py_lib/py_ds_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio_handle.cpp', 'csrc/aio/py_lib/deepspeed_aio_thread.cpp', 'csrc/aio/common/deepspeed_aio_utils.cpp', 'csrc/aio/common/deepspeed_aio_common.cpp', 'csrc/aio/common/deepspeed_aio_types.cpp', 'csrc/aio/py_lib/deepspeed_pin_tensor.cpp', # Adds 3 source files: 'csrc/aio/py_lib/deepspeed_py_io_handle.cpp', 'csrc/aio/py_lib/deepspeed_aio_op_desc.cpp', 'csrc/aio/py_lib/deepspeed_cpu_op.cpp' ] ``` 2. `csrc/aio/py_lib/deepspeed_cpu_op.cpp`: ```cpp #if defined(__ENABLE_CANN__) // `DS_BUILD_OPS=1 install.sh` complains that ‘torch_npu’ has not // been declared, so inlines `torch_npu::utils::is_npu`. if (_buffer.is_privateuseone()) { auto device = at::Device("npu:0"); _buffer.copy_(_cpu_buffer.to(device)); } #endif ``` Signed-off-by: Huang Yifan <yifan0610@foxmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
1 parent b6346bf commit f88d0f8

2 files changed

Lines changed: 5 additions & 2 deletions

File tree

csrc/aio/py_lib/deepspeed_cpu_op.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,9 @@ void cpu_op_desc_t::finish()
4545
if (_buffer.is_xpu()) { _buffer.copy_(_cpu_buffer.to(torch::kXPU)); }
4646
if (_buffer.is_cpu()) { _buffer.copy_(_cpu_buffer); }
4747
#if defined(__ENABLE_CANN__)
48-
if (torch_npu::utils::is_npu(_buffer)) {
48+
// `DS_BUILD_OPS=1 install.sh` complains that ‘torch_npu’ has not
49+
// been declared, so inline `torch_npu::utils::is_npu`.
50+
if (_buffer.is_privateuseone()) {
4951
auto device = at::Device("npu:0");
5052
_buffer.copy_(_cpu_buffer.to(device));
5153
}

op_builder/npu/async_io.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ def sources(self):
2525
'csrc/aio/py_lib/deepspeed_py_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio_handle.cpp',
2626
'csrc/aio/py_lib/deepspeed_aio_thread.cpp', 'csrc/aio/common/deepspeed_aio_utils.cpp',
2727
'csrc/aio/common/deepspeed_aio_common.cpp', 'csrc/aio/common/deepspeed_aio_types.cpp',
28-
'csrc/aio/py_lib/deepspeed_pin_tensor.cpp'
28+
'csrc/aio/py_lib/deepspeed_pin_tensor.cpp', 'csrc/aio/py_lib/deepspeed_py_io_handle.cpp',
29+
'csrc/aio/py_lib/deepspeed_aio_op_desc.cpp', 'csrc/aio/py_lib/deepspeed_cpu_op.cpp'
2930
]
3031

3132
def include_paths(self):

0 commit comments

Comments
 (0)