Skip to content

Commit ba5ffab

Browse files
authored
Address review feedback on device tensor helpers (pytorch#20078) (pytorch#20078)
Summary: Follow-up to D99913077 applying review feedback on the TensorPtr device tensor helpers: aliasing make_tensor_ptr now preserves device metadata, clone_tensor_ptr requires a CPU source, device alloc/copy failures report their error codes, and the device test is pinned to its abort messages and built in non-aten Buck/CMake/OSS configs. device_allocator moves to exported_deps so the exported header compiles for aten consumers. Mirrored in fbcode and xplat. Also replaces the two device-transfer helpers `clone_tensor_ptr_to_device` and `clone_tensor_ptr_to_cpu` with a single `clone_tensor_ptr_to(tensor, target)` keyed on the target device. The direction (host-to-device or device-to-host) is inferred from the source and target, which removes the asymmetry where one helper named the device and the other inferred it, and removes the footgun where `clone_tensor_ptr_to_device(t, CPU)` aborted. CPU-to-CPU and device-to-device are rejected with clear messages; `clone_tensor_ptr` remains the same-device copy and the `make_tensor_ptr` device tag is unchanged. This mirrors ATen's single `to(device)` and keeps the public surface minimal. The `extension-tensor.md` guide and its ATen equivalence table are updated to match. This also fixes a pre-existing portable-build break: the aliasing `make_tensor_ptr(const Tensor&)` overload passed `device_type()` and `device_index()` as two separate arguments to a primary factory that takes a single `Device`, so the non-`USE_ATEN_LIB` build did not compile; it now wraps them in a `Device`. Reviewed By: Gasoonjia Differential Revision: D106842466
1 parent c4e3db0 commit ba5ffab

7 files changed

Lines changed: 164 additions & 150 deletions

File tree

docs/source/extension-tensor.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,22 @@ auto tensor = clone_tensor_ptr(original_tensor);
199199

200200
Note that, regardless of whether the original `TensorPtr` owns the data or not, the newly created `TensorPtr` will own a copy of the data.
201201

202+
#### Cloning To or From a Device
203+
204+
If a tensor lives on CPU and you want a copy on an accelerator, or the other way around, use `clone_tensor_ptr_to` with the device you want. It allocates memory on the target device, copies the data for you, and the returned `TensorPtr` owns that memory.
205+
206+
```cpp
207+
auto cpu_tensor = make_tensor_ptr({2, 3}, {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f});
208+
209+
// CPU to device:
210+
auto device_tensor = clone_tensor_ptr_to(cpu_tensor, DeviceType::CUDA);
211+
212+
// Device back to CPU:
213+
auto host_tensor = clone_tensor_ptr_to(device_tensor, DeviceType::CPU);
214+
```
215+
216+
The direction is chosen from the source and target device. This needs a `DeviceAllocator` registered for the device, so it is available only in the portable (non-`USE_ATEN_LIB`) build. For a plain CPU-to-CPU copy, use `clone_tensor_ptr` instead.
217+
202218
### Resizing Tensors
203219
204220
The `TensorShapeDynamism` enum specifies the mutability of a tensor's shape:
@@ -375,6 +391,7 @@ Here's a table matching `TensorPtr` creation functions with their corresponding
375391
| `at::tensor(data, type)` | `make_tensor_ptr(data, type)` |
376392
| `at::tensor(data, type).reshape(sizes)` | `make_tensor_ptr(sizes, data, type)` |
377393
| `tensor.clone()` | `clone_tensor_ptr(tensor)` |
394+
| `tensor.to(device)` | `clone_tensor_ptr_to(tensor, device)` |
378395
| `tensor.resize_(new_sizes)` | `resize_tensor_ptr(tensor, new_sizes)` |
379396
| `at::scalar_tensor(value)` | `scalar_tensor(value)` |
380397
| `at::from_blob(data, sizes, type)` | `from_blob(data, sizes, type)` |

extension/tensor/targets.bzl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ def define_common_targets():
2424
],
2525
visibility = ["PUBLIC"],
2626
deps = [
27-
"//executorch/runtime/core:device_allocator",
2827
"//executorch/runtime/core/exec_aten/util:dim_order_util" + aten_suffix,
2928
"//executorch/runtime/core/exec_aten/util:tensor_util" + aten_suffix,
3029
],
3130
exported_deps = [
31+
"//executorch/runtime/core:device_allocator",
3232
"//executorch/runtime/core/exec_aten:lib" + aten_suffix,
3333
"//executorch/runtime/core/exec_aten/util:scalar_type_util" + aten_suffix,
3434
],

extension/tensor/tensor_ptr.cpp

Lines changed: 65 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,15 @@ TensorPtr make_tensor_ptr(
198198
TensorPtr clone_tensor_ptr(
199199
const executorch::aten::Tensor& tensor,
200200
executorch::aten::ScalarType type) {
201+
#ifndef USE_ATEN_LIB
202+
ET_CHECK_MSG(
203+
tensor.device_type() == runtime::etensor::DeviceType::CPU,
204+
"clone_tensor_ptr only supports CPU tensors; use clone_tensor_ptr_to with a CPU target first.");
205+
#else // USE_ATEN_LIB
206+
ET_CHECK_MSG(
207+
tensor.is_cpu(),
208+
"clone_tensor_ptr only supports CPU tensors; move it to CPU first (e.g. tensor.to(torch::kCPU)).");
209+
#endif // USE_ATEN_LIB
201210
std::vector<executorch::aten::SizesType> sizes(
202211
tensor.sizes().begin(), tensor.sizes().end());
203212
std::vector<executorch::aten::DimOrderType> dim_order{
@@ -252,11 +261,11 @@ TensorPtr clone_tensor_ptr(
252261
} ctx;
253262

254263
ET_SWITCH_REALHBBF16_AND_UINT_TYPES(
255-
tensor_type, ctx, "clone_tensor_ptr_from", CTYPE_FROM, [&] {
264+
tensor_type, ctx, "clone_tensor_ptr_cast_from", CTYPE_FROM, [&] {
256265
const CTYPE_FROM* tensor_data_ptr =
257266
static_cast<const CTYPE_FROM*>(tensor_data);
258267
ET_SWITCH_REALHBBF16_AND_UINT_TYPES(
259-
type, ctx, "clone_tensor_ptr_to", CTYPE_TO, [&] {
268+
type, ctx, "clone_tensor_ptr_cast_to", CTYPE_TO, [&] {
260269
CTYPE_TO* data_ptr = reinterpret_cast<CTYPE_TO*>(data.data());
261270
std::transform(
262271
tensor_data_ptr,
@@ -285,98 +294,84 @@ runtime::Error resize_tensor_ptr(
285294
sizes.data(), sizes.size()));
286295
}
287296

288-
// ---- Device tensor helpers ----
297+
// ---- Device tensor helper ----
289298
//
290-
// These helpers rely on the ExecuTorch DeviceAllocator and the portable tensor
299+
// This helper relies on the ExecuTorch DeviceAllocator and the portable tensor
291300
// metadata APIs (dim_order, shape_dynamism, device), which have no equivalent
292-
// in USE_ATEN_LIB builds, so they are compiled out there.
301+
// in USE_ATEN_LIB builds, so it is compiled out there.
293302

294303
#ifndef USE_ATEN_LIB
295304

296-
TensorPtr clone_tensor_ptr_to_device(
297-
const TensorPtr& cpu_tensor,
298-
executorch::aten::Device device) {
305+
TensorPtr clone_tensor_ptr_to(
306+
const TensorPtr& tensor,
307+
executorch::aten::Device target) {
308+
const auto source = tensor->device();
299309
ET_CHECK_MSG(
300-
cpu_tensor->device().is_cpu(),
301-
"Source tensor must reside on CPU; got device type %d.",
302-
static_cast<int>(cpu_tensor->device_type()));
303-
310+
!(source.is_cpu() && target.is_cpu()),
311+
"clone_tensor_ptr_to does not copy CPU-to-CPU; use clone_tensor_ptr.");
304312
ET_CHECK_MSG(
305-
!device.is_cpu(),
306-
"Target device must not be CPU; use clone_tensor_ptr for CPU-to-CPU copies.");
313+
source.is_cpu() || target.is_cpu(),
314+
"Device-to-device copy is not supported; route through CPU.");
307315

316+
const auto nbytes = tensor->nbytes();
317+
const auto* src_data = tensor->const_data_ptr();
318+
ET_CHECK_MSG(src_data != nullptr, "Source tensor has no data.");
319+
320+
// Whichever end is not CPU provides the allocator.
321+
const auto device = target.is_cpu() ? source : target;
308322
auto* allocator = runtime::get_device_allocator(device.type());
309323
ET_CHECK_MSG(
310324
allocator != nullptr,
311325
"No device allocator registered for device type %d",
312326
static_cast<int>(device.type()));
313327

314-
const auto nbytes = cpu_tensor->nbytes();
315-
const auto* cpu_data = cpu_tensor->const_data_ptr();
316-
ET_CHECK_MSG(cpu_data != nullptr, "Source tensor has no data.");
317-
318-
auto result = allocator->allocate(nbytes, device.index());
319-
ET_CHECK_MSG(result.ok(), "Failed to allocate device memory.");
320-
void* device_data = result.get();
321-
322-
auto err = allocator->copy_host_to_device(
323-
device_data, cpu_data, nbytes, device.index());
324-
ET_CHECK_MSG(err == runtime::Error::Ok, "Host-to-device copy failed.");
325-
326328
std::vector<executorch::aten::SizesType> sizes(
327-
cpu_tensor->sizes().begin(), cpu_tensor->sizes().end());
329+
tensor->sizes().begin(), tensor->sizes().end());
328330
std::vector<executorch::aten::DimOrderType> dim_order(
329-
cpu_tensor->dim_order().begin(), cpu_tensor->dim_order().end());
331+
tensor->dim_order().begin(), tensor->dim_order().end());
330332
std::vector<executorch::aten::StridesType> strides(
331-
cpu_tensor->strides().begin(), cpu_tensor->strides().end());
333+
tensor->strides().begin(), tensor->strides().end());
334+
335+
if (target.is_cpu()) {
336+
std::vector<uint8_t> cpu_data(nbytes);
337+
auto err = allocator->copy_device_to_host(
338+
cpu_data.data(), src_data, nbytes, source.index());
339+
ET_CHECK_MSG(
340+
err == runtime::Error::Ok,
341+
"Device-to-host copy failed: error %d",
342+
static_cast<int>(err));
343+
return make_tensor_ptr(
344+
std::move(sizes),
345+
std::move(cpu_data),
346+
std::move(dim_order),
347+
std::move(strides),
348+
tensor->scalar_type(),
349+
tensor->shape_dynamism());
350+
}
332351

352+
auto result = allocator->allocate(nbytes, target.index());
353+
ET_CHECK_MSG(
354+
result.ok(),
355+
"Failed to allocate device memory: error %d",
356+
static_cast<int>(result.error()));
357+
void* device_data = result.get();
358+
auto err = allocator->copy_host_to_device(
359+
device_data, src_data, nbytes, target.index());
360+
ET_CHECK_MSG(
361+
err == runtime::Error::Ok,
362+
"Host-to-device copy failed: error %d",
363+
static_cast<int>(err));
333364
return make_tensor_ptr(
334365
std::move(sizes),
335366
device_data,
336367
std::move(dim_order),
337368
std::move(strides),
338-
cpu_tensor->scalar_type(),
339-
cpu_tensor->shape_dynamism(),
340-
[allocator, device](void* ptr) {
341-
allocator->deallocate(ptr, device.index());
369+
tensor->scalar_type(),
370+
tensor->shape_dynamism(),
371+
[allocator, target](void* ptr) {
372+
allocator->deallocate(ptr, target.index());
342373
},
343-
device);
344-
}
345-
346-
TensorPtr clone_tensor_ptr_to_cpu(const TensorPtr& device_tensor) {
347-
const auto nbytes = device_tensor->nbytes();
348-
const auto* device_data = device_tensor->const_data_ptr();
349-
ET_CHECK_MSG(device_data != nullptr, "Source device tensor has no data.");
350-
351-
const auto device = device_tensor->device();
352-
ET_CHECK_MSG(!device.is_cpu(), "Source tensor is already on CPU.");
353-
354-
auto* allocator = runtime::get_device_allocator(device.type());
355-
ET_CHECK_MSG(
356-
allocator != nullptr,
357-
"No device allocator registered for device type %d",
358-
static_cast<int>(device.type()));
359-
360-
std::vector<uint8_t> cpu_data(nbytes);
361-
362-
auto err = allocator->copy_device_to_host(
363-
cpu_data.data(), device_data, nbytes, device.index());
364-
ET_CHECK_MSG(err == runtime::Error::Ok, "Device-to-host copy failed.");
365-
366-
std::vector<executorch::aten::SizesType> sizes(
367-
device_tensor->sizes().begin(), device_tensor->sizes().end());
368-
std::vector<executorch::aten::DimOrderType> dim_order(
369-
device_tensor->dim_order().begin(), device_tensor->dim_order().end());
370-
std::vector<executorch::aten::StridesType> strides(
371-
device_tensor->strides().begin(), device_tensor->strides().end());
372-
373-
return make_tensor_ptr(
374-
std::move(sizes),
375-
std::move(cpu_data),
376-
std::move(dim_order),
377-
std::move(strides),
378-
device_tensor->scalar_type(),
379-
device_tensor->shape_dynamism());
374+
target);
380375
}
381376

382377
#endif // USE_ATEN_LIB

extension/tensor/tensor_ptr.h

Lines changed: 29 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ using TensorPtr = std::shared_ptr<executorch::aten::Tensor>;
3636
* allocated or copied. The caller is responsible for ensuring `data` already
3737
* lives on the requested device; construct the `executorch::aten::Device` from
3838
* the runtime environment and pass it in. To copy CPU data to a device, use
39-
* `clone_tensor_ptr_to_device` instead.
39+
* `clone_tensor_ptr_to` instead.
4040
*
4141
* @param sizes A vector specifying the size of each dimension.
4242
* @param data A pointer to the data buffer (CPU or device, see device).
@@ -110,7 +110,7 @@ inline TensorPtr make_tensor_ptr(
110110
* vectors of one type and a different scalar type.
111111
*
112112
* The result is always a CPU tensor. To move it to a device, use
113-
* `clone_tensor_ptr_to_device`.
113+
* `clone_tensor_ptr_to`.
114114
*
115115
* @tparam T The C++ type of the tensor elements, deduced from the vector.
116116
* @param sizes A vector specifying the size of each dimension.
@@ -204,7 +204,7 @@ inline TensorPtr make_tensor_ptr(
204204
* vector's data type.
205205
*
206206
* The result is always a CPU tensor. To move it to a device, use
207-
* `clone_tensor_ptr_to_device`.
207+
* `clone_tensor_ptr_to`.
208208
*
209209
* @tparam T The C++ type of the tensor elements, deduced from the vector.
210210
* @param data A vector containing the tensor's data.
@@ -236,7 +236,7 @@ inline TensorPtr make_tensor_ptr(
236236
* from the initializer list's data type.
237237
*
238238
* The result is always a CPU tensor. To move it to a device, use
239-
* `clone_tensor_ptr_to_device`.
239+
* `clone_tensor_ptr_to`.
240240
*
241241
* @tparam T The C++ type of the tensor elements, deduced from the initializer
242242
* list.
@@ -278,7 +278,7 @@ inline TensorPtr make_tensor_ptr(
278278
* initializer list's elements.
279279
*
280280
* The result is always a CPU tensor. To move it to a device, use
281-
* `clone_tensor_ptr_to_device`.
281+
* `clone_tensor_ptr_to`.
282282
*
283283
* @tparam T The C++ type of the tensor elements, deduced from the initializer
284284
* list.
@@ -375,7 +375,7 @@ inline TensorPtr make_tensor_ptr(
375375
* is left empty so the core may infer it from the provided strides.
376376
*
377377
* This overload always aliases — it never copies. To copy a tensor's data to
378-
* a device, use `clone_tensor_ptr_to_device`.
378+
* a device, use `clone_tensor_ptr_to`.
379379
*
380380
* @param tensor The source tensor to alias.
381381
* @param sizes Optional sizes override.
@@ -426,18 +426,21 @@ inline TensorPtr make_tensor_ptr(
426426
tensor.scalar_type(),
427427
#ifndef USE_ATEN_LIB
428428
tensor.shape_dynamism(),
429+
std::move(deleter),
430+
executorch::aten::Device(tensor.device_type(), tensor.device_index()));
429431
#else // USE_ATEN_LIB
430432
executorch::aten::TensorShapeDynamism::DYNAMIC_BOUND,
433+
std::move(deleter),
434+
tensor.device());
431435
#endif // USE_ATEN_LIB
432-
std::move(deleter));
433436
}
434437

435438
/**
436439
* Convenience overload identical to make_tensor_ptr(*tensor_ptr, ...).
437440
* Keeps the original TensorPtr alive until the returned TensorPtr is destroyed.
438441
*
439442
* This overload always aliases — it never copies. To copy a tensor's data to
440-
* a device, use `clone_tensor_ptr_to_device`.
443+
* a device, use `clone_tensor_ptr_to`.
441444
*
442445
* @param tensor_ptr The source tensor pointer to alias.
443446
* @param sizes Optional sizes override.
@@ -527,38 +530,29 @@ runtime::Error resize_tensor_ptr(
527530
const std::vector<executorch::aten::SizesType>& sizes);
528531

529532
/**
530-
* Clones a CPU TensorPtr to a device TensorPtr.
531-
*
532-
* Allocates memory on the specified device and copies the tensor data from
533-
* host to device using the DeviceAllocator registered for the given device
534-
* type. The returned TensorPtr owns the device memory and will free it via
535-
* the allocator when destroyed.
533+
* Clones a TensorPtr's data onto the given target device, allocating and
534+
* copying as needed.
536535
*
537-
* Only available in the ExecuTorch portable build: cloning relies on the
538-
* ExecuTorch DeviceAllocator, which has no equivalent in USE_ATEN_LIB builds.
539-
*
540-
* @param cpu_tensor The source CPU tensor whose data will be copied.
541-
* @param device The target device (must not be CPU).
542-
* @return A TensorPtr backed by device memory containing the copied data.
543-
*/
544-
#ifndef USE_ATEN_LIB
545-
TensorPtr clone_tensor_ptr_to_device(
546-
const TensorPtr& cpu_tensor,
547-
executorch::aten::Device device);
548-
549-
/**
550-
* Clones a device TensorPtr to a CPU TensorPtr.
536+
* The transfer direction is inferred from the source and target device:
537+
* host-to-device when `target` is an accelerator, and device-to-host when
538+
* `target` is CPU. Copies use the DeviceAllocator registered for the
539+
* accelerator side; a device-backed result owns its memory and frees it via
540+
* that allocator when destroyed.
551541
*
552-
* Allocates host memory and copies the tensor data from device to host using
553-
* the DeviceAllocator registered for the source tensor's device type. The
554-
* device is determined from the source tensor's metadata.
542+
* Source and target must differ in device domain: for a CPU-to-CPU copy use
543+
* clone_tensor_ptr, and device-to-device transfers are not supported.
555544
*
556-
* Only available in the ExecuTorch portable build.
545+
* Only available in the ExecuTorch portable build: it relies on the ExecuTorch
546+
* DeviceAllocator, which has no equivalent in USE_ATEN_LIB builds.
557547
*
558-
* @param device_tensor The source device tensor whose data will be copied.
559-
* @return A TensorPtr backed by CPU memory containing the copied data.
548+
* @param tensor The source tensor whose data will be copied.
549+
* @param target The destination device (CPU or an accelerator).
550+
* @return A TensorPtr backed by `target` memory containing the copied data.
560551
*/
561-
TensorPtr clone_tensor_ptr_to_cpu(const TensorPtr& device_tensor);
552+
#ifndef USE_ATEN_LIB
553+
TensorPtr clone_tensor_ptr_to(
554+
const TensorPtr& tensor,
555+
executorch::aten::Device target);
562556
#endif // USE_ATEN_LIB
563557

564558
} // namespace extension

extension/tensor/test/CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ set(EXECUTORCH_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/../../..)
1919

2020
include(${EXECUTORCH_ROOT}/tools/cmake/Test.cmake)
2121

22-
set(_test_srcs tensor_ptr_maker_test.cpp tensor_ptr_test.cpp)
22+
set(_test_srcs tensor_ptr_maker_test.cpp tensor_ptr_test.cpp
23+
tensor_ptr_device_test.cpp
24+
)
2325

2426
et_cxx_test(
2527
extension_tensor_test SOURCES ${_test_srcs} EXTRA_LIBS extension_tensor

0 commit comments

Comments
 (0)