You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Follow-up to D99913077 applying review feedback on the TensorPtr device
tensor helpers: aliasing make_tensor_ptr now preserves device metadata,
clone_tensor_ptr requires a CPU source, device alloc/copy failures
report their error codes, and the device test is pinned to its abort
messages and built in non-aten Buck/CMake/OSS configs. device_allocator
moves to exported_deps so the exported header compiles for aten
consumers. Mirrored in fbcode and xplat.
Also replaces the two device-transfer helpers
`clone_tensor_ptr_to_device` and `clone_tensor_ptr_to_cpu` with a single
`clone_tensor_ptr_to(tensor, target)` keyed on the target device. The
direction (host-to-device or device-to-host) is inferred from the source
and target, which removes the asymmetry where one helper named the
device and the other inferred it, and removes the footgun where
`clone_tensor_ptr_to_device(t, CPU)` aborted. CPU-to-CPU and
device-to-device are rejected with clear messages; `clone_tensor_ptr`
remains the same-device copy and the `make_tensor_ptr` device tag is
unchanged. This mirrors ATen's single `to(device)` and keeps the public
surface minimal. The `extension-tensor.md` guide and its ATen
equivalence table are updated to match.
This also fixes a pre-existing portable-build break: the aliasing
`make_tensor_ptr(const Tensor&)` overload passed `device_type()` and
`device_index()` as two separate arguments to a primary factory that
takes a single `Device`, so the non-`USE_ATEN_LIB` build did not
compile; it now wraps them in a `Device`.
Reviewed By: Gasoonjia
Differential Revision: D106842466
Copy file name to clipboardExpand all lines: docs/source/extension-tensor.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -199,6 +199,22 @@ auto tensor = clone_tensor_ptr(original_tensor);
199
199
200
200
Note that, regardless of whether the original `TensorPtr` owns the data or not, the newly created `TensorPtr` will own a copy of the data.
201
201
202
+
#### Cloning To or From a Device
203
+
204
+
If a tensor lives on CPU and you want a copy on an accelerator, or the other way around, use `clone_tensor_ptr_to` with the device you want. It allocates memory on the target device, copies the data for you, and the returned `TensorPtr` owns that memory.
auto device_tensor = clone_tensor_ptr_to(cpu_tensor, DeviceType::CUDA);
211
+
212
+
// Device back to CPU:
213
+
auto host_tensor = clone_tensor_ptr_to(device_tensor, DeviceType::CPU);
214
+
```
215
+
216
+
The direction is chosen from the source and target device. This needs a `DeviceAllocator` registered for the device, so it is available only in the portable (non-`USE_ATEN_LIB`) build. For a plain CPU-to-CPU copy, use `clone_tensor_ptr` instead.
217
+
202
218
### Resizing Tensors
203
219
204
220
The `TensorShapeDynamism` enum specifies the mutability of a tensor's shape:
@@ -375,6 +391,7 @@ Here's a table matching `TensorPtr` creation functions with their corresponding
0 commit comments