Fix integer overflows in tensor byte-size computations #19055
Conversation
…H-19) Three tensor-byte-size multiplications had no overflow check, letting a malicious PTE trigger wrap-to-small size_t values while kernels iterate on the un-wrapped element count, producing heap buffer overflows. Fixed here: - extension/tensor/tensor_ptr.h: data.size() * elementSize(type) in make_tensor_ptr cast path. - extension/tensor/tensor_ptr_maker.cpp: compute_numel(...) * elementSize(type) in empty_strided. - runtime/core/tensor_layout.cpp: dim-product loop and final * elementSize(scalar_type) in calculate_nbytes; now returns Error::InvalidArgument on overflow since the function already returns Result<size_t>. All guards use c10::mul_overflows, matching the existing pattern in MethodMeta::calculate_nbytes, the data loaders, and PlatformMemoryAllocator. runtime/core/portable_type/tensor_impl.cpp is intentionally left alone in this branch; guarding the nbytes() / compute_numel multiplications there breaks internal callers and will be handled separately. Authored with Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19055
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 4 New Failures, 2 Unrelated FailuresAs of commit a879332 with merge base 217ad45 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR hardens tensor byte-size computations against integer overflow (wrap-to-small) scenarios that could otherwise lead to under-allocation and subsequent heap buffer overflows when kernels iterate using the true element count.
Changes:
- Add overflow-checked dimension product and
* elementSizehandling inTensorLayoutbyte-size calculation (returningError::InvalidArgumenton overflow). - Add
c10::mul_overflowsguards fornumel * elementSizeinempty_strided(). - Add
c10::mul_overflowsguards fordata.size() * elementSizewhen allocating cast buffers inmake_tensor_ptr().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| runtime/core/tensor_layout.cpp | Uses c10::mul_overflows to safely compute nbytes and returns InvalidArgument on overflow. |
| extension/tensor/tensor_ptr_maker.cpp | Adds overflow checking before allocating the backing std::vector<uint8_t> in empty_strided(). |
| extension/tensor/tensor_ptr.h | Adds overflow checking for cast-buffer allocation when casting vector-backed tensor data. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (type != deduced_type) { | ||
| ET_CHECK_MSG( | ||
| runtime::canCast(deduced_type, type), | ||
| "Cannot cast deduced type to specified type."); | ||
| std::vector<uint8_t> casted_data(data.size() * aten::elementSize(type)); | ||
| size_t casted_bytes = 0; |
There was a problem hiding this comment.
In executor (non-ATen) builds, executorch::aten::compute_numel() ultimately multiplies sizes using signed ssize_t arithmetic without overflow checks. That means the earlier data.size() == compute_numel(...) validation can itself invoke signed-overflow UB for large shapes, and may incorrectly succeed with a wrapped result. Since this block then allocates based on data.size(), you can end up with a tensor whose declared shape is inconsistent with its backing storage.
To make the new overflow-checked allocation meaningful, consider replacing the compute_numel(...) usage in this overload with a local, overflow-checked size_t product over sizes (similar to the mul_overflows loop used elsewhere), and compare against data.size() only after that succeeds.
| size_t next = 0; | ||
| if (c10::mul_overflows(n, static_cast<size_t>(sizes[i]), &next)) { | ||
| return Error::InvalidArgument; | ||
| } | ||
| n = next; |
There was a problem hiding this comment.
The new overflow handling in calculate_nbytes() changes behavior to return Error::InvalidArgument on overflow, but runtime/core/test/tensor_layout_test.cpp currently has no coverage for this path. Add a unit test that uses sizes whose product (or product * elementSize) overflows size_t (e.g., {INT32_MAX, INT32_MAX} with ScalarType::Double on 64-bit) and assert TensorLayout::create() returns InvalidArgument.
| const auto numel = static_cast<size_t>( | ||
| executorch::aten::compute_numel(sizes.data(), sizes.size())); |
There was a problem hiding this comment.
empty_strided() still relies on executorch::aten::compute_numel() (which returns ssize_t and, in executor mode, multiplies sizes using signed arithmetic without overflow checks). If the size product overflows inside compute_numel, that's undefined behavior and may yield a wrapped numel before you even reach the mul_overflows(numel, elem_size, ...) guard.
Consider computing numel locally using size_t + c10::mul_overflows over the sizes vector (and validating non-negative sizes) so the overflow is caught deterministically before any UB occurs, then multiply by elem_size as you do now.
| const auto numel = static_cast<size_t>( | |
| executorch::aten::compute_numel(sizes.data(), sizes.size())); | |
| size_t numel = 1; | |
| for (const auto dim : sizes) { | |
| ET_CHECK_MSG( | |
| dim >= 0, | |
| "empty_strided requires non-negative sizes, got %zd", | |
| static_cast<ssize_t>(dim)); | |
| size_t next_numel = 0; | |
| ET_CHECK_MSG( | |
| !c10::mul_overflows( | |
| numel, static_cast<size_t>(dim), &next_numel), | |
| "empty_strided size overflow while computing numel: %zu * %zu", | |
| numel, | |
| static_cast<size_t>(dim)); | |
| numel = next_numel; | |
| } |
|
@lucylq has imported this pull request. If you are a Meta employee, you can view this in D102058400. |
| executorch::aten::compute_numel(sizes.data(), sizes.size()) * | ||
| executorch::aten::elementSize(type)); | ||
| const auto numel = static_cast<size_t>( | ||
| executorch::aten::compute_numel(sizes.data(), sizes.size())); |
There was a problem hiding this comment.
need to use safe_numel here.
Three tensor-byte-size multiplications had no overflow check, letting a malicious PTE trigger wrap-to-small size_t values while kernels iterate on the un-wrapped element count, producing heap buffer overflows. Fixed here: - extension/tensor/tensor_ptr.h: data.size() * elementSize(type) in make_tensor_ptr cast path. - extension/tensor/tensor_ptr_maker.cpp: compute_numel(...) * elementSize(type) in empty_strided. - runtime/core/tensor_layout.cpp: dim-product loop and final * elementSize(scalar_type) in calculate_nbytes; now returns Error::InvalidArgument on overflow since the function already returns Result<size_t>. All guards use c10::mul_overflows, matching the existing pattern in MethodMeta::calculate_nbytes, the data loaders, and PlatformMemoryAllocator. runtime/core/portable_type/tensor_impl.cpp is intentionally left alone in this branch; guarding the nbytes() / compute_numel multiplications there breaks internal callers and will be handled separately. Authored with Claude.
Three tensor-byte-size multiplications had no overflow check, letting a malicious PTE trigger wrap-to-small size_t values while kernels iterate on the un-wrapped element count, producing heap buffer overflows. Fixed here: - extension/tensor/tensor_ptr.h: data.size() * elementSize(type) in make_tensor_ptr cast path. - extension/tensor/tensor_ptr_maker.cpp: compute_numel(...) * elementSize(type) in empty_strided. - runtime/core/tensor_layout.cpp: dim-product loop and final * elementSize(scalar_type) in calculate_nbytes; now returns Error::InvalidArgument on overflow since the function already returns Result<size_t>. All guards use c10::mul_overflows, matching the existing pattern in MethodMeta::calculate_nbytes, the data loaders, and PlatformMemoryAllocator. runtime/core/portable_type/tensor_impl.cpp is intentionally left alone in this branch; guarding the nbytes() / compute_numel multiplications there breaks internal callers and will be handled separately. Authored with Claude.
Three tensor-byte-size multiplications had no overflow check, letting a malicious PTE trigger wrap-to-small size_t values while kernels iterate on the un-wrapped element count, producing heap buffer overflows.
Fixed here:
All guards use c10::mul_overflows, matching the existing pattern in MethodMeta::calculate_nbytes, the data loaders, and PlatformMemoryAllocator.
runtime/core/portable_type/tensor_impl.cpp is intentionally left alone in this branch; guarding the nbytes() / compute_numel multiplications there breaks internal callers and will be handled separately.
Authored with Claude.