Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963
Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963bmehta001 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens runtime validation to prevent out-of-bounds reads and invalid shape creation, and adds targeted regression tests to ensure failures are surfaced consistently across relevant execution providers.
Changes:
- Fix GatherND invalid-index detection so index
0can’t be misinterpreted as a “no error” sentinel. - Reject negative runtime split sizes (Split) and negative output dimensions caused by excessive negative padding/slicing (Pad/Slice).
- Add regression tests for GatherND (zero-dim), Split (negative split from input), and Pad (negative output dim).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/test/providers/cpu/tensor/split_op_test.cc | Adds a failure test for negative values in runtime split input. |
| onnxruntime/test/providers/cpu/tensor/pad_test.cc | Adds a failure test for negative output dimension from dynamic pads. |
| onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc | Adds regression tests for zero-sized dimension index validation and a valid index-0 case. |
| onnxruntime/core/providers/webgpu/tensor/slice.cc | Adds axis range validation and an output-dim sanity check for WebGPU Slice. |
| onnxruntime/core/providers/webgpu/tensor/pad.cc | Adds runtime check to reject negative output dimensions in WebGPU Pad. |
| onnxruntime/core/providers/cuda/tensor/split.cc | Adds runtime check to reject negative split sizes from split input. |
| onnxruntime/core/providers/cuda/tensor/pad.cc | Adds runtime check to reject negative output dimensions in CUDA Pad. |
| onnxruntime/core/providers/cpu/tensor/split.h | Adds runtime check to reject negative split sizes when split is provided as input. |
| onnxruntime/core/providers/cpu/tensor/pad.cc | Adds runtime checks to reject negative output dimensions during Pad shape computation. |
| onnxruntime/core/providers/cpu/tensor/gather_nd.cc | Reworks GatherND invalid-index tracking to avoid “0 as sentinel” collision. |
| js/web/lib/wasm/jsep/webgpu/ops/slice.ts | Adds a JS WebGPU Slice output-dimension negativity check. |
| js/web/lib/wasm/jsep/util.ts | Fixes axis normalization range check and adds negative-dimension protection in padShape(). |
| js/web/lib/onnxjs/util.ts | Same axis normalization fix and padShape() negative-dimension protection for onnxjs. |
| js/web/lib/onnxjs/backends/webgl/ops/slice.ts | Adds a JS WebGL Slice output-dimension negativity check. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3fc2e9b to
7be69dc
Compare
GatherND used err_index==0 as a sentinel for 'no validation error', but 0 is also a valid tensor index. When a dimension has size 0, index 0 correctly fails the bounds check (0 >= 0), but err_index is set to 0 which is indistinguishable from success. This allowed the operator to proceed with an out-of-bounds memcpy, leaking process heap memory. Replace the sentinel pattern with a std::atomic<bool> error flag that is independent of the index value. The err_index variable is retained solely for the error message. Files changed: - onnxruntime/core/providers/cpu/tensor/gather_nd.cc - onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Other EPs may not support 1D GatherND with scalar output, causing test failures unrelated to the sentinel-value fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7be69dc to
4be3f33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const auto upper_limit = input_shape[SafeInt<size_t>(batch_dims_) + dim_idx]; | ||
| const auto lower_limit = -upper_limit; | ||
| if (index < lower_limit || index >= upper_limit) { | ||
| err_index = index; | ||
| has_invalid_index.store(true, std::memory_order_relaxed); | ||
| err_index.store(index, std::memory_order_relaxed); | ||
| break; |
There was a problem hiding this comment.
has_invalid_index and err_index are updated via separate memory_order_relaxed stores. With relaxed ordering on two different atomics, the final has_invalid_index==true load can observe err_index still at its default (0), producing a misleading error message (the PR notes err_index is retained for messaging). Consider ordering the updates via release/acquire (e.g., store err_index then store has_invalid_index with memory_order_release, and load the flag with memory_order_acquire before loading err_index), or collapsing to a single atomic sentinel that cannot collide with a valid index value.
Address review feedback: with relaxed ordering on separate atomics, the acquire of has_invalid_index could observe true while err_index is still at its default. Store err_index first (relaxed), then publish has_invalid_index with release. Load the flag with acquire to guarantee err_index visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
67841e9 to
9756ff0
Compare
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Fix a logic flaw in
GatherNDthat allowed out-of-bounds memory reads due to a validation-state collision between a sentinel value and a valid tensor index.Root Cause
GatherNDBase::PrepareForComputeusederr_index == 0as a sentinel for "no validation error", but0is also a valid tensor index. When a dimension has size 0, index 0 correctly fails the bounds check (0 >= 0is true), buterr_indexis set to0— indistinguishable from the "no error" state. This allowed the operator to proceed with an out-of-boundsmemcpy, leaking process heap memory in the output tensor.This is a deterministic memory disclosure primitive: the leaked data is stable across executions, the amount is controllable via tensor shape, and the leaked values include process memory addresses.
Fix
std::atomic<bool> has_invalid_indexflag that is independent of the index valueerr_indexalsostd::atomic<int64_t>to eliminate a data race in the parallelTryParallelForloop (review feedback)err_indexis retained solely for the error messageTesting
GatherND_zero_dim_index_zero_rejected: verifies index 0 into a zero-sized dimension is rejectedGatherND_valid_index_zero: verifies index 0 into a valid dimension still works correctlyMotivation and Context
Security fix — prevents out-of-bounds memory disclosure when processing untrusted ONNX models with crafted
GatherNDinputs on the CPUExecutionProvider.