Skip to content

Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963

Open
bmehta001 wants to merge 4 commits into
mainfrom
fix/gathernd-sentinel-oob-read
Open

Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963
bmehta001 wants to merge 4 commits into
mainfrom
fix/gathernd-sentinel-oob-read

Conversation

@bmehta001

@bmehta001 bmehta001 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Description

Fix a logic flaw in GatherND that allowed out-of-bounds memory reads due to a validation-state collision between a sentinel value and a valid tensor index.

Root Cause

GatherNDBase::PrepareForCompute used err_index == 0 as a sentinel for "no validation error", but 0 is also a valid tensor index. When a dimension has size 0, index 0 correctly fails the bounds check (0 >= 0 is true), but err_index is set to 0 — indistinguishable from the "no error" state. This allowed the operator to proceed with an out-of-bounds memcpy, leaking process heap memory in the output tensor.

This is a deterministic memory disclosure primitive: the leaked data is stable across executions, the amount is controllable via tensor shape, and the leaked values include process memory addresses.

Fix

  • Replaced the sentinel pattern with a std::atomic<bool> has_invalid_index flag that is independent of the index value
  • Made err_index also std::atomic<int64_t> to eliminate a data race in the parallel TryParallelFor loop (review feedback)
  • err_index is retained solely for the error message

Testing

  • GatherND_zero_dim_index_zero_rejected: verifies index 0 into a zero-sized dimension is rejected
  • GatherND_valid_index_zero: verifies index 0 into a valid dimension still works correctly
  • Both tests restricted to CPU EP only

Motivation and Context

Security fix — prevents out-of-bounds memory disclosure when processing untrusted ONNX models with crafted GatherND inputs on the CPUExecutionProvider.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens runtime validation to prevent out-of-bounds reads and invalid shape creation, and adds targeted regression tests to ensure failures are surfaced consistently across relevant execution providers.

Changes:

  • Fix GatherND invalid-index detection so index 0 can’t be misinterpreted as a “no error” sentinel.
  • Reject negative runtime split sizes (Split) and negative output dimensions caused by excessive negative padding/slicing (Pad/Slice).
  • Add regression tests for GatherND (zero-dim), Split (negative split from input), and Pad (negative output dim).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
onnxruntime/test/providers/cpu/tensor/split_op_test.cc Adds a failure test for negative values in runtime split input.
onnxruntime/test/providers/cpu/tensor/pad_test.cc Adds a failure test for negative output dimension from dynamic pads.
onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc Adds regression tests for zero-sized dimension index validation and a valid index-0 case.
onnxruntime/core/providers/webgpu/tensor/slice.cc Adds axis range validation and an output-dim sanity check for WebGPU Slice.
onnxruntime/core/providers/webgpu/tensor/pad.cc Adds runtime check to reject negative output dimensions in WebGPU Pad.
onnxruntime/core/providers/cuda/tensor/split.cc Adds runtime check to reject negative split sizes from split input.
onnxruntime/core/providers/cuda/tensor/pad.cc Adds runtime check to reject negative output dimensions in CUDA Pad.
onnxruntime/core/providers/cpu/tensor/split.h Adds runtime check to reject negative split sizes when split is provided as input.
onnxruntime/core/providers/cpu/tensor/pad.cc Adds runtime checks to reject negative output dimensions during Pad shape computation.
onnxruntime/core/providers/cpu/tensor/gather_nd.cc Reworks GatherND invalid-index tracking to avoid “0 as sentinel” collision.
js/web/lib/wasm/jsep/webgpu/ops/slice.ts Adds a JS WebGPU Slice output-dimension negativity check.
js/web/lib/wasm/jsep/util.ts Fixes axis normalization range check and adds negative-dimension protection in padShape().
js/web/lib/onnxjs/util.ts Same axis normalization fix and padShape() negative-dimension protection for onnxjs.
js/web/lib/onnxjs/backends/webgl/ops/slice.ts Adds a JS WebGL Slice output-dimension negativity check.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc
Comment thread onnxruntime/core/providers/webgpu/tensor/slice.cc
Comment thread js/web/lib/wasm/jsep/webgpu/ops/slice.ts Outdated
Comment thread js/web/lib/onnxjs/backends/webgl/ops/slice.ts Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/split.h Outdated
@bmehta001 bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 3fc2e9b to 7be69dc Compare April 4, 2026 05:55
bmehta001 and others added 2 commits April 4, 2026 01:44
GatherND used err_index==0 as a sentinel for 'no validation error',
but 0 is also a valid tensor index. When a dimension has size 0, index
0 correctly fails the bounds check (0 >= 0), but err_index is set to 0
which is indistinguishable from success. This allowed the operator to
proceed with an out-of-bounds memcpy, leaking process heap memory.

Replace the sentinel pattern with a std::atomic<bool> error flag that
is independent of the index value. The err_index variable is retained
solely for the error message.

Files changed:
- onnxruntime/core/providers/cpu/tensor/gather_nd.cc
- onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Other EPs may not support 1D GatherND with scalar output, causing
test failures unrelated to the sentinel-value fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 7be69dc to 4be3f33 Compare April 4, 2026 06:45

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc Outdated
@bmehta001 bmehta001 self-assigned this Apr 4, 2026
@bmehta001 bmehta001 changed the title Fix gathernd sentinel oob read Fix GatherND sentinel-value collision enabling OOB memory disclosure Apr 4, 2026
@bmehta001 bmehta001 requested a review from Copilot April 5, 2026 18:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 110 to 115
const auto upper_limit = input_shape[SafeInt<size_t>(batch_dims_) + dim_idx];
const auto lower_limit = -upper_limit;
if (index < lower_limit || index >= upper_limit) {
err_index = index;
has_invalid_index.store(true, std::memory_order_relaxed);
err_index.store(index, std::memory_order_relaxed);
break;

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_invalid_index and err_index are updated via separate memory_order_relaxed stores. With relaxed ordering on two different atomics, the final has_invalid_index==true load can observe err_index still at its default (0), producing a misleading error message (the PR notes err_index is retained for messaging). Consider ordering the updates via release/acquire (e.g., store err_index then store has_invalid_index with memory_order_release, and load the flag with memory_order_acquire before loading err_index), or collapsing to a single atomic sentinel that cannot collide with a valid index value.

Copilot uses AI. Check for mistakes.
Address review feedback: with relaxed ordering on separate atomics,
the acquire of has_invalid_index could observe true while err_index
is still at its default. Store err_index first (relaxed), then
publish has_invalid_index with release. Load the flag with acquire
to guarantee err_index visibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bmehta001 bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 67841e9 to 9756ff0 Compare April 6, 2026 16:05

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc Outdated
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants