Fix GatherND sentinel-value collision enabling OOB memory disclosure by bmehta001 · Pull Request #27963 · microsoft/onnxruntime

bmehta001 · 2026-04-03T08:26:28Z

Description

Fix a logic flaw in GatherND that allowed out-of-bounds memory reads due to a validation-state collision between a sentinel value and a valid tensor index.

Root Cause

GatherNDBase::PrepareForCompute used err_index == 0 as a sentinel for "no validation error", but 0 is also a valid tensor index. When a dimension has size 0, index 0 correctly fails the bounds check (0 >= 0 is true), but err_index is set to 0 — indistinguishable from the "no error" state. This allowed the operator to proceed with an out-of-bounds memcpy, leaking process heap memory in the output tensor.

This is a deterministic memory disclosure primitive: the leaked data is stable across executions, the amount is controllable via tensor shape, and the leaked values include process memory addresses.

Fix

Replaced the sentinel pattern with a std::atomic<bool> has_invalid_index flag that is independent of the index value
Made err_index also std::atomic<int64_t> to eliminate a data race in the parallel TryParallelFor loop (review feedback)
err_index is retained solely for the error message

Testing

GatherND_zero_dim_index_zero_rejected: verifies index 0 into a zero-sized dimension is rejected
GatherND_valid_index_zero: verifies index 0 into a valid dimension still works correctly
Both tests restricted to CPU EP only

Motivation and Context

Security fix — prevents out-of-bounds memory disclosure when processing untrusted ONNX models with crafted GatherND inputs on the CPUExecutionProvider.

Copilot

Pull request overview

This PR tightens runtime validation to prevent out-of-bounds reads and invalid shape creation, and adds targeted regression tests to ensure failures are surfaced consistently across relevant execution providers.

Changes:

Fix GatherND invalid-index detection so index 0 can’t be misinterpreted as a “no error” sentinel.
Reject negative runtime split sizes (Split) and negative output dimensions caused by excessive negative padding/slicing (Pad/Slice).
Add regression tests for GatherND (zero-dim), Split (negative split from input), and Pad (negative output dim).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
onnxruntime/test/providers/cpu/tensor/split_op_test.cc	Adds a failure test for negative values in runtime `split` input.
onnxruntime/test/providers/cpu/tensor/pad_test.cc	Adds a failure test for negative output dimension from dynamic pads.
onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc	Adds regression tests for zero-sized dimension index validation and a valid index-0 case.
onnxruntime/core/providers/webgpu/tensor/slice.cc	Adds axis range validation and an output-dim sanity check for WebGPU Slice.
onnxruntime/core/providers/webgpu/tensor/pad.cc	Adds runtime check to reject negative output dimensions in WebGPU Pad.
onnxruntime/core/providers/cuda/tensor/split.cc	Adds runtime check to reject negative split sizes from `split` input.
onnxruntime/core/providers/cuda/tensor/pad.cc	Adds runtime check to reject negative output dimensions in CUDA Pad.
onnxruntime/core/providers/cpu/tensor/split.h	Adds runtime check to reject negative split sizes when `split` is provided as input.
onnxruntime/core/providers/cpu/tensor/pad.cc	Adds runtime checks to reject negative output dimensions during Pad shape computation.
onnxruntime/core/providers/cpu/tensor/gather_nd.cc	Reworks GatherND invalid-index tracking to avoid “0 as sentinel” collision.
js/web/lib/wasm/jsep/webgpu/ops/slice.ts	Adds a JS WebGPU Slice output-dimension negativity check.
js/web/lib/wasm/jsep/util.ts	Fixes axis normalization range check and adds negative-dimension protection in padShape().
js/web/lib/onnxjs/util.ts	Same axis normalization fix and padShape() negative-dimension protection for onnxjs.
js/web/lib/onnxjs/backends/webgl/ops/slice.ts	Adds a JS WebGL Slice output-dimension negativity check.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GatherND used err_index==0 as a sentinel for 'no validation error', but 0 is also a valid tensor index. When a dimension has size 0, index 0 correctly fails the bounds check (0 >= 0), but err_index is set to 0 which is indistinguishable from success. This allowed the operator to proceed with an out-of-bounds memcpy, leaking process heap memory. Replace the sentinel pattern with a std::atomic<bool> error flag that is independent of the index value. The err_index variable is retained solely for the error message. Files changed: - onnxruntime/core/providers/cpu/tensor/gather_nd.cc - onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Other EPs may not support 1D GatherND with scalar output, causing test failures unrelated to the sentinel-value fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

You can commit the suggested changes from lintrunner.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T05:57:36Z

      const auto upper_limit = input_shape[SafeInt<size_t>(batch_dims_) + dim_idx];
      const auto lower_limit = -upper_limit;
      if (index < lower_limit || index >= upper_limit) {
-        err_index = index;
+        has_invalid_index.store(true, std::memory_order_relaxed);
+        err_index.store(index, std::memory_order_relaxed);
        break;


has_invalid_index and err_index are updated via separate memory_order_relaxed stores. With relaxed ordering on two different atomics, the final has_invalid_index==true load can observe err_index still at its default (0), producing a misleading error message (the PR notes err_index is retained for messaging). Consider ordering the updates via release/acquire (e.g., store err_index then store has_invalid_index with memory_order_release, and load the flag with memory_order_acquire before loading err_index), or collapsing to a single atomic sentinel that cannot collide with a valid index value.

Address review feedback: with relaxed ordering on separate atomics, the acquire of has_invalid_index could observe true while err_index is still at its default. Store err_index first (relaxed), then publish has_invalid_index with release. Load the flag with acquire to guarantee err_index visibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

You can commit the suggested changes from lintrunner.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

tianleiwu requested a review from Copilot April 3, 2026 18:44

Copilot started reviewing on behalf of tianleiwu April 3, 2026 18:45 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 3fc2e9b to 7be69dc Compare April 4, 2026 05:55

bmehta001 and others added 2 commits April 4, 2026 01:44

Restrict GatherND valid-index-zero test to CPU EP only

4921cde

Other EPs may not support 1D GatherND with scalar output, causing test failures unrelated to the sentinel-value fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 7be69dc to 4be3f33 Compare April 4, 2026 06:45

github-actions Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc Outdated

bmehta001 self-assigned this Apr 4, 2026

bmehta001 changed the title ~~Fix gathernd sentinel oob read~~ Fix GatherND sentinel-value collision enabling OOB memory disclosure Apr 4, 2026

bmehta001 requested a review from Copilot April 5, 2026 18:27

Copilot started reviewing on behalf of bmehta001 April 5, 2026 18:28 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

bmehta001 requested a review from Copilot April 6, 2026 05:52

Copilot started reviewing on behalf of bmehta001 April 6, 2026 05:53 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

bmehta001 force-pushed the fix/gathernd-sentinel-oob-read branch from 67841e9 to 9756ff0 Compare April 6, 2026 16:05

github-actions Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/gather_nd.cc Outdated

Update onnxruntime/core/providers/cpu/tensor/gather_nd.cc

ff2ea8d

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963

Fix GatherND sentinel-value collision enabling OOB memory disclosure#27963
bmehta001 wants to merge 4 commits into
mainfrom
fix/gathernd-sentinel-oob-read

bmehta001 commented Apr 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bmehta001 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

Fix

Testing

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bmehta001 commented Apr 3, 2026 •

edited

Loading