Skip to content

Fix OOB reads in SoftmaxCrossEntropyLoss via label bounds validation#28004

Open
vraspar wants to merge 1 commit intomainfrom
vraspar/fix-softmax-ce-loss-oob-read
Open

Fix OOB reads in SoftmaxCrossEntropyLoss via label bounds validation#28004
vraspar wants to merge 1 commit intomainfrom
vraspar/fix-softmax-ce-loss-oob-read

Conversation

@vraspar
Copy link
Copy Markdown
Contributor

@vraspar vraspar commented Apr 7, 2026

Description

Fix out-of-bounds reads in SoftmaxCrossEntropyLoss and SoftmaxCrossEntropyLossGrad when label values are outside [0, C).

Same class of bug fixed in #27568 for SparseSoftmaxCrossEntropy. CUDA kernels already have this check.

Changes

  • Validate label values are in [0, C) before indexing (forward + backward), skipping ignore_index
  • Validate weight_shape[0] == C
  • Move weight_data[label] access after ignore_index check in grad weighted paths

Tests

4 regression tests: label too large, negative label, OOB with weights, grad OOB.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens runtime validation to prevent out-of-bounds reads by ensuring indices/sizes are validated before being used for tensor indexing in training loss ops (and also adds additional validation for MatMulBnb4 inputs).

Changes:

  • Add label-value bounds validation for SoftmaxCrossEntropyLoss and SoftmaxCrossEntropyLossGrad (skipping ignore_index) and validate weight_shape[0] == C on CPU.
  • Add regression tests that expect failures for out-of-range labels for SoftmaxCrossEntropyLoss (forward + grad).
  • Add K/N/block_size positivity checks and minimum-size validation for MatMulBnb4 B and absmax inputs (CPU + CUDA), plus new negative tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc Adds CPU-side bounds validation for labels and weight shape in SoftmaxCrossEntropyLoss forward/grad paths.
orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc Adds new failure-mode tests for out-of-range labels (currently duplicated with new CPU test file).
orttraining/orttraining/test/training_ops/cpu/loss/cross_entropy_test.cc Adds new CPU regression tests for out-of-range labels (currently duplicates CUDA test names).
onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc Adds attribute positivity checks and input-size validation for MatMulBnb4 on CPU.
onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cc Adds attribute positivity checks and input-size validation for MatMulBnb4 on CUDA.
onnxruntime/test/contrib_ops/matmul_bnb4_test.cc Adds new negative tests for undersized MatMulBnb4 inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc Outdated
Comment thread orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc Outdated
Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc
Comment thread onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cc Outdated
Comment thread onnxruntime/test/contrib_ops/matmul_bnb4_test.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc Outdated
Comment thread onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cc Outdated
@vraspar vraspar requested a review from adrianlizarraga April 13, 2026 22:13
Add bounds checking for label tensor values in SoftmaxCrossEntropyLoss
and SoftmaxCrossEntropyLossGrad to prevent out-of-bounds memory reads
when processing untrusted ONNX models.

The operators used label_data values as array indices into log_prob_data
and weight_data buffers without validating they fall within [0, C) where
C is the number of classes. A malicious model could embed arbitrary label
values causing heap OOB reads.

Changes:
- Add label value validation in both forward and backward Compute methods
- Add weight tensor size validation (weight_shape[0] == C)
- Move weight_data[label] access after ignore_index check in grad path
  to prevent OOB when ignore_index is outside [0, C)
- Add regression tests for both forward and backward paths

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vraspar vraspar force-pushed the vraspar/fix-softmax-ce-loss-oob-read branch from f2c15e3 to f20400c Compare April 16, 2026 18:38
@vraspar vraspar requested a review from Copilot April 16, 2026 18:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 154 to +163
const T2* label_data = label.template Data<T2>();

// Validate label values are within [0, C) to prevent out-of-bounds reads.
for (int64_t i = 0; i < N_D; i++) {
if (ignore_index != label_data[i]) {
ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,
"SoftmaxCrossEntropyLoss: label value ", label_data[i],
" at index ", i, " is out of range [0, ", C, ")");
}
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new full-tensor validation passes add an extra O(N*D) scan in both forward and backward. Since the kernels already iterate labels during the main computation, consider folding the bounds check into the existing per-sample/per-row loops where label_data is consumed (still skipping ignore_index). This preserves the safety fix while avoiding an additional memory pass on large batches.

Suggested change
const T2* label_data = label.template Data<T2>();
// Validate label values are within [0, C) to prevent out-of-bounds reads.
for (int64_t i = 0; i < N_D; i++) {
if (ignore_index != label_data[i]) {
ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,
"SoftmaxCrossEntropyLoss: label value ", label_data[i],
" at index ", i, " is out of range [0, ", C, ")");
}
}
struct CheckedLabelData {
const T2* data;
int64_t class_count;
T2 ignored_label;
T2 operator[](int64_t index) const {
const T2 value = data[index];
ORT_ENFORCE(value == ignored_label || (value >= 0 && value < class_count),
"SoftmaxCrossEntropyLoss: label value ", value,
" at index ", index, " is out of range [0, ", class_count, ")");
return value;
}
} label_data{label.template Data<T2>(), C, ignore_index};

Copilot uses AI. Check for mistakes.
Comment on lines +286 to +293
// Validate label values are within [0, C) to prevent out-of-bounds reads.
for (int64_t i = 0; i < N_D; i++) {
if (ignore_index != label_data[i]) {
ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,
"SoftmaxCrossEntropyLossGrad: label value ", label_data[i],
" at index ", i, " is out of range [0, ", C, ")");
}
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new full-tensor validation passes add an extra O(N*D) scan in both forward and backward. Since the kernels already iterate labels during the main computation, consider folding the bounds check into the existing per-sample/per-row loops where label_data is consumed (still skipping ignore_index). This preserves the safety fix while avoiding an additional memory pass on large batches.

Suggested change
// Validate label values are within [0, C) to prevent out-of-bounds reads.
for (int64_t i = 0; i < N_D; i++) {
if (ignore_index != label_data[i]) {
ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,
"SoftmaxCrossEntropyLossGrad: label value ", label_data[i],
" at index ", i, " is out of range [0, ", C, ")");
}
}
// Validate labels inline in the main backward loop where label_data is consumed,
// so we preserve bounds checking without adding an extra full-tensor memory pass.
const auto validate_label = [ignore_index, C](T2 label_value, int64_t index) -> Status {
if (ignore_index != label_value) {
ORT_RETURN_IF(label_value < 0 || label_value >= C,
"SoftmaxCrossEntropyLossGrad: label value ", label_value,
" at index ", index, " is out of range [0, ", C, ")");
}
return Status::OK();
};

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file uses std::vector and std::string but does not include <vector> / <string> directly. Please add the proper standard headers to avoid relying on transitive includes, which can break with toolchain or include-order changes.

Suggested change
#include <string>
#include <vector>

Copilot uses AI. Check for mistakes.
// Validate label values are within [0, C) to prevent out-of-bounds reads.
for (int64_t i = 0; i < N_D; i++) {
if (ignore_index != label_data[i]) {
ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move that test in one of the loops below where the label is compared a second time to ignore_index?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants