Fix OOB reads in SoftmaxCrossEntropyLoss via label bounds validation by vraspar · Pull Request #28004 · microsoft/onnxruntime

vraspar · 2026-04-07T22:39:38Z

Description

Fix out-of-bounds reads in SoftmaxCrossEntropyLoss and SoftmaxCrossEntropyLossGrad when label values are outside [0, C).

Same class of bug fixed in #27568 for SparseSoftmaxCrossEntropy. CUDA kernels already have this check.

Changes

Validate label values are in [0, C) before indexing (forward + backward), skipping ignore_index
Validate weight_shape[0] == C
Move weight_data[label] access after ignore_index check in grad weighted paths

Tests

4 regression tests: label too large, negative label, OOB with weights, grad OOB.

Copilot

Pull request overview

This PR hardens runtime validation to prevent out-of-bounds reads by ensuring indices/sizes are validated before being used for tensor indexing in training loss ops (and also adds additional validation for MatMulBnb4 inputs).

Changes:

Add label-value bounds validation for SoftmaxCrossEntropyLoss and SoftmaxCrossEntropyLossGrad (skipping ignore_index) and validate weight_shape[0] == C on CPU.
Add regression tests that expect failures for out-of-range labels for SoftmaxCrossEntropyLoss (forward + grad).
Add K/N/block_size positivity checks and minimum-size validation for MatMulBnb4 B and absmax inputs (CPU + CUDA), plus new negative tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc	Adds CPU-side bounds validation for labels and weight shape in SoftmaxCrossEntropyLoss forward/grad paths.
orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc	Adds new failure-mode tests for out-of-range labels (currently duplicated with new CPU test file).
orttraining/orttraining/test/training_ops/cpu/loss/cross_entropy_test.cc	Adds new CPU regression tests for out-of-range labels (currently duplicates CUDA test names).
onnxruntime/contrib_ops/cpu/quantization/matmul_bnb4.cc	Adds attribute positivity checks and input-size validation for MatMulBnb4 on CPU.
onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cc	Adds attribute positivity checks and input-size validation for MatMulBnb4 on CUDA.
onnxruntime/test/contrib_ops/matmul_bnb4_test.cc	Adds new negative tests for undersized MatMulBnb4 inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add bounds checking for label tensor values in SoftmaxCrossEntropyLoss and SoftmaxCrossEntropyLossGrad to prevent out-of-bounds memory reads when processing untrusted ONNX models. The operators used label_data values as array indices into log_prob_data and weight_data buffers without validating they fall within [0, C) where C is the number of classes. A malicious model could embed arbitrary label values causing heap OOB reads. Changes: - Add label value validation in both forward and backward Compute methods - Add weight tensor size validation (weight_shape[0] == C) - Move weight_data[label] access after ignore_index check in grad path to prevent OOB when ignore_index is outside [0, C) - Add regression tests for both forward and backward paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xadupre · 2026-04-23T12:53:31Z

+  // Validate label values are within [0, C) to prevent out-of-bounds reads.
+  for (int64_t i = 0; i < N_D; i++) {
+    if (ignore_index != label_data[i]) {
+      ORT_RETURN_IF(label_data[i] < 0 || label_data[i] >= C,


Should we move that test in one of the loops below where the label is compared a second time to ignore_index?

Should we move that test in one of the loops below where the label is compared a second time to ignore_index?

I think the separate validation pass is the better approach here, for a few reasons:

Fail-fast: validates all labels before any computation begins, so we never produce partial/corrupt output on bad input.

Parallel safety: the Grad path uses TryParallelFor, and propagating errors out of parallel lambdas is tricky. A separate upfront check avoids that complexity.

Negligible cost: the label scan is O(N_D) vs O(N_D × C) for the main computation, so the extra pass should not be measurable in practice.

That said, I'm open to folding it in if you feel strongly about it. Just wanted to flag the tradeoffs.

vraspar requested a review from Copilot April 7, 2026 23:42

Copilot started reviewing on behalf of vraspar April 7, 2026 23:43 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

adrianlizarraga requested a review from Copilot April 10, 2026 21:45

Copilot started reviewing on behalf of adrianlizarraga April 10, 2026 21:46 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

vraspar requested a review from adrianlizarraga April 13, 2026 22:13

vraspar force-pushed the vraspar/fix-softmax-ce-loss-oob-read branch from f2c15e3 to f20400c Compare April 16, 2026 18:38

vraspar requested a review from Copilot April 16, 2026 18:38

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc

Comment thread orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc

Comment thread orttraining/orttraining/test/training_ops/cpu/loss/cross_entropy_test.cc

Copilot started reviewing on behalf of vraspar April 16, 2026 18:48 View session

vraspar requested a review from baijumeswani April 16, 2026 21:48

xadupre reviewed Apr 23, 2026

View reviewed changes

vraspar requested a review from xadupre April 30, 2026 20:32

Conversation

vraspar commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xadupre Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

vraspar Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vraspar commented Apr 7, 2026 •

edited

Loading