Address CUDA ReduceL2 negative results and add test coverage by yuslepukhin · Pull Request #28681 · microsoft/onnxruntime

yuslepukhin · 2026-05-26T22:27:07Z

This pull request improves the correctness and robustness of L1 and L2 reduction operations (norms) for both CPU and CUDA backends, especially for integer types. It addresses issues with undefined behavior and overflow when computing absolute values and sums/squares, particularly for edge cases like the minimum representable integer value and large accumulations. The changes also introduce comprehensive tests to ensure correct handling of these cases.

The most important changes are:

CPU backend: safer and more robust norm reductions

Added a saturating_abs helper for integer types to safely compute absolute values, returning the maximum representable value instead of overflowing for values like INT_MIN. (onnxruntime/core/providers/cpu/reduction/reduction_ops.h, onnxruntime/core/providers/cpu/reduction/reduction_ops.hR722-R782)
Modified ReduceAggregatorL1 and ReduceAggregatorL2 to accumulate in double for integer types, avoiding overflow during sum or sum-of-squares, and saturate the result to the type's maximum value if necessary. (onnxruntime/core/providers/cpu/reduction/reduction_ops.h, [1] [2]

CUDA backend: saturating absolute value and correct no-op reduction handling

Introduced Impl_SaturatingAbs for CUDA, which safely computes absolute values for norm reductions, including handling of the minimum signed integer. (onnxruntime/core/providers/cuda/reduction/reduction_functions.cu, [1]; onnxruntime/core/providers/cuda/reduction/reduction_functions.h, [2]
Updated reduction kernels to use Impl_SaturatingAbs for norm operations (L1/L2) when the reduction is a no-op (input and output shapes are the same), ensuring the result is always non-negative and avoiding undefined behavior. (onnxruntime/core/providers/cuda/reduction/reduction_ops.cc, [1] [2] [3]

Testing: improved coverage for edge cases

Added new tests for L1 reductions with negative values, singleton axes, and integer overflow scenarios, including cases for INT_MIN and summation overflow, to verify correct saturation and behavior across data types and backends. (onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc, onnxruntime/test/providers/cpu/reduction/reduction_ops_test.ccR350-R449)

Copilot

Pull request overview

This PR fixes CUDA ReduceL1/ReduceL2 correctness for “no-op” reductions where the input and output element counts are the same (e.g., reducing a singleton axis, or scalar-like cases), ensuring norm semantics by producing non-negative results. It also adds regression tests targeting negative inputs for singleton-axis reductions across several data types.

Changes:

CUDA: for ReduceL1/ReduceL2 no-op reductions (input_count == output_count), compute elementwise absolute value instead of memcpy/copy-through behavior.
CUDA: apply the same no-op norm handling across multiple reduction code paths (shared helper, core compute, and specialized int path).
Tests: add new ReduceL1/ReduceL2 cases for negative inputs with singleton axes (keepdims and non-keepdims; multiple dtypes).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`onnxruntime/core/providers/cuda/reduction/reduction_ops.cc`	Applies `Abs` for CUDA no-op `ReduceL1`/`ReduceL2` to enforce non-negative norm results instead of copying negative values.
`onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc`	Adds regression tests for singleton-axis reductions with negative inputs across multiple types (including FP16 when CUDA is enabled).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

+  // For int64 inputs, values > 2^53 lose precision, but the final truncation to int64
+  // makes the error at most ±1.


+  test.AddInput<int32_t>("data", {1}, {std::numeric_limits<int32_t>::min()});
+  // abs(INT_MIN) overflows; we expect saturation to INT_MAX.
+  test.AddOutput<int32_t>("reduced", {}, {std::numeric_limits<int32_t>::max()});
+  test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider});


+  test.AddOutput<int32_t>("reduced", {}, {std::numeric_limits<int32_t>::max()});
+  // Only CPU handles saturation; CUDA casts to float (loses precision but doesn't overflow).
+  test.Run(OpTester::ExpectResult::kExpectSuccess, "",
+           {kTensorrtExecutionProvider, kCudaExecutionProvider, kRocmExecutionProvider});


tianleiwu

Review Summary

The approach is sound — using double accumulators for CPU integer reductions and saturating abs for CUDA no-op paths correctly eliminates UB and overflow. Tests are comprehensive.

One blocking issue: missing CUDA template instantiations will cause linker failures (see inline comment).

tianleiwu · 2026-05-26T23:49:34Z

+template void Impl_SaturatingAbs<double>(cudaStream_t, const double*, double*, size_t);
+template void Impl_SaturatingAbs<half>(cudaStream_t, const half*, half*, size_t);
+template void Impl_SaturatingAbs<BFloat16>(cudaStream_t, const BFloat16*, BFloat16*, size_t);
+template void Impl_SaturatingAbs<int32_t>(cudaStream_t, const int32_t*, int32_t*, size_t);


⚠️ Missing explicit instantiations for int64_t and int8_t — will cause a linker error.

The SPECIALIZED_REDUCEKERNEL_COMPUTEIMPL macro (expanded for int32_t, int64_t, int8_t at lines 834-836 of reduction_ops.cc) now calls Impl_SaturatingAbs<CudaT> guarded by a runtime cudnn_reduce_op check. Since the check isn't compile-time, the compiler emits the call for all three expansions, and the linker needs the symbols.

Even though ReduceL1/L2 aren't registered for int64_t/int8_t on CUDA (dead code at runtime), the linker still requires the instantiations.

template void Impl_SaturatingAbs<int64_t>(cudaStream_t, const int64_t*, int64_t*, size_t); template void Impl_SaturatingAbs<int8_t>(cudaStream_t, const int8_t*, int8_t*, size_t);

Address the issue and add test coverage

2e86226

yuslepukhin requested a review from Copilot May 26, 2026 22:27

Copilot started reviewing on behalf of yuslepukhin May 26, 2026 22:27 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/reduction/reduction_ops.cc Outdated

Comment thread onnxruntime/core/providers/cuda/reduction/reduction_ops.cc Outdated

Comment thread onnxruntime/test/providers/cpu/reduction/reduction_ops_test.cc

yuslepukhin added 2 commits May 26, 2026 16:36

Address review comments

febdd3e

Add test coverage

5308513

yuslepukhin requested a review from Copilot May 26, 2026 23:39

Copilot started reviewing on behalf of yuslepukhin May 26, 2026 23:39 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

tianleiwu requested changes May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address CUDA ReduceL2 negative results and add test coverage#28681

Address CUDA ReduceL2 negative results and add test coverage#28681
yuslepukhin wants to merge 3 commits into
mainfrom
yuslepukhin/reducel2_negative

yuslepukhin commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

tianleiwu left a comment

Uh oh!

tianleiwu May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// For int64 inputs, values > 2^53 lose precision, but the final truncation to int64
		// makes the error at most ±1.

Conversation

yuslepukhin commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

tianleiwu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuslepukhin commented May 26, 2026 •

edited

Loading