Add LpNormalization support for CUDA Execution Provider by apsonawane · Pull Request #28724 · microsoft/onnxruntime

apsonawane · 2026-05-29T20:25:17Z

This pull request adds CUDA (GPU) support for the LpNormalization ONNX operator in ONNX Runtime, including implementation, kernel registration, and new unit tests (notably for FP16). The main changes involve adding the CUDA kernel, wiring it up for opsets 1–22, and extending the test suite to cover new scenarios and datatypes.

CUDA LpNormalization Operator Support:

Implemented CUDA kernel for LpNormalization supporting float, double, and MLFloat16 datatypes, with efficient handling for both L1 and L2 normalization. [1] [2] [3] [4]
Registered the CUDA kernel for LpNormalization for opsets 1–21 (versioned) and opset 22 (current), for all supported datatypes (float, double, MLFloat16). [1] [2] [3] [4]

Testing and Validation:

Added new unit tests for LpNormalization covering FP16, various axes, and both L1/L2 normalization, ensuring CUDA kernel correctness and excluding unsupported providers. [1] [2]
Updated backend test filters to reflect the current status of LpNormalization-related tests.

These changes collectively enable and validate GPU-accelerated LpNormalization in ONNX Runtime for a wide range of models and datatypes.

Copilot

Pull request overview

This PR adds a CUDA implementation of the ONNX LpNormalization operator to ONNX Runtime (opsets 1–22), and extends the unit tests and backend-test filters to validate/track the new support (including FP16 scenarios).

Changes:

Added CUDA kernel implementation for LpNormalization (float/double/MLFloat16) and wired it into the CUDA EP kernel registration for opsets 1–22.
Added new unit tests covering FP16 and additional axis scenarios.
Updated ONNX backend test series filters to narrow the currently-skipped l2normalization zero-norm cases.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc	Refines skipped `l2normalization` backend tests to specific zero-norm cases.
onnxruntime/test/providers/cpu/nn/lp_norm_op_test.cc	Adds new `LpNormalization` tests (including FP16 and additional axes).
onnxruntime/core/providers/cuda/nn/lp_norm.h	Introduces CUDA kernel class wrapper for `LpNormalization`.
onnxruntime/core/providers/cuda/nn/lp_norm.cc	Implements CUDA kernel registration + `ComputeInternal` calling into the CUDA impl.
onnxruntime/core/providers/cuda/nn/lp_norm_impl.h / .cu	Adds CUDA device implementation for L1/L2 normalization.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc	Registers the new `LpNormalization` CUDA kernels for opsets 1–22.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

tianleiwu

Solid, well-scoped addition of the CUDA LpNormalization kernel. The element indexing exactly mirrors the existing CPU kernel, the block reduction uses a power-of-two thread count via NextPowerOfTwo, and FP16 accumulation is correctly done in float via AccumulationType_t<T>. The major concerns from earlier review rounds (non-power-of-two reduction, FP16 accumulation overflow, empty-tensor division-by-zero, and test EP setup) are all addressed in the current head.

A couple of minor, non-blocking suggestions are left inline. One clarification on the prior automated review: the zero-norm branch writing 0 actually matches the ORT CPU kernel (yVec.setZero()); both diverge from the ONNX spec equally, which is why the corresponding backend tests remain in current_failing_tests. So the kernel is consistent with the CPU path here.

Remaining items:

A coverage gap: the FP16 tests use norm_size = 3 with small magnitudes and pass with or without the accumulation fix; a test with a longer axis (128-256) and larger magnitudes would actually validate the float-accumulation overflow protection.

apsonawane added 2 commits May 29, 2026 18:30

Add LpNormalization support for CUDA EP

d92b994

Fix lint

e620ccf

apsonawane requested a review from Copilot May 29, 2026 20:28

Copilot started reviewing on behalf of apsonawane May 29, 2026 20:28 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

apsonawane requested a review from Copilot June 1, 2026 16:39

Copilot started reviewing on behalf of apsonawane June 1, 2026 16:39 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm_impl.cu Outdated

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm_impl.cu Outdated

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm_impl.cu Outdated

Fix comments

9020124

tianleiwu reviewed Jun 1, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm_impl.cu Outdated

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm_impl.cu Outdated

tianleiwu reviewed Jun 1, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/nn/lp_norm.cc Outdated

Fix pipelines

6a67de8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LpNormalization support for CUDA Execution Provider#28724

Add LpNormalization support for CUDA Execution Provider#28724
apsonawane wants to merge 4 commits into
mainfrom
asonawane/lp

apsonawane commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apsonawane commented May 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants