Skip to content

Commit 8c4245e

Browse files
authored
CUDA Plugin Cleanup for Shared Kernel Helpers (microsoft#27915)
## Description This PR reduces the amount of CUDA plugin-specific compatibility code by moving reusable validation and attribute-reading logic into shared helper paths that work for both bundled and plugin builds. It also fills in a missing allocator hook in the EP adapter so plugin kernels can reuse the same initialization path as the in-tree CUDA EP, which simplifies maintenance and improves behavior parity. The follow-up changes update the CUDA plugin design doc to reflect the new shared-helper model and add focused plugin regression tests for the two runtime paths that changed most materially. ## Summary of Changes ### EP adapter and shared helper extraction | File | Change | |------|--------| | `ep/adapter/op_kernel_info.h` | Adds `OpKernelInfo::GetAllocator(OrtMemType)` so adapter-based kernels can request device or CPU temp allocators in plugin builds. | | `cpu/tensor/scatter_nd.h` | Extracts shape validation into `scatter_nd_internal::ValidateShapes` so the same logic can be reused outside the CPU `ScatterND` class. | | `cpu/tensor/space_depth_ops.h` | Moves blocksize parsing, mode parsing, and dimension validation into `space_depth_internal` helpers that can be shared by CUDA kernels. | ### CUDA kernel cleanup and plugin parity | File | Change | |------|--------| | `cuda/tensor/scatter_nd.cc` | Removes the plugin-only `ScatterND` validation duplicate and reuses the shared helper implementation. | | `cuda/tensor/scatter_nd.h` | Drops the old conditional include split now that validation is shared through the common helper path. | | `cuda/tensor/space_depth_ops.h` | Deletes the plugin-only `SpaceToDepth`/`DepthToSpace` reimplementation and inherits from the shared base/helper logic in all builds. | | `cuda/tensor/upsample.cc` | Reuses the normal antialias lookup-table allocation/caching path in plugin builds via the new allocator adapter support. | | `cuda/tensor/upsample.h` | Keeps the persistent device lookup-table member available in plugin builds as well. | ### Shared-provider and diagnostics alignment | File | Change | |------|--------| | `cpu/cpu_provider_shared.cc` | Routes shared-provider `ScatterND` shape validation through the extracted helper. | | `provider_bridge_provider.cc` | Updates the bridge-side `ScatterND::ValidateShapes` implementation to call the shared helper directly. | | `cuda/cudnn_common.h` | Preserves the batch-norm epsilon warning path in plugin builds instead of suppressing it. | | `cuda/nn/conv.cc` | Removes plugin-specific shortened cuDNN frontend errors so bundled and plugin builds both include frontend JSON in failures. | | `cuda/nn/conv_transpose.cc` | Extends cuDNN frontend failures to include frontend JSON for easier debugging, matching the `Conv` behavior. | ### Documentation and regression coverage | File | Change | |------|--------| | `cuda_plugin_ep_design.md` | Updates the design doc to reflect that `ScatterND`, `SpaceDepth`, and `Upsample` now use shared adapter-safe helper paths instead of plugin-only fallback branches. | | `test_cuda_plugin_ep.py` | Adds plugin regression coverage for antialias `Resize`/`Upsample` and `ScatterND`, covering the new allocator-backed lookup-table path and the shared `ScatterND` validation helper. | ## Testing - Build with `onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON` and verify the affected CUDA provider sources compile without the removed plugin-only fallback paths. - Run targeted CUDA provider coverage for `ScatterND`, `SpaceToDepth`/`DepthToSpace`, `Resize`/`Upsample`, `Conv`, and `ConvTranspose` in both plugin and bundled CUDA configurations. - Confirm antialias upsample still initializes and uses the shared lookup table correctly in plugin builds. - Run the new plugin tests for antialias `Resize` and `ScatterND` in `onnxruntime/test/python/transformers/test_cuda_plugin_ep.py`. - Confirm cuDNN frontend failure paths now emit the same diagnostic detail in plugin and non-plugin builds. ## Motivation and Context The initial CUDA plugin enablement introduced several localized `#ifdef BUILD_CUDA_EP_AS_PLUGIN` branches and helper copies to get kernels compiling under the adapter path. This cleanup pays down that compatibility debt by extracting the truly shared pieces into reusable helpers and by teaching the adapter `OpKernelInfo` how to provide the allocators those kernels already expect. The result is less duplicated logic, fewer plugin-only code paths to keep in sync, and better debugging consistency between the plugin EP and the built-in CUDA EP. ## Checklist - [x] Tests added/updated - [x] Documentation updated (if applicable) - [x] No breaking changes (or documented in description)
1 parent e688ef1 commit 8c4245e

15 files changed

Lines changed: 243 additions & 318 deletions

File tree

docs/cuda_plugin_ep/cuda_plugin_ep_design.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ The primary approach moves pure-computation helpers from CPU `.cc` files to head
217217
- `roialign.h``CheckROIAlignValidInput`, `RoiAlignBase` constructor (templatized on info type)
218218
- `upsamplebase.h``UpsampleBase::AdjustOutputSizeAsPolicy`
219219
- `crop.h``CropBase` constructor (templatized on info type)
220-
- `space_depth_ops.h``SpaceDepthBase` constructor (templatized on info type)
220+
- `space_depth_ops.h``SpaceDepthBase` constructor plus shared `ReadBlocksize`, `ReadIsDCR`, and dimension-validation helpers (templatized on info/context type where needed)
221221
- `clip.h` — Clip min/max attribute handling (removed `Clip_6Base` CPU dependency)
222222
- `cuda_common_type_helpers.h` — CUDA type conversion and handle error string helpers (moved from `cuda_common.cc`)
223223

@@ -253,7 +253,8 @@ This allows the base class constructor to work with both the framework `OpKernel
253253
Some CPU base classes have heavy dependencies (protobuf, `UnpackTensor`) that make inlining impractical:
254254

255255
- **`ConstantOfShapeBase`** — depends on `TensorProto` and `UnpackTensor`. The plugin path in `constant_of_shape.h` stays self-contained: it reuses `ConstantOfShapeCore` but fetches the `value` attribute through the ORT C++ API instead of depending on the full CPU base implementation.
256-
- **`UpsampleBase`** — partially addressed: `AdjustOutputSizeAsPolicy` moved to header (#27628). Still depends on `InputDefs()` and `OpKernelInfo::GetAllocator()` which are not in the adapter.
256+
257+
`UpsampleBase` no longer belongs in this category: the adapter now exposes `OpKernelInfo::GetAllocator(OrtMemType)`, and the remaining shape-rank query already has an adapter-safe fallback when `Node::InputDefs()` is unavailable. That lets the CUDA `Upsample` antialias path reuse the same persistent device lookup-table initialization in both bundled and plugin builds instead of keeping a plugin-only scratch-buffer fallback.
257258

258259
---
259260

@@ -619,7 +620,7 @@ The branch still contains a small set of plugin guards in both infrastructure an
619620
- `generator/constant_of_shape.h` still needs a plugin-specific path because `ConstantOfShapeBase` depends on framework-only tensor-attribute helpers.
620621
- Tunable kernels such as `math/matmul.cc` still gate framework-only registration paths.
621622
- `tensor/identity_op.h` guards the `TensorSeq` code path and `context->InputType()` call with `#ifndef BUILD_CUDA_EP_AS_PLUGIN` — the plugin build handles only the `Tensor` path. `identity_op.cc` uses conditional macros (`IDENTITY_V_TYPES` / `IDENTITY_V_TYPES_IRv9`) so opset 14+ registrations use `AllFixedSizeTensorTypes()` in the plugin build. Additionally, old Dropout opset 7–9 and 10–11 kernel registrations were moved from `identity_op.cc` to `nn/dropout.cc` so that each op's registrations live in that op's own source file.
622-
- A few tensor kernels (`pad.cc`, `tile.cc`, `unsqueeze.cc`, `upsample.*`, `space_depth_ops.h`, `scatter_nd.*`) still contain localized plugin guards where adapter and framework paths have not fully converged.
623+
- A few tensor kernels (`pad.cc`, `tile.cc`, `unsqueeze.cc`) still contain localized plugin guards where adapter and framework paths have not fully converged. Recent cleanup removed the plugin-only branches from `upsample.*`, `space_depth_ops.h`, and `scatter_nd.*` by moving reusable logic into shared adapter-safe helpers and by adding allocator access to `ep::adapter::OpKernelInfo`.
623624

624625
The broad trend remains positive: most operator-level plugin conditionals were removed by moving reusable CPU/helper logic into shared headers and by centralizing stream bridging in `CudaKernel` helpers.
625626

include/onnxruntime/ep/adapter/op_kernel_info.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,14 @@ struct OpKernelInfo {
7373
const DataTransferManager& GetDataTransferManager() const noexcept {
7474
return (static_cast<const Ep*>(cache_->ort_ep_))->GetDataTransferManager();
7575
}
76+
77+
// Delegates to the core OpKernelInfo::GetAllocator so the adapter returns
78+
// exactly the same allocator the framework would provide for each OrtMemType.
79+
AllocatorPtr GetAllocator(OrtMemType mem_type) const {
80+
const auto* core_kernel_info = reinterpret_cast<const ::onnxruntime::OpKernelInfo*>(cache_->kernel_info_);
81+
return core_kernel_info->GetAllocator(mem_type);
82+
}
83+
7684
Node node() const noexcept {
7785
return Node{cache_->kernel_info_};
7886
}

onnxruntime/core/providers/cpu/cpu_provider_shared.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ struct ProviderHostCPUImpl : ProviderHostCPU {
8989
// From cpu/tensor/scatter_nd.h (direct)
9090
Status ScatterNDBase__ValidateShapes(const TensorShape& input_shape,
9191
const TensorShape& indice_shape,
92-
const TensorShape& update_shape) override { return ScatterND::ValidateShapes(input_shape, indice_shape, update_shape); }
92+
const TensorShape& update_shape) override { return scatter_nd_internal::ValidateShapes(input_shape, indice_shape, update_shape); }
9393
// From cpu/tensor/padbase.h (direct)
9494
Status PadBase__HandleDimValueZero(const Mode& mode, const TensorShape& input_shape, const TensorShape& output_shape) override { return PadBase::HandleDimValueZero(mode, input_shape, output_shape); }
9595

onnxruntime/core/providers/cpu/tensor/scatter_nd.h

Lines changed: 47 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
#include "core/common/narrow.h"
77

8-
#ifndef SHARED_PROVIDER
8+
#if !defined(SHARED_PROVIDER) && !defined(BUILD_CUDA_EP_AS_PLUGIN)
99
#include "core/common/common.h"
1010
#include "core/framework/op_kernel.h"
1111
#endif
@@ -15,6 +15,51 @@ namespace concurrency {
1515
class ThreadPool;
1616
}
1717

18+
namespace scatter_nd_internal {
19+
20+
inline Status ValidateShapes(const TensorShape& input_shape,
21+
const TensorShape& indice_shape,
22+
const TensorShape& update_shape) {
23+
auto input_rank = input_shape.NumDimensions();
24+
auto indice_rank = indice_shape.NumDimensions();
25+
auto update_rank = update_shape.NumDimensions();
26+
27+
if (input_rank == 0 || indice_rank == 0) {
28+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
29+
"input tensor and indices tensor must have rank larger than 0. ",
30+
"input shape: ", input_shape, ", indices shape: ", indice_shape);
31+
}
32+
33+
auto last_indice_dimension = indice_shape[indice_rank - 1];
34+
if (last_indice_dimension > static_cast<int64_t>(input_rank)) {
35+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
36+
"last dimension of indices must not be larger than rank of input tensor");
37+
}
38+
39+
bool is_update_shape_invalid = [&]() {
40+
if (update_rank != (input_rank + indice_rank - 1 - static_cast<ptrdiff_t>(last_indice_dimension))) {
41+
return true;
42+
}
43+
if (indice_shape.Slice(0, indice_rank - 1) != update_shape.Slice(0, indice_rank - 1)) {
44+
return true;
45+
}
46+
if (input_shape.Slice(onnxruntime::narrow<size_t>(last_indice_dimension)) != update_shape.Slice(indice_rank - 1)) {
47+
return true;
48+
}
49+
return false;
50+
}();
51+
52+
if (is_update_shape_invalid) {
53+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
54+
"updates tensor should have shape equal to indices.shape[:-1] + data.shape[indices.shape[-1]:]. ",
55+
"updates shape: ", update_shape, ", indices shape: ", indice_shape, ", data shape: ", input_shape);
56+
}
57+
58+
return Status::OK();
59+
}
60+
61+
} // namespace scatter_nd_internal
62+
1863
class ScatterND final : public OpKernel {
1964
public:
2065
enum class Reduction : int {
@@ -51,42 +96,7 @@ class ScatterND final : public OpKernel {
5196
static inline Status ValidateShapes(const TensorShape& input_shape,
5297
const TensorShape& indice_shape,
5398
const TensorShape& update_shape) {
54-
auto input_rank = input_shape.NumDimensions();
55-
auto indice_rank = indice_shape.NumDimensions();
56-
auto update_rank = update_shape.NumDimensions();
57-
58-
if (input_rank == 0 || indice_rank == 0) {
59-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
60-
"input tensor and indices tensor must has rank larger than 0. ",
61-
"input shape: ", input_shape, ", indices shape: ", indice_shape);
62-
}
63-
64-
auto last_indice_dimension = indice_shape[indice_rank - 1];
65-
if (last_indice_dimension > static_cast<int64_t>(input_rank)) {
66-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
67-
"last dimension of indices must not be larger than rank of input tensor");
68-
}
69-
70-
bool is_update_shape_invalid = [&]() {
71-
if (update_rank != (input_rank + indice_rank - 1 - static_cast<ptrdiff_t>(last_indice_dimension))) {
72-
return true;
73-
}
74-
if (indice_shape.Slice(0, indice_rank - 1) != update_shape.Slice(0, indice_rank - 1)) {
75-
return true;
76-
}
77-
if (input_shape.Slice(onnxruntime::narrow<size_t>(last_indice_dimension)) != update_shape.Slice(indice_rank - 1)) {
78-
return true;
79-
}
80-
return false;
81-
}();
82-
83-
if (is_update_shape_invalid) {
84-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
85-
"updates tensor should have shape equal to indices.shape[:-1] + data.shape[indices.shape[-1]:]. ",
86-
"updates shape: ", update_shape, ", indices shape: ", indice_shape, ", data shape: ", input_shape);
87-
}
88-
89-
return Status::OK();
99+
return scatter_nd_internal::ValidateShapes(input_shape, indice_shape, update_shape);
90100
}
91101
#endif // SHARED_PROVIDER
92102

onnxruntime/core/providers/cpu/tensor/space_depth_ops.h

Lines changed: 93 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -3,72 +3,116 @@
33

44
#pragma once
55

6+
#include <string>
7+
8+
#if !defined(SHARED_PROVIDER) && !defined(BUILD_CUDA_EP_AS_PLUGIN)
69
#include "core/framework/op_kernel.h"
10+
#endif
711

812
namespace onnxruntime {
913

10-
class SpaceDepthBase {
11-
protected:
12-
template <typename KernelInfoType>
13-
explicit SpaceDepthBase(const KernelInfoType& info) {
14-
ORT_ENFORCE(info.template GetAttr<int64_t>("blocksize", &blocksize_).IsOK(),
15-
"Attribute blocksize is not set.");
14+
namespace space_depth_internal {
15+
16+
template <typename KernelInfoType>
17+
inline int64_t ReadBlocksize(const KernelInfoType& info) {
18+
int64_t blocksize = 0;
19+
ORT_ENFORCE(info.template GetAttr<int64_t>("blocksize", &blocksize).IsOK(),
20+
"Attribute blocksize is not set.");
21+
return blocksize;
22+
}
23+
24+
template <typename KernelInfoType>
25+
inline bool ReadIsDCR(const KernelInfoType& info) {
26+
bool is_dcr = true;
27+
std::string mode;
28+
// If mode doesn't exist, then it is the default "DCR" mode
29+
// (or) it is an opset < 11 model for which the only mode is "DCR" mode.
30+
if (info.GetAttr("mode", &mode).IsOK()) {
31+
if (mode == "CRD") {
32+
is_dcr = false;
33+
} else if (mode != "DCR") {
34+
ORT_THROW("DepthToSpace op: only 'DCR' and 'CRD' modes are supported");
35+
}
1636
}
1737

18-
template <bool IsNHWC = false>
19-
Status InputValidationsAndOutputDimsCalc(const Tensor& input,
20-
int64_t& batch,
21-
int64_t& input_depth, int64_t& input_height, int64_t& input_width,
22-
int64_t& output_depth, int64_t& output_height, int64_t& output_width,
23-
bool is_space_to_depth) const {
24-
const TensorShape& input_shape = input.Shape();
38+
return is_dcr;
39+
}
40+
41+
template <bool IsNHWC = false>
42+
inline Status InputValidationsAndOutputDimsCalc(int64_t blocksize,
43+
const Tensor& input,
44+
int64_t& batch,
45+
int64_t& input_depth, int64_t& input_height, int64_t& input_width,
46+
int64_t& output_depth, int64_t& output_height, int64_t& output_width,
47+
bool is_space_to_depth) {
48+
const TensorShape& input_shape = input.Shape();
49+
50+
if (input_shape.NumDimensions() != 4) {
51+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceDepth ops require a 4-D input. Provided rank: ",
52+
input_shape.NumDimensions());
53+
}
2554

26-
if (input_shape.NumDimensions() != 4) {
27-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceDepth ops require a 4-D input. Provided rank: ",
28-
input_shape.NumDimensions());
55+
batch = input_shape[0];
56+
if constexpr (IsNHWC) {
57+
input_depth = input_shape[3];
58+
input_height = input_shape[1];
59+
input_width = input_shape[2];
60+
} else {
61+
input_depth = input_shape[1];
62+
input_height = input_shape[2];
63+
input_width = input_shape[3];
64+
}
65+
66+
if (is_space_to_depth) { // SpaceToDepth op
67+
if ((input_height % blocksize) != 0) {
68+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceToDepth requires input height to be a multiple of block_size");
2969
}
3070

31-
batch = input_shape[0];
32-
if constexpr (IsNHWC) {
33-
input_depth = input_shape[3];
34-
input_height = input_shape[1];
35-
input_width = input_shape[2];
36-
} else {
37-
input_depth = input_shape[1];
38-
input_height = input_shape[2];
39-
input_width = input_shape[3];
71+
if ((input_width % blocksize) != 0) {
72+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceToDepth requires input width to be a multiple of block_size");
4073
}
4174

42-
if (is_space_to_depth) { // SpaceToDepth op
43-
if ((input_height % this->blocksize_) != 0) {
44-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceToDepth requires input height to be a multiple of block_size");
45-
}
75+
output_depth = input_depth * blocksize * blocksize;
76+
output_height = input_height / blocksize;
77+
output_width = input_width / blocksize;
4678

47-
if ((input_width % this->blocksize_) != 0) {
48-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "SpaceToDepth requires input width to be a multiple of block_size");
49-
}
79+
} else { // DepthToSpace op
80+
if ((input_depth % (blocksize * blocksize) != 0)) {
81+
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
82+
"DepthToSpace requires input depth to be a multiple of (block_size * block_size)");
83+
}
5084

51-
output_depth = input_depth * blocksize_ * blocksize_;
52-
output_height = input_height / blocksize_;
53-
output_width = input_width / blocksize_;
85+
output_depth = input_depth / blocksize / blocksize;
86+
output_height = input_height * blocksize;
87+
output_width = input_width * blocksize;
88+
}
5489

55-
} else { // DepthToSpace op
56-
if ((input_depth % (blocksize_ * blocksize_) != 0)) {
57-
return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT,
58-
"DepthToSpace requires input depth to be a multiple of (block_size * block_size)");
59-
}
90+
return Status::OK();
91+
}
6092

61-
output_depth = input_depth / blocksize_ / blocksize_;
62-
output_height = input_height * blocksize_;
63-
output_width = input_width * blocksize_;
64-
}
93+
} // namespace space_depth_internal
94+
95+
class SpaceDepthBase {
96+
protected:
97+
template <typename KernelInfoType>
98+
explicit SpaceDepthBase(const KernelInfoType& info) : blocksize_(space_depth_internal::ReadBlocksize(info)) {}
6599

66-
return Status::OK();
100+
template <bool IsNHWC = false>
101+
Status InputValidationsAndOutputDimsCalc(const Tensor& input,
102+
int64_t& batch,
103+
int64_t& input_depth, int64_t& input_height, int64_t& input_width,
104+
int64_t& output_depth, int64_t& output_height, int64_t& output_width,
105+
bool is_space_to_depth) const {
106+
return space_depth_internal::InputValidationsAndOutputDimsCalc<IsNHWC>(
107+
blocksize_, input, batch, input_depth, input_height, input_width,
108+
output_depth, output_height, output_width, is_space_to_depth);
67109
}
68110

69111
int64_t blocksize_;
70112
};
71113

114+
#if !defined(SHARED_PROVIDER) && !defined(BUILD_CUDA_EP_AS_PLUGIN)
115+
72116
class SpaceToDepth final : public OpKernel, SpaceDepthBase {
73117
public:
74118
explicit SpaceToDepth(const OpKernelInfo& info) : OpKernel(info), SpaceDepthBase(info) {
@@ -79,23 +123,15 @@ class SpaceToDepth final : public OpKernel, SpaceDepthBase {
79123

80124
class DepthToSpace final : public OpKernel, SpaceDepthBase {
81125
public:
82-
explicit DepthToSpace(const OpKernelInfo& info) : OpKernel(info), SpaceDepthBase(info) {
83-
std::string mode;
84-
// if mode doesn't exist, then it is the default "DCR" mode
85-
// (or) it is an opset < 11 model for which the only mode is "DCR" mode
86-
if (info.GetAttr("mode", &mode).IsOK()) {
87-
if (mode == "CRD")
88-
is_dcr_ = false;
89-
90-
else if (mode != "DCR")
91-
ORT_THROW("DepthToSpace op: only 'DCR' and 'CRD' modes are supported");
92-
}
93-
}
126+
explicit DepthToSpace(const OpKernelInfo& info)
127+
: OpKernel(info), SpaceDepthBase(info), is_dcr_(space_depth_internal::ReadIsDCR(info)) {}
94128

95129
Status Compute(OpKernelContext* context) const override;
96130

97131
private:
98132
bool is_dcr_ = true;
99133
};
100134

135+
#endif // !defined(SHARED_PROVIDER) && !defined(BUILD_CUDA_EP_AS_PLUGIN)
136+
101137
} // namespace onnxruntime

onnxruntime/core/providers/cuda/cudnn_common.h

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -149,11 +149,10 @@ struct Consts<BFloat16> {
149149

150150
inline double ClampCudnnBatchNormEpsilon(double epsilon) {
151151
if (epsilon < CUDNN_BN_MIN_EPSILON) {
152-
#ifndef BUILD_CUDA_EP_AS_PLUGIN
153-
if (CUDNN_BN_MIN_EPSILON - epsilon > FLT_EPSILON)
152+
if (CUDNN_BN_MIN_EPSILON - epsilon > FLT_EPSILON) {
154153
LOGS_DEFAULT(WARNING) << "Provided epsilon is smaller than CUDNN_BN_MIN_EPSILON. "
155154
<< "Setting it to CUDNN_BN_MIN_EPSILON";
156-
#endif
155+
}
157156
return CUDNN_BN_MIN_EPSILON;
158157
}
159158
return epsilon;

onnxruntime/core/providers/cuda/nn/conv.cc

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -237,12 +237,8 @@ Status Conv<T, Layout>::CreateCudnnFeExecutionPlan(const onnxruntime::TensorShap
237237
CUDNN_FE_CALL_THROW(s_.cudnn_fe_graph->build_operation_graph(handle));
238238
CUDNN_FE_CALL_THROW(s_.cudnn_fe_graph->create_execution_plans({heur_mode}));
239239
} catch (const std::exception& ex) {
240-
#ifndef BUILD_CUDA_EP_AS_PLUGIN
241240
std::string message = MakeString("Failed to initialize CUDNN Frontend: ", ex.what(),
242241
" with the cudnn frontend json:\n", s_.cudnn_fe_graph->print());
243-
#else
244-
std::string message = MakeString("Failed to initialize CUDNN Frontend: ", ex.what());
245-
#endif
246242
return Status(common::StatusCategory::ONNXRUNTIME, common::StatusCode::EP_FAIL, message);
247243
}
248244

@@ -253,12 +249,8 @@ Status Conv<T, Layout>::CreateCudnnFeExecutionPlan(const onnxruntime::TensorShap
253249
CUDNN_FE_CALL_THROW(s_.cudnn_fe_graph->build_plans(handle));
254250
} catch (const std::exception& ex) {
255251
if (!fuse_bias && !fuse_act && use_tf32) {
256-
#ifndef BUILD_CUDA_EP_AS_PLUGIN
257252
std::string message = MakeString("OP not supported by CUDNN Frontend: ", ex.what(),
258253
" with the cudnn frontend json:\n", s_.cudnn_fe_graph->print());
259-
#else
260-
std::string message = MakeString("OP not supported by CUDNN Frontend: ", ex.what());
261-
#endif
262254
return Status(common::StatusCategory::ONNXRUNTIME, common::StatusCode::EP_FAIL, message);
263255
}
264256

0 commit comments

Comments
 (0)