Skip to content

Lower CUDA plugin EP minimum ORT version to 1.24.4 with version-gated callbacks#28824

Merged
tianleiwu merged 10 commits into
mainfrom
tlwu/20260605/cuda_plugin_min_ort_version
Jun 13, 2026
Merged

Lower CUDA plugin EP minimum ORT version to 1.24.4 with version-gated callbacks#28824
tianleiwu merged 10 commits into
mainfrom
tlwu/20260605/cuda_plugin_min_ort_version

Conversation

@tianleiwu

Copy link
Copy Markdown
Contributor

Description

Lowers the minimum supported ONNX Runtime runtime version for the standalone CUDA plugin EP from 1.26.0 to 1.24.4, so the plugin binary (built against the latest ORT headers) can be loaded by older ORT runtimes. The plugin negotiates the API version at load time and only advertises EP callbacks the negotiated runtime actually supports, so newer features degrade gracefully on older runtimes instead of crashing.

Motivation

The plugin is shipped as a separate package and is intended to run against a range of base onnxruntime runtimes. The previous hard floor of 1.26.0 was stricter than necessary: an audit of the \since annotations shows the plugin only calls APIs introduced in 1.24 or earlier (apart from the optional EP profiler, which is now version-gated). 1.24.4 is also the floor already used by the WebGPU plugin EP, so this aligns the two.

Key Changes

Area Change
plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION 1.26.01.24.4 (single source of truth for the floor)
cmake/onnxruntime_providers_cuda_plugin.cmake Reads MIN_ONNXRUNTIME_VERSION and bakes it into the DLL as the ORT_PLUGIN_EP_MIN_ORT_VERSION compile definition
cuda_plugin_ep.cc CreateEpFactories() negotiates the runtime API version via onnxruntime::ep::ApiInit(...) instead of hard-coding GetApi(26)
cuda_plugin_utils.h Adds CudaPluginEpOrtVersionSupported() = min(CurrentOrtApiVersion(), ORT_API_VERSION); removes the hard-coded min-version constant
13 callback structs Report ort_version_supported/version = CudaPluginEpOrtVersionSupported()
cuda_ep.cc Defensive capability gating: installs each newer OrtEp callback only when the negotiated runtime is new enough — Sync/CreateProfiler require ≥1.25, graph-capture set + GetAvailableResource require ≥1.26; otherwise left null
plugin-linux-cuda-test-stage.yml Adds a CI step that installs the floor (MIN_ONNXRUNTIME_VERSION) base onnxruntime and runs the plugin test against it, catching any accidental dependency on a newer API
Docs New §2.6 "API Version Audit and Defensive Capability Gating" in the design doc; QUICK_START min-version test recipe

API Version Audit

API surface Newest \since used
OrtApi direct calls 1.23
OrtEpApi direct calls 1.24
EP profiler API (only with ENABLE_CUDA_PROFILING) 1.25

Apart from the optional EP profiler, every API the plugin calls is \since 1.24 or older, justifying the 1.24.4 floor. The profiler's three \since 1.25 functions are made unreachable on older runtimes by gating the CreateProfiler callback.

Testing Notes

  • Incremental build on CUDA 12.8 / SM90 — clean, plugin .so relinked.
  • test_cuda_plugin_ep.py against the latest runtime (1.28): 87/87 tests pass.
  • Plugin (built against latest headers) loaded into onnxruntime==1.24.4: registers, enumerates all 8 GPUs, and runs inference correctly with the newer callbacks left null.
  • lintrunner clean on changed files.
  • New CI step validates the plugin against the declared floor automatically.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR lowers the minimum supported ONNX Runtime version for the standalone CUDA plugin Execution Provider (EP) from 1.26.0 to 1.24.4 by negotiating the runtime API version at load time and version-gating newer EP callbacks so the same plugin binary can load into older ORT runtimes safely.

Changes:

  • Lowered the declared minimum supported ORT runtime version to 1.24.4 and used it as build-time input for the plugin.
  • Updated CUDA plugin EP initialization to use onnxruntime::ep::ApiInit(...) for runtime API negotiation, and updated callback structs to report min(runtime_api_version, ORT_API_VERSION).
  • Added CI coverage to run the plugin tests against the declared minimum ORT version, and documented the compatibility model and local test recipe.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml Adds a CI step to install the floor ORT version and run plugin tests against it.
plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION Lowers plugin floor runtime version from 1.26.0 to 1.24.4.
onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc Updates stream/notification structs to report negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_profiler_plugin.cc Updates profiler struct to report negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h Introduces CudaPluginEpOrtVersionSupported() using negotiated runtime API version.
onnxruntime/core/providers/cuda/plugin/cuda_plugin_ep.cc Switches factory entrypoint to ApiInit(...) API negotiation with conservative error reporting.
onnxruntime/core/providers/cuda/plugin/cuda_mempool_allocator_plugin.cc Updates allocator struct version to negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Version-gates newer OrtEp callbacks (Sync/profiler/graph capture/resource accounting).
onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc Updates factory struct to report negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_data_transfer_plugin.cc Updates data transfer struct to report negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_controlflow_plugin.cc Updates controlflow helper structs to report negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_arena.h Updates arena allocator struct version to negotiated supported API version.
onnxruntime/core/providers/cuda/plugin/cuda_allocator_plugin.cc Updates allocator struct versions to negotiated supported API version.
docs/cuda_plugin_ep/QUICK_START.md Documents minimum ORT version and how to test against it.
docs/cuda_plugin_ep/cuda_plugin_ep_design.md Adds design documentation for API negotiation, audit, and defensive gating.
cmake/onnxruntime_providers_cuda_plugin.cmake Reads and bakes MIN_ONNXRUNTIME_VERSION into ORT_PLUGIN_EP_MIN_ORT_VERSION.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmake/onnxruntime_providers_cuda_plugin.cmake
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_plugin_ep.cc Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Comment thread tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_ep.cc Outdated
Comment thread include/onnxruntime/core/session/onnxruntime_ep_c_api.h Outdated
Comment thread include/onnxruntime/core/session/onnxruntime_ep_c_api.h Outdated
Comment thread tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml Outdated
## Description

The CUDA plugin EP packages
(`onnxruntime-ep-cuda12`/`onnxruntime-ep-cuda13` wheels and the
`Microsoft.ML.OnnxRuntime.EP.Cuda` NuGet package) currently declare a
hard dependency on the core
`onnxruntime` package. That is wrong for `onnxruntime-gpu` users, who
would otherwise be forced to pull in
the CPU `onnxruntime`/`Microsoft.ML.OnnxRuntime` package. This change
drops the hard dependency from both
packages, documents the core-package prerequisite in the READMEs, and
installs/pins the core ORT package
explicitly in CI — mirroring what was done for the WebGPU plugin EP in
#28384.

The minimum compatible ORT version remains the single source of truth in

[`plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION`](plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION);
it is now injected into
each package's README at build/pack time, and the native plugin EP
validates compatibility at registration
time.

## Summary of Changes

### Python wheel

| File | Change |
|------|--------|
| `plugin-ep-cuda/python/pyproject.toml.in` | Removed the `dependencies
= ["onnxruntime>=..."]` block. |
| `plugin-ep-cuda/python/build_wheel.py` | Import the shared
`gen_file_from_template` helper; render the staged package README with
the minimum ORT version; stop injecting the version into
`pyproject.toml`. |
| `plugin-ep-cuda/python/onnxruntime_ep_cuda/README.md` | Added a
Prerequisites section and an explicit `pip install
"onnxruntime>=@min_onnxruntime_version@"` step. |

### C# NuGet package

| File | Change |
|------|--------|
|
`plugin-ep-cuda/csharp/Microsoft.ML.OnnxRuntime.EP.Cuda/Microsoft.ML.OnnxRuntime.EP.Cuda.csproj`
| Removed the `OnnxRuntimeMinVersion` resolution/validation machinery
and the hard `Microsoft.ML.OnnxRuntime` `PackageReference`. |
| `plugin-ep-cuda/csharp/Microsoft.ML.OnnxRuntime.EP.Cuda/README.md` |
Added a Prerequisites section and templatized the version requirement. |
| `plugin-ep-cuda/csharp/pack_nuget.py` | Render the staged README via
the shared helper; dropped the `-p:OnnxRuntimeMinVersionFile` plumbing
from `dotnet build`/`pack`. |

### Shared / docs

| File | Change |
|------|--------|
| `plugin-ep-cuda/_packaging_utils.py` | New shared
`gen_file_from_template` helper (matches the WebGPU plugin's utility). |
| `plugin-ep-cuda/README.md` | Documented the no-hard-dependency
behavior and the C# package. |

### Tests & CI

| File | Change |
|------|--------|
| `plugin-ep-cuda/csharp/test/CudaEpNuGetTest/CudaEpNuGetTest.csproj` |
Reference the core `Microsoft.ML.OnnxRuntime` package explicitly via
`$(OrtCoreTestVersion)` (no longer transitively pulled in). |
|
`tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml`
| Dropped the now-unnecessary `--no-deps` install flag and updated the
comment. |
|
`tools/ci_build/github/azure-pipelines/stages/plugin-win-cuda-test-stage.yml`
| Set `OrtCoreTestVersion` from `MIN_ONNXRUNTIME_VERSION` and pass it to
`dotnet build`. |

## Testing

- `python -m ruff check` and `lintrunner` pass on the changed files.
- Python scripts compile (`python -m py_compile`).
- CI: the Linux/Windows CUDA plugin test stages install the core
`onnxruntime` package explicitly and verify
the plugin wheel installs without altering the pinned core version; the
NuGet test project references the
  core package explicitly and pins it to the minimum supported version.

## Motivation and Context

Follow-up to PR #28824 (CUDA plugin EP minimum ORT version /
version-gated callbacks), kept separate to keep
that PR focused on the version-gating change. Mirrors the WebGPU plugin
EP packaging change in #28384. The
CUDA case is stronger: a hard dependency on the CPU `onnxruntime`
package is incorrect for `onnxruntime-gpu`
users.

## Checklist

- [x] Tests added/updated (CI test stages and NuGet test project
updated)
- [x] Documentation updated (package and top-level READMEs)
- [x] No breaking changes (consumers must now install the core ORT
package explicitly, documented in READMEs)
- [ ] CI passes
Comment thread docs/cuda_plugin_ep/QUICK_START.md Outdated
Comment thread docs/cuda_plugin_ep/QUICK_START.md Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h Outdated
Comment thread onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h
@tianleiwu tianleiwu enabled auto-merge (squash) June 13, 2026 02:16
@tianleiwu tianleiwu merged commit 33b389a into main Jun 13, 2026
92 of 93 checks passed
@tianleiwu tianleiwu deleted the tlwu/20260605/cuda_plugin_min_ort_version branch June 13, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants