Lower CUDA plugin EP minimum ORT version to 1.24.4 with version-gated callbacks#28824
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR lowers the minimum supported ONNX Runtime version for the standalone CUDA plugin Execution Provider (EP) from 1.26.0 to 1.24.4 by negotiating the runtime API version at load time and version-gating newer EP callbacks so the same plugin binary can load into older ORT runtimes safely.
Changes:
- Lowered the declared minimum supported ORT runtime version to 1.24.4 and used it as build-time input for the plugin.
- Updated CUDA plugin EP initialization to use
onnxruntime::ep::ApiInit(...)for runtime API negotiation, and updated callback structs to reportmin(runtime_api_version, ORT_API_VERSION). - Added CI coverage to run the plugin tests against the declared minimum ORT version, and documented the compatibility model and local test recipe.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml | Adds a CI step to install the floor ORT version and run plugin tests against it. |
| plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION | Lowers plugin floor runtime version from 1.26.0 to 1.24.4. |
| onnxruntime/core/providers/cuda/plugin/cuda_stream_plugin.cc | Updates stream/notification structs to report negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_profiler_plugin.cc | Updates profiler struct to report negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_plugin_utils.h | Introduces CudaPluginEpOrtVersionSupported() using negotiated runtime API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_plugin_ep.cc | Switches factory entrypoint to ApiInit(...) API negotiation with conservative error reporting. |
| onnxruntime/core/providers/cuda/plugin/cuda_mempool_allocator_plugin.cc | Updates allocator struct version to negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_ep.cc | Version-gates newer OrtEp callbacks (Sync/profiler/graph capture/resource accounting). |
| onnxruntime/core/providers/cuda/plugin/cuda_ep_factory.cc | Updates factory struct to report negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_data_transfer_plugin.cc | Updates data transfer struct to report negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_controlflow_plugin.cc | Updates controlflow helper structs to report negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_arena.h | Updates arena allocator struct version to negotiated supported API version. |
| onnxruntime/core/providers/cuda/plugin/cuda_allocator_plugin.cc | Updates allocator struct versions to negotiated supported API version. |
| docs/cuda_plugin_ep/QUICK_START.md | Documents minimum ORT version and how to test against it. |
| docs/cuda_plugin_ep/cuda_plugin_ep_design.md | Adds design documentation for API negotiation, audit, and defensive gating. |
| cmake/onnxruntime_providers_cuda_plugin.cmake | Reads and bakes MIN_ONNXRUNTIME_VERSION into ORT_PLUGIN_EP_MIN_ORT_VERSION. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
edgchen1
reviewed
Jun 9, 2026
… Sync gating and ort_version_supported docs
edgchen1
reviewed
Jun 10, 2026
4 tasks
## Description The CUDA plugin EP packages (`onnxruntime-ep-cuda12`/`onnxruntime-ep-cuda13` wheels and the `Microsoft.ML.OnnxRuntime.EP.Cuda` NuGet package) currently declare a hard dependency on the core `onnxruntime` package. That is wrong for `onnxruntime-gpu` users, who would otherwise be forced to pull in the CPU `onnxruntime`/`Microsoft.ML.OnnxRuntime` package. This change drops the hard dependency from both packages, documents the core-package prerequisite in the READMEs, and installs/pins the core ORT package explicitly in CI — mirroring what was done for the WebGPU plugin EP in #28384. The minimum compatible ORT version remains the single source of truth in [`plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION`](plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION); it is now injected into each package's README at build/pack time, and the native plugin EP validates compatibility at registration time. ## Summary of Changes ### Python wheel | File | Change | |------|--------| | `plugin-ep-cuda/python/pyproject.toml.in` | Removed the `dependencies = ["onnxruntime>=..."]` block. | | `plugin-ep-cuda/python/build_wheel.py` | Import the shared `gen_file_from_template` helper; render the staged package README with the minimum ORT version; stop injecting the version into `pyproject.toml`. | | `plugin-ep-cuda/python/onnxruntime_ep_cuda/README.md` | Added a Prerequisites section and an explicit `pip install "onnxruntime>=@min_onnxruntime_version@"` step. | ### C# NuGet package | File | Change | |------|--------| | `plugin-ep-cuda/csharp/Microsoft.ML.OnnxRuntime.EP.Cuda/Microsoft.ML.OnnxRuntime.EP.Cuda.csproj` | Removed the `OnnxRuntimeMinVersion` resolution/validation machinery and the hard `Microsoft.ML.OnnxRuntime` `PackageReference`. | | `plugin-ep-cuda/csharp/Microsoft.ML.OnnxRuntime.EP.Cuda/README.md` | Added a Prerequisites section and templatized the version requirement. | | `plugin-ep-cuda/csharp/pack_nuget.py` | Render the staged README via the shared helper; dropped the `-p:OnnxRuntimeMinVersionFile` plumbing from `dotnet build`/`pack`. | ### Shared / docs | File | Change | |------|--------| | `plugin-ep-cuda/_packaging_utils.py` | New shared `gen_file_from_template` helper (matches the WebGPU plugin's utility). | | `plugin-ep-cuda/README.md` | Documented the no-hard-dependency behavior and the C# package. | ### Tests & CI | File | Change | |------|--------| | `plugin-ep-cuda/csharp/test/CudaEpNuGetTest/CudaEpNuGetTest.csproj` | Reference the core `Microsoft.ML.OnnxRuntime` package explicitly via `$(OrtCoreTestVersion)` (no longer transitively pulled in). | | `tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-test-stage.yml` | Dropped the now-unnecessary `--no-deps` install flag and updated the comment. | | `tools/ci_build/github/azure-pipelines/stages/plugin-win-cuda-test-stage.yml` | Set `OrtCoreTestVersion` from `MIN_ONNXRUNTIME_VERSION` and pass it to `dotnet build`. | ## Testing - `python -m ruff check` and `lintrunner` pass on the changed files. - Python scripts compile (`python -m py_compile`). - CI: the Linux/Windows CUDA plugin test stages install the core `onnxruntime` package explicitly and verify the plugin wheel installs without altering the pinned core version; the NuGet test project references the core package explicitly and pins it to the minimum supported version. ## Motivation and Context Follow-up to PR #28824 (CUDA plugin EP minimum ORT version / version-gated callbacks), kept separate to keep that PR focused on the version-gating change. Mirrors the WebGPU plugin EP packaging change in #28384. The CUDA case is stronger: a hard dependency on the CPU `onnxruntime` package is incorrect for `onnxruntime-gpu` users. ## Checklist - [x] Tests added/updated (CI test stages and NuGet test project updated) - [x] Documentation updated (package and top-level READMEs) - [x] No breaking changes (consumers must now install the core ORT package explicitly, documented in READMEs) - [ ] CI passes
…ugin_min_ort_version
edgchen1
reviewed
Jun 12, 2026
edgchen1
reviewed
Jun 13, 2026
edgchen1
approved these changes
Jun 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Lowers the minimum supported ONNX Runtime runtime version for the standalone CUDA plugin EP from 1.26.0 to 1.24.4, so the plugin binary (built against the latest ORT headers) can be loaded by older ORT runtimes. The plugin negotiates the API version at load time and only advertises EP callbacks the negotiated runtime actually supports, so newer features degrade gracefully on older runtimes instead of crashing.
Motivation
The plugin is shipped as a separate package and is intended to run against a range of base
onnxruntimeruntimes. The previous hard floor of 1.26.0 was stricter than necessary: an audit of the\sinceannotations shows the plugin only calls APIs introduced in 1.24 or earlier (apart from the optional EP profiler, which is now version-gated). 1.24.4 is also the floor already used by the WebGPU plugin EP, so this aligns the two.Key Changes
plugin-ep-cuda/MIN_ONNXRUNTIME_VERSION1.26.0→1.24.4(single source of truth for the floor)cmake/onnxruntime_providers_cuda_plugin.cmakeMIN_ONNXRUNTIME_VERSIONand bakes it into the DLL as theORT_PLUGIN_EP_MIN_ORT_VERSIONcompile definitioncuda_plugin_ep.ccCreateEpFactories()negotiates the runtime API version viaonnxruntime::ep::ApiInit(...)instead of hard-codingGetApi(26)cuda_plugin_utils.hCudaPluginEpOrtVersionSupported() = min(CurrentOrtApiVersion(), ORT_API_VERSION); removes the hard-coded min-version constantort_version_supported/version=CudaPluginEpOrtVersionSupported()cuda_ep.ccOrtEpcallback only when the negotiated runtime is new enough —Sync/CreateProfilerrequire ≥1.25, graph-capture set +GetAvailableResourcerequire ≥1.26; otherwise left nullplugin-linux-cuda-test-stage.ymlMIN_ONNXRUNTIME_VERSION) baseonnxruntimeand runs the plugin test against it, catching any accidental dependency on a newer APIAPI Version Audit
\sinceusedOrtApidirect callsOrtEpApidirect callsENABLE_CUDA_PROFILING)Apart from the optional EP profiler, every API the plugin calls is
\since 1.24or older, justifying the 1.24.4 floor. The profiler's three\since 1.25functions are made unreachable on older runtimes by gating theCreateProfilercallback.Testing Notes
.sorelinked.test_cuda_plugin_ep.pyagainst the latest runtime (1.28): 87/87 tests pass.onnxruntime==1.24.4: registers, enumerates all 8 GPUs, and runs inference correctly with the newer callbacks left null.lintrunnerclean on changed files.