Bump Microsoft.ML.OnnxRuntime from 1.18.1 to 1.26.0 by dependabot[bot] · Pull Request #3 · uav-simulator/uavsimulator

dependabot · 2026-05-17T16:56:11Z

Updated Microsoft.ML.OnnxRuntime from 1.18.1 to 1.26.0.

Release notes

Sourced from Microsoft.ML.OnnxRuntime's releases.

1.26.0

n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

Support for CUDA 12 will be removed in 1.27.0.
- CUDA 13 will continue to be published as onnxruntime-<os>-<arch>-gpu_cuda13-<version>.<ext>
CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.

Highlights

Added optional memory mapping for .ort model loads (#28164).
Added RISC-V Vector (RVV) support for CPU EP (#28261).
OpenVINO EP upgraded for 1.26.0 development release (#28297).
WebGPU gained GridSample support (#28264) and Split-K improvements (#28151).
CUDA plugin EP gained graph support (#28002), profiling API (#28216).

Security and Reliability Hardening

Replaced unrestricted Python setattr configuration with an allowlist (#28083).
Hardened multiple OOB and overflow scenarios across ML and core ops:
- Attention mask index OOB write (#27789).
- MaxPoolGrad indices bounds validation (#27903).
- SVM and TreeEnsemble bounds/security fixes (#27950, #27951, #27952, #27989).
- RNN sequence_lens OOB read and integer overflow handling (#28052, #28003).
- GroupQueryAttention seqlens_k bounds validation and compatibility follow-up (#28031, #28259).
- MatMulBnb4 and ML coefficient SafeInt checks (#27995, #28001).
- CUDA Gather int32 overflow fix (#28108).
- GridSample float->int64 cast hardening for NaN/Inf/out-of-range coords (#28302).
Fixed session logger use-after-free during EP teardown under verbose logging (#28274).

CUDA, Attention, and MLAS

Filled CUDA opset/operator gaps and extended support:
- Transpose opset 23 -> 25 (#27740).
- QuantizeLinear/DequantizeLinear opset 25 (#28046).
- CUDA TopK INT8/INT16/UINT8 support (#27862).
- LabelEncoder CUDA support for numeric types (#28045).
Attention/GQA improvements:
- Fixed ONNX Attention min-bias alignment crash on SM<80 and masked-batch NaN behavior (#27831).
- Added FP32 QK accumulation path for unfused GQA attention (#28198).
- Added CUDART_VERSION reduction compatibility in GQA attention (#28296).
- Fixed CUDA 13 build error in GQA unfused attention (#28309).
- PagedAttention fallback for SM<80 fp16 (#28200).
MLAS updates:
- FP16 Gelu enablement (#26815).
- Arm64 BF16 fast-math conv kernels for NCHW/NCHWc paths (#27878).

WebGPU, WebNN, and JavaScript

... (truncated)

1.25.1

n.b. This changelog is LLM generated. Only the contributor listing has been verified.

ONNX Runtime Release 1.25.1

📢 Announcements & Breaking Changes

ONNX Op Updates

Enhanced ONNX operator support with new opset versions: Reshape (opset 25), Transpose (opset 24) (#27752)

✨ New Features

📊 New ONNX Ops & Model Support

LinearAttention and CausalConvState operators for Qwen3.5 model support (#27907)
RotaryEmbedding (RotEMB) and RMSNorm operators added (#27752)
Linear Attention signature support (#27842)

🌐 Web & JavaScript

WebGPU EP

Qwen3.5 model support on WebGPU execution provider (#27996)
QMoE 1-token decode path optimization — fused operations to reduce GPU dispatches for improved performance (#27998)

🐛 Bug Fixes

Core Runtime Fixes

Improved filesystem error messages during Linux device discovery for better debugging experience (#27289)
Fixed missing include for SetRawDataInTensorProto in NVIDIA TensorRT RTX tests (#28065)

🙏 Contributors

Thanks to our 7 contributors for this release:
@guschmue, @sanaa-hamel-microsoft, @apsonawane, @eserscor, @ishwar-raut1, @qjia7, @theHamsta

Full Changelog: microsoft/onnxruntime@v1.25.0...v1.25.1

1.25.0 📢 Announcements & Breaking Changes

Build & Platform

C++20 is now required to build ONNX Runtime from source. Minimum toolchains: MSVC 19.29+, GCC 10+, Clang 10+. Users of prebuilt packages are unaffected. (#27178)
CUDA minimum version raised to 12.0 — CUDA 11.x is no longer supported. Users pinned to CUDA 11.x should stay on ORT 1.24.x or upgrade their CUDA toolkit/driver. (#27570)
ONNX upgraded to 1.21.0 (#27601)
sympy is now an optional dependency for Python builds. (#27200)

Execution Provider Changes

ArmNN EP has been removed. Users should remove any --use_armnn build flags and migrate to the MLAS/KleidiAI-backed CPU EP or QNN EP for Qualcomm hardware. (#27447)

API Version

ORT_API_VERSION updated to 25. (#27280)

🔒 Security Fixes

Fixed potential integer truncation leading to heap out-of-bounds read/write (#27544)
Addressed Pad Reflect vulnerability (#27652)
Security fix for transpose optimizer (#27555)
Upgraded minimatch 3.1.2 → 3.1.4 for CVE-2026-27904 (#27667)
Hardened shell command handling for constant strings (#27840)
Added validation of onnx::TensorProto data size before allocation (#27547)
Cleaned up external data path validation (#27539)
Fixed misaligned address reads for tensor attributes from raw data buffers (#27312)
Fixed CPU Attention overflow issue (#27822)
Fixed CPU LRN integer overflow issues (#27886)
Additional input validation hardening:
- Tile kernel dim overflow (#27566)
- Out-of-bounds read in cross entropy (#27568)
- TreeEnsembleClassifier attributes (#27571)
- AffineGrid (#27572)
- EmbedLayerNorm position_ids (#27573)
- RotaryEmbedding position_ids (#27597)
- RoiAlign batch_indices (#27603)
- MaxUnpool indices (#27432)
- QMoECPU swiglu OOB (#27748)
- SVMClassifier initializer (#27699)
- Col2Im SafeInt (#27625)

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

... (truncated)

1.24.4

This is a patch release for ONNX Runtime 1.24, containing bug fixes and execution provider updates.

Bug Fixes

Core: Added PCI bus fallback for Linux GPU device discovery in containerized environments (e.g., AKS/Kubernetes) where nvidia-drm is not loaded but GPU PCI devices are still exposed via sysfs. (#27591)
Plugin EP: Fixed null pointer dereference when iterating output spans in GetOutputIndex. (#27644)
Plugin EP: Fixed bug that incorrectly assigned duplicate MetaDef IDs to fused nodes in different GraphViews (e.g., then/else branches of an If node), causing session creation to fail with a conflicting kernel error. (#27666)

Execution Provider Updates

QNN EP: Enabled offline x64 compilation with memhandle IO type by deferring rpcmem library loading to inference time. (#27479)
QNN EP: Reverted QNN SDK logging verbosity changes that caused segmentation faults on backend destruction. (#27650)

Build and Infrastructure

Python: Updated python_requires from >=3.10 to >=3.11 to reflect dropped Python 3.10 support. (#27354)
Build: Replaced __builtin_ia32_tpause with the compiler-portable _tpause intrinsic to fix cross-compiler portability issues between GCC and LLVM. (#27607)

Full Changelog: v1.24.3...v1.24.4

Contributors

@derdeljan-msft, @adrianlizarraga, @apwojcik, @baijumeswani, @edgchen1, @mocknen, @tianleiwu, @XXXXRT666

1.24.3

This is a patch release for ONNX Runtime 1.24, containing bug fixes, security improvements, performance enhancements, and execution provider updates.

Security Fixes

Core: Fixed GatherCopyData integer truncation leading to heap out-of-bounds read/write. (#27444)
Core: Fixed RoiAlign heap out-of-bounds read via unchecked batch_indices. (#27543)
Core: Prevent heap OOB from maliciously crafted Lora Adapters. (#27518)
Core: Fixed out-of-bounds access for Resize operation. (#27419)

Bug Fixes

Core: Fixed GatherND division by zero when batch dimensions mismatch. (#27090)
Core: Fixed validation for external data paths for models loaded from bytes. (#27430)
Core: Fixed SkipLayerNorm fusion incorrectly applied when gamma/beta are not 1D. (#27459)
Core: Fixed double-free in TRT EP custom op domain Release functions. (#27471)
Core: Fixed QMoE CPU Operator. (#27360)
Core: Fixed MatmulNBits prepacking scales. (#27412)
Python: Fixed refcount bug in map input conversion that caused shutdown segfault. (#27413)
NuGet: Fixed DllImportResolver. (#27397)
NuGet: Added OrtEnv.DisableDllImportResolver to prevent fatal error on resolver conflict. (#27535)

Performance Improvements

Core: QMoE CPU performance update (up to 4x on 4-bit). (#27364)
Core: Fixed O(n²) model load time for TreeEnsemble with categorical feature chains. (#27391)

Execution Provider Updates

NvTensorRtRtx EP:
- Avoid repetitive creation of fp4/fp8 native-custom-op domains. (#27192)
- Added missing override specifiers to suppress warnings. (#27288)
- DQ→MatMulNBits fusion transformer. (#27466)
WebGPU:
- Used embedded WASM module in Blob URL workers when wasmBinary is provided. (#27318)
- Fixed usage of wasmBinary together with a blob URL for .mjs. (#27411)
- Removed the unhelpful "Unknown CPU vendor" warning. (#27399)
- Allows new memory info name for WebGPU. (#27475)
MLAS:
- Added DynamicQGemm function pointers and ukernel interface. (#27403)
- Fixed error where bytes is not assigned for dynamic qgemm pack b size. (#27421)
VitisAI EP: Removed s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep(). (#27295)
Plugin EPs: Added "library_path" metadata entry to OrtEpDevice instances for plugin and provider bridge EPs. (#27522)

Build and Infrastructure

Pipelines:
- Build Windows ARM64X binaries as part of packaging pipeline. (#27316)
- Moved JAR testing pipelines to canonical pipeline template. (#27480)
Python: Enabled Python 3.14 CI and upgraded dependencies. (#27401)
Build: Suppressed spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds. (#27454)
Build: Fixed -Warray-bounds build error in MLAS on clang 17+. (#27499)
Telemetry: Added/Updated telemetry events. (#27356)
Config: Increased kMaxValueLength to 8192. (#27521)

... (truncated)

1.24.2

This is a patch release for ONNX Runtime 1.24, containing several bug fixes, security improvements, and execution provider updates.

Bug Fixes

NuGet: Fixed native library loading issues in the ONNX Runtime NuGet package on Linux and macOS. (#27266)
macOS: Fixed Java support and Jar testing on macOS ARM64. (#27271)
Core: Enable Robust Symlink Support for External Data for Huggingface Hub Cache. (#27374)
Core: Added boundary checks for SparseTensorProtoToDenseTensorProto to improve robustness. (#27323)
Security: Fixed an out-of-bounds read vulnerability in ArrayFeatureExtractor. (#27275)

Execution Provider Updates

MLAS: Fixed flakiness and accuracy issues in Lut GEMM (MatMulNBitsLutGemm). (#27216)
QNN: Enabled 64-bit UDMA mode for HTP target v81 or above. (#26677)
WebGPU:
- Used LazyRelease for prepack allocator. (#27077)
- Fixed ConvTranspose bias validation in both TypeScript and C++ implementations. (#27213)
OpenVINO (OVEP): Patch to reduce resident memory by reusing weight files across shared contexts. (#27238)
DNNL: Fixed DNNL build error by including missing files. (#27334)

Build and Infrastructure

CUDA:
- Added support for CUDA architecture family codes (suffix 'f') introduced in CUDA 12.9. (#27278)
- Fixed build errors and warnings for various CUDA versions (12.8, 13.0, 13.1.1). (#27276)
- Applied patches for Abseil CUDA warnings. (#27096, #27126)
Pipelines:
- Fixed Python packaging pipeline for Windows ARM64 and release. (#27339, #27350, #27299)
- Fixed DirectML NuGet pipeline to correctly bundle x64 and ARM64 binaries for release. (#27349)
- Updated Microsoft.ML.OnnxRuntime.Foundry package for Windows ARM64 support and NuGet signing. (#27294)
Testing: Updated BaseTester to support plugin EPs with both compiled nodes and registered kernels. (#27176)
Telemetry: Added service name and framework name to telemetry events for better usage understanding on Windows. (#27252, #27256)

Full Changelog: v1.24.1...v1.24.2

Contributors

@tianleiwu, @hariharans29, @edgchen1, @xiaofeihan1, @adrianlizarraga, @angelser, @angelserMS, @ankitm3k, @baijumeswani, @bmehta001, @ericcraw, @eserscor, @fs-eire, @guschmue, @mc-nv, @qjia7, @qti-monumeen, @titaiwangms, @yuslepukhin

1.24.1 📢 Announcements & Breaking Changes

Platform Support Changes

Python 3.10 wheels are no longer published — Please upgrade to Python 3.11+
Python 3.14 support added
Free-threaded Python (PEP 703) — Added support for Python 3.13t and 3.14t in Linux (#26786)
x86_64 binaries for macOS/iOS are no longer provided and minimum macOS is raised to 14.0

API Version

ORT_API_VERSION updated to 24 (#26418)

✨ New Features

🤖 Execution Provider (EP) Plugin API

A major infrastructure enhancement enabling plugin-based EPs with dynamic loading:

Initial kernel-based EP support (#26206)
Weight pre-packing support for plugin EPs (#26754)
EP Context model support (#25124)
Control flow kernel APIs (#26927)
OrtKernelInfo APIs for kernel-based plugin EPs (#26803)

🔧 Core APIs

OrtApi::CreateEnvWithOptions() and OrtEpApi::GetEnvConfigEntries() (#26971)
EP Device Compatibility APIs (#26922)
External Resource Importer API for D3D12 shared resources (#26828)
Session config access from KernelInfo (#26589)

📊 Dependencies & Integration

ONNX upgraded to 1.20.1 (#26579)
Protobuf updated from 3.20.3 → 4.25.8 (#26910)
CUDA Graph enabled by default (#26929)

🖥️ Execution Provider Updates

NVIDIA

CUDA EP: Flash Attention updates, GQA kernel fusion, BF16 support for MoE/qMoE/MatMulNBits, CUDA 13.0 support
TensorRT EP: Upgraded to TensorRT 10.14, automatic plugin loading, NVFP4 custom ops
TensorRT RTX EP: RTX runtime caching, CUDA graph support, BFloat16, memory-mapped engines

Qualcomm QNN EP

QNN SDK upgraded to 2.42.0 with new ops (RMSNorm, ScatterElements, GatherND, STFT, RandomUniformLike)
Gelu pattern fusion, LPBQ quantization support, ARM64 wheel builds, v81 device support

Intel & AMD

OpenVINO EP: Upgraded to 2025.4.1
VitisAI EP: External EP loader, compiled model compatibility API
... (truncated)

1.23.2

1.23.1 What's Changed

Fix Attention GQA implementation on CPU (#25966)
Address edge GetMemInfo edge cases (#26021)
Implement new Python APIs (#25999)
MemcpyFromHost and MemcpyToHost support for plugin EPs (#26088)
[TRT RTX EP] Fix bug for generating the correct subgraph in GetCapability (#26132)
add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart (#25590)
[build] fix WebAssembly build on macOS/arm64 (#25653)
[CPU] MoE Kernel (#25958)
[CPU] Block-wise QMoE kernel for CPU (#26009)
[C#] Implement missing APIs (#26101)
Regenerate test model with ONNX IR < 12 (#26149)
[CPU] Fix compilation errors because of unused variables (#26147)
[EP ABI] Check if nodes specified in GetCapability() have already been assigned (#26156)
[QNN EP] Add dynamic option to set HTP performance mode (#26135)

Full Changelog: microsoft/onnxruntime@v1.23.0...v1.23.1

1.23.0

Announcements

This release introduces Execution Provider (EP) Plugin API, which is a new infrastructure for building plugin-based EPs. (#24887 , #25137, #25124, #25147, #25127, #25159, #25191, #2524)
This release introduces the ability to dynamically download and install execution providers. This feature is exclusively available in the WinML build and requires Windows 11 version 25H2 or later. To leverage this new capability, C/C++/C# users should use the builds distributed through the Windows App SDK, and Python users should install the onnxruntime-winml package(will be published soon). We encourage users who can upgrade to the latest Windows 11 to utilize the WinML build to take advantage of this enhancement.

Upcoming Changes

The next release will stop providing x86_64 binaries for macOS and iOS operating systems.
The next release will increase the minimum supported macOS version from 13.4 to 14.0.
The next release will stop providing python 3.10 wheels.

Execution & Core Optimizations

Shutdown logic on Windows is simplified

Now on Windows some global object will be not destroyed if we detect that the process is being shutting down(#24891) . It will not cause memory leak as when a process ends all the memory will be returned to the operating system. This change can reduce the chance of having crashes on process exit.

AutoEP/Device Management

Now ONNX Runtime has the ability to automatically discovery computing devices and select the best EPs to download and register. The EP downloading feature currently only works on Windows 11 version 25H2 or later.

Execution Provider (EP) Updates

ROCM EP was removed from the source tree. Users are recommended to use Migraphx or Vitis AI EPs from AMD.
A new EP, Nvidia TensorRT RTX, was added.

Web

EMDSK is upgraded from 4.0.4 to 4.0.8

WebGPU EP

Added WGSL template support.

QNN EP

SDK Update: Added support for QNN SDK 2.37.

KleidiAI

Enhanced performance for SGEMM, IGEMM, and Dynamic Quantized MatMul operations, especially for Conv2D operators on hardware that supports SME2 (Scalable Matrix Extension v2).

Known Problems

There was a change in build.py that was related to KleidiAI that may cause build failures when doing cross-compiling (#26175) .

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:

@1duo, @Akupadhye, @amarin16, @AndreyOrb, @ankan-ban, @ankitm3k, @anujj, @aparmp-quic, @arnej27959, @bachelor-dou, @benjamin-hodgson, @Bonoy0328, @chenweng-quic, @chuteng-quic, @clementperon, @co63oc, @daijh, @damdoo01-arm, @danyue333, @fanchenkong1, @gedoensmax, @genarks, @gnedanur, @Honry, @huaychou, @ianfhunter, @ishwar-raut1, @jing-bao, @joeyearsley, @johnpaultaken, @jordanozang, @JulienMaille, @keshavv27, @kevinch-nv, @khoover, @krahenbuhl, @kuanyul-quic, @mauriciocm9, @mc-nv, @minfhong-quic, @mingyueliuh, @MQ-mengqing, @NingW101, @notken12, @omarhass47, @peishenyan, @pkubaj, @qc-tbhardwa, @qti-jkilpatrick, @qti-yuduo, @quic-ankus, @quic-ashigarg, @quic-ashwshan, @quic-calvnguy, @quic-hungjuiw, @quic-tirupath, @qwu16, @ranjitshs, @saurabhkale17, @schuermans-slx, @sfatimar, @stefantalpalaru, @sunnyshu-intel, @TedThemistokleous, @thevishalagarwal, @toothache, @umangb-09, @vatlark, @VishalX, @wcy123, @xhcao, @xuke537, @zhaoxul-qti

1.22.2

What's new?

This release adds an optimized CPU/MLAS implementation of DequantizeLinear (8 bit) and introduces the build option client_package_build, which enables default options that are more appropriate for client/on-device workloads (e.g., disable thread spinning by default).

Build System & Packages

Add –client_package_build option (#25351) - @jywu-msft
Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (#25552) - @snnn

CPU EP

Add multithreaded/vectorized implementation of DequantizeLinear for int8 and uint8 inputs (SSE2, NEON) (#24818) - @adrianlizarraga

QNN EP

Add support for the Upsample, Einsum, LSTM, and CumSum operators (#24265, #24616, #24646, #24820) - @quic-zhaoxul, @1duo, @chenweng-quic, @Akupadhye
Fuse scale into Softmax (#24809) - @qti-yuduo
Enable DSP queue polling when performance is set to “burst” mode (#25361) - @quic-calvnguy
Update QNN SDK to version 2.36.1 (#25388) - @qti-jkilpatrick
Include the license file from QNN SDK in the Microsoft.ML.OnnxRunitme.QNN NuGet package (#25158) - @HectorSVC

1.22.1

What's new?

This release replaces static linking of dxcore.lib with optional runtime loading, lowering the minimum supported version from Windows 10 22H2 (10.0.22621) to 20H1 (10.0.19041). This enables compatibility with Windows Server 2019 (10.0.17763), where dxcore.dll may be absent.

change dependency from gitlab eigen to github eigen-mirror #24884 - @prathikr
Weaken dxcore dependency #24845 - @skottmckay
[DML] Restore compatibility with Windows Sdk 10.0.17134.0 #24950 - @JulienMaille
Disable VCPKG's binary cache #24889 - @snnn

1.22 Announcements

This release introduces new API's for Model Editor, Auto EP infrastructure, and AOT Compile
OnnxRuntime GPU packages require CUDA 12.x , packages built for CUDA 11.x are no longer published.
The min supported Windows version is now 10.0.19041.

GenAI & Advanced Model Features

Constrained Decoding: Introduced new capabilities for constrained decoding, offering more control over generative AI model outputs.

Execution & Core Optimizations

Core

Auto EP Selection Infrastructure: Added foundational infrastructure to enable automatic selection of Execution Providers via selection policies, aiming to simplify configuration and optimize performance. (Pull Request #24430)
Compile API: Introduced new APIs to support explicit compilation of ONNX models.
- See: OrtCompileApi Struct Reference (Assuming a similar link structure for future documentation)
- See: EP Context Design (Assuming a similar link structure for future documentation)
Model Editor API api's for creating or editing ONNX models
- See: OrtModelEditorApi

Execution Provider (EP) Updates

CPU EP/MLAS

KleidiAI Integration: Integrated KleidiAI into ONNX Runtime/MLAS for enhanced performance on Arm architectures.
MatMulNBits Support: Added support for MatMulNBits, enabling matrix multiplication with weights quantized to 8 bits.
GroupQueryAttention optimizations and enhancements

OpenVINO EP

Added support up to OpenVINO 2025.1
Introduced Intel compiler level optimizations for QDQ models.
Added support to select Intel devices based on LUID
Load_config feature improvement to support AUTO, HETERO and MULTI plugin.
misc bugfixes/optimizations
For detailed updates, refer to Pull Request #24394: ONNXRuntime OpenVINO - Release 1.22

QNN EP

SDK Update: Added support for QNN SDK 2.33.2.
operator updates/support to Sum, Softmax, Upsample, Expand, ScatterND, Einsum
QNN EP can be built as shared or static library.
enable QnnGpu backend
For detailed updates refer to recent QNN tagged PR's

TensorRT EP

TensorRT Version: Added support for TensorRT 10.9.
- Note for onnx-tensorrt open-source parser users: Please check here for specific requirements (Referencing 1.21 link as a placeholder, this should be updated for 1.22).
New Features:
- EP option to enable TRT Preview Feature
- Support to load TensorRT V3 plugin
Bug Fixes:
- Resolved an issue related to multithreading scenarios.
  ... (truncated)

1.21.1

What's new?

Extend CMAKE_CUDA_FLAGS with all Blackwell compute capacity #23928 - @yf711
[ARM CPU] Fix fp16 const initialization on no-fp16 platform #23978 - @fajin-corp
[TensorRT EP] Call cudaSetDevice at compute function for handling multithreading scenario #24010 - @chilo-ms
Fix attention bias broadcast #24017 - @tianleiwu
Deleted the constant SKIP_CUDA_TEST_WITH_DML #24113 - @CodingSeaotter
[QNN EP] ARM64EC python package remove --vcpkg in build #24174 - @jywu-msft
[wasm] remove --vcpkg in wasm build #24179 - @fs-eire

1.21.0 Announcements

No large announcements of note this release! We've made a lot of small refinements to streamline your ONNX Runtime experience.

GenAI & Advanced Model Features

Enhanced Decoding & Pipeline Support

Added "chat mode" support for CPU, GPU, and WebGPU.
Provided support for decoder model pipelines.
Added support for Java API for MultiLoRA.

API & Compatibility Updates

Chat mode introduced breaking changes in the API (see migration guide).

Bug Fixes for Model Output

Fixed Phi series garbage output issues with long prompts.
Resolved gibberish issues with top_k on CPU.

Execution & Core Optimizations

Core Refinements

Reduced default logger usage for improved efficiency(#23030).
Fixed a visibility issue in theadpool (#23098).

Execution Provider (EP) Updates

General

Removed TVM EP from the source tree(#22827).
Marked NNAPI EP for deprecation (following Google's deprecation of NNAPI).
Fixed a DLL delay loading issue that impacts WebGPU EP and DirectML EP's usability on Windows (#23111, #23227)

TensorRT EP Improvements

Added support for TensorRT 10.8.
- onnx-tensorrt open-source parser user: please check here for requirement.
Assigned DDS ops (NMS, RoiAlign, NonZero) to TensorRT by default.
Introduced option trt_op_types_to_exclude to exclude specific ops from TensorRT assignment.

CUDA EP Improvements

Added a python API preload_dlls to coexist with PyTorch.
Miscellaneous enhancements for Flux model inference.

QNN EP Improvements

Introduced QNN shared memory support.
Improved performance for AI Hub models.
Added support for QAIRT/QNN SDK 2.31.
Added Python 3.13 package.
Miscellaneous bug fixes and enhancements.
QNN EP is now built as a shared library/DLL by default. To retain previous build behavior, use build option --use_qnn static_lib.

DirectML EP Support & Upgrades

Updated DirectML version from 1.15.2 to 1.15.4(#22635).

... (truncated)

1.20.2

What's new?

Build System & Packages

Merge Windows machine pools for Web CI pipeline to reduce maintenance costs (#23243) - @snnn
Update boost URL for React Native CI pipeline (#23281) - @jchen351
Move ORT Training pipeline to GitHub actions and enable CodeQL scan for the source code (#22543) - @snnn
Move Linux GitHub actions to a dedicated machine pool (#22566) - @snnn
Update Apple deployment target to iOS 15.1 and macOS 13.3 (#23308) - @snnn
Deprecate macOS 12 in packaging pipeline (#23017) - @mszhanyi
Remove net8.0-android MAUI target from MAUI test project (#23607) - @carzh

CUDA EP

Fixes use of numeric_limits that causes a compiler error in Visual Studio 2022 v17.12 Preview 5 (#22738, #22868) - @tianleiwu

QNN EP

Enable offloading graph input quantization and graph output dequantization to CPU by default. Improves inference latency by reducing the amount of I/O data copied between CPU and NPU. (#23368) - @adrianlizarraga

1.20.1

What's new?

Python Quantization Tool

Prevent int32 quantized bias from clipping by adjusting the weight's scale (#22020) - @adrianlizarraga
Update QDQ Pad, Slice, Softmax (#22676) - @adrianlizarraga
Introduce get_qdq_config() helper to get QDQ configurations (#22677) - @adrianlizarraga
Add reduce_range option to get_qdq_config() (#22782) - @adrianlizarraga
Flaky test due to Pad reflect bug (#22798) - @adrianlizarraga

CPU EP

Refactor SkipLayerNorm implementation to address issues (#22719, #22862) - @amarin16, @liqunfu

QNN EP

Add QNN SDK v2.28.2 support (#22724, #22844) - @HectorSVC, @adrianlizarraga

TensorRT EP

Exclude DDS ops from running on TRT (#22875) - @chilo-ms

Packaging

Rework the native library usage so that a pre-built ORT native package can be easily used (#22345) - @skottmckay
Fix Maven Sha256 Checksum Issue (#22600) - @idiskyle

Contributions

Big thank you to the release manager @yf711, along with @adrianlizarraga, @HectorSVC, @jywu-msft, and everyone else who helped to make this patch release process a smooth one!

1.20.0

Release Manager: @apsonawane

Announcements

All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published.
ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.
ONNX Runtime 1.20 CUDA packages will include new dependencies that were not required in 1.19 packages. The following dependencies are new: libcudnn_adv.so.9, libcudnn_cnn.so.9, libcudnn_engines_precompiled.so.9, libcudnn_engines_runtime_compiled.so.9, libcudnn_graph.so.9, libcudnn_heuristic.so.9, libcudnn_ops.so.9, libnvrtc.so.12, and libz.so.1.

Build System & Packages

Python 3.13 support is included in PyPI packages.
ONNX 1.17 support will be delayed until a future release, but the ONNX version used by ONNX Runtime has been patched to include a shape inference change to the Einsum op.
DLLs in the Maven build are now digitally signed (fix for issue reported here).
(Experimental) vcpkg support added for the CPU EP. The DML EP does not yet support vcpkg, and other EPs have not been tested.

Core

MultiLoRA support.
Reduced memory utilization.
- Fixed alignment that was causing mmap to fail for external weights.
- Eliminated double allocations when deserializing external weights.
- Added ability to serialize pre-packed weights so that they don’t cause an increase in memory utilization when the model is loaded.
Support bfloat16 and float8 data types in python I/O binding API.

Performance

INT4 quantized embedding support on CPU and CUDA EPs.
Miscellaneous performance improvements and bug fixes.

EPs

CPU

FP16 support for MatMulNbits, Clip, and LayerNormalization ops.

CUDA

Cudnn frontend integration for convolution operators.
Added support of cuDNN Flash Attention and Lean Attention in MultiHeadAttention op.

TensorRT

TensorRT 10.4 and 10.5 support.

QNN

QNN HTP support for weight sharing across multiple ORT inference sessions. (See ORT QNN EP documentation for more information.)
Support for QNN SDK 2.27.

OpenVINO

Added support up to OpenVINO 2024.4.1.
Compile-time memory optimizations.
Enhancement of ORT EPContext Session option for optimized first inference latency.
Added remote tensors to ensure direct memory access for inferencing on NPU.

DirectML

DirectML 1.15.2 support.

... (truncated)

1.19.2 Announcements

ORT 1.19.2 is a small patch release, fixing some broken workflows and introducing bug fixes.

Build System & Packages

Fixed the signing of native DLLs.
Disabled absl symbolize in Windows Release build to avoid dependency on dbghelp.dll.

Training

Restored support for CUDA compute capability 7.0 and 7.5 with CUDA 12, and 6.0 and 6.1 with CUDA 11.
Several fixes for training CI pipelines.

Mobile

Fixed ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access for CoreML EP.

Generative AI

Added CUDA kernel for Phi3 MoE.
Added smooth softmax support in CUDA and CPU kernels for the GroupQueryAttention operator.
Fixed number of splits calculations in GroupQueryAttention CUDA operator.
Enabled causal support in the MultiHeadAttention CUDA operator.

Contributors

@prathikr, @mszhanyi, @edgchen1, @tianleiwu, @wangyems, @aciddelgado, @mindest, @snnn, @baijumeswani, @MaanavD

Thanks to everyone who helped ship this release smoothly!

Full Changelog: microsoft/onnxruntime@v1.19.0...v1.19.2

1.19.0 Announcements

Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: microsoft/onnxruntime@26250ae. This shouldn't effect much, but sorry for the inconvenience!

Build System & Packages

Numpy support for 2.x has been added
Qualcomm SDK has been upgraded to 2.25
ONNX has been upgraded from 1.16 → 1.16.1
Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
TensorRT 10.2 support added
Introduced Java CUDA 12 packages on Maven.
Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

Implemented DeformConv
Fixed big-endian and support build on AIX

Performance

Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs

Execution Providers

TensorRT
- Updated to support TensorRT 10.2
- Remove calls to deprecated api’s
- Enable refittable embedded engine when ONNX model provided as byte stream
CUDA
- Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
- Updated MultiHeadAttention and Attention operators to be thread-safe.
- Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
- Expanded op support - Tile (bf16)
CPU
- Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
QNN
- Updated to support QNN SDK 2.25
- Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
- Expanded fusion support – Conv + Clip/Relu fusion
OpenVINO
- Added support for OpenVINO 2024.3
- Support for enabling EpContext using session options
DirectML
- Updated DirectML from 1.14.1 → 1.15.1
- Updated ONNX opset from 17 → 20
  ... (truncated)

Commits viewable in compare view.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

--- updated-dependencies: - dependency-name: Microsoft.ML.OnnxRuntime dependency-version: 1.26.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-17T16:56:12Z

Labels

The following labels could not be found: deps, dotnet. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

Bump Microsoft.ML.OnnxRuntime from 1.18.1 to 1.26.0

af8d182

--- updated-dependencies: - dependency-name: Microsoft.ML.OnnxRuntime dependency-version: 1.26.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot requested a review from NMGorovenko as a code owner May 17, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump Microsoft.ML.OnnxRuntime from 1.18.1 to 1.26.0#3

Bump Microsoft.ML.OnnxRuntime from 1.18.1 to 1.26.0#3
dependabot[bot] wants to merge 1 commit into
developfrom
dependabot/nuget/src/ks0223-web-mac/backend/Microsoft.ML.OnnxRuntime-1.26.0

dependabot Bot commented on behalf of github May 17, 2026

Uh oh!

dependabot Bot commented on behalf of github May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 17, 2026

1.26.0

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

Highlights

Security and Reliability Hardening

CUDA, Attention, and MLAS

WebGPU, WebNN, and JavaScript

1.25.1

ONNX Runtime Release 1.25.1

📢 Announcements & Breaking Changes

ONNX Op Updates

✨ New Features

📊 New ONNX Ops & Model Support

🌐 Web & JavaScript

WebGPU EP

🐛 Bug Fixes

Core Runtime Fixes

🙏 Contributors

1.25.0

📢 Announcements & Breaking Changes

Build & Platform

Execution Provider Changes

API Version

🔒 Security Fixes

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

1.24.4

Bug Fixes

Execution Provider Updates

Build and Infrastructure

Contributors

1.24.3

Security Fixes

Bug Fixes

Performance Improvements

Execution Provider Updates

Build and Infrastructure

1.24.2

Bug Fixes

Execution Provider Updates

Build and Infrastructure

Contributors

1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

API Version

✨ New Features

🤖 Execution Provider (EP) Plugin API

🔧 Core APIs

📊 Dependencies & Integration

🖥️ Execution Provider Updates

NVIDIA

Qualcomm QNN EP

Intel & AMD

1.23.2

1.23.1

What's Changed

1.23.0

Announcements

Upcoming Changes

Execution & Core Optimizations

Shutdown logic on Windows is simplified

AutoEP/Device Management

Execution Provider (EP) Updates

Web

WebGPU EP

QNN EP

KleidiAI

Known Problems

Contributions

1.22.2

What's new?

Build System & Packages

CPU EP

QNN EP

1.22.1

What's new?

1.22