Update optimizer opset version checks for latest ONNX opset 26#28966
Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands ONNX Runtime optimizer pattern matching and unit tests to recognize newer ONNX operator schema versions (opset 23–25), aiming to keep attention fusions and reshape fusion behavior compatible with opset 25 models.
Changes:
- Broadened supported operator-version allowlists in optimizer fusions (e.g., Transpose/Reshape/Squeeze/Unsqueeze/Shape) to include newer schema versions up to opset 25.
- Added opset 25 coverage for MobileCLIP attention fusion and GroupQueryAttentionPreNorm fusion unit tests.
- Extended
ReshapeFusionOpsetTestto iterate additional opsets (19/21/23/24/25).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/optimizer/attention_fusion.cc | Updates MobileCLIP attention fusion pattern version checks for newer ONNX schemas. |
| onnxruntime/core/optimizer/attention_fusion_helper.h | Extends supported Transpose versions in GPT attention helper logic. |
| onnxruntime/core/optimizer/group_query_attention_pre_norm_fusion.cc | Expands supported Reshape versions in the GQA pre-norm fusion matcher. |
| onnxruntime/core/optimizer/reshape_fusion.cc | Updates Shape/Unsqueeze schema version handling in reshape fusion logic. |
| onnxruntime/test/optimizer/graph_transform_test.cc | Adds opset coverage (incl. 25) for attention and reshape fusion tests. |
| onnxruntime/test/optimizer/group_query_attention_pre_norm_fusion_test.cc | Adds opset 25 test for Qwen GQA pre-norm fusion. |
Comments suppressed due to low confidence (1)
onnxruntime/test/optimizer/graph_transform_test.cc:8241
ReshapeFusionOpsetTestnow iterates opsets 19/21/23/24/25, but theshape_test_for_opset15flag is mutated insidebuild_test_caseand then reused across iterations. After the first opset>=15 run, subsequent iterations build a Shape with start=1,end=2 and also switch to the (pre,pre) checker branch, so the newly added opsets are not actually validating the fusion path this test is meant to cover.
const std::vector<int> opsets{11, 12, 13, 14, 15, 18, 19, 21, 23, 24, 25};
bool shape_test_for_opset15 = false;
for (auto& opset : opsets) {
auto build_test_case = [&](ModelTestBuilder& builder) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
795f312 to
6907dd7
Compare
Add newer opset versions (19, 21, 23, 24, 25) to IsSupportedOptypeVersionAndDomain and MatchesOpSinceVersion checks in optimizers where the version bumps are type-constraint widenings only (no semantic changes): - attention_fusion.cc: Reshape, Transpose, Squeeze - attention_fusion_helper.h: Transpose - group_query_attention_pre_norm_fusion.cc: Reshape - reshape_fusion.cc: Unsqueeze, Shape Add corresponding tests at opset 25 for attention fusion, GQA pre-norm fusion, and extend ReshapeFusionOpsetTest to cover opsets 19-25. Fix ReshapeFusionOpsetTest to properly test the fusion path for all opsets including 19+. Previously, a mutable flag caused opsets after 18 to only test the no-fusion (partial Shape) path.
6907dd7 to
6a12338
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
onnxruntime/core/optimizer/reshape_fusion.cc:181
- The Shape(start/end) guard rejects any explicit
endattribute, even if it is set to the default "no slicing" value (e.g., INT64_MAX). That can unnecessarily block reshape-fusion for graphs that redundantly setendto the default. Consider treating anendattribute with a very large value (i.e., equivalent to full-shape) as acceptable, and only rejecting when start/end imply an actual slice.
// Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
if (shape.SinceVersion() >= 15) {
const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
return false;
}
Verdict: Approve, but
|
…nt-opset regression tests - Update version lists in attention_fusion.cc, attention_fusion_helper.h, and embed_layer_norm_fusion.cc to include opset versions up to 25/26. - Add programmatic current-opset regression tests that auto-detect when version lists need updating: Gelu, FastGelu, BiasGelu, LayerNorm, SkipLayerNorm, EmbedLayerNorm (3 formats), MobileClip MHA, GQA PreNorm. - Tests check for fused node first and report remaining op counts with guidance to update version lists or skip the opset.
- Replace .at(ONNX_DOMAIN) with find + ASSERT_TRUE in GQA test to avoid potential throw on missing domain (Copilot review, high). - Remove redundant TEST_RETURN_IF_NOT in DivMulFusionCurrentOpsetTest where the condition was already guaranteed by the enclosing if (Copilot review, low).
…usion with partial-shape queries
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
onnxruntime/core/optimizer/reshape_fusion.cc:181
- The new Shape start/end-attribute guard rejects any node that has an "end" attribute, even when end is the default full-range value. ORT’s Shape kernel treats end==std::numeric_limits<int64_t>::max() as the default (full shape), so this check can incorrectly block ReshapeFusion matching for models/exporters that explicitly set end to INT64_MAX.
// Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
if (shape.SinceVersion() >= 15) {
const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
return false;
}
Verdict: ApproveFive new commits since the prior review. The substantive concern ( Prior observations — status1.
|
Add WHY comments + tracking issue refs (microsoft#28966, and microsoft#28969 on the WebGPU attention-fusion path) to the ModelOptions{allow_released_opsets_only=false} call sites in the *CurrentOpset fusion tests, so a future reader knows they can be removed once ONNX opset 27 ships. No test logic or ModelOptions args change. Extend the onnx-opset-bump-checklist skill with three hard-won gotchas from the 1.22.0 integration: (m) the vcpkg MS-internal asset mirror must be Terrapin-seeded with the new tag tarball or every --use_vcpkg leg 404s; (n) a FINAL onnx release can still ship a map-max opset > last released opset (1.22.0: 27 > 26), leaving it under-development; (o) prefer per-model ModelOptions{allow_released_opsets_only=false} over per-leg CI env flips or GTEST_SKIP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### Integrate ONNX 1.22.0rc1 (opset 27) Resolves #28752. Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df` (VERSION_NUMBER `1.22.0rc1`). ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version **unchanged (13 / `0x0D`)**. This is the **RC validation phase** of an incremental integration (same strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub release is still a **draft** (no git tag yet), so re-pinning to the released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC now validates ONNX 1.22 against ORT before ONNX publishes the formal release. --- ### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main` + closes #28969 ONNX published the formal **`v1.22.0`** GitHub release, so this PR is re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step deferred in the rc1 description below. The branch was also **rebased onto `upstream/main`** to pick up the intervening optimizer/opset-26 work. The released tag tarball is a different asset hash than the RCs, so the vcpkg MS-internal asset mirror was re-seeded for the final tag (otherwise `--use_vcpkg` legs 404). **Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX` underflow). ONNX 1.22's expanded-Attention reference tests exposed a latent WebGPU bug where a broadcast shape computed `dim - 1` on a zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included here and the previously-skipped reference tests are re-enabled. **Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships `DomainToVersionRange` **map-max 27** while the last *released* opset is **26**, so **opset 27 stays under development** for the whole 1.22 cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`) therefore throw *"Opset 27 under development"* at model load on every `*CurrentOpset` fusion test that builds at the max opset. These tests now load with per-model `ModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false}`, extending the existing `38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset` suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not just the relaxed ones) and **preserves opset coverage** (vs. `GTEST_SKIP`). Each call site is annotated with a one-line WHY + tracking issue (#28966) so the relaxation can be removed once opset 27 is released. `Resolves #28752` (unchanged). Closes #28969. ### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX `output_shape` spec Since the original rc1 description below, this PR was re-pinned **rc1 → rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries onnx#8051, which tightened `convTransposeShapeInference` to reject an `output_shape`/`output_padding` whose size does not match the number of spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX Runtime now conforms to that spec** instead of patching ONNX to preserve a non-standard form. **⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX spec (spatial dimensions only).** ORT previously also accepted a non-standard `rank + 2` form that included batch and channel, i.e. `(N, C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a ConvTranspose whose input has a **statically-known rank** is rejected at `Graph::Resolve` with *"Attribute output_shape has incorrect size"*. **Migration:** specify `output_shape` with spatial dimensions only — e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred from the input and weight, so results are identical; the kernel ignores `N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank are unaffected** — ONNX skips the size check and ORT computes the same result (covered by the new `ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test). **Patch inventory — supersedes "2 files, 3 hunks" below.** `cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch` mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose hunks**. rc2's strict shape-inference check is kept as-is; ORT's own test models were conformed to the spec. The upstream archive hash, `deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are unchanged. **Additional rc2 test conform.** rc2 also tightened `convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input tensor must have at least 3 dimensions"*). The hand-authored model in `onnxruntime/test/python/quantization/test_op_split.py` declared a spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping the quantized-Split graph and expected outputs identical. No ORT source change. > This note should also seed the GitHub Release notes for the ONNX 1.22 / opset 27 milestone and the squash-commit message. --- ### What changed (29 files) **Version plumbing** - `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1 `421e5a9afb6c41a54696e424e5b9a3796aab6821`. - `cmake/external/onnx` — submodule → `bc3be77b`. - `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz SHA512 `e0c526f5…3ce467`. - `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`, `port-version` 0. - `cmake/patches/onnx/onnx.patch` + `cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto 1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option (restructured for 1.22's new `onnx_core` OBJECT-lib / `add_subdirectory(onnx)` layout) and the GroupNormalization-18 `.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings hunk (already merged upstream in 1.22). **Opset-27 op enablement (Range)** - `onnxruntime/core/providers/cpu/generator/range.cc` — split into versioned `[11, 26]` + a new unversioned `27` registration. The opset-27 kernel natively supports the existing common numeric types (float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's Range-27 **function body**, which ORT expands into primitive ops at partition time. **bf16 Range is deferred to that same function expansion** — there is no native bf16 kernel, and its bf16 reference node test (`test_range_bfloat16_type_positive_delta`, base + `_expanded`) is not exercised by the Python/numpy ONNX backend series, whose harness cannot materialize bf16 (`Numpy_type 256`); a native fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not correctness). - `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned the Range forward-declare + `BuildKernelCreateInfo` entries and added the opset-27 registration. - **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU (`onnxruntime/core/providers/cuda/generator/range.cc` + `cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner -e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on CUDAExecutionProvider (fp16/bf16 via function expansion). **Optimizer / EP opset ceilings** - `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26 → 27**. - `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` — `GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support checks still gate — these EPs gain no new kernels here). **Fusion updates** - `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range version list `{1,11}` → `{1,11,27}`. - `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to the two Range path-matchers (`parent_path_3/4`) so embedding fusion still matches opset-27 models. - `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27 GatherToSliceFusion test. **Requirements (7 bumped)** - All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to this bump; intentionally untouched). **Generated docs / test data** - `js/web/docs/webgl-operators.md` — regenerated. - `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP Range rows (`27+` + `[11, 26]` continuation each); see caveats. - `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` — **comment-only**: documents why no opset-27 CPU exclusions are needed (all opset-27 node tests pass via function expansion). **Docs** - `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable checklist skill distilled from this integration. Now also documents the "bump **all** execution providers together" tradition (CPU + CUDA + JS/DML assessment in one pass) so future opset bumps don't ship a partial EP set. --- ### Validation (CPU EP + CUDA EP, Linux x64) - Full build ✅ - `--minimal_build extended` build ✅ (validates the rebased `ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path) - `onnxruntime_test_all` ✅ — **1595 passed / 0 failed** - `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ — **62/62 pass** via ONNX function-body expansion (run with `ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState, LinearAttention, and fp16/bf16 Range — despite no native kernels for them. - **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and **RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range node tests ✅ — **8/8 pass**, with native Range-27 placed on CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via function-body expansion. --- ### Standing caveats (please read before reviewing) 1. **CUDA EP now locally verified for Range; other GPU EPs/ops still CI-only.** The CUDA EP was built and the opset-27 **Range** node tests run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops were **not** exercised here. Function-body expansion is EP-agnostic, so other opset-27 models are expected to run on those EPs too, but broader GPU coverage remains a CI/follow-up item. 2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP sections (the generator only emits rows for the EPs in the built module). A correct multi-EP regen needs a build per EP and is a follow-up. 3. **Opset 27 is "under development"** in ONNX's released-versions map. ORT's load-time validation rejects opset-27 models unless `ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The opset-27 **schemas are always compiled in from the submodule** regardless — this gate only affects model load-time acceptance, not schema availability. 4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an *upper* guard only; raising it merely lets opset-26/27 nodes reach the per-op support checks that still gate correctness. No regression — it also retroactively un-caps opset-26 for these EPs. 5. **iOS/macOS Xcode framework build is currently broken by an upstream ONNX CMake regression** (the `onnx_core` OBJECT-library split in onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset bump. Tracked upstream at [onnx/onnx#8053](onnx/onnx#8053). Non-Xcode builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin once ONNX ships the fix. --- ### Follow-ups (explicitly NOT in this PR) - **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests; regenerate `OperatorKernels.md` across all EPs. - **JS EP Range** `[11, 26]` + `27` split (currently registered open-ended at `11`; mirror the CPU/CUDA versioned split). - **DML Range opset-27 assessment** (DML uses its own `REG_INFO` registration system — assess whether an opset-27 entry is needed). - **WebGPU EP Range** opset-27 split — `range.cc` registers `Range` `.SinceVersion(11)` open-ended, so it already claims opset-27 Range; only the new bf16 type is unsupported and falls back via the `T` type-constraint (function expansion). Mirror the CPU/CUDA versioned `[11, 26]` + `27` split. - **Native kernels:** implement CPU (and EP) `CausalConvWithState` and `LinearAttention` kernels, and a native fp16/bf16 + `stash_type` Range-27 kernel (replace today's function-expansion path with efficient kernels). - **Phase 2 — formal `v1.22.0` re-pin:** re-pin `deps.txt`/submodule/portfile/requirements to the released tag once ONNX publishes it (currently blocked on ONNX tagging the release); upload the tag tarball to the vcpkg mirror. **This also restores the iOS/macOS Xcode framework build** once the upstream onnx OBJECT-library Xcode regression (caveat 5) is resolved and re-pinned. - **Tooling:** fix the pre-existing crash in `find_optimizer_opset_version_updates_required.py` (placeholder `ver` parsed as int) so it can be relied on for future bumps. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This pull request expands support for additional ONNX opset versions in the attention fusion optimization code, making the optimizer compatible with newer and more diverse ONNX models. The changes primarily update the accepted opset versions for various operators such as
Transpose,Reshape,Squeeze,Unsqueeze,Shape, and others across multiple functions. This ensures broader model compatibility and improves the robustness of the fusion logic.Expanded opset version support for attention fusion:
Transpose,Reshape,Squeeze,Unsqueeze,Shape,Add,Mul,Sub,Div,Cast, etc.) in the main attention fusion logic (attention_fusion.cc), allowing matching and fusion of newer ONNX models using these operators at opsets up to 25. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Helper and mask subgraph matching improvements:
These changes collectively future-proof the attention fusion optimizer for a wider range of ONNX models and operator versions, reducing the likelihood of unsupported patterns during optimization.