Update optimizer opset version checks for latest ONNX opset 26 by yuslepukhin · Pull Request #28966 · microsoft/onnxruntime

yuslepukhin · 2026-06-09T22:09:08Z

This pull request expands support for additional ONNX opset versions in the attention fusion optimization code, making the optimizer compatible with newer and more diverse ONNX models. The changes primarily update the accepted opset versions for various operators such as Transpose, Reshape, Squeeze, Unsqueeze, Shape, and others across multiple functions. This ensures broader model compatibility and improves the robustness of the fusion logic.

Expanded opset version support for attention fusion:

Updated accepted opset versions for key operators (Transpose, Reshape, Squeeze, Unsqueeze, Shape, Add, Mul, Sub, Div, Cast, etc.) in the main attention fusion logic (attention_fusion.cc), allowing matching and fusion of newer ONNX models using these operators at opsets up to 25. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Helper and mask subgraph matching improvements:

Broadened opset version checks for subgraph matching in helper functions, including those for Gemm subgraphs, unidirectional mask subgraphs, input mask subgraphs, and past subgraph matching, to support additional opset versions and operator variants. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

These changes collectively future-proof the attention fusion optimizer for a wider range of ONNX models and operator versions, reducing the likelihood of unsupported patterns during optimization.

Copilot

Pull request overview

This PR expands ONNX Runtime optimizer pattern matching and unit tests to recognize newer ONNX operator schema versions (opset 23–25), aiming to keep attention fusions and reshape fusion behavior compatible with opset 25 models.

Changes:

Broadened supported operator-version allowlists in optimizer fusions (e.g., Transpose/Reshape/Squeeze/Unsqueeze/Shape) to include newer schema versions up to opset 25.
Added opset 25 coverage for MobileCLIP attention fusion and GroupQueryAttentionPreNorm fusion unit tests.
Extended ReshapeFusionOpsetTest to iterate additional opsets (19/21/23/24/25).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
onnxruntime/core/optimizer/attention_fusion.cc	Updates MobileCLIP attention fusion pattern version checks for newer ONNX schemas.
onnxruntime/core/optimizer/attention_fusion_helper.h	Extends supported Transpose versions in GPT attention helper logic.
onnxruntime/core/optimizer/group_query_attention_pre_norm_fusion.cc	Expands supported Reshape versions in the GQA pre-norm fusion matcher.
onnxruntime/core/optimizer/reshape_fusion.cc	Updates Shape/Unsqueeze schema version handling in reshape fusion logic.
onnxruntime/test/optimizer/graph_transform_test.cc	Adds opset coverage (incl. 25) for attention and reshape fusion tests.
onnxruntime/test/optimizer/group_query_attention_pre_norm_fusion_test.cc	Adds opset 25 test for Qwen GQA pre-norm fusion.

Comments suppressed due to low confidence (1)

onnxruntime/test/optimizer/graph_transform_test.cc:8241

ReshapeFusionOpsetTest now iterates opsets 19/21/23/24/25, but the shape_test_for_opset15 flag is mutated inside build_test_case and then reused across iterations. After the first opset>=15 run, subsequent iterations build a Shape with start=1,end=2 and also switch to the (pre,pre) checker branch, so the newly added opsets are not actually validating the fusion path this test is meant to cover.

  const std::vector<int> opsets{11, 12, 13, 14, 15, 18, 19, 21, 23, 24, 25};
  bool shape_test_for_opset15 = false;

  for (auto& opset : opsets) {
    auto build_test_case = [&](ModelTestBuilder& builder) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Add newer opset versions (19, 21, 23, 24, 25) to IsSupportedOptypeVersionAndDomain and MatchesOpSinceVersion checks in optimizers where the version bumps are type-constraint widenings only (no semantic changes): - attention_fusion.cc: Reshape, Transpose, Squeeze - attention_fusion_helper.h: Transpose - group_query_attention_pre_norm_fusion.cc: Reshape - reshape_fusion.cc: Unsqueeze, Shape Add corresponding tests at opset 25 for attention fusion, GQA pre-norm fusion, and extend ReshapeFusionOpsetTest to cover opsets 19-25. Fix ReshapeFusionOpsetTest to properly test the fusion path for all opsets including 19+. Previously, a mutable flag caused opsets after 18 to only test the no-fusion (partial Shape) path.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/optimizer/reshape_fusion.cc:181

The Shape(start/end) guard rejects any explicit end attribute, even if it is set to the default "no slicing" value (e.g., INT64_MAX). That can unnecessarily block reshape-fusion for graphs that redundantly set end to the default. Consider treating an end attribute with a very large value (i.e., equivalent to full-shape) as acceptable, and only rejecting when start/end imply an actual slice.

    // Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
    if (shape.SinceVersion() >= 15) {
      const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
      const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
      if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
        return false;
      }

hariharans29 · 2026-06-10T18:45:44Z

Verdict: Approve, but `attention_fusion_helper.h` has an internal inconsistency worth fixing before merge

The mechanical opset-list expansions are fine, and two of the changes are actually meaningful correctness fixes hiding inside the "version bump" framing. One change in attention_fusion_helper.h is half-done in a way that defeats its own purpose — Copilot's second comment is accurate and worth acting on.

The two real fixes hiding in here

1. `Shape` SinceVersion check in `reshape_fusion.cc` — correctness fix

Pre-PR:

if (graph_utils::MatchesOpSinceVersion(shape, {15})) {
  // check start/end attributes that, if set, would block fusion
  ...
}

MatchesOpSinceVersion(shape, {15}) returns true only when shape.SinceVersion() == 15. For a model at opset 19 or 21 (where Shape has SinceVersion 19/21 respectively), this returned false, skipped the start/end attribute check, and would fuse a partial-Shape pattern that should not be fused. This is a latent correctness bug being silently fixed.

Post-PR:

if (shape.SinceVersion() >= 15) { ... }

is the right shape — once the schema added the attribute, every later version inherits it. This deserves a callout in the PR description because the framing "extend opset version checks" undersells it; "fix partial-Shape detection on opset ≥ 19" would be more accurate.

The test side does pick this up — ReshapeFusionOpsetTest was previously contorted into a one-shot state machine (shape_test_for_opset15) so that the partial-shape path ran exactly once across all opsets, and the new test now runs the partial-shape negative case for every opset ≥ 15. That's the right refactor.

2. `Unsqueeze` axes-from-input in `reshape_fusion.cc` — same class of fix

Pre-PR:

} else if (graph_utils::MatchesOpSinceVersion(unsqueeze, {13})) {
  const NodeArg* axes_node_arg = unsqueeze.InputDefs()[1];
  ...
}

Same issue: only matched SinceVersion() == 13. For opset 21+ where Unsqueeze has SinceVersion 21 (or whichever version it was bumped to), this returned false → reshape fusion failed silently on those models. The new structural check InputDefs().size() > 1 fixes it.

Copilot suggested gating on SinceVersion() >= 13 instead. Two reasonable opinions:

Pro SinceVersion() >= 13: schema-aligned, matches the comment, defends against a hypothetical malformed Unsqueeze that has the wrong arity for its version.
Pro InputDefs().size() > 1: future-proof against any later opset bump (no need to revisit), and the "malformed Unsqueeze with wrong arity" case would fail schema validation before reaching this code anyway.

I'd take the structural check as you have it. The defensiveness Copilot is asking for is downstream of schema validation, and you'd otherwise need to update this site on every Unsqueeze bump going forward. Author's call — not blocking either way.

The one thing worth addressing before merge

`FuseGptAttention` in `attention_fusion_helper.h` is internally inconsistent

The PR updates one line in this function:

// line 1450
if (graph_utils::IsSupportedOptypeVersionAndDomain(*k_concat, "Transpose",
                                                   {1, 13, 21, 23, 24, 25}, kOnnxDomain)) {
  transpose_optimized_pattern = true;
  ...
}

But everything downstream of that gate is still locked to the old opsets:

// ~line 1468
if (!graph_utils::IsSupportedOptypeVersionAndDomain(*k_concat, "Concat",
                                                    {4, 11, 13}, kOnnxDomain)) {
  return false;
}

// ~line 1474
std::vector<graph_utils::EdgeEndToMatch> k_path{
    {0, 1, "Transpose", {1, 13},     kOnnxDomain},
    {0, 0, "Reshape",   {5, 13},     kOnnxDomain},
    {1, 0, "Split",     {2, 11, 13}, kOnnxDomain}};

Consequence: on an opset 23/24/25 GPT model the precheck succeeds, then FindPath fails because the Transpose has SinceVersion 21 (or whichever) and isn't in {1, 13}. Net result of the one-line change in this function: nothing. Either:

(a) Update the q/k/v path matchers, Reshape {5, 13} → {5, 13, 14, 19, 21, 23}, Split {2, 11, 13} → {2, 11, 13, 18, ...}, Concat {4, 11, 13} → matching set, and the inner Transpose {1, 13} → {1, 13, 21, 23, 24, 25} — consistent with the stated PR intent. Plus an opset-25 test for the GPT path the same way you did for MobileCLIP and the GQA pre-norm fusion.
(b) Or revert the line 1450 change and explicitly scope the PR to "opset 25 for MobileCLIP / GQA pre-norm / reshape fusion" only, since FuseGptAttention won't actually work end-to-end at opset 25 without the rest.

Either is fine. The current state is the one option that doesn't make sense.

Copilot's second comment captured this. Acting on it would close the gap.

Pattern-level observations on the version list extensions

Adding 24 and 25 to every list

Reshape {5, 13, 14, 19, 21, 23, 24, 25}, Transpose {1, 13, 21, 23, 24, 25}, etc.

IsSupportedOptypeVersionAndDomain checks SinceVersion(), which is the version the operator's schema was last changed, not the model's opset. So the entries that actually do anything are the versions where the operator was bumped. If Reshape was last bumped at v23, then {24, 25} in its list are no-ops (a model declared at opset 25 will still report Reshape.SinceVersion() == 23).

This is harmless future-proofing, and consistent with how similar extensions have been done in the repo before — I'd just flag in the PR description what was actually bumped at 24/25 vs. what was added defensively. Helps the next person doing the same exercise understand which entries are load-bearing.

Hot-spot to harmonize as follow-up (out of scope here)

This file (attention_fusion.cc) has the same set of operator version lists repeated 9 times for Reshape and 6 times for Transpose. Every opset bump now triggers a search-and-replace across the file, with the FuseGptAttention mistake above being a natural consequence. A small refactor into named constants (kReshapeOpsetVersions, kTransposeOpsetVersions) would have made this PR a 4-line change and made the next one trivial. Not for this PR — file as a cleanup.

Tests look right

AttentionFusionMobileClipMhaOpset25Test is the parallel form of the existing AttentionFusionMobileClipMhaTest (opset 14 → 25), reusing the same helper and checker. Minimal and correct.
GroupQueryAttentionPreNormFusionFusesQwenPatternOpset25 likewise parallels the existing test.
ReshapeFusionOpsetTest refactor (drop the state machine, run the positive case always and the partial-Shape negative case for every opset ≥ 15) is cleaner and increases coverage. Good.

One small note: ReshapeFusionOpsetTest now iterates {11, 12, 13, 14, 15, 18, 19, 21, 23, 24, 25} but skips 16, 17, 20, 22. If those gaps are intentional (Reshape unchanged at those opsets so they collapse to the previous SinceVersion), fine. If not, adding them is one character each. Minor.

Bottom line

The two reshape_fusion.cc corrections are nice quiet wins. The mechanical version-list extensions in attention_fusion.cc are fine. The attention_fusion_helper.h change is currently a no-op due to the unchanged downstream matchers — either complete it (preferred, with a matching test) or drop it. Recommend addressing that one point and then this is good to land.

…nt-opset regression tests - Update version lists in attention_fusion.cc, attention_fusion_helper.h, and embed_layer_norm_fusion.cc to include opset versions up to 25/26. - Add programmatic current-opset regression tests that auto-detect when version lists need updating: Gelu, FastGelu, BiasGelu, LayerNorm, SkipLayerNorm, EmbedLayerNorm (3 formats), MobileClip MHA, GQA PreNorm. - Tests check for fused node first and report remaining op counts with guidance to update version lists or skip the opset.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

- Replace .at(ONNX_DOMAIN) with find + ASSERT_TRUE in GQA test to avoid potential throw on missing domain (Copilot review, high). - Remove redundant TEST_RETURN_IF_NOT in DivMulFusionCurrentOpsetTest where the condition was already guaranteed by the enclosing if (Copilot review, low).

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

…usion with partial-shape queries

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/optimizer/reshape_fusion.cc:181

The new Shape start/end-attribute guard rejects any node that has an "end" attribute, even when end is the default full-range value. ORT’s Shape kernel treats end==std::numeric_limits<int64_t>::max() as the default (full shape), so this check can incorrectly block ReshapeFusion matching for models/exporters that explicitly set end to INT64_MAX.

    // Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
    if (shape.SinceVersion() >= 15) {
      const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
      const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
      if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
        return false;
      }

hariharans29 · 2026-06-11T21:09:43Z

Verdict: Approve

Five new commits since the prior review. The substantive concern (FuseGptAttention internal inconsistency) is fixed, and the PR has grown two additional correctness fixes and a genuinely useful piece of test infrastructure. Ready to land.

Prior observations — status

1. `FuseGptAttention` precheck-vs-matcher gap — fixed

All three path matchers in attention_fusion_helper.h::FuseGptAttention are now extended to match the precheck:

// path1 (line ~1357)
{0, 0, "Reshape",   {5, 13, 14, 19, 21, 23, 24, 25}, kOnnxDomain},
{0, 0, "Transpose", {1, 13, 21, 23, 24, 25},         kOnnxDomain},
{0, 0, "MatMul",    {1, 9, 13},                       kOnnxDomain}};

// path2 (line ~1379) and q_path (line ~1431)
{2, 0, "Split", {2, 11, 13, 18}, kOnnxDomain}

// k_path (line ~1489)
{0, 1, "Transpose", {1, 13, 21, 23, 24, 25},         kOnnxDomain},
{0, 0, "Reshape",   {5, 13, 14, 19, 21, 23, 24, 25}, kOnnxDomain},
{1, 0, "Split",     {2, 11, 13, 18},                  kOnnxDomain}};

End-to-end coherent now. Bonus: Split {2, 11, 13, 18} also picks up the Split-18 bump (the num_outputs attribute change) that I flagged as a possible gap in the prior review.

2. `GetAxesFromUnsqueezeNode` — gating choice

reshape_fusion.cc still uses InputDefs().size() > 1. Fine — that was author's call and both forms are defensible. Not blocking.

3. Shape `(!end_attr)` guard — fixed and tightened

The original guard at reshape_fusion.cc:175 was:

if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) { ... }

which rejected any model that explicitly set end = INT64_MAX (the documented default) — a false negative for legitimate full-shape calls. The new form accepts both forms equivalently:

if (!((!start_attr || start_attr->i() == 0) &&
      (!end_attr   || end_attr->i()   == std::numeric_limits<int64_t>::max()))) { ... }

Correct. The commit message ("Fix Shape end-attr guard to accept INT64_MAX as full-shape default") is accurate.

New material since prior review

A. Two more partial-Shape correctness guards

Same fix pattern (SinceVersion() >= 15 + start/end attribute check + INT64_MAX-default handling) added in two more places:

attention_fusion_helper.h::MatchGemmSubgraph (line ~101) — guards the GPT subgraph where Shape → Slice → Squeeze → Unsqueeze → Concat → Reshape → Gemm assumed full-shape semantics.
embed_layer_norm_fusion.cc::MatchInputToConcatSubgraph (lines ~143, ~188) — guards the two Shape → Gather paths feeding embed-layer-norm position-id construction.

Each carries a clear "why this guard exists" comment. These were latent correctness bugs on any opset-15+ model that used start/end to emit a partial shape. Quietly worth more than the headline opset bump.

B. Forward-proof test infrastructure — `GetCurrentOnnxOpset()`

static int GetCurrentOnnxOpset() {
  const auto& map = ONNX_NAMESPACE::OpSchemaRegistry::DomainToVersionRange::Instance().Map();
  auto it = map.find(ONNX_NAMESPACE::ONNX_DOMAIN);
  EXPECT_TRUE(it != map.end()) << "ONNX domain not found in OpSchemaRegistry";
  return it != map.end() ? it->second.second : 0;
}

Used by:

LayerNormFusionCurrentOpsetTest, SkipLayerNormFusionCurrentOpsetTest
EmbedLayerNormFusionFormat{1,2,3}CurrentOpset
GroupQueryAttentionPreNormFusionFusesQwenPatternCurrentOpset

This is the right approach. The next time someone bumps the bundled ONNX submodule, these tests will start failing on opset drift and tell you exactly which optimizer's version lists need updating. The failure messages even point at the offending file ("Update version lists in onnxruntime/core/optimizer/embed_layer_norm_fusion.cc"). Strictly better than hard-coded opset numbers.

C. `LoadModelAtCurrentOpset` model rewriting

Loads an existing .onnx test model and rewrites the proto in place:

Bumps the ai.onnx opset_import to current opset.
Converts attribute-form axes to input-form for Squeeze/Unsqueeze (opset 13+) and ReduceSum/ReduceMean/ReduceMax/ReduceMin/ReduceProd (opset 18+).

This works for the three embed_layer_norm_format*.onnx fixtures because they only use those operators with attribute-form axes. A few notes for future readers:

The conversion list is exhaustive for the fixtures it's used against. If anyone reuses LoadModelAtCurrentOpset for a model that also has, say, attribute-form axes on a Slice (was input from opset 10) or any other op that migrated attributes to inputs, it will silently fail to convert and produce an invalid graph at the bumped opset. A one-line comment on the helper noting "currently handles only Squeeze/Unsqueeze/Reduce*; extend as needed" would help.
Bumping opset_import does not re-validate the existing nodes against the new schemas. For operators that were bumped without input changes (e.g., Add at v14), the node-level SinceVersion() reported by the loader will adjust to the imported opset. CI confirms the rewritten models load and the tests pass (86/86 checks OK on 992c85e), so this is empirically fine.

Not blocking — useful infrastructure, just worth a comment.

D. Title vs scope

The PR title says "opset 25" but the version lists include {24, 25} and the third commit message references "opset 26". The actual entries are fine (defensive future-proofing past the last-bumped versions). Worth updating the PR title to match if 26 is the real target.

Minor observations on the new code

The SinceVersion() >= 15 + start/end attribute check is now duplicated five times (reshape_fusion.cc, attention_fusion_helper.h × 1, embed_layer_norm_fusion.cc × 2). A small helper —
```
// Returns true if shape_node represents the full tensor shape (no start/end slicing).
static bool IsFullShape(const Node& shape_node);
```
— would dedupe nicely. Out of scope here; file as a follow-up alongside the attention_fusion.cc named-constants refactor.
LayerNormFusionCurrentOpsetTest builds the LayerNorm pattern with Pow. Pow was last bumped at opset 15 (pow.SinceVersion() == 15 for any model at opset ≥ 15). Fine.
LayerNormFusionCurrentOpsetTest runs at TransformerLevel::Level1, but the standard LayerNormFusion registration is at Level2 with fuse_in_level_1=true in some pipelines. The test explicitly constructs LayerNormFusion(no_limit_empty_ep_list, TransformerLevel::Level1) to force the Level1 path, which is the right call for a unit test. Just noting it's intentional, not a bug.
The post-graph checker pattern in the new tests:
```
if (op_to_count["LayerNormalization"] == 1) { ...checks...; return Status::OK(); }
return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "...failed at opset ", current_opset, "...");
```
produces a much more useful failure message than TEST_RETURN_IF_NOT alone. Nice pattern — worth adopting for other current-opset tests in the codebase.

Bottom line

The substantive concern from the prior review is fixed, the partial-Shape correctness fix has been generalized to two more sites, and the new GetCurrentOnnxOpset()-based tests are a genuinely useful piece of regression infrastructure that will catch the next opset drift automatically. Approve.

The follow-up cleanups (dedupe the five copies of the partial-Shape guard into an IsFullShape helper; pull attention_fusion.cc opset lists into named constants; broaden LoadModelAtCurrentOpset's attribute-to-input table if it gets reused) are all worth filing as separate issues but should not hold this PR.

Add WHY comments + tracking issue refs (microsoft#28966, and microsoft#28969 on the WebGPU attention-fusion path) to the ModelOptions{allow_released_opsets_only=false} call sites in the *CurrentOpset fusion tests, so a future reader knows they can be removed once ONNX opset 27 ships. No test logic or ModelOptions args change. Extend the onnx-opset-bump-checklist skill with three hard-won gotchas from the 1.22.0 integration: (m) the vcpkg MS-internal asset mirror must be Terrapin-seeded with the new tag tarball or every --use_vcpkg leg 404s; (n) a FINAL onnx release can still ship a map-max opset > last released opset (1.22.0: 27 > 26), leaving it under-development; (o) prefer per-model ModelOptions{allow_released_opsets_only=false} over per-leg CI env flips or GTEST_SKIP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

### Integrate ONNX 1.22.0rc1 (opset 27) Resolves #28752. Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df` (VERSION_NUMBER `1.22.0rc1`). ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version **unchanged (13 / `0x0D`)**. This is the **RC validation phase** of an incremental integration (same strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub release is still a **draft** (no git tag yet), so re-pinning to the released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC now validates ONNX 1.22 against ORT before ONNX publishes the formal release. --- ### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main` + closes #28969 ONNX published the formal **`v1.22.0`** GitHub release, so this PR is re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step deferred in the rc1 description below. The branch was also **rebased onto `upstream/main`** to pick up the intervening optimizer/opset-26 work. The released tag tarball is a different asset hash than the RCs, so the vcpkg MS-internal asset mirror was re-seeded for the final tag (otherwise `--use_vcpkg` legs 404). **Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX` underflow). ONNX 1.22's expanded-Attention reference tests exposed a latent WebGPU bug where a broadcast shape computed `dim - 1` on a zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included here and the previously-skipped reference tests are re-enabled. **Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships `DomainToVersionRange` **map-max 27** while the last *released* opset is **26**, so **opset 27 stays under development** for the whole 1.22 cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`) therefore throw *"Opset 27 under development"* at model load on every `*CurrentOpset` fusion test that builds at the max opset. These tests now load with per-model `ModelOptions{/*allow_released_opsets_only*/ false, /*strict_shape_type_inference*/ false}`, extending the existing `38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset` suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not just the relaxed ones) and **preserves opset coverage** (vs. `GTEST_SKIP`). Each call site is annotated with a one-line WHY + tracking issue (#28966) so the relaxation can be removed once opset 27 is released. `Resolves #28752` (unchanged). Closes #28969. ### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX `output_shape` spec Since the original rc1 description below, this PR was re-pinned **rc1 → rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries onnx#8051, which tightened `convTransposeShapeInference` to reject an `output_shape`/`output_padding` whose size does not match the number of spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX Runtime now conforms to that spec** instead of patching ONNX to preserve a non-standard form. **⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX spec (spatial dimensions only).** ORT previously also accepted a non-standard `rank + 2` form that included batch and channel, i.e. `(N, C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a ConvTranspose whose input has a **statically-known rank** is rejected at `Graph::Resolve` with *"Attribute output_shape has incorrect size"*. **Migration:** specify `output_shape` with spatial dimensions only — e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred from the input and weight, so results are identical; the kernel ignores `N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank are unaffected** — ONNX skips the size check and ORT computes the same result (covered by the new `ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test). **Patch inventory — supersedes "2 files, 3 hunks" below.** `cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch` mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose hunks**. rc2's strict shape-inference check is kept as-is; ORT's own test models were conformed to the spec. The upstream archive hash, `deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are unchanged. **Additional rc2 test conform.** rc2 also tightened `convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input tensor must have at least 3 dimensions"*). The hand-authored model in `onnxruntime/test/python/quantization/test_op_split.py` declared a spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping the quantized-Split graph and expected outputs identical. No ORT source change. > This note should also seed the GitHub Release notes for the ONNX 1.22 / opset 27 milestone and the squash-commit message. --- ### What changed (29 files) **Version plumbing** - `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1 `421e5a9afb6c41a54696e424e5b9a3796aab6821`. - `cmake/external/onnx` — submodule → `bc3be77b`. - `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz SHA512 `e0c526f5…3ce467`. - `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`, `port-version` 0. - `cmake/patches/onnx/onnx.patch` + `cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto 1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option (restructured for 1.22's new `onnx_core` OBJECT-lib / `add_subdirectory(onnx)` layout) and the GroupNormalization-18 `.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings hunk (already merged upstream in 1.22). **Opset-27 op enablement (Range)** - `onnxruntime/core/providers/cpu/generator/range.cc` — split into versioned `[11, 26]` + a new unversioned `27` registration. The opset-27 kernel natively supports the existing common numeric types (float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's Range-27 **function body**, which ORT expands into primitive ops at partition time. **bf16 Range is deferred to that same function expansion** — there is no native bf16 kernel, and its bf16 reference node test (`test_range_bfloat16_type_positive_delta`, base + `_expanded`) is not exercised by the Python/numpy ONNX backend series, whose harness cannot materialize bf16 (`Numpy_type 256`); a native fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not correctness). - `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned the Range forward-declare + `BuildKernelCreateInfo` entries and added the opset-27 registration. - **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU (`onnxruntime/core/providers/cuda/generator/range.cc` + `cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner -e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on CUDAExecutionProvider (fp16/bf16 via function expansion). **Optimizer / EP opset ceilings** - `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26 → 27**. - `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` — `GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support checks still gate — these EPs gain no new kernels here). **Fusion updates** - `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range version list `{1,11}` → `{1,11,27}`. - `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to the two Range path-matchers (`parent_path_3/4`) so embedding fusion still matches opset-27 models. - `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27 GatherToSliceFusion test. **Requirements (7 bumped)** - All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to this bump; intentionally untouched). **Generated docs / test data** - `js/web/docs/webgl-operators.md` — regenerated. - `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP Range rows (`27+` + `[11, 26]` continuation each); see caveats. - `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` — **comment-only**: documents why no opset-27 CPU exclusions are needed (all opset-27 node tests pass via function expansion). **Docs** - `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable checklist skill distilled from this integration. Now also documents the "bump **all** execution providers together" tradition (CPU + CUDA + JS/DML assessment in one pass) so future opset bumps don't ship a partial EP set. --- ### Validation (CPU EP + CUDA EP, Linux x64) - Full build ✅ - `--minimal_build extended` build ✅ (validates the rebased `ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path) - `onnxruntime_test_all` ✅ — **1595 passed / 0 failed** - `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ — **62/62 pass** via ONNX function-body expansion (run with `ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState, LinearAttention, and fp16/bf16 Range — despite no native kernels for them. - **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and **RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range node tests ✅ — **8/8 pass**, with native Range-27 placed on CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via function-body expansion. --- ### Standing caveats (please read before reviewing) 1. **CUDA EP now locally verified for Range; other GPU EPs/ops still CI-only.** The CUDA EP was built and the opset-27 **Range** node tests run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops were **not** exercised here. Function-body expansion is EP-agnostic, so other opset-27 models are expected to run on those EPs too, but broader GPU coverage remains a CI/follow-up item. 2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP sections (the generator only emits rows for the EPs in the built module). A correct multi-EP regen needs a build per EP and is a follow-up. 3. **Opset 27 is "under development"** in ONNX's released-versions map. ORT's load-time validation rejects opset-27 models unless `ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The opset-27 **schemas are always compiled in from the submodule** regardless — this gate only affects model load-time acceptance, not schema availability. 4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an *upper* guard only; raising it merely lets opset-26/27 nodes reach the per-op support checks that still gate correctness. No regression — it also retroactively un-caps opset-26 for these EPs. 5. **iOS/macOS Xcode framework build is currently broken by an upstream ONNX CMake regression** (the `onnx_core` OBJECT-library split in onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset bump. Tracked upstream at [onnx/onnx#8053](onnx/onnx#8053). Non-Xcode builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin once ONNX ships the fix. --- ### Follow-ups (explicitly NOT in this PR) - **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests; regenerate `OperatorKernels.md` across all EPs. - **JS EP Range** `[11, 26]` + `27` split (currently registered open-ended at `11`; mirror the CPU/CUDA versioned split). - **DML Range opset-27 assessment** (DML uses its own `REG_INFO` registration system — assess whether an opset-27 entry is needed). - **WebGPU EP Range** opset-27 split — `range.cc` registers `Range` `.SinceVersion(11)` open-ended, so it already claims opset-27 Range; only the new bf16 type is unsupported and falls back via the `T` type-constraint (function expansion). Mirror the CPU/CUDA versioned `[11, 26]` + `27` split. - **Native kernels:** implement CPU (and EP) `CausalConvWithState` and `LinearAttention` kernels, and a native fp16/bf16 + `stash_type` Range-27 kernel (replace today's function-expansion path with efficient kernels). - **Phase 2 — formal `v1.22.0` re-pin:** re-pin `deps.txt`/submodule/portfile/requirements to the released tag once ONNX publishes it (currently blocked on ONNX tagging the release); upload the tag tarball to the vcpkg mirror. **This also restores the iOS/macOS Xcode framework build** once the upstream onnx OBJECT-library Xcode regression (caveat 5) is resolved and re-pinned. - **Tooling:** fix the pre-existing crash in `find_optimizer_opset_version_updates_required.py` (placeholder `ver` parsed as int) so it can be relied on for future bumps. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yuslepukhin requested review from Copilot and hariharans29 June 9, 2026 22:09

Copilot started reviewing on behalf of yuslepukhin June 9, 2026 22:09 View session

yuslepukhin added the release:1.27.0 label Jun 9, 2026

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/attention_fusion.cc

yuslepukhin force-pushed the yuslepukhin/ruleset_support branch from 795f312 to 6907dd7 Compare June 10, 2026 17:48

yuslepukhin requested a review from Copilot June 10, 2026 17:48

Copilot started reviewing on behalf of yuslepukhin June 10, 2026 17:48 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread onnxruntime/test/optimizer/graph_transform_test.cc

Comment thread onnxruntime/test/optimizer/graph_transform_test.cc Outdated

Comment thread onnxruntime/core/optimizer/reshape_fusion.cc

Comment thread onnxruntime/core/optimizer/reshape_fusion.cc Outdated

yuslepukhin force-pushed the yuslepukhin/ruleset_support branch from 6907dd7 to 6a12338 Compare June 10, 2026 18:24

yuslepukhin requested a review from Copilot June 10, 2026 18:26

Copilot started reviewing on behalf of yuslepukhin June 10, 2026 18:26 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/reshape_fusion.cc

Comment thread onnxruntime/core/optimizer/attention_fusion_helper.h

yuslepukhin removed the release:1.27.0 label Jun 10, 2026

yuslepukhin requested a review from Copilot June 10, 2026 22:55

Copilot started reviewing on behalf of yuslepukhin June 10, 2026 22:56 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread onnxruntime/test/optimizer/group_query_attention_pre_norm_fusion_test.cc Outdated

Comment thread onnxruntime/test/optimizer/graph_transform_test.cc

yuslepukhin marked this pull request as draft June 11, 2026 00:53

yuslepukhin requested a review from Copilot June 11, 2026 01:03

Copilot started reviewing on behalf of yuslepukhin June 11, 2026 01:03 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Add Shape opset 15+ start/end attribute guards to prevent incorrect f…

8ca6320

…usion with partial-shape queries

yuslepukhin requested a review from Copilot June 11, 2026 01:27

Copilot started reviewing on behalf of yuslepukhin June 11, 2026 01:27 View session

yuslepukhin marked this pull request as ready for review June 11, 2026 01:27

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread onnxruntime/core/optimizer/attention_fusion_helper.h Outdated

Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc Outdated

Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc Outdated

yuslepukhin added 2 commits June 11, 2026 10:10

Fix Shape end-attr guard to accept INT64_MAX as full-shape default

033ac63

Add comments explaining partial-shape guard logic

992c85e

hariharans29 previously approved these changes Jun 11, 2026

View reviewed changes

yuslepukhin changed the title ~~Update optimizer opset version checks for latest ONNX opset 25~~ Update optimizer opset version checks for latest ONNX opset 26 Jun 11, 2026

Add scope comment to LoadModelAtCurrentOpset helper

7941263

yuslepukhin dismissed hariharans29’s stale review via 7941263 June 11, 2026 21:41

hariharans29 previously approved these changes Jun 11, 2026

View reviewed changes

Extract IsFullShapeNode helper into graph_utils

06ef3db

yuslepukhin dismissed hariharans29’s stale review via 06ef3db June 11, 2026 21:50

hariharans29 approved these changes Jun 11, 2026

View reviewed changes

yuslepukhin merged commit 0b278bb into main Jun 12, 2026
86 checks passed

yuslepukhin deleted the yuslepukhin/ruleset_support branch June 12, 2026 17:35

titaiwangms mentioned this pull request Jun 15, 2026

Integrate ONNX 1.22.0 (opset 27) — issue #28752 #28754

Merged

Uh oh!

Conversation

yuslepukhin commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

hariharans29 commented Jun 10, 2026

Verdict: Approve, but attention_fusion_helper.h has an internal inconsistency worth fixing before merge

The two real fixes hiding in here

1. Shape SinceVersion check in reshape_fusion.cc — correctness fix

2. Unsqueeze axes-from-input in reshape_fusion.cc — same class of fix

The one thing worth addressing before merge

FuseGptAttention in attention_fusion_helper.h is internally inconsistent

Pattern-level observations on the version list extensions

Adding 24 and 25 to every list

Hot-spot to harmonize as follow-up (out of scope here)

Tests look right

Bottom line

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hariharans29 commented Jun 11, 2026

Verdict: Approve

Prior observations — status

1. FuseGptAttention precheck-vs-matcher gap — fixed

2. GetAxesFromUnsqueezeNode — gating choice

3. Shape (!end_attr) guard — fixed and tightened

New material since prior review

A. Two more partial-Shape correctness guards

B. Forward-proof test infrastructure — GetCurrentOnnxOpset()

C. LoadModelAtCurrentOpset model rewriting

D. Title vs scope

Minor observations on the new code

Bottom line

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

yuslepukhin commented Jun 9, 2026 •

edited

Loading

Verdict: Approve, but `attention_fusion_helper.h` has an internal inconsistency worth fixing before merge

1. `Shape` SinceVersion check in `reshape_fusion.cc` — correctness fix

2. `Unsqueeze` axes-from-input in `reshape_fusion.cc` — same class of fix

`FuseGptAttention` in `attention_fusion_helper.h` is internally inconsistent

1. `FuseGptAttention` precheck-vs-matcher gap — fixed

2. `GetAxesFromUnsqueezeNode` — gating choice

3. Shape `(!end_attr)` guard — fixed and tightened

B. Forward-proof test infrastructure — `GetCurrentOnnxOpset()`

C. `LoadModelAtCurrentOpset` model rewriting