Skip to content

Update optimizer opset version checks for latest ONNX opset 26#28966

Merged
yuslepukhin merged 8 commits into
mainfrom
yuslepukhin/ruleset_support
Jun 12, 2026
Merged

Update optimizer opset version checks for latest ONNX opset 26#28966
yuslepukhin merged 8 commits into
mainfrom
yuslepukhin/ruleset_support

Conversation

@yuslepukhin

@yuslepukhin yuslepukhin commented Jun 9, 2026

Copy link
Copy Markdown
Member

This pull request expands support for additional ONNX opset versions in the attention fusion optimization code, making the optimizer compatible with newer and more diverse ONNX models. The changes primarily update the accepted opset versions for various operators such as Transpose, Reshape, Squeeze, Unsqueeze, Shape, and others across multiple functions. This ensures broader model compatibility and improves the robustness of the fusion logic.

Expanded opset version support for attention fusion:

  • Updated accepted opset versions for key operators (Transpose, Reshape, Squeeze, Unsqueeze, Shape, Add, Mul, Sub, Div, Cast, etc.) in the main attention fusion logic (attention_fusion.cc), allowing matching and fusion of newer ONNX models using these operators at opsets up to 25. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Helper and mask subgraph matching improvements:

  • Broadened opset version checks for subgraph matching in helper functions, including those for Gemm subgraphs, unidirectional mask subgraphs, input mask subgraphs, and past subgraph matching, to support additional opset versions and operator variants. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

These changes collectively future-proof the attention fusion optimizer for a wider range of ONNX models and operator versions, reducing the likelihood of unsupported patterns during optimization.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands ONNX Runtime optimizer pattern matching and unit tests to recognize newer ONNX operator schema versions (opset 23–25), aiming to keep attention fusions and reshape fusion behavior compatible with opset 25 models.

Changes:

  • Broadened supported operator-version allowlists in optimizer fusions (e.g., Transpose/Reshape/Squeeze/Unsqueeze/Shape) to include newer schema versions up to opset 25.
  • Added opset 25 coverage for MobileCLIP attention fusion and GroupQueryAttentionPreNorm fusion unit tests.
  • Extended ReshapeFusionOpsetTest to iterate additional opsets (19/21/23/24/25).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
onnxruntime/core/optimizer/attention_fusion.cc Updates MobileCLIP attention fusion pattern version checks for newer ONNX schemas.
onnxruntime/core/optimizer/attention_fusion_helper.h Extends supported Transpose versions in GPT attention helper logic.
onnxruntime/core/optimizer/group_query_attention_pre_norm_fusion.cc Expands supported Reshape versions in the GQA pre-norm fusion matcher.
onnxruntime/core/optimizer/reshape_fusion.cc Updates Shape/Unsqueeze schema version handling in reshape fusion logic.
onnxruntime/test/optimizer/graph_transform_test.cc Adds opset coverage (incl. 25) for attention and reshape fusion tests.
onnxruntime/test/optimizer/group_query_attention_pre_norm_fusion_test.cc Adds opset 25 test for Qwen GQA pre-norm fusion.
Comments suppressed due to low confidence (1)

onnxruntime/test/optimizer/graph_transform_test.cc:8241

  • ReshapeFusionOpsetTest now iterates opsets 19/21/23/24/25, but the shape_test_for_opset15 flag is mutated inside build_test_case and then reused across iterations. After the first opset>=15 run, subsequent iterations build a Shape with start=1,end=2 and also switch to the (pre,pre) checker branch, so the newly added opsets are not actually validating the fusion path this test is meant to cover.
  const std::vector<int> opsets{11, 12, 13, 14, 15, 18, 19, 21, 23, 24, 25};
  bool shape_test_for_opset15 = false;

  for (auto& opset : opsets) {
    auto build_test_case = [&](ModelTestBuilder& builder) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/optimizer/attention_fusion.cc

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread onnxruntime/test/optimizer/graph_transform_test.cc
Comment thread onnxruntime/test/optimizer/graph_transform_test.cc Outdated
Comment thread onnxruntime/core/optimizer/reshape_fusion.cc
Comment thread onnxruntime/core/optimizer/reshape_fusion.cc Outdated
Add newer opset versions (19, 21, 23, 24, 25) to IsSupportedOptypeVersionAndDomain
and MatchesOpSinceVersion checks in optimizers where the version bumps are
type-constraint widenings only (no semantic changes):

- attention_fusion.cc: Reshape, Transpose, Squeeze
- attention_fusion_helper.h: Transpose
- group_query_attention_pre_norm_fusion.cc: Reshape
- reshape_fusion.cc: Unsqueeze, Shape

Add corresponding tests at opset 25 for attention fusion, GQA pre-norm
fusion, and extend ReshapeFusionOpsetTest to cover opsets 19-25.

Fix ReshapeFusionOpsetTest to properly test the fusion path for all opsets
including 19+. Previously, a mutable flag caused opsets after 18 to only
test the no-fusion (partial Shape) path.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/optimizer/reshape_fusion.cc:181

  • The Shape(start/end) guard rejects any explicit end attribute, even if it is set to the default "no slicing" value (e.g., INT64_MAX). That can unnecessarily block reshape-fusion for graphs that redundantly set end to the default. Consider treating an end attribute with a very large value (i.e., equivalent to full-shape) as acceptable, and only rejecting when start/end imply an actual slice.
    // Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
    if (shape.SinceVersion() >= 15) {
      const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
      const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
      if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
        return false;
      }

Comment thread onnxruntime/core/optimizer/reshape_fusion.cc
Comment thread onnxruntime/core/optimizer/attention_fusion_helper.h
@hariharans29

Copy link
Copy Markdown
Member

Verdict: Approve, but attention_fusion_helper.h has an internal inconsistency worth fixing before merge

The mechanical opset-list expansions are fine, and two of the changes are actually meaningful correctness fixes hiding inside the "version bump" framing. One change in attention_fusion_helper.h is half-done in a way that defeats its own purpose — Copilot's second comment is accurate and worth acting on.


The two real fixes hiding in here

1. Shape SinceVersion check in reshape_fusion.cc — correctness fix

Pre-PR:

if (graph_utils::MatchesOpSinceVersion(shape, {15})) {
  // check start/end attributes that, if set, would block fusion
  ...
}

MatchesOpSinceVersion(shape, {15}) returns true only when shape.SinceVersion() == 15. For a model at opset 19 or 21 (where Shape has SinceVersion 19/21 respectively), this returned false, skipped the start/end attribute check, and would fuse a partial-Shape pattern that should not be fused. This is a latent correctness bug being silently fixed.

Post-PR:

if (shape.SinceVersion() >= 15) { ... }

is the right shape — once the schema added the attribute, every later version inherits it. This deserves a callout in the PR description because the framing "extend opset version checks" undersells it; "fix partial-Shape detection on opset ≥ 19" would be more accurate.

The test side does pick this up — ReshapeFusionOpsetTest was previously contorted into a one-shot state machine (shape_test_for_opset15) so that the partial-shape path ran exactly once across all opsets, and the new test now runs the partial-shape negative case for every opset ≥ 15. That's the right refactor.

2. Unsqueeze axes-from-input in reshape_fusion.cc — same class of fix

Pre-PR:

} else if (graph_utils::MatchesOpSinceVersion(unsqueeze, {13})) {
  const NodeArg* axes_node_arg = unsqueeze.InputDefs()[1];
  ...
}

Same issue: only matched SinceVersion() == 13. For opset 21+ where Unsqueeze has SinceVersion 21 (or whichever version it was bumped to), this returned false → reshape fusion failed silently on those models. The new structural check InputDefs().size() > 1 fixes it.

Copilot suggested gating on SinceVersion() >= 13 instead. Two reasonable opinions:

  • Pro SinceVersion() >= 13: schema-aligned, matches the comment, defends against a hypothetical malformed Unsqueeze that has the wrong arity for its version.
  • Pro InputDefs().size() > 1: future-proof against any later opset bump (no need to revisit), and the "malformed Unsqueeze with wrong arity" case would fail schema validation before reaching this code anyway.

I'd take the structural check as you have it. The defensiveness Copilot is asking for is downstream of schema validation, and you'd otherwise need to update this site on every Unsqueeze bump going forward. Author's call — not blocking either way.


The one thing worth addressing before merge

FuseGptAttention in attention_fusion_helper.h is internally inconsistent

The PR updates one line in this function:

// line 1450
if (graph_utils::IsSupportedOptypeVersionAndDomain(*k_concat, "Transpose",
                                                   {1, 13, 21, 23, 24, 25}, kOnnxDomain)) {
  transpose_optimized_pattern = true;
  ...
}

But everything downstream of that gate is still locked to the old opsets:

// ~line 1468
if (!graph_utils::IsSupportedOptypeVersionAndDomain(*k_concat, "Concat",
                                                    {4, 11, 13}, kOnnxDomain)) {
  return false;
}

// ~line 1474
std::vector<graph_utils::EdgeEndToMatch> k_path{
    {0, 1, "Transpose", {1, 13},     kOnnxDomain},
    {0, 0, "Reshape",   {5, 13},     kOnnxDomain},
    {1, 0, "Split",     {2, 11, 13}, kOnnxDomain}};

Consequence: on an opset 23/24/25 GPT model the precheck succeeds, then FindPath fails because the Transpose has SinceVersion 21 (or whichever) and isn't in {1, 13}. Net result of the one-line change in this function: nothing. Either:

  • (a) Update the q/k/v path matchers, Reshape {5, 13}{5, 13, 14, 19, 21, 23}, Split {2, 11, 13}{2, 11, 13, 18, ...}, Concat {4, 11, 13} → matching set, and the inner Transpose {1, 13}{1, 13, 21, 23, 24, 25} — consistent with the stated PR intent. Plus an opset-25 test for the GPT path the same way you did for MobileCLIP and the GQA pre-norm fusion.
  • (b) Or revert the line 1450 change and explicitly scope the PR to "opset 25 for MobileCLIP / GQA pre-norm / reshape fusion" only, since FuseGptAttention won't actually work end-to-end at opset 25 without the rest.

Either is fine. The current state is the one option that doesn't make sense.

Copilot's second comment captured this. Acting on it would close the gap.


Pattern-level observations on the version list extensions

Adding 24 and 25 to every list

Reshape {5, 13, 14, 19, 21, 23, 24, 25}, Transpose {1, 13, 21, 23, 24, 25}, etc.

IsSupportedOptypeVersionAndDomain checks SinceVersion(), which is the version the operator's schema was last changed, not the model's opset. So the entries that actually do anything are the versions where the operator was bumped. If Reshape was last bumped at v23, then {24, 25} in its list are no-ops (a model declared at opset 25 will still report Reshape.SinceVersion() == 23).

This is harmless future-proofing, and consistent with how similar extensions have been done in the repo before — I'd just flag in the PR description what was actually bumped at 24/25 vs. what was added defensively. Helps the next person doing the same exercise understand which entries are load-bearing.

Hot-spot to harmonize as follow-up (out of scope here)

This file (attention_fusion.cc) has the same set of operator version lists repeated 9 times for Reshape and 6 times for Transpose. Every opset bump now triggers a search-and-replace across the file, with the FuseGptAttention mistake above being a natural consequence. A small refactor into named constants (kReshapeOpsetVersions, kTransposeOpsetVersions) would have made this PR a 4-line change and made the next one trivial. Not for this PR — file as a cleanup.


Tests look right

  • AttentionFusionMobileClipMhaOpset25Test is the parallel form of the existing AttentionFusionMobileClipMhaTest (opset 14 → 25), reusing the same helper and checker. Minimal and correct.
  • GroupQueryAttentionPreNormFusionFusesQwenPatternOpset25 likewise parallels the existing test.
  • ReshapeFusionOpsetTest refactor (drop the state machine, run the positive case always and the partial-Shape negative case for every opset ≥ 15) is cleaner and increases coverage. Good.

One small note: ReshapeFusionOpsetTest now iterates {11, 12, 13, 14, 15, 18, 19, 21, 23, 24, 25} but skips 16, 17, 20, 22. If those gaps are intentional (Reshape unchanged at those opsets so they collapse to the previous SinceVersion), fine. If not, adding them is one character each. Minor.


Bottom line

The two reshape_fusion.cc corrections are nice quiet wins. The mechanical version-list extensions in attention_fusion.cc are fine. The attention_fusion_helper.h change is currently a no-op due to the unchanged downstream matchers — either complete it (preferred, with a matching test) or drop it. Recommend addressing that one point and then this is good to land.

…nt-opset regression tests

- Update version lists in attention_fusion.cc, attention_fusion_helper.h,
  and embed_layer_norm_fusion.cc to include opset versions up to 25/26.
- Add programmatic current-opset regression tests that auto-detect when
  version lists need updating: Gelu, FastGelu, BiasGelu, LayerNorm,
  SkipLayerNorm, EmbedLayerNorm (3 formats), MobileClip MHA, GQA PreNorm.
- Tests check for fused node first and report remaining op counts with
  guidance to update version lists or skip the opset.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread onnxruntime/test/optimizer/group_query_attention_pre_norm_fusion_test.cc Outdated
Comment thread onnxruntime/test/optimizer/graph_transform_test.cc
@yuslepukhin yuslepukhin marked this pull request as draft June 11, 2026 00:53
- Replace .at(ONNX_DOMAIN) with find + ASSERT_TRUE in GQA test to avoid
  potential throw on missing domain (Copilot review, high).
- Remove redundant TEST_RETURN_IF_NOT in DivMulFusionCurrentOpsetTest where
  the condition was already guaranteed by the enclosing if (Copilot review, low).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Comment thread onnxruntime/test/optimizer/graph_transform_test.cc
Comment thread onnxruntime/test/optimizer/graph_transform_test_layernorm.cc
Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc
Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc
Comment thread onnxruntime/core/optimizer/attention_fusion_helper.h
@yuslepukhin yuslepukhin requested a review from Copilot June 11, 2026 01:27
@yuslepukhin yuslepukhin marked this pull request as ready for review June 11, 2026 01:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

onnxruntime/core/optimizer/reshape_fusion.cc:181

  • The new Shape start/end-attribute guard rejects any node that has an "end" attribute, even when end is the default full-range value. ORT’s Shape kernel treats end==std::numeric_limits<int64_t>::max() as the default (full shape), so this check can incorrectly block ReshapeFusion matching for models/exporters that explicitly set end to INT64_MAX.
    // Opset 15+ added start/end attributes to Shape. Reject partial-shape queries.
    if (shape.SinceVersion() >= 15) {
      const ONNX_NAMESPACE::AttributeProto* start_attr = graph_utils::GetNodeAttribute(shape, "start");
      const ONNX_NAMESPACE::AttributeProto* end_attr = graph_utils::GetNodeAttribute(shape, "end");
      if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) {
        return false;
      }

Comment thread onnxruntime/core/optimizer/attention_fusion_helper.h Outdated
Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc Outdated
Comment thread onnxruntime/core/optimizer/embed_layer_norm_fusion.cc Outdated
@hariharans29

Copy link
Copy Markdown
Member

Verdict: Approve

Five new commits since the prior review. The substantive concern (FuseGptAttention internal inconsistency) is fixed, and the PR has grown two additional correctness fixes and a genuinely useful piece of test infrastructure. Ready to land.


Prior observations — status

1. FuseGptAttention precheck-vs-matcher gap — fixed

All three path matchers in attention_fusion_helper.h::FuseGptAttention are now extended to match the precheck:

// path1 (line ~1357)
{0, 0, "Reshape",   {5, 13, 14, 19, 21, 23, 24, 25}, kOnnxDomain},
{0, 0, "Transpose", {1, 13, 21, 23, 24, 25},         kOnnxDomain},
{0, 0, "MatMul",    {1, 9, 13},                       kOnnxDomain}};

// path2 (line ~1379) and q_path (line ~1431)
{2, 0, "Split", {2, 11, 13, 18}, kOnnxDomain}

// k_path (line ~1489)
{0, 1, "Transpose", {1, 13, 21, 23, 24, 25},         kOnnxDomain},
{0, 0, "Reshape",   {5, 13, 14, 19, 21, 23, 24, 25}, kOnnxDomain},
{1, 0, "Split",     {2, 11, 13, 18},                  kOnnxDomain}};

End-to-end coherent now. Bonus: Split {2, 11, 13, 18} also picks up the Split-18 bump (the num_outputs attribute change) that I flagged as a possible gap in the prior review.

2. GetAxesFromUnsqueezeNode — gating choice

reshape_fusion.cc still uses InputDefs().size() > 1. Fine — that was author's call and both forms are defensible. Not blocking.

3. Shape (!end_attr) guard — fixed and tightened

The original guard at reshape_fusion.cc:175 was:

if (!((!start_attr || static_cast<int>(start_attr->i()) == 0) && (!end_attr))) { ... }

which rejected any model that explicitly set end = INT64_MAX (the documented default) — a false negative for legitimate full-shape calls. The new form accepts both forms equivalently:

if (!((!start_attr || start_attr->i() == 0) &&
      (!end_attr   || end_attr->i()   == std::numeric_limits<int64_t>::max()))) { ... }

Correct. The commit message ("Fix Shape end-attr guard to accept INT64_MAX as full-shape default") is accurate.


New material since prior review

A. Two more partial-Shape correctness guards

Same fix pattern (SinceVersion() >= 15 + start/end attribute check + INT64_MAX-default handling) added in two more places:

  • attention_fusion_helper.h::MatchGemmSubgraph (line ~101) — guards the GPT subgraph where Shape → Slice → Squeeze → Unsqueeze → Concat → Reshape → Gemm assumed full-shape semantics.
  • embed_layer_norm_fusion.cc::MatchInputToConcatSubgraph (lines ~143, ~188) — guards the two Shape → Gather paths feeding embed-layer-norm position-id construction.

Each carries a clear "why this guard exists" comment. These were latent correctness bugs on any opset-15+ model that used start/end to emit a partial shape. Quietly worth more than the headline opset bump.

B. Forward-proof test infrastructure — GetCurrentOnnxOpset()

static int GetCurrentOnnxOpset() {
  const auto& map = ONNX_NAMESPACE::OpSchemaRegistry::DomainToVersionRange::Instance().Map();
  auto it = map.find(ONNX_NAMESPACE::ONNX_DOMAIN);
  EXPECT_TRUE(it != map.end()) << "ONNX domain not found in OpSchemaRegistry";
  return it != map.end() ? it->second.second : 0;
}

Used by:

  • LayerNormFusionCurrentOpsetTest, SkipLayerNormFusionCurrentOpsetTest
  • EmbedLayerNormFusionFormat{1,2,3}CurrentOpset
  • GroupQueryAttentionPreNormFusionFusesQwenPatternCurrentOpset

This is the right approach. The next time someone bumps the bundled ONNX submodule, these tests will start failing on opset drift and tell you exactly which optimizer's version lists need updating. The failure messages even point at the offending file ("Update version lists in onnxruntime/core/optimizer/embed_layer_norm_fusion.cc"). Strictly better than hard-coded opset numbers.

C. LoadModelAtCurrentOpset model rewriting

Loads an existing .onnx test model and rewrites the proto in place:

  1. Bumps the ai.onnx opset_import to current opset.
  2. Converts attribute-form axes to input-form for Squeeze/Unsqueeze (opset 13+) and ReduceSum/ReduceMean/ReduceMax/ReduceMin/ReduceProd (opset 18+).

This works for the three embed_layer_norm_format*.onnx fixtures because they only use those operators with attribute-form axes. A few notes for future readers:

  • The conversion list is exhaustive for the fixtures it's used against. If anyone reuses LoadModelAtCurrentOpset for a model that also has, say, attribute-form axes on a Slice (was input from opset 10) or any other op that migrated attributes to inputs, it will silently fail to convert and produce an invalid graph at the bumped opset. A one-line comment on the helper noting "currently handles only Squeeze/Unsqueeze/Reduce*; extend as needed" would help.
  • Bumping opset_import does not re-validate the existing nodes against the new schemas. For operators that were bumped without input changes (e.g., Add at v14), the node-level SinceVersion() reported by the loader will adjust to the imported opset. CI confirms the rewritten models load and the tests pass (86/86 checks OK on 992c85e), so this is empirically fine.

Not blocking — useful infrastructure, just worth a comment.

D. Title vs scope

The PR title says "opset 25" but the version lists include {24, 25} and the third commit message references "opset 26". The actual entries are fine (defensive future-proofing past the last-bumped versions). Worth updating the PR title to match if 26 is the real target.


Minor observations on the new code

  • The SinceVersion() >= 15 + start/end attribute check is now duplicated five times (reshape_fusion.cc, attention_fusion_helper.h × 1, embed_layer_norm_fusion.cc × 2). A small helper —

    // Returns true if shape_node represents the full tensor shape (no start/end slicing).
    static bool IsFullShape(const Node& shape_node);

    — would dedupe nicely. Out of scope here; file as a follow-up alongside the attention_fusion.cc named-constants refactor.

  • LayerNormFusionCurrentOpsetTest builds the LayerNorm pattern with Pow. Pow was last bumped at opset 15 (pow.SinceVersion() == 15 for any model at opset ≥ 15). Fine.

  • LayerNormFusionCurrentOpsetTest runs at TransformerLevel::Level1, but the standard LayerNormFusion registration is at Level2 with fuse_in_level_1=true in some pipelines. The test explicitly constructs LayerNormFusion(no_limit_empty_ep_list, TransformerLevel::Level1) to force the Level1 path, which is the right call for a unit test. Just noting it's intentional, not a bug.

  • The post-graph checker pattern in the new tests:

    if (op_to_count["LayerNormalization"] == 1) { ...checks...; return Status::OK(); }
    return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "...failed at opset ", current_opset, "...");

    produces a much more useful failure message than TEST_RETURN_IF_NOT alone. Nice pattern — worth adopting for other current-opset tests in the codebase.


Bottom line

The substantive concern from the prior review is fixed, the partial-Shape correctness fix has been generalized to two more sites, and the new GetCurrentOnnxOpset()-based tests are a genuinely useful piece of regression infrastructure that will catch the next opset drift automatically. Approve.

The follow-up cleanups (dedupe the five copies of the partial-Shape guard into an IsFullShape helper; pull attention_fusion.cc opset lists into named constants; broaden LoadModelAtCurrentOpset's attribute-to-input table if it gets reused) are all worth filing as separate issues but should not hold this PR.

hariharans29
hariharans29 previously approved these changes Jun 11, 2026
@yuslepukhin yuslepukhin changed the title Update optimizer opset version checks for latest ONNX opset 25 Update optimizer opset version checks for latest ONNX opset 26 Jun 11, 2026
hariharans29
hariharans29 previously approved these changes Jun 11, 2026
@yuslepukhin yuslepukhin merged commit 0b278bb into main Jun 12, 2026
86 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/ruleset_support branch June 12, 2026 17:35
titaiwangms added a commit to titaiwangms/onnxruntime that referenced this pull request Jun 15, 2026
Add WHY comments + tracking issue refs (microsoft#28966, and microsoft#28969 on the WebGPU
attention-fusion path) to the ModelOptions{allow_released_opsets_only=false}
call sites in the *CurrentOpset fusion tests, so a future reader knows they can
be removed once ONNX opset 27 ships. No test logic or ModelOptions args change.

Extend the onnx-opset-bump-checklist skill with three hard-won gotchas from the
1.22.0 integration: (m) the vcpkg MS-internal asset mirror must be Terrapin-seeded
with the new tag tarball or every --use_vcpkg leg 404s; (n) a FINAL onnx release
can still ship a map-max opset > last released opset (1.22.0: 27 > 26), leaving it
under-development; (o) prefer per-model ModelOptions{allow_released_opsets_only=false}
over per-leg CI env flips or GTEST_SKIP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
titaiwangms added a commit that referenced this pull request Jun 16, 2026
### Integrate ONNX 1.22.0rc1 (opset 27)

Resolves #28752.

Pin: `onnx/onnx@bc3be77bec2f628788796dff60819186bacf49df`
(VERSION_NUMBER `1.22.0rc1`).
ONNX **1.21.0 → 1.22.0rc1**. Max ai.onnx opset **26 → 27**. IR version
**unchanged (13 / `0x0D`)**.

This is the **RC validation phase** of an incremental integration (same
strategy as the ONNX 1.21 bump, #27601). The formal `v1.22.0` GitHub
release is still a **draft** (no git tag yet), so re-pinning to the
released tag is deferred to **Phase 2** (see Follow-ups). Landing the RC
now validates ONNX 1.22 against ORT before ONNX publishes the formal
release.

---

### Update — ONNX 1.22.0 **FINAL** re-pin + rebase onto `upstream/main`
+ closes #28969

ONNX published the formal **`v1.22.0`** GitHub release, so this PR is
re-pinned **rc2 → FINAL** (`onnx/onnx@v1.22.0`) — the Phase-2 step
deferred in the rc1 description below. The branch was also **rebased
onto `upstream/main`** to pick up the intervening optimizer/opset-26
work. The released tag tarball is a different asset hash than the RCs,
so the vcpkg MS-internal asset mirror was re-seeded for the final tag
(otherwise `--use_vcpkg` legs 404).

**Also closes #28969** (WebGPU binary-elementwise broadcast `SIZE_MAX`
underflow). ONNX 1.22's expanded-Attention reference tests exposed a
latent WebGPU bug where a broadcast shape computed `dim - 1` on a
zero/unit dimension and underflowed to `SIZE_MAX`; the fix is included
here and the previously-skipped reference tests are re-enabled.

**Opset-27 `*CurrentOpset` test handling.** ONNX 1.22.0 FINAL ships
`DomainToVersionRange` **map-max 27** while the last *released* opset is
**26**, so **opset 27 stays under development** for the whole 1.22
cycle. Strict legs (the default, or `ALLOW_RELEASED_ONNX_OPSET_ONLY=1`)
therefore throw *"Opset 27 under development"* at model load on every
`*CurrentOpset` fusion test that builds at the max opset. These tests
now load with per-model `ModelOptions{/*allow_released_opsets_only*/
false, /*strict_shape_type_inference*/ false}`, extending the existing
`38f17243b` / GatherToSlice precedent to the rest of the `*CurrentOpset`
suite. This is **leg-agnostic** (exercises opset 27 on every CI leg, not
just the relaxed ones) and **preserves opset coverage** (vs.
`GTEST_SKIP`). Each call site is annotated with a one-line WHY +
tracking issue (#28966) so the relaxation can be removed once opset 27
is released.

`Resolves #28752` (unchanged). Closes #28969.


### Update — ONNX 1.22.0rc2 re-pin + ConvTranspose conforms to ONNX
`output_shape` spec

Since the original rc1 description below, this PR was re-pinned **rc1 →
rc2** (`onnx/onnx@b124e0188a`, `VERSION_NUMBER 1.22.0rc2`) to pick up
the upstream Xcode/iOS CMake fix (onnx#8056). rc2 also carries
onnx#8051, which tightened `convTransposeShapeInference` to reject an
`output_shape`/`output_padding` whose size does not match the number of
spatial dimensions (per the ONNX spec clarification onnx#5400). **ONNX
Runtime now conforms to that spec** instead of patching ONNX to preserve
a non-standard form.

**⚠️ Breaking change — ConvTranspose `output_shape` now follows the ONNX
spec (spatial dimensions only).** ORT previously also accepted a
non-standard `rank + 2` form that included batch and channel, i.e. `(N,
C, H, W)`. As of ONNX 1.22, a `rank + 2` `output_shape` on a
ConvTranspose whose input has a **statically-known rank** is rejected at
`Graph::Resolve` with *"Attribute output_shape has incorrect size"*.
**Migration:** specify `output_shape` with spatial dimensions only —
e.g. `{1, 1, 1, 14}` → `{1, 14}` (batch and channel are always inferred
from the input and weight, so results are identical; the kernel ignores
`N, C`). Models whose ConvTranspose input has a **dynamic/unknown rank
are unaffected** — ONNX skips the size check and ORT computes the same
result (covered by the new
`ConvTranspose_RankPlus2_OutputShape_DynamicRankInput_Runtime` test).

**Patch inventory — supersedes "2 files, 3 hunks" below.**
`cmake/patches/onnx/onnx.patch` (and its byte-identical `binskim.patch`
mirror) carries **only** the `ONNX_MINIMAL_BUILD` option hunk and the
GroupNormalization-18 `.Deprecate()` removal — **no ConvTranspose
hunks**. rc2's strict shape-inference check is kept as-is; ORT's own
test models were conformed to the spec. The upstream archive hash,
`deps.txt`, `portfile.cmake`, `vcpkg.json`, and the submodule pin are
unchanged.

**Additional rc2 test conform.** rc2 also tightened
`convPoolShapeInference` to reject `Conv` inputs with rank < 3 (*"Input
tensor must have at least 3 dimensions"*). The hand-authored model in
`onnxruntime/test/python/quantization/test_op_split.py` declared a
spec-invalid rank-2 `Conv` input/weight; it was conformed to a valid
NCHW shape (`[6, 3]` → `[1, 1, 6, 3]`, weight → `[2, 1, 1, 1]`), keeping
the quantized-Split graph and expected outputs identical. No ORT source
change.

> This note should also seed the GitHub Release notes for the ONNX 1.22
/ opset 27 milestone and the squash-commit message.


---

### What changed (29 files)

**Version plumbing**
- `cmake/deps.txt` — onnx archive URL → rc1 commit zip + SHA1
`421e5a9afb6c41a54696e424e5b9a3796aab6821`.
- `cmake/external/onnx` — submodule → `bc3be77b`.
- `cmake/vcpkg-ports/onnx/portfile.cmake` — `REF` commit form + tar.gz
SHA512 `e0c526f5…3ce467`.
- `cmake/vcpkg-ports/onnx/vcpkg.json` — `version-semver` `1.22.0`,
`port-version` 0.
- `cmake/patches/onnx/onnx.patch` +
`cmake/vcpkg-ports/onnx/binskim.patch` — **byte-identical** rebase onto
1.22 (2 files, 3 hunks): kept the `ONNX_MINIMAL_BUILD` option
(restructured for 1.22's new `onnx_core` OBJECT-lib /
`add_subdirectory(onnx)` layout) and the GroupNormalization-18
`.Deprecate()` removal; **dropped** the `Utils.cmake` protobuf-warnings
hunk (already merged upstream in 1.22).

**Opset-27 op enablement (Range)**
- `onnxruntime/core/providers/cpu/generator/range.cc` — split into
versioned `[11, 26]` + a new unversioned `27` registration. The opset-27
kernel natively supports the existing common numeric types
(float/double/int16/int32/int64). **fp16 Range is covered** via ONNX's
Range-27 **function body**, which ORT expands into primitive ops at
partition time. **bf16 Range is deferred to that same function
expansion** — there is no native bf16 kernel, and its bf16 reference
node test (`test_range_bfloat16_type_positive_delta`, base +
`_expanded`) is not exercised by the Python/numpy ONNX backend series,
whose harness cannot materialize bf16 (`Numpy_type 256`); a native
fp16/bf16 kernel + `stash_type` handling is a follow-up (efficiency, not
correctness).
- `onnxruntime/core/providers/cpu/cpu_execution_provider.cc` — versioned
the Range forward-declare + `BuildKernelCreateInfo` entries and added
the opset-27 registration.
- **CUDA Range** — same versioned `[11, 26]` + opset-27 split as CPU
(`onnxruntime/core/providers/cuda/generator/range.cc` +
`cuda_execution_provider.cc`); GPU-verified locally: `onnx_test_runner
-e cuda` 8/8 opset-27 Range node tests pass, native Range-27 placed on
CUDAExecutionProvider (fp16/bf16 via function expansion).

**Optimizer / EP opset ceilings**
- `…/transpose_optimization/optimizer_api.h` — `kMaxSupportedOpset` **26
→ 27**.
- `coreml`/`nnapi`/`vsinpu`/`webnn` `base_op_builder.h` —
`GetMaxSupportedOpSet()` **25 → 27** (upper guard only; per-op support
checks still gate — these EPs gain no new kernels here).

**Fusion updates**
- `onnxruntime/core/optimizer/gather_fusion.cc` — GatherToSlice Range
version list `{1,11}` → `{1,11,27}`.
- `onnxruntime/core/optimizer/embed_layer_norm_fusion.cc` — add `27` to
the two Range path-matchers (`parent_path_3/4`) so embedding fusion
still matches opset-27 models.
- `onnxruntime/test/optimizer/graph_transform_test.cc` — new opset-27
GatherToSliceFusion test.

**Requirements (7 bumped)**
- All 7 CI `requirements.txt` → `onnx==1.22.0rc1` (rc1 wheel is on
PyPI). The 3 transformers pins remain frozen at `1.18.0` (unrelated to
this bump; intentionally untouched).

**Generated docs / test data**
- `js/web/docs/webgl-operators.md` — regenerated.
- `docs/OperatorKernels.md` — **surgical** edit: CPU EP **and** CUDA EP
Range rows (`27+` + `[11, 26]` continuation each); see caveats.
- `onnxruntime/test/testdata/onnx_backend_test_series_filters.jsonc` —
**comment-only**: documents why no opset-27 CPU exclusions are needed
(all opset-27 node tests pass via function expansion).

**Docs**
- `.agents/skills/onnx-opset-bump-checklist/SKILL.md` — new reusable
checklist skill distilled from this integration. Now also documents the
"bump **all** execution providers together" tradition (CPU + CUDA +
JS/DML assessment in one pass) so future opset bumps don't ship a
partial EP set.

---

### Validation (CPU EP + CUDA EP, Linux x64)

- Full build ✅
- `--minimal_build extended` build ✅ (validates the rebased
`ONNX_MINIMAL_BUILD` patch hunk independently of the vcpkg mirror path)
- `onnxruntime_test_all` ✅ — **1595 passed / 0 failed**
- `onnx_test_runner -e cpu` on the ONNX 1.22 opset-27 node tests ✅ —
**62/62 pass** via ONNX function-body expansion (run with
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0`), including CausalConvWithState,
LinearAttention, and fp16/bf16 Range — despite no native kernels for
them.
- **CUDA EP (H100):** built `--use_cuda` clean in both **Debug** and
**RelWithDebInfo** ✅; `onnx_test_runner -e cuda` on the opset-27 Range
node tests ✅ — **8/8 pass**, with native Range-27 placed on
CUDAExecutionProvider (no CPU fallback) and fp16/bf16 covered via
function-body expansion.

---

### Standing caveats (please read before reviewing)

1. **CUDA EP now locally verified for Range; other GPU EPs/ops still
CI-only.** The CUDA EP was built and the opset-27 **Range** node tests
run locally on an H100 (8/8 pass). DML and the remaining GPU EPs/ops
were **not** exercised here. Function-body expansion is EP-agnostic, so
other opset-27 models are expected to run on those EPs too, but broader
GPU coverage remains a CI/follow-up item.
2. **`OperatorKernels.md` updated surgically** (CPU Range row only). A
CPU-only *full* regen would destructively wipe the CUDA/DML/other-EP
sections (the generator only emits rows for the EPs in the built
module). A correct multi-EP regen needs a build per EP and is a
follow-up.
3. **Opset 27 is "under development"** in ONNX's released-versions map.
ORT's load-time validation rejects opset-27 models unless
`ALLOW_RELEASED_ONNX_OPSET_ONLY=0` (ORT CI already sets this). The
opset-27 **schemas are always compiled in from the submodule**
regardless — this gate only affects model load-time acceptance, not
schema availability.
4. **EP `GetMaxSupportedOpSet` jumped 25 → 27** (skips 26). This is an
*upper* guard only; raising it merely lets opset-26/27 nodes reach the
per-op support checks that still gate correctness. No regression — it
also retroactively un-caps opset-26 for these EPs.
5. **iOS/macOS Xcode framework build is currently broken by an upstream
ONNX CMake regression** (the `onnx_core` OBJECT-library split in
onnx/onnx#7733 reintroduced the Xcode breakage originally fixed by
onnx/onnx#7515 for onnx/onnx#7514). This is **NOT** caused by this opset
bump. Tracked upstream at
[onnx/onnx#8053](onnx/onnx#8053). Non-Xcode
builds (Linux/Windows/Android/WASM) and all CPU/CUDA validation are
unaffected. This resolves at the **Phase 2** formal `v1.22.0` re-pin
once ONNX ships the fix.

---

### Follow-ups (explicitly NOT in this PR)

- **GPU/multi-EP coverage:** run opset-27 CUDA/DML node tests;
regenerate `OperatorKernels.md` across all EPs.
- **JS EP Range** `[11, 26]` + `27` split (currently registered
open-ended at `11`; mirror the CPU/CUDA versioned split).
- **DML Range opset-27 assessment** (DML uses its own `REG_INFO`
registration system — assess whether an opset-27 entry is needed).
- **WebGPU EP Range** opset-27 split — `range.cc` registers `Range`
`.SinceVersion(11)` open-ended, so it already claims opset-27 Range;
only the new bf16 type is unsupported and falls back via the `T`
type-constraint (function expansion). Mirror the CPU/CUDA versioned
`[11, 26]` + `27` split.
- **Native kernels:** implement CPU (and EP) `CausalConvWithState` and
`LinearAttention` kernels, and a native fp16/bf16 + `stash_type`
Range-27 kernel (replace today's function-expansion path with efficient
kernels).
- **Phase 2 — formal `v1.22.0` re-pin:** re-pin
`deps.txt`/submodule/portfile/requirements to the released tag once ONNX
publishes it (currently blocked on ONNX tagging the release); upload the
tag tarball to the vcpkg mirror. **This also restores the iOS/macOS
Xcode framework build** once the upstream onnx OBJECT-library Xcode
regression (caveat 5) is resolved and re-pinned.
- **Tooling:** fix the pre-existing crash in
`find_optimizer_opset_version_updates_required.py` (placeholder `ver`
parsed as int) so it can be relied on for future bumps.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants