Implement Name Based partitioning and update documents by yuslepukhin · Pull Request #28903 · microsoft/onnxruntime

yuslepukhin · 2026-06-08T21:49:02Z

This pull request introduces a new "name-based layer assignment" feature for ONNX Runtime, allowing device assignment of model nodes based on substring matching of node names, as an alternative to annotation-based matching. The implementation ensures that name-based and annotation-based assignment modes are mutually exclusive, and updates documentation, configuration keys, and core logic to support this new capability.

Key changes:

Name-based Layer Assignment Feature

Added support for a new session option, session.name_based_layer_assignment, which enables device assignment using substring matching against node names (rather than metadata annotations). The longest matching pattern wins, and all patterns are treated as substrings (the = exact-match qualifier is disallowed in this mode). [1] [2]
Implemented the SubstringMatcher class and integrated it into the LayeringIndex logic, enabling efficient substring-based rule matching for node assignment. The matcher sorts patterns by length and returns the first (longest) match. [1] [2]

Mutual Exclusivity and API Changes

Enforced mutual exclusivity between annotation-based (session.layer_assignment_settings) and name-based (session.name_based_layer_assignment) assignment options. Attempting to set both now results in an error, with clear messaging. [1] [2]
Updated the LayeringIndex::Create API and related logic to accept both configuration strings, select the active mode, and construct the appropriate matcher. [1] [2]

Documentation Updates

Expanded the documentation to describe the new name-based assignment feature, provide usage examples, highlight best practices for pattern writing, and explain the mutual exclusivity and lack of subgraph inheritance in name-based mode. [1] [2]

Core Logic and Maintenance

Refactored the LayeringIndex and related methods to support both matching modes, updating node assignment and update logic to branch appropriately based on the selected mode. [1] [2]
Added and documented the new configuration key kOrtSessionOptionsNameBasedLayerAssignment in the public API headers.

These changes provide a more flexible and user-friendly way to partition models across devices, especially for models with structured node names but lacking explicit annotations.

- Correct misleading claim about reusing trie-based LayeringRuleMatcher - Add SubstringMatcher class design (flat vector + std::string::find) - Document longest-match-first priority semantics - Update session option examples to use trailing-slash patterns - Add integration sketch with LayeringIndex::ProcessGraph

Add SubstringMatcher class that matches patterns anywhere in node names, unlike the existing trie-based LayeringRuleMatcher which only matches prefixes. - New session option: session.name_based_layer_assignment - SubstringMatcher with longest-match-first priority - Integrated into LayeringIndex::ProcessGraph as fallback after annotation matching - Annotation-based matching takes priority when both are configured - Updated PartitioningWithAnnotationsAndMemoryConstraints.md with usage docs - Tests: SubstringMatcher unit tests + LayeringIndex integration tests including annotation-priority-over-name-based test

Copilot

Pull request overview

This PR adds an alternative to annotation-based layered partitioning by allowing device assignment rules to match substrings of Node::Name(), enabling partitioning of models with structured node names without modifying model metadata. It updates session configuration parsing to support both annotation-based and name-based rules, and expands documentation accordingly.

Changes:

Add session.name_based_layer_assignment session option and integrate it into InferenceSession::TransformGraph.
Implement SubstringMatcher and refactor LayeringIndex creation/indexing to combine annotation-based (higher priority) and name-based (fallback) rules.
Add unit/integration tests plus documentation updates for the new behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
onnxruntime/core/framework/layering_annotations.h	Adds `SubstringMatcher` and extends `LayeringIndex` to optionally use it.
onnxruntime/core/framework/layering_annotations.cc	Parses/merges name-based + annotation-based rules; performs substring matching fallback by node name.
onnxruntime/core/session/inference_session.cc	Wires new session option into graph transform/partitioning pipeline.
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h	Adds the new config key constant and its API comment.
onnxruntime/test/framework/layering_annotations_test.cc	Adds SubstringMatcher tests and integration tests for name-based assignment + priority behavior.
onnxruntime/test/framework/session_state_test.cc	Updates `LayeringIndex::Create` callsite for new signature.
docs/annotated_partitioning/PartitioningWithAnnotationsAndMemoryConstraints.md	Documents `session.name_based_layer_assignment` and updates “combining features” guidance.
docs/annotated_partitioning/future_directions_constrained_env.md	Adds long-form design/future directions write-up (new).
docs/annotated_partitioning/cuda_kernel_workspace_inventory.md	Adds CUDA workspace inventory document (new).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

Review: Name-based partitioning

The change is well-structured overall: SubstringMatcher cleanly encapsulates longest-pattern-wins substring matching, the new session.name_based_layer_assignment option is documented, and the annotation-vs-name priority is covered by focused unit tests.

One functional gap worth addressing before merge (see inline comment): the incremental LayeringIndex::Update() path does not apply substring_matcher_, so name-based assignment is inconsistent between the initial ProcessGraph pass and the per-partition Update() calls from graph_partitioner.cc.

I also agree with the existing automated comment on layering_annotations.h: building matcher_ from the merged rules_ lets a node's annotation string accidentally prefix/exact-match a name-based pattern. Scoping the annotation matcher to the annotation-only rule range (or discarding matcher_.Match() results in the name-based index range) would keep the two matching domains independent.

Verdict: COMMENT.

1. Scope annotation matcher to annotation-only rules (Comment 1): - LayeringRuleMatcher now accepts a rule_count parameter to limit which rules from the merged vector it indexes. Prevents a node's annotation string from accidentally matching a name-based pattern. 2. Reject '=' in name-based config (Comment 2): - Validate after parsing: if any name-based rule has prefix_match=false (meaning '=' was used), return an error with a clear message. - Updated doc comment from 'NOT supported' to 'rejected with an error'. 3. Update roadmap text (Comment 3): - Replace stale 'name: prefix qualifier' roadmap item with actual implementation description (separate session option + SubstringMatcher). 4. Add substring_matcher_ fallback to Update() (Comment 4): - The incremental Update() path (called after layout transforms) now tries substring_matcher_->Match(node->Name()) when annotation matching finds no match, consistent with ProcessGraph(). - Added UpdateAppliesSubstringMatcherToNewNodes test covering this path.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

…tcher Address additional review comment: the LayeringIndex::Create overload that accepts a SubstringMatcher now also takes annotation_rule_count so callers can properly scope the annotation matcher to annotation-only rules. Updated AnnotationTakesPriorityOverNameBased test to pass the correct count.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

…ove naming - LayeringRuleMatcher: rule_count → annotation_rule_count (std::optional<size_t>) nullopt = index all rules, 0 = index none (name-only configs) - SubstringMatcher: rule_index_offset → name_rules_start_index Clearer: it's the starting index of name-based rules in the merged vector - Updated all doc comments to match new semantics and naming - Fixed: name-only config (annotation_rule_count=0) no longer leaks name-based patterns into the annotation trie

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

…lly exclusive Annotation-based matching uses sparse metadata with subgraph inheritance. Name-based matching uses dense node names without inheritance. The two modes have incompatible semantics, so combining them is now rejected with INVALID_ARGUMENT. - Remove merged-rules/offset machinery (annotation_rule_count, name_rules_start_index) - ProcessGraph branches cleanly: name-based uses SubstringMatcher only, annotation-based uses LayeringRuleMatcher with inheritance - Update config key docs, design docs, and user-facing docs - Replace AnnotationTakesPriorityOverNameBased test with MutualExclusivityRejectsBothConfigs test

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

…based mode)

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

…buffers as proposed

tianleiwu

Re-reviewed the latest head. The previous round's concern about LayeringIndex::Update() not applying substring_matcher_ (causing inconsistent name-based assignment for nodes created during layout transforms) is now addressed and covered by UpdateAppliesSubstringMatcherToNewNodes — that thread is resolved.

The name-based partitioning implementation is clean and well-documented. A few minor items remain:

Test gap: No test exercises the new name-based = (exact-match) rejection error path. A small test asserting INVALID_ARGUMENT for a name-based config containing = (e.g. gpu(=Attention)) would lock in this validation, mirroring MutualExclusivityRejectsBothConfigs. (See inline comment.)
Nitpick: matcher_ (the LayeringRuleMatcher trie) is built unconditionally in the LayeringIndex constructor even in name-based mode, where it is never consulted. It could be made conditional if the construction cost ever matters for large rule sets.
Nitpick: The production factory inlines the LayeringIndex(...) constructor + ProcessGraph instead of delegating to the new static Create(graph, ep_map, rule_map, rules, substring_matcher) overload that currently exists only for tests. Reusing the overload would keep the two construction sites consistent.

None of these are blocking.

yuslepukhin added 6 commits June 8, 2026 12:32

History begins

2dff268

Add more design pieces

59f8a4e

Continue design iterations

320cf7a

Continue design iterations

d4fa801

yuslepukhin requested a review from Copilot June 8, 2026 21:49

Copilot started reviewing on behalf of yuslepukhin June 8, 2026 21:49 View session

yuslepukhin requested a review from tianleiwu June 8, 2026 21:49

Copilot AI reviewed Jun 8, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.h

Comment thread include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h Outdated

Comment thread docs/annotated_partitioning/future_directions_constrained_env.md Outdated

tianleiwu reviewed Jun 8, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

yuslepukhin requested a review from Copilot June 8, 2026 23:57

Copilot started reviewing on behalf of yuslepukhin June 8, 2026 23:57 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.h

Comment thread onnxruntime/core/framework/layering_annotations.cc

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

yuslepukhin requested a review from Copilot June 9, 2026 18:18

Copilot started reviewing on behalf of yuslepukhin June 9, 2026 18:19 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.h Outdated

Comment thread onnxruntime/core/framework/layering_annotations.h Outdated

Comment thread onnxruntime/core/framework/layering_annotations.h Outdated

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

yuslepukhin requested review from Copilot and tianleiwu June 9, 2026 21:09

Copilot started reviewing on behalf of yuslepukhin June 10, 2026 22:33 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

Comment thread onnxruntime/core/framework/layering_annotations.cc Outdated

yuslepukhin requested a review from Copilot June 11, 2026 00:52

Copilot started reviewing on behalf of yuslepukhin June 11, 2026 00:52 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread docs/annotated_partitioning/future_directions_constrained_env.md Outdated

Comment thread docs/annotated_partitioning/future_directions_constrained_env.md

fix: update design doc to reflect mutual exclusivity (no '=' in name-…

853f327

…based mode)

yuslepukhin requested a review from Copilot June 11, 2026 17:01

Copilot started reviewing on behalf of yuslepukhin June 11, 2026 17:01 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

Comment thread docs/annotated_partitioning/future_directions_constrained_env.md Outdated

Comment thread docs/annotated_partitioning/future_directions_constrained_env.md Outdated

fix: clarify CUDA mempool config key and mark pre_allocate_execution_…

c6d60ed

…buffers as proposed

tianleiwu reviewed Jun 12, 2026

View reviewed changes

Comment thread onnxruntime/core/framework/layering_annotations.cc

tianleiwu approved these changes Jun 12, 2026

View reviewed changes

yuslepukhin merged commit 29616be into main Jun 12, 2026
89 of 90 checks passed

yuslepukhin deleted the yuslepukhin/future_directions branch June 12, 2026 20:20

Uh oh!

Conversation

yuslepukhin commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Name-based Layer Assignment Feature

Mutual Exclusivity and API Changes

Documentation Updates

Core Logic and Maintenance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review: Name-based partitioning

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuslepukhin commented Jun 8, 2026 •

edited

Loading