Implement Name Based partitioning and update documents#28903
Conversation
- Correct misleading claim about reusing trie-based LayeringRuleMatcher - Add SubstringMatcher class design (flat vector + std::string::find) - Document longest-match-first priority semantics - Update session option examples to use trailing-slash patterns - Add integration sketch with LayeringIndex::ProcessGraph
Add SubstringMatcher class that matches patterns anywhere in node names, unlike the existing trie-based LayeringRuleMatcher which only matches prefixes. - New session option: session.name_based_layer_assignment - SubstringMatcher with longest-match-first priority - Integrated into LayeringIndex::ProcessGraph as fallback after annotation matching - Annotation-based matching takes priority when both are configured - Updated PartitioningWithAnnotationsAndMemoryConstraints.md with usage docs - Tests: SubstringMatcher unit tests + LayeringIndex integration tests including annotation-priority-over-name-based test
There was a problem hiding this comment.
Pull request overview
This PR adds an alternative to annotation-based layered partitioning by allowing device assignment rules to match substrings of Node::Name(), enabling partitioning of models with structured node names without modifying model metadata. It updates session configuration parsing to support both annotation-based and name-based rules, and expands documentation accordingly.
Changes:
- Add
session.name_based_layer_assignmentsession option and integrate it intoInferenceSession::TransformGraph. - Implement
SubstringMatcherand refactorLayeringIndexcreation/indexing to combine annotation-based (higher priority) and name-based (fallback) rules. - Add unit/integration tests plus documentation updates for the new behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/framework/layering_annotations.h | Adds SubstringMatcher and extends LayeringIndex to optionally use it. |
| onnxruntime/core/framework/layering_annotations.cc | Parses/merges name-based + annotation-based rules; performs substring matching fallback by node name. |
| onnxruntime/core/session/inference_session.cc | Wires new session option into graph transform/partitioning pipeline. |
| include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h | Adds the new config key constant and its API comment. |
| onnxruntime/test/framework/layering_annotations_test.cc | Adds SubstringMatcher tests and integration tests for name-based assignment + priority behavior. |
| onnxruntime/test/framework/session_state_test.cc | Updates LayeringIndex::Create callsite for new signature. |
| docs/annotated_partitioning/PartitioningWithAnnotationsAndMemoryConstraints.md | Documents session.name_based_layer_assignment and updates “combining features” guidance. |
| docs/annotated_partitioning/future_directions_constrained_env.md | Adds long-form design/future directions write-up (new). |
| docs/annotated_partitioning/cuda_kernel_workspace_inventory.md | Adds CUDA workspace inventory document (new). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tianleiwu
left a comment
There was a problem hiding this comment.
Review: Name-based partitioning
The change is well-structured overall: SubstringMatcher cleanly encapsulates longest-pattern-wins substring matching, the new session.name_based_layer_assignment option is documented, and the annotation-vs-name priority is covered by focused unit tests.
One functional gap worth addressing before merge (see inline comment): the incremental LayeringIndex::Update() path does not apply substring_matcher_, so name-based assignment is inconsistent between the initial ProcessGraph pass and the per-partition Update() calls from graph_partitioner.cc.
I also agree with the existing automated comment on layering_annotations.h: building matcher_ from the merged rules_ lets a node's annotation string accidentally prefix/exact-match a name-based pattern. Scoping the annotation matcher to the annotation-only rule range (or discarding matcher_.Match() results in the name-based index range) would keep the two matching domains independent.
Verdict: COMMENT.
1. Scope annotation matcher to annotation-only rules (Comment 1):
- LayeringRuleMatcher now accepts a rule_count parameter to limit
which rules from the merged vector it indexes. Prevents a node's
annotation string from accidentally matching a name-based pattern.
2. Reject '=' in name-based config (Comment 2):
- Validate after parsing: if any name-based rule has prefix_match=false
(meaning '=' was used), return an error with a clear message.
- Updated doc comment from 'NOT supported' to 'rejected with an error'.
3. Update roadmap text (Comment 3):
- Replace stale 'name: prefix qualifier' roadmap item with actual
implementation description (separate session option + SubstringMatcher).
4. Add substring_matcher_ fallback to Update() (Comment 4):
- The incremental Update() path (called after layout transforms) now
tries substring_matcher_->Match(node->Name()) when annotation matching
finds no match, consistent with ProcessGraph().
- Added UpdateAppliesSubstringMatcherToNewNodes test covering this path.
…tcher Address additional review comment: the LayeringIndex::Create overload that accepts a SubstringMatcher now also takes annotation_rule_count so callers can properly scope the annotation matcher to annotation-only rules. Updated AnnotationTakesPriorityOverNameBased test to pass the correct count.
…ove naming - LayeringRuleMatcher: rule_count → annotation_rule_count (std::optional<size_t>) nullopt = index all rules, 0 = index none (name-only configs) - SubstringMatcher: rule_index_offset → name_rules_start_index Clearer: it's the starting index of name-based rules in the merged vector - Updated all doc comments to match new semantics and naming - Fixed: name-only config (annotation_rule_count=0) no longer leaks name-based patterns into the annotation trie
…lly exclusive Annotation-based matching uses sparse metadata with subgraph inheritance. Name-based matching uses dense node names without inheritance. The two modes have incompatible semantics, so combining them is now rejected with INVALID_ARGUMENT. - Remove merged-rules/offset machinery (annotation_rule_count, name_rules_start_index) - ProcessGraph branches cleanly: name-based uses SubstringMatcher only, annotation-based uses LayeringRuleMatcher with inheritance - Update config key docs, design docs, and user-facing docs - Replace AnnotationTakesPriorityOverNameBased test with MutualExclusivityRejectsBothConfigs test
…buffers as proposed
tianleiwu
left a comment
There was a problem hiding this comment.
Re-reviewed the latest head. The previous round's concern about LayeringIndex::Update() not applying substring_matcher_ (causing inconsistent name-based assignment for nodes created during layout transforms) is now addressed and covered by UpdateAppliesSubstringMatcherToNewNodes — that thread is resolved.
The name-based partitioning implementation is clean and well-documented. A few minor items remain:
- Test gap: No test exercises the new name-based
=(exact-match) rejection error path. A small test assertingINVALID_ARGUMENTfor a name-based config containing=(e.g.gpu(=Attention)) would lock in this validation, mirroringMutualExclusivityRejectsBothConfigs. (See inline comment.) - Nitpick:
matcher_(theLayeringRuleMatchertrie) is built unconditionally in theLayeringIndexconstructor even in name-based mode, where it is never consulted. It could be made conditional if the construction cost ever matters for large rule sets. - Nitpick: The production factory inlines the
LayeringIndex(...)constructor +ProcessGraphinstead of delegating to the new staticCreate(graph, ep_map, rule_map, rules, substring_matcher)overload that currently exists only for tests. Reusing the overload would keep the two construction sites consistent.
None of these are blocking.
This pull request introduces a new "name-based layer assignment" feature for ONNX Runtime, allowing device assignment of model nodes based on substring matching of node names, as an alternative to annotation-based matching. The implementation ensures that name-based and annotation-based assignment modes are mutually exclusive, and updates documentation, configuration keys, and core logic to support this new capability.
Key changes:
Name-based Layer Assignment Feature
Added support for a new session option,
session.name_based_layer_assignment, which enables device assignment using substring matching against node names (rather than metadata annotations). The longest matching pattern wins, and all patterns are treated as substrings (the=exact-match qualifier is disallowed in this mode). [1] [2]Implemented the
SubstringMatcherclass and integrated it into theLayeringIndexlogic, enabling efficient substring-based rule matching for node assignment. The matcher sorts patterns by length and returns the first (longest) match. [1] [2]Mutual Exclusivity and API Changes
Enforced mutual exclusivity between annotation-based (
session.layer_assignment_settings) and name-based (session.name_based_layer_assignment) assignment options. Attempting to set both now results in an error, with clear messaging. [1] [2]Updated the
LayeringIndex::CreateAPI and related logic to accept both configuration strings, select the active mode, and construct the appropriate matcher. [1] [2]Documentation Updates
Core Logic and Maintenance
Refactored the
LayeringIndexand related methods to support both matching modes, updating node assignment and update logic to branch appropriately based on the selected mode. [1] [2]Added and documented the new configuration key
kOrtSessionOptionsNameBasedLayerAssignmentin the public API headers.These changes provide a more flexible and user-friendly way to partition models across devices, especially for models with structured node names but lacking explicit annotations.