Skip to content

Commit fbba40a

Browse files
authored
Model Package Support (#27786)
### Description To support the model package design, one of the goals for ORT is to automatically select the most suitable compiled EPContext binary from a collection of precompiled variants based on the EP, provider options, metadata, and available devices. This PR is for ORT to support first phase model package. There could be other follow-up PRs in the future. A model package is a collection of models, binaries, and metadata files organized in a hierarchically structured directory. The directory structure is not yet finalized, so the following is just a simple example of a model package directory: ```` <model>.ortpackage/  ├── manifest.json ├── pipeline.json ├── configs/ | ├── genai_config.json | └── chat_template.jinja  └── models/      └── model_name/          ├── metadata.json | └── Contains general information on the component model, | and specific information about each model variant | such as data types, quantization algo, EP, etc. that | is updated on add/remove of model variant └── shared_weights/ (shared weights from all variants) └── <checksum of weights file A>/ └── model.data └── <checksum of weights file B>/ └── model.data └── ...         └── base model/                ├── model.onnx          └── variant A /              ├── optimized model.onnx (contains EPContext nodes)              └── [Compilation artifacts]          └── variant B /              ├── optimized model.onnx (contains EPContext nodes)              └── [Compilation artifacts]  ```` #### Spec and Format: See [here](https://github.com/microsoft/onnxruntime/blob/07e55627e75da24099c582331a0f786090e6382a/onnxruntime/core/session/model_package/README.md) #### Definitions: - Model Package - A model package defines the overall logical ‘model’ - A model package contains one or more ‘component models’ - Component Model - A component model comprises one or more ‘model variants’ - Model Variant - A ‘model variant’ is a single ONNX or ORT format model #### manifest.json and metadata.json A manifest.json may look like: ```` { "model_name": <logical_model_name>, "component_models": [ <component_model_name_1>, <component_model_name_2> ] } ```` A metadata.json for a component model may look like: ```` { "component_model_name": <component_model_name_1>, "model_variants": { <variant_name_1>: { "file": <ep_context_model_1 onnx file>, "constraints": { "ep": <ep_name>, "device": <device_type>, "architecture": <hardware_architecture> } }, <variant_name_2>: { "file": <ep_context_model_2 onnx file>, "constraints": { "ep": <ep_name>, "device": <device_type>, "architecture": <hardware_architecture> } } } } ```` #### Model Selection The selection logic is implemented in `MatchesVariant()`, which evaluates the following constraints: (Note: A constraint refers to a value under the "constraints" field in either manifest.json or metadata.json.) - Check ep constraint - Check device constraint - For some provider-bridge EPs, they may not implement `OrtEpFactory::GetSupportedDevices`, therefore ORT won't have the supported device information for those EPs. In that case, ORT will skip the device constraint validation for those EPs. - If provider option contains key related to device type, then the value must match the device constraint if any. - Check ep_compatibility_info constraint - ORT does not directly evaluate the architecture constraint. Instead, it relies on the ep_compatibility_info constraint, which may encode architecture information if needed. - The ep_compatibility_info value is expected to match the EP compatibility string stored in the EPContext model metadata. (See OrtEp::GetCompiledModelCompatibilityInfo() for how this string is generated.) - The EP implementation of EpFactory::ValidateCompiledModelCompatibilityInfo() is responsible for validating the compatibility string against the target device (i.e. OrtHardwareDevice) and returning the compatibility result. #### Note Check the unit test [here](https://github.com/microsoft/onnxruntime/pull/27786/changes#diff-bfa4122a85543ae2d80bf4cf6d9f85248e51c2276a5956af32f9bd8c8983d23a) to better understand how to use model package. #### Code Change This pull request introduces significant enhancements to the execution provider (EP) selection and management infrastructure in ONNX Runtime. The main focus is on supporting more sophisticated device selection and manifest-based model packaging, as well as refactoring provider selection logic for modularity and future extensibility. Key changes include: - Introduction of model package context and manifest parsing to support selecting model components based on device and EP constraints. - Refactoring of the execution provider interface and related classes to support multiple devices per provider. - Modularization of EP/device selection, creation, and registration logic in the provider policy context. The most important changes are: **Model Package Context and Manifest Support** - Added new files `model_package_context.h` and `model_package_context.cc` to implement manifest parsing, device/EP constraint matching, and component selection logic for model packages. This enables ONNX Runtime to select the most appropriate model variant based on available hardware and EP configuration. [[1]](diffhunk://#diff-006078879d52b421c973e2880c65db474aad6b21ad81ba69d387df8661bafeb2R1-R78) [[2]](diffhunk://#diff-45c29f481077e424c8969dc2198a8b40ab5908cf3b0bbf25dbeaca3ec51935d5R1-R279) **Execution Provider Interface Enhancements** - Updated the `IExecutionProvider` class to support construction with a list of `OrtEpDevice` pointers, and added a `GetEpDevices()` method to retrieve the supported devices. This allows plugin and bridge EPs to expose multiple devices. [[1]](diffhunk://#diff-e15769e35b807986b812aae3ff7192269e171c5846b2ff4d8ec571ec8ed57aa4R87-R104) [[2]](diffhunk://#diff-e15769e35b807986b812aae3ff7192269e171c5846b2ff4d8ec571ec8ed57aa4R203-R207) - Updated plugin EP construction to pass the list of supported devices to the base class. **Provider Policy Context Refactoring** - Refactored provider policy context logic to modularize device ordering, device selection, telemetry logging, EP creation, and registration. This includes splitting the monolithic `SelectEpsForSession` into smaller methods: `OrderDevices`, `SelectEpDevices`, `LogTelemetry`, `CreateExecutionProviders`, `RegisterExecutionProviders`, and a new flow for model package-based EP selection. [[1]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0R53-R58) [[2]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0L118-L156) [[3]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0L225-R199) [[4]](diffhunk://#diff-dd9f398bec3f054aed2c930af620e3e1bfcc5b4a5d5667c4b0cd1f60ddfffda0R254-R365) These changes collectively lay the groundwork for more flexible, robust, and extensible device and EP selection in ONNX Runtime, especially in scenarios involving packaged models with multiple variants and complex hardware environments. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
1 parent 2a54ef9 commit fbba40a

18 files changed

Lines changed: 2592 additions & 141 deletions

cmake/onnxruntime_session.cmake

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ file(GLOB onnxruntime_session_srcs CONFIGURE_DEPENDS
77
"${ONNXRUNTIME_ROOT}/core/session/*.cc"
88
"${ONNXRUNTIME_ROOT}/core/session/plugin_ep/*.h"
99
"${ONNXRUNTIME_ROOT}/core/session/plugin_ep/*.cc"
10+
"${ONNXRUNTIME_ROOT}/core/session/model_package/*.h"
11+
"${ONNXRUNTIME_ROOT}/core/session/model_package/*.cc"
1012
)
1113

1214
if (onnxruntime_ENABLE_TRAINING_APIS)
@@ -25,6 +27,7 @@ endif()
2527
if (onnxruntime_MINIMAL_BUILD)
2628
file(GLOB autoep_srcs
2729
"${ONNXRUNTIME_ROOT}/core/session/plugin_ep/*.*"
30+
"${ONNXRUNTIME_ROOT}/core/session/model_package/*.*"
2831
)
2932

3033
set(onnxruntime_session_src_exclude
@@ -72,4 +75,3 @@ if (NOT onnxruntime_BUILD_SHARED_LIB)
7275
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
7376
FRAMEWORK DESTINATION ${CMAKE_INSTALL_BINDIR})
7477
endif()
75-

include/onnxruntime/core/framework/execution_provider.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,24 @@ class IExecutionProvider {
8484
: default_device_(device), type_{type}, logger_{&logger} {
8585
}
8686

87+
IExecutionProvider(const std::string& type, OrtDevice device,
88+
std::vector<const OrtEpDevice*> ep_devices, const logging::Logger& logger)
89+
: default_device_(device), ep_devices_{ep_devices}, type_{type}, logger_{&logger} {
90+
}
91+
8792
/*
8893
default device for this ExecutionProvider
8994
*/
9095
const OrtDevice default_device_;
9196

97+
/*
98+
The OrtEpDevice list this execution provider supports.
99+
100+
It's mainly for plugin EP which implements this interface or provider-bridge EP that
101+
implements OrtEpFactory as OrtEpDevice(s) are available for such scenarios.
102+
*/
103+
const std::vector<const OrtEpDevice*> ep_devices_;
104+
92105
public:
93106
virtual ~IExecutionProvider() = default;
94107

@@ -187,6 +200,11 @@ class IExecutionProvider {
187200
*/
188201
const OrtDevice& GetDevice() const { return default_device_; }
189202

203+
/**
204+
* Get the OrtEpDevice list the execution provider was registered with.
205+
*/
206+
const std::vector<const OrtEpDevice*>& GetEpDevices() const { return ep_devices_; }
207+
190208
/**
191209
Get execution provider's configuration options.
192210
*/

onnxruntime/core/session/abi_devices.h

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
#pragma once
55

6+
#include <sstream>
67
#include <string>
78
#include <unordered_map>
89

@@ -33,6 +34,44 @@ struct OrtHardwareDevice {
3334

3435
return h;
3536
}
37+
38+
std::string ToString() const {
39+
const char* type_str = "UNKNOWN";
40+
switch (type) {
41+
case OrtHardwareDeviceType_CPU:
42+
type_str = "CPU";
43+
break;
44+
case OrtHardwareDeviceType_GPU:
45+
type_str = "GPU";
46+
break;
47+
case OrtHardwareDeviceType_NPU:
48+
type_str = "NPU";
49+
break;
50+
default:
51+
break;
52+
}
53+
54+
std::ostringstream oss;
55+
oss << "OrtHardwareDevice{"
56+
<< "type=" << type_str
57+
<< ", vendor_id=" << vendor_id
58+
<< ", device_id=" << device_id
59+
<< ", vendor=\"" << vendor << "\"";
60+
61+
if (!metadata.Entries().empty()) {
62+
oss << ", metadata={";
63+
bool first = true;
64+
for (const auto& [k, v] : metadata.Entries()) {
65+
if (!first) oss << ", ";
66+
first = false;
67+
oss << k << "=" << v;
68+
}
69+
oss << "}";
70+
}
71+
72+
oss << "}";
73+
return oss.str();
74+
}
3675
};
3776

3877
// This is to make OrtHardwareDevice a valid key in hash tables
@@ -74,6 +113,40 @@ struct OrtEpDevice {
74113
// the user provides const OrtEpDevice instances, but the OrtEpFactory API takes non-const instances for all
75114
// get/create methods to be as flexible as possible. this helper converts to a non-const factory instance.
76115
OrtEpFactory* GetMutableFactory() const { return ep_factory; }
116+
117+
std::string ToString() const {
118+
std::ostringstream oss;
119+
oss << "OrtEpDevice{"
120+
<< "ep_name=\"" << ep_name << "\""
121+
<< ", ep_vendor=\"" << ep_vendor << "\""
122+
<< ", device=" << (device ? device->ToString() : "null");
123+
124+
if (!ep_metadata.Entries().empty()) {
125+
oss << ", ep_metadata={";
126+
bool first = true;
127+
for (const auto& [k, v] : ep_metadata.Entries()) {
128+
if (!first) oss << ", ";
129+
first = false;
130+
oss << k << "=" << v;
131+
}
132+
oss << "}";
133+
}
134+
135+
if (!ep_options.Entries().empty()) {
136+
oss << ", ep_options={";
137+
bool first = true;
138+
for (const auto& [k, v] : ep_options.Entries()) {
139+
if (!first) oss << ", ";
140+
first = false;
141+
oss << k << "=" << v;
142+
}
143+
oss << "}";
144+
}
145+
146+
oss << "}";
147+
148+
return oss.str();
149+
}
77150
};
78151

79152
struct OrtDeviceEpIncompatibilityDetails {
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Model Package Format
2+
3+
This document describes the model package directory layout and the JSON files used by ONNX Runtime to discover and load model packages. All JSON files must be UTF-8 encoded.
4+
5+
## Definitions
6+
7+
- Model Package
8+
9+
- A model package defines the overall logical ‘model’
10+
- A model package contains one or more ‘component models’
11+
- The component models are executed when running the model package to provide the overall functionality of the logical model
12+
- A model package may contain configuration information to support running multiple component models
13+
14+
- Component Model
15+
- A component model comprises one or more ‘model variants’
16+
- All variants have the same model inputs and outputs with the same shapes.
17+
- The data types may vary.
18+
19+
- Model Variant
20+
- A ‘model variant’ is a single ONNX or ORT format model.
21+
22+
## Directory layout
23+
24+
````
25+
<model>.ortpackage/ 
26+
├── manifest.json
27+
├── pipeline.json
28+
├── configs/
29+
| ├── genai_config.json
30+
| └── chat_template.jinja
31+
└── models/ 
32+
    └── model_name/ 
33+
        ├── metadata.json
34+
| └── Contains general information on the component model,
35+
| and specific information about each model variant
36+
| such as data types, quantization algo, EP, etc. that
37+
| is updated on add/remove of model variant
38+
└── shared_weights/ (shared weights from all variants)
39+
└── <checksum of weights file A>/
40+
└── model.data
41+
└── <checksum of weights file B>/
42+
└── model.data
43+
└── ...
44+
        └── base model /   
45+
            ├── model.onnx  
46+
        └── variant A / 
47+
            ├── optimized model.onnx (contains EPContext nodes) 
48+
            └── [Compilation artifacts] 
49+
        └── variant B / 
50+
            ├── optimized model.onnx (contains EPContext nodes) 
51+
            └── [Compilation artifacts] 
52+
````
53+
54+
55+
## Notes:
56+
- Shared weights is not yet supported, but the format allows for it in the future.
57+
58+
## `manifest.json` (required)
59+
60+
Location: `<package_root>/manifest.json`
61+
62+
Purpose: Provides the overall package identity and (optionally) lists component models available in the package.
63+
64+
Schema:
65+
- `model_name` (string, required): Logical package name.
66+
- `model_version` (string, optional): Version of the model package.
67+
- `component_models` (array of strings, optional): List of component model names. If this field is omitted, ONNX Runtime will discover component models by enumerating subdirectories under `models/`. If present, the names listed here must match the subdirectory names under `models/`.
68+
69+
### `manifest.json` example
70+
71+
```json
72+
{
73+
"model_name": <logical_model_name>,
74+
"model_version": "1.0",
75+
"component_models": [
76+
<component_model_name_1>,
77+
<component_model_name_2>
78+
]
79+
}
80+
```
81+
82+
## `metadata.json` (required per component model)
83+
84+
Location: `<package_root>/models/<component_model>/metadata.json`
85+
86+
Purpose: Describes the variants available for a specific component model.
87+
88+
Schema:
89+
- `component_model_name` (string, required): Name of the component model.
90+
- `model_variants` (object, required): Map of variant names to variant descriptors.
91+
- `<variant_name>` (object, required):
92+
- `model_type` (string, optional): Type of the model (e.g., `"onnx"`, `"ORT"`). If omitted, ORT will treat it as an ONNX model by default.
93+
- `model_file` (string, optional): Path relative to the model variant directory. Can point to an ONNX model file or a directory. If it is a directory, or if `model_file` is omitted, ORT will discover the ONNX model file within that directory.
94+
- `model_id` (string, optional): Unique identifier for the model variant. It should match a catalog value if the model comes from a catalog. If `model_id` is present, the model will be in the <component_model_name>/`model_id`/ directory.
95+
- `constraints` (object, required):
96+
- `ep` (string, required (except base model)): Execution provider name (e.g., `"TensorrtExecutionProvider"`, `"QNNExecutionProvider"`, `"OpenVINOExecutionProvider"`).
97+
- `device` (string, optional): Target device type (e.g., `"cpu"`, `"gpu"`, `"npu"`). Must match a supported `OrtHardwareDevice`. If the EPContext model can support multiple device types, this field can be omitted and EP should record supported device types in `ep_compatibility_info` instead.
98+
- `architecture` (string, optional): Hardware architecture hint; interpreted by the EP if needed.
99+
- `ep_compatibility_info` (string, optional): EP-specific compatibility string (as produced by `OrtEp::GetCompiledModelCompatibilityInfo()`); validated by the EP when selecting a variant. **The compatibility value returned by the EP is critical—ORT uses it to rank and choose the model variant.**
100+
101+
### `metadata.json` example
102+
```json
103+
{
104+
"component_model_name": <component_model_name>,
105+
"model_variants": {
106+
<variant_1>: {
107+
"model_type": "onnx",
108+
"model_file": "model_ctx.onnx",
109+
"constraints": {
110+
"ep": "TensorrtExecutionProvider",
111+
"ep_compatibility_info": "..."
112+
}
113+
},
114+
<variant_2>: {
115+
"model_type": "onnx",
116+
"model_file": "model_ctx.onnx",
117+
"constraints": {
118+
"ep": "OpenVINOExecutionProvider",
119+
"device": "cpu",
120+
"ep_compatibility_info": "..."
121+
}
122+
}
123+
}
124+
}
125+
```
126+
127+
128+
## Processing rules (runtime expectations)
129+
130+
- ONNX Runtime reads `manifest.json` if the path passed in is the package root directory; if `component_models` is present, it uses that to determine which component models to load. If `component_models` is not present, ONNX Runtime discovers component models by enumerating subdirectories under `models/`. (In this case, ONNX Runtime expects only one component model exist in the model package.)
131+
- ONNX Runtime reads component model's `metadata.json` and ignores `manifest.json` if the path passed in points directly to a component model directory.
132+
- For each component model, `metadata.json` supplies the definitive list of variants and constraints.
133+
- Variant selection is performed by matching constraints (EP, device, `ep_compatibility_info`, and optionally architecture). **The EP’s returned compatibility value (e.g., `EP_SUPPORTED_OPTIMAL`, `EP_SUPPORTED_PREFER_RECOMPILATION`) is used to score and pick the winning model variant.**
134+
- All file paths must be relative paths; avoid absolute paths to keep packages portable

0 commit comments

Comments
 (0)