Skip to content

Commit 22787ae

Browse files
authored
[QNN EP] Add platform-agnostic EP option to specify QNN backend, backend_type (microsoft#24235)
Add a platform-agnostic EP option to specify QNN backend, `backend_type`. In typical usage, this should supersede the `backend_path` EP option. `backend_path` requires specifying a path to the QNN backend library which is different between Windows and non-Windows platforms (e.g., QnnCpu.dll vs. libQnnCpu.so). It will not be removed for backwards compatibility. It also provides the flexibility to specify an arbitrary backend path.
1 parent 83650ed commit 22787ae

44 files changed

Lines changed: 356 additions & 575 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

include/onnxruntime/core/session/onnxruntime_c_api.h

Lines changed: 68 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -3646,64 +3646,96 @@ struct OrtApi {
36463646
* that should be used to add it.
36473647
*
36483648
* QNN supported keys:
3649-
* "backend_path": file path to QNN backend library.
3650-
* "profiling_level": QNN profiling level, options: "off", "basic", "detailed". Default to off.
3649+
* "backend_type": Type of QNN backend. Specifies a backend path that is the associated QNN backend library file
3650+
* name. E.g., given backend type "htp", on Windows, the backend path would be "QnnHtp.dll", and on other
3651+
* platforms, it would be "libQnnHtp.so". Mutually exclusive with "backend_path".
3652+
* Available options:
3653+
* - "cpu"
3654+
* - "gpu"
3655+
* - "htp": Default.
3656+
* - "saver"
3657+
* "backend_path": File path to QNN backend library. Mutually exclusive with "backend_type".
3658+
* "profiling_level": QNN profiling level.
3659+
* Available options:
3660+
* - "off": Default.
3661+
* - "basic"
3662+
* - "detailed"
36513663
* "profiling_file_path": QNN profiling file path if ETW not enabled.
36523664
* "rpc_control_latency": QNN RPC control latency.
36533665
* "vtcm_mb": QNN VTCM size in MB. default to 0(not set).
3654-
* "htp_performance_mode": QNN performance mode, options: "burst", "balanced", "default", "high_performance",
3655-
* "high_power_saver", "low_balanced", "extreme_power_saver", "low_power_saver", "power_saver", "sustained_high_performance". Default to "default".
3666+
* "htp_performance_mode": QNN performance mode.
3667+
* Available options:
3668+
* - "burst"
3669+
* - "balanced"
3670+
* - "default": Default.
3671+
* - "high_performance"
3672+
* - "high_power_saver"
3673+
* - "low_balanced"
3674+
* - "extreme_power_saver"
3675+
* - "low_power_saver"
3676+
* - "power_saver"
3677+
* - "sustained_high_performance"
36563678
* "qnn_saver_path": File path to the QNN Saver backend library. If specified, QNN Saver will be enabled and will
3657-
* dump QNN API calls to disk for replay/debugging. QNN Saver produces incorrect model inference results and
3658-
* may alter model/EP partitioning. Use only for debugging.
3659-
* "qnn_context_priority": QNN context priority, options: "low", "normal", "normal_high", "high". Default to "normal".
3660-
* "htp_graph_finalization_optimization_mode": Set the optimization mode for graph finalization on the HTP backend. Available options:
3661-
* - "0": Default.
3662-
* - "1": Faster preparation time, less optimal graph.
3663-
* - "2": Longer preparation time, more optimal graph.
3664-
* - "3": Longest preparation time, most likely even more optimal graph. See QNN SDK documentation for specific details.
3665-
* "soc_model": The SoC model number. Refer to the QNN SDK documentation for valid values. Defaults to "0" (unknown).
3666-
* "htp_arch": The minimum HTP architecture the driver will use to select compatible QNN operators. Available options:
3667-
* - "0": Default (none).
3668-
* - "68"
3669-
* - "69"
3670-
* - "73"
3671-
* - "75"
3679+
* dump QNN API calls to disk for replay/debugging. QNN Saver produces incorrect model inference results and
3680+
* may alter model/EP partitioning. Use only for debugging.
3681+
* "qnn_context_priority": QNN context priority.
3682+
* Available options:
3683+
* - "low"
3684+
* - "normal": Default.
3685+
* - "normal_high"
3686+
* - "high"
3687+
* "htp_graph_finalization_optimization_mode": Set the optimization mode for graph finalization on the HTP backend.
3688+
* Available options:
3689+
* - "0": Default.
3690+
* - "1": Faster preparation time, less optimal graph.
3691+
* - "2": Longer preparation time, more optimal graph.
3692+
* - "3": Longest preparation time, most likely even more optimal graph. See QNN SDK documentation for specific
3693+
* details.
3694+
* "soc_model": The SoC model number. Refer to the QNN SDK documentation for valid values.
3695+
* Defaults to "0" (unknown).
3696+
* "htp_arch": The minimum HTP architecture the driver will use to select compatible QNN operators.
3697+
* Available options:
3698+
* - "0": Default (none).
3699+
* - "68"
3700+
* - "69"
3701+
* - "73"
3702+
* - "75"
36723703
* "device_id": The ID of the device to use when setting 'htp_arch'. Defaults to "0" (for single device).
36733704
* "enable_htp_fp16_precision": Used for float32 model for HTP backend.
3674-
* Enable the float32 model to be inferenced with fp16 precision. Otherwise, it will be fp32 precision.
3705+
* Enable the float32 model to be inferenced with fp16 precision. Otherwise, it will be fp32 precision.
36753706
* - "0": With fp32 precision.
36763707
* - "1": Default. With fp16 precision.
36773708
* "offload_graph_io_quantization": Offload graph input quantization and graph output dequantization to another
3678-
* execution provider (typically CPU EP).
3679-
* - "0": Disabled. QNN EP will handle quantization and dequantization of graph I/O.
3680-
* - "1": Enabled. This is the default value.
3681-
* "enable_htp_spill_fill_buffer": Enable HTP spill fill buffer setting. The flag is used while generating context binary.
3682-
* - "0": Default. Disabled.
3683-
* - "1": Enabled.
3709+
* execution provider (typically CPU EP).
3710+
* - "0": Disabled. QNN EP will handle quantization and dequantization of graph I/O.
3711+
* - "1": Enabled. This is the default value.
3712+
* "enable_htp_spill_fill_buffer": Enable HTP spill fill buffer setting. The flag is used while generating context
3713+
* binary.
3714+
* - "0": Default. Disabled.
3715+
* - "1": Enabled.
36843716
* "enable_htp_shared_memory_allocator": Enable the QNN HTP shared memory allocator. Requires libcdsprpc.so/dll to
3685-
* be available.
3686-
* - "0": Default. Disabled.
3687-
* - "1": Enabled.
3717+
* be available.
3718+
* - "0": Default. Disabled.
3719+
* - "1": Enabled.
36883720
* "dump_json_qnn_graph": Set to "1" to dump QNN graphs generated by QNN EP as JSON files. Each graph partition
36893721
* assigned to QNN EP is dumped to a separate file.
36903722
* "json_qnn_graph_dir": Directory in which to dump QNN JSON graphs. If not specified, QNN graphs are dumped in the
36913723
* program's current working directory. Ignored if "dump_json_qnn_graph" is not set.
36923724
*
36933725
* SNPE supported keys:
36943726
* "runtime": SNPE runtime engine, options: "CPU", "CPU_FLOAT32", "GPU", "GPU_FLOAT32_16_HYBRID", "GPU_FLOAT16",
3695-
* "DSP", "DSP_FIXED8_TF", "AIP_FIXED_TF", "AIP_FIXED8_TF".
3696-
* Mapping to SNPE Runtime_t definition: CPU, CPU_FLOAT32 => zdl::DlSystem::Runtime_t::CPU;
3697-
* GPU, GPU_FLOAT32_16_HYBRID => zdl::DlSystem::Runtime_t::GPU;
3698-
* GPU_FLOAT16 => zdl::DlSystem::Runtime_t::GPU_FLOAT16;
3699-
* DSP, DSP_FIXED8_TF => zdl::DlSystem::Runtime_t::DSP.
3700-
* AIP_FIXED_TF, AIP_FIXED8_TF => zdl::DlSystem::Runtime_t::AIP_FIXED_TF.
3727+
* "DSP", "DSP_FIXED8_TF", "AIP_FIXED_TF", "AIP_FIXED8_TF".
3728+
* Mapping to SNPE Runtime_t definition:
3729+
* CPU, CPU_FLOAT32 => zdl::DlSystem::Runtime_t::CPU;
3730+
* GPU, GPU_FLOAT32_16_HYBRID => zdl::DlSystem::Runtime_t::GPU;
3731+
* GPU_FLOAT16 => zdl::DlSystem::Runtime_t::GPU_FLOAT16;
3732+
* DSP, DSP_FIXED8_TF => zdl::DlSystem::Runtime_t::DSP.
3733+
* AIP_FIXED_TF, AIP_FIXED8_TF => zdl::DlSystem::Runtime_t::AIP_FIXED_TF.
37013734
* "priority": execution priority, options: "low", "normal".
37023735
* "buffer_type": ITensor or user buffers, options: "ITENSOR", user buffer with different types - "TF8", "TF16", "UINT8", "FLOAT".
37033736
* "ITENSOR" -- default, ITensor which is float only.
37043737
* "TF8" -- quantized model required, "FLOAT" -- for both quantized or non-quantized model
37053738
* "enable_init_cache": enable SNPE init caching feature, set to 1 to enabled it. Disabled by default.
3706-
* If SNPE is not available (due to a non Snpe enabled build or its dependencies not being installed), this function will fail.
37073739
*
37083740
* XNNPACK supported keys:
37093741
* "intra_op_num_threads": number of thread-pool size to use for XNNPACK execution provider.

java/src/test/android/app/src/androidTest/java/ai/onnxruntime/example/javavalidator/SimpleTest.kt

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,7 @@ class SimpleTest {
8282

8383
OrtProvider.QNN -> {
8484
if (OrtEnvironment.getAvailableProviders().contains(OrtProvider.QNN)) {
85-
// Since this is running in an Android environment, we use the .so library
86-
val qnnLibrary = "libQnnHtp.so"
87-
val providerOptions = Collections.singletonMap("backend_path", qnnLibrary)
85+
val providerOptions = Collections.singletonMap("backend_type", "htp")
8886
opts.addQnn(providerOptions)
8987
} else {
9088
Log.println(Log.INFO, TAG, "NO QNN EP available, skip the test")

java/src/test/java/ai/onnxruntime/InferenceTest.java

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2125,13 +2125,8 @@ private static SqueezeNetTuple openSessionSqueezeNet(EnumSet<OrtProvider> provid
21252125
options.addXnnpack(Collections.emptyMap());
21262126
break;
21272127
case QNN:
2128-
{
2129-
String backendPath = OS.WINDOWS.isCurrentOs() ? "/QnnCpu.dll" : "/libQnnCpu.so";
2130-
options.addQnn(
2131-
Collections.singletonMap(
2132-
"backend_path", TestHelpers.getResourcePath(backendPath).toString()));
2133-
break;
2134-
}
2128+
options.addQnn(Collections.singletonMap("backend_type", "cpu"));
2129+
break;
21352130
case VITIS_AI:
21362131
case RK_NPU:
21372132
case MI_GRAPH_X:

js/common/lib/inference-session.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -310,9 +310,15 @@ export declare namespace InferenceSession {
310310
export interface QnnExecutionProviderOption extends ExecutionProviderOption {
311311
readonly name: 'qnn';
312312
/**
313-
* Specify a path to the QnnHtp.dll file.
313+
* Specify the QNN backend type. E.g., 'cpu' or 'htp'.
314+
* Mutually exclusive with `backendPath`.
314315
*
315-
* @default 'QnnHtp.dll'
316+
* @default 'htp'
317+
*/
318+
backendType?: string;
319+
/**
320+
* Specify a path to the QNN backend library.
321+
* Mutually exclusive with `backendType`.
316322
*/
317323
backendPath?: string;
318324
/**

js/node/src/session_options_helper.cc

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,14 @@ void ParseExecutionProviders(const Napi::Array epList, Ort::SessionOptions& sess
8383
#endif
8484
#ifdef USE_QNN
8585
if (name == "qnn") {
86+
Napi::Value backend_type = obj.Get("backendType");
87+
if (!backend_type.IsUndefined()) {
88+
if (backend_type.IsString()) {
89+
qnn_options["backend_type"] = backend_type.As<Napi::String>().Utf8Value();
90+
} else {
91+
ORT_NAPI_THROW_TYPEERROR(epList.Env(), "Invalid argument: backendType must be a string.");
92+
}
93+
}
8694
Napi::Value backend_path = obj.Get("backendPath");
8795
if (!backend_path.IsUndefined()) {
8896
if (backend_path.IsString()) {
@@ -136,11 +144,6 @@ void ParseExecutionProviders(const Napi::Array epList, Ort::SessionOptions& sess
136144
#endif
137145
#ifdef USE_QNN
138146
} else if (name == "qnn") {
139-
// Ensure that the backend_path option are set to default values if not provided.
140-
if (qnn_options.find("backend_path") == qnn_options.end()) {
141-
qnn_options["backend_path"] = "QnnHtp.dll";
142-
}
143-
144147
sessionOptions.AppendExecutionProvider("QNN", qnn_options);
145148
#endif
146149
} else {

onnxruntime/core/providers/qnn/qnn_execution_provider.cc

Lines changed: 88 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
#include "qnn_execution_provider.h"
55

66
#include <filesystem>
7+
#include <optional>
8+
#include <string_view>
79
#include <unordered_set>
810

911
#include "core/providers/qnn/ort_api.h"
@@ -22,6 +24,60 @@ namespace onnxruntime {
2224

2325
constexpr const char* QNN = "QNN";
2426

27+
static std::string MakeSharedLibraryPath(std::string_view name) {
28+
#if defined(_WIN32)
29+
return MakeString(name, ".dll");
30+
#else
31+
return MakeString("lib", name, ".so");
32+
#endif
33+
}
34+
35+
const std::string kDefaultCpuBackendPath = MakeSharedLibraryPath("QnnCpu");
36+
const std::string kDefaultGpuBackendPath = MakeSharedLibraryPath("QnnGpu");
37+
const std::string kDefaultHtpBackendPath = MakeSharedLibraryPath("QnnHtp");
38+
const std::string kDefaultSaverBackendPath = MakeSharedLibraryPath("QnnSaver");
39+
40+
static bool ParseBackendTypeName(std::string_view backend_type_name, std::string& backend_path) {
41+
constexpr std::string_view kCpuBackendTypeName{"cpu"};
42+
constexpr std::string_view kGpuBackendTypeName{"gpu"};
43+
constexpr std::string_view kHtpBackendTypeName{"htp"};
44+
constexpr std::string_view kSaverBackendTypeName{"saver"};
45+
46+
constexpr std::array kAllowedBackendTypeNames{
47+
kCpuBackendTypeName,
48+
kGpuBackendTypeName,
49+
kHtpBackendTypeName,
50+
kSaverBackendTypeName,
51+
};
52+
53+
std::optional<std::string> associated_backend_path{};
54+
if (backend_type_name == kCpuBackendTypeName) {
55+
associated_backend_path = kDefaultCpuBackendPath;
56+
} else if (backend_type_name == kGpuBackendTypeName) {
57+
associated_backend_path = kDefaultGpuBackendPath;
58+
} else if (backend_type_name == kHtpBackendTypeName) {
59+
associated_backend_path = kDefaultHtpBackendPath;
60+
} else if (backend_type_name == kSaverBackendTypeName) {
61+
associated_backend_path = kDefaultSaverBackendPath;
62+
}
63+
64+
if (associated_backend_path.has_value()) {
65+
backend_path = std::move(*associated_backend_path);
66+
return true;
67+
}
68+
69+
std::ostringstream warning{};
70+
warning << "Invalid backend type name: " << backend_type_name << ". Allowed backend type names: ";
71+
for (size_t i = 0; i < kAllowedBackendTypeNames.size(); ++i) {
72+
warning << kAllowedBackendTypeNames[i];
73+
if (i + 1 < kAllowedBackendTypeNames.size()) {
74+
warning << ", ";
75+
}
76+
}
77+
LOGS_DEFAULT(WARNING) << warning.str();
78+
return false;
79+
}
80+
2581
static void ParseProfilingLevel(std::string profiling_level_string,
2682
qnn::ProfilingLevel& profiling_level) {
2783
std::transform(profiling_level_string.begin(),
@@ -201,15 +257,39 @@ QNNExecutionProvider::QNNExecutionProvider(const ProviderOptions& provider_optio
201257
LOGS_DEFAULT(VERBOSE) << "User specified option - stop share EP contexts across sessions: " << stop_share_ep_contexts_;
202258
}
203259

204-
static const std::string BACKEND_PATH = "backend_path";
205-
auto backend_path_pos = provider_options_map.find(BACKEND_PATH);
260+
std::string backend_path{};
261+
{
262+
std::optional<std::string> backend_path_from_options{};
263+
264+
static const std::string BACKEND_TYPE = "backend_type";
265+
static const std::string BACKEND_PATH = "backend_path";
206266

207-
std::string backend_path;
208-
if (backend_path_pos != provider_options_map.end()) {
209-
backend_path = backend_path_pos->second;
210-
LOGS_DEFAULT(VERBOSE) << "Backend path: " << backend_path;
211-
} else {
212-
LOGS_DEFAULT(ERROR) << "No backend path provided.";
267+
auto backend_type_it = provider_options_map.find(BACKEND_TYPE);
268+
auto backend_path_it = provider_options_map.find(BACKEND_PATH);
269+
270+
if (backend_type_it != provider_options_map.end() && backend_path_it != provider_options_map.end()) {
271+
ORT_THROW("Only one of '", BACKEND_TYPE, "' and '", BACKEND_PATH, "' should be set.");
272+
}
273+
274+
if (backend_type_it != provider_options_map.end()) {
275+
if (std::string parsed_backend_path; ParseBackendTypeName(backend_type_it->second, parsed_backend_path)) {
276+
backend_path_from_options = parsed_backend_path;
277+
} else {
278+
LOGS_DEFAULT(ERROR) << "Failed to parse '" << BACKEND_TYPE << "' value.";
279+
}
280+
} else if (backend_path_it != provider_options_map.end()) {
281+
backend_path_from_options = backend_path_it->second;
282+
}
283+
284+
if (backend_path_from_options.has_value()) {
285+
backend_path = std::move(*backend_path_from_options);
286+
} else {
287+
const auto& default_backend_path = kDefaultHtpBackendPath;
288+
backend_path = default_backend_path;
289+
LOGS_DEFAULT(WARNING) << "Unable to determine backend path from provider options. Using default.";
290+
}
291+
292+
LOGS_DEFAULT(VERBOSE) << "Using backend path: " << backend_path;
213293
}
214294

215295
std::string profiling_file_path;

0 commit comments

Comments
 (0)