Gate weights cache on runtime option instead of compile-time macro (#19603)#19603
Gate weights cache on runtime option instead of compile-time macro (#19603)#19603hboyraz wants to merge 2 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19603
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ No Failures, 138 PendingAs of commit 8601d3b with merge base 09a7cbe ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@hboyraz has exported this pull request. If you are a Meta employee, you can view the originating Diff in D105123995. |
This PR needs a
|
|
@pytorchbot label "release notes: bug fix" |
|
Didn't find following labels among repository labels: release notes: bug fix |
…ytorch#19603) Summary: Replaces the compile-time `#ifdef ENABLE_XNNPACK_WEIGHTS_CACHE` gate in XNNCompiler.cpp with a runtime boolean plumbed from `XnnpackBackendOptions::resolve_weight_cache(context)` through `XNNPACKBackend::init` to `XNNCompiler::compileModel`. This fixes a silent-disable bug: previously, runtime opt-in via `set_option(weight_cache_option_key, true)` was silently a no-op unless the build also set `-c executorch.xnnpack_weights_cache=1`, because the cache pointer handed to `xnn_create_runtime_v4` was hardcoded to nullptr when the macro was undefined. Multimethod LoRA models re-packed the entire backbone for every method load, costing hundreds of MB of resident memory. The runtime path now keys all three cache-relevant code regions (unpacked-data load, cache pointer handoff to xnn_create_runtime_v4, and finalize_for_runtime) on `bool use_weight_cache` resolved per-init from the BackendInitContext. The `Result<vector<string>>` declaration in compileModel was reshaped to plain `vector<string>` since `Result<>` is non-assignable, which is required for the new runtime branch. Reviewed By: GregoryComer Differential Revision: D105123995
e33c3f9 to
fcbc108
Compare
…ytorch#19603) Summary: Replaces the compile-time `#ifdef ENABLE_XNNPACK_WEIGHTS_CACHE` gate in XNNCompiler.cpp with a runtime boolean plumbed from `XnnpackBackendOptions::resolve_weight_cache(context)` through `XNNPACKBackend::init` to `XNNCompiler::compileModel`. This fixes a silent-disable bug: previously, runtime opt-in via `set_option(weight_cache_option_key, true)` was silently a no-op unless the build also set `-c executorch.xnnpack_weights_cache=1`, because the cache pointer handed to `xnn_create_runtime_v4` was hardcoded to nullptr when the macro was undefined. Multimethod LoRA models re-packed the entire backbone for every method load, costing hundreds of MB of resident memory. The runtime path now keys all three cache-relevant code regions (unpacked-data load, cache pointer handoff to xnn_create_runtime_v4, and finalize_for_runtime) on `bool use_weight_cache` resolved per-init from the BackendInitContext. The `Result<vector<string>>` declaration in compileModel was reshaped to plain `vector<string>` since `Result<>` is non-assignable, which is required for the new runtime branch. Reviewed By: GregoryComer Differential Revision: D105123995
fcbc108 to
bbf2b17
Compare
Summary:
Replaces the compile-time
#ifdef ENABLE_XNNPACK_WEIGHTS_CACHEgate inXNNCompiler.cpp with a runtime boolean plumbed from
XnnpackBackendOptions::resolve_weight_cache(context)throughXNNPACKBackend::inittoXNNCompiler::compileModel.This fixes a silent-disable bug: previously, runtime opt-in via
set_option(weight_cache_option_key, true)was silently a no-op unlessthe build also set
-c executorch.xnnpack_weights_cache=1, because thecache pointer handed to
xnn_create_runtime_v4was hardcoded to nullptrwhen the macro was undefined. Multimethod LoRA models re-packed the entire backbone for every method load, costing
hundreds of MB of resident memory.
The runtime path now keys all three cache-relevant code regions
(unpacked-data load, cache pointer handoff to xnn_create_runtime_v4, and
finalize_for_runtime) on
bool use_weight_cacheresolved per-init fromthe BackendInitContext.
The
Result<vector<string>>declaration in compileModel was reshaped toplain
vector<string>sinceResult<>is non-assignable, which isrequired for the new runtime branch.
Reviewed By: GregoryComer
Differential Revision: D105123995