Skip to content

Commit 13a1032

Browse files
authored
WeightsCacheManager: per-cache-path instances (#20337) (#20337)
Summary: Split `XNNWeightsCache` out of its de-facto singleton lifetime (one instance per `XnnpackBackend`, shared across every PTE that opted into the file-backed cache) into a per-cache-file-path instance dispensed by a new `XNNWeightsCacheManager`. Mirrors the `XNNWorkspaceManager` PerModel pattern: same path → same shared instance; different paths → independent instances; empty path → one shared heap-only instance so XNNPACK's in-memory name dedup still works across PTEs. The singleton design was forced today because `XnnpackBackendOptions::weights_cache_` is a by-value member and `XnnpackBackend` itself is a namespace-scope global (`XNNPACKBackend.cpp:246`). For PTEs that genuinely share a cache file the singleton's per-entry refcounting works, but two PTEs with **different** packed-cache paths hit the same `XNNWeightsCache` instance, so the second PTE's `initialize_for_runtime` calls `ftruncate(0)` on the file under the first PTE's still-live mmap regions — every subsequent access SIGBUSes (P2369924970 traces this). The clue the prior fix added — the warning in `XNNWeightsCache.h:147-151` "the path MUST be unique per `XNNWeightsCache` instance" — is enforced here at the manager level rather than relying on each caller to honor it. Design: `XNNWeightsCacheManager` (new, mirrors `XNNWorkspaceManager`): - `std::unordered_map<std::string, std::weak_ptr<XNNWeightsCache>> caches_` keyed by absolute cache file path. - `get_or_create(path)` looks up the path under a single `meta_mutex_`; if a live `weak_ptr` exists, returns the shared instance, otherwise constructs a new one, calls `set_packed_cache_path(path)` BEFORE registering it, and stores a `weak_ptr`. - `meta_mutex_` is held only during the map op — never across any call into `XNNWeightsCache`, so different-path callers proceed in parallel after the brief window. - `save_all()` snapshots live shared_ptrs under `meta_mutex_`, then iterates outside the meta lock and acquires each instance's own mutex around `save_packed_index()`. - Empty path uses a separate `empty_path_cache_` (weak_ptr) + `empty_path_mutex_`: all heap-only callers (NGTTS sub-runners, FLLM classifier, PLLM methods when mmap MC is off) share one instance so XNNPACK's in-memory `look_up_or_insert` name dedup catches duplicate weights across PTEs and across methods within a PTE. Without this sharing, every `XnnpackBackend::init` allocated its own packed copy of every weight, regressing heap-only memory by ~500 MB on LoRA-multimethod PLLM (paste P2380809516: app_phys peak 1731 MB with per-instance vs ~1260 MB with the prior process-singleton). The hazard the per-path keying motivates — non-opt-in PTE inheriting an opt-in PTE's path and writing into its mmap file — never applies to empty-path callers because they hold no path / no fd / no mmap regions, so sharing among empty-path callers carries no isolation cost. - Expired `weak_ptr` entries are erased opportunistically on the next `get_or_create` / `save_all` for that path. Stale entries from never-revisited paths linger; cost is one string + weak_ptr per dead entry. Acceptable per `XNNWorkspaceManager` precedent. `XNNWeightsCache`: - New `std::mutex instance_mutex_` member + `mutex()` accessor. The class has no internal synchronization; callers are responsible for holding `mutex()` around every method invocation, INCLUDING the XNNPACK callback paths (`look_up`, `reserve_space`, `look_up_or_insert`) that fire during `xnn_create_runtime`. - `set_packed_cache_path` is documented as call-once-before-publish: production callers go through the manager, which sets the path before installing the `shared_ptr` in the map, so no other thread can observe the instance yet. Tests that construct the class directly must respect this contract. `XnnpackBackendOptions`: - Replaced `XNNWeightsCache weights_cache_` + `std::mutex weights_cache_mutex_` with `XNNWeightsCacheManager weights_cache_manager_`. - New `get_or_create_weights_cache(path)` thin wrapper around the manager. - `save_weights_cache_locked()` now walks every live cache via the manager's `save_all()`. - `packed_cache_path_` keeps a small `path_mutex_` to serialize the `set_option(packed_cache_path_option_key)` → `init()` read; this is just transport for the option value, the path's authoritative home is per-instance inside each cache. `XNNPACKBackend`: - `init`: pulls the path from per-PTE `runtime_spec` (see "Per-PTE caller signal" below), asks the manager for the shared `XNNWeightsCache`, locks its `mutex()` for the entire init→compileModel sequence, then publishes the `shared_ptr` into the executor via the new `XNNExecutor::set_weights_cache`. Same-path PTEs serialize on the same instance mutex; different-path PTEs hold different mutexes and proceed in parallel — the singleton design forced full serialization here. - `execute`: lock the per-executor cache's mutex (if any) instead of the global one. Concurrent execute on independent caches now runs in parallel. - `destroy`: lock the per-executor cache's mutex, call `delete_packed_data`. The local `shared_ptr` keeps the instance alive across `delete_packed_data` even if dropping it from the executor was the last outside reference. `XNNExecutor`: - New `std::shared_ptr<XNNWeightsCache> weights_cache_` member, set once after `compileModel`. Forward-declared (rather than including `XNNWeightsCache.h`) to keep the transitive `pte_data_map.h` dependency out of the executor's public header — preserves the existing `xnnexecutor_test` build dep set. Per-PTE caller signal (the FLLM / NGTTS isolation guarantee): The manager's per-path dedup is necessary but not sufficient — if a non-opt-in PTE inherits an opt-in PTE's globally-set path, the manager hands it the same shared instance and the non-opt-in PTE's `reserve_space` writes into the opt-in model's mmap file. Investigation of the three on-device loaders confirms the concrete risk: cria PLLM pushes `packed_cache_path_option_key` globally (`runner_interface.h:365-373`), but NGTTS sub-runners (`AcousticRunner` / `HfMimiRunner` / `SemanticLmRunner` at `executorch/examples/models/fb/llama4/runner/*.cpp`) bypass cria entirely and never push a path, and the cria FLLM classifier path skips the push when `FactoryMetaData::useMmapPackedWeights = false` (`CriaHost.cpp:220-224`). All three loaders run in the same process and share one `XnnpackBackend` global (`XNNPACKBackend.cpp:246`). Two complementary changes lock this down: 1. `XNNPACKBackend::init` no longer reads `options_.get_packed_cache_path()` (shared backend-singleton state). It reads the path strictly from `BackendInitContext::get_runtime_spec<const char*>(packed_cache_path_option_key)` — the only per-PTE signal that proves THIS PTE explicitly opted in. If `runtime_spec` carries no path, `cache_path` is empty and the manager hands the shared `empty_path_cache_` (per the empty-path branch above). Non-opt-in PTEs are guaranteed isolated from the mmap-path file regardless of what the global path happens to hold; they still dedupe against one another in the shared empty-path cache. 2. cria `runner_interface.h::loadModel()` no longer pushes XNNPACK options globally via `executorch::runtime::set_option`. It now builds a `BackendOptions<3>` carrying path / `weight_cache_option_key` / `workspace_sharing_mode_option_key`, wraps it in a `LoadBackendOptionsMap`, and passes that map to every `Module::load_method` call (primary, multimethod loop, YOCO prefill/decode). The `BackendOptions` and map both live on the `loadModel` stack frame, which extends through every `load_method` call — Span lifetime requirements satisfied. Per-PTE options propagate into the backend's `BackendInitContext::runtime_spec` via `Method::init`'s `LoadBackendOptionsMap` path (`method.cpp:957-963`). Non-opt-in cria PTEs and non-cria loaders (NGTTS, direct-Module) simply don't pass a map → empty runtime_spec → init forces empty path → shared heap-only instance with dedup. Lock hierarchy (updated): - `weights_cache_manager_.meta_mutex_` (leaf — only during path-keyed map ops, never held across calls into instances) - `weights_cache_manager_.empty_path_mutex_` (leaf — only during empty-path weak_ptr lookup/store) - `XNNWeightsCache::instance_mutex_` (one per cache) - `workspace_meta_mutex_` - `workspace_mutex_` (owned by executor) Race-condition / corner-case coverage: - Same-path concurrent `get_or_create`: serialize on `meta_mutex_`, both return the same shared instance. - Different-path concurrent `get_or_create`: parallel after the brief `meta_mutex_` window. - Mid-load contention: same-path callers serialize on the instance mutex around `initialize_for_runtime`. - Cross-PTE clobbering (the original bug): impossible — each path owns its own instance. - Cross-process same-path: existing `flock(LOCK_EX|LOCK_NB)` defense untouched. - Cache file deleted on disk: existing mmap stays valid (unix unlink semantics); manager doesn't track disk state. - Process shutdown mid-save: executor-held `shared_ptr` outlives the manager map; instance destruction follows the executor's normal teardown. - XNNPACK seed mismatch / cache format bump: existing per-entry seed reject + v1-trailer reject paths untouched. - Empty path: shared via `empty_path_cache_` weak_ptr; recreated when all shared_ptrs drop; never collides with any mmap-path instance. - Concurrent same-cache execute + destroy: serialize on the instance mutex. - Stale global path inherited by non-opt-in PTE: prevented by the runtime_spec-only path read in init. Mirrored to `fbcode/executorch/backends/xnnpack/runtime/`. The cria change lives only under `xplat/cria/` (no fbcode mirror). ### Test plan Built `fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend` on linux, Apple, Android, and `fbcode//executorch/backends/xnnpack:xnnpack_backend` on linux — all green. Built downstream consumers to verify the API change is binary-compatible: `fbsource//xplat/cria/core:cria{Apple,Android}`, `fbsource//xplat/sgr/ml_service/modules/llm:lib_sgr_llmApple`, `fbsource//xplat/assistant/oacr/trims/modules/ondevice_modules:mwa_ondevice_moduleApple` — all green. ``` buck2 test \ fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache_manager \ fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache \ fbcode//executorch/backends/xnnpack/test:test_workspace_manager \ fbcode//executorch/backends/xnnpack/test:xnnexecutor_test → Pass 38. Fail 0. Build failure 0. ``` The new `test_xnn_weights_cache_manager` exercises 13 hazard cases the manager handles: `SamePathReturnsSameInstance`, `DifferentPathsReturnDifferentInstances`, `EmptyPathSharedAcrossCallers`, `EmptyPathRecreatedAfterAllRefsDrop`, `EmptyPathDoesNotShareWithMmapPath`, `ExpiredEntryDoesNotLeak`, `ExpiredEntryRecreatedOnNextCall`, `ConcurrentSamePathSameInstance` (16-thread fan-in), `ConcurrentDifferentPathsIndependent` (8-thread fan-out), `SaveAllNoLiveInstancesIsOk`, `SaveAllWalksLiveCaches`, `SaveAllSkipsExpiredEntries`, `NonEmptyPathRegistersInMap`. Cria runner tests (`fbsource//xplat/cria/core/runner/tests/...`): Pass 938, Fail 0. The 18 `Fatal` entries reported by buck2 (`PrefillReturnsLogits`, `PrefillMapsParams`, `PrefillStringPrompt`, etc.) reproduce identically on this commit's parent (605126226e, no cria change, no init guard) with the same OpenMP/MKL/ASan SEGV stack — `kmp_basic_flag_native::done_check` → `__kmp_hyper_barrier_release` triggered by `mkl_blas_sgemm_omp_driver_v1` racing with `pthread_create` from `pthreadpool_create_v2`. These are pre-existing flakes in the asan-ubsan platform configuration, not caused by either the manager refactor or the runtime_spec migration. `arc lint -a` clean across all 22 changed/added files (11 xplat + 11 fbcode mirrors; cria is xplat-only). Differential Revision: D108431510
1 parent f61d7c1 commit 13a1032

10 files changed

Lines changed: 546 additions & 73 deletions

File tree

backends/xnnpack/runtime/XNNExecutor.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ namespace backends {
2525
namespace xnnpack {
2626
namespace delegate {
2727

28+
// Forward-declared to keep XNNWeightsCache.h out of this header.
29+
class XNNWeightsCache;
30+
2831
class XNNExecutor {
2932
private:
3033
std::unique_ptr<xnn_runtime, decltype(&xnn_delete_runtime)> runtime_{
@@ -37,6 +40,10 @@ class XNNExecutor {
3740
std::vector<xnn_external_value> externals_;
3841
std::vector<std::string> packed_data_names_;
3942
std::shared_ptr<XNNWorkspace> workspace_;
43+
// Owned so the cache outlives delete_packed_data in destroy(),
44+
// even when every other executor sharing it is gone. Empty when no
45+
// file-backed cache is in use.
46+
std::shared_ptr<XNNWeightsCache> weights_cache_;
4047
std::atomic<bool> in_use_{false};
4148
std::atomic<bool> destroyed_{false};
4249

@@ -71,6 +78,20 @@ class XNNExecutor {
7178
return workspace_;
7279
}
7380

81+
// Set once by XNNPACKBackend::init after compileModel succeeds. Pass
82+
// an empty shared_ptr if no file-backed cache is in use for this PTE
83+
// (treated identically to never calling this).
84+
inline void set_weights_cache(std::shared_ptr<XNNWeightsCache> cache) {
85+
weights_cache_ = std::move(cache);
86+
}
87+
88+
// Returns the per-PTE weights cache shared_ptr (may be empty). Used
89+
// by XNNPACKBackend::execute to lock the cache's mutex around runtime
90+
// invocation, and by destroy() to invoke delete_packed_data.
91+
inline std::shared_ptr<XNNWeightsCache> get_weights_cache() const {
92+
return weights_cache_;
93+
}
94+
7495
/**
7596
* Initialize the XNNExecutor with a given runtime and input/output ids.
7697
* The input/output ids are expected to be sorted in order of their

backends/xnnpack/runtime/XNNPACKBackend.cpp

Lines changed: 45 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -91,18 +91,28 @@ class XnnpackBackend final
9191
auto workspace = workspace_result.get();
9292

9393
bool use_weight_cache = options_.resolve_weight_cache(context);
94-
// Hold the lock for the entire init-compile-finalize sequence to prevent
95-
// concurrent inits from resetting is_finalized_ or overwriting
96-
// named_data_map_ while compileModel is using the shared weights cache.
97-
std::unique_lock<std::mutex> lock_weights_cache(
98-
options_.weights_cache_mutex(), std::defer_lock);
94+
// Per-path weights cache: same-path PTEs share an instance and
95+
// serialize on its mutex; different paths run in parallel.
96+
std::shared_ptr<xnnpack::delegate::XNNWeightsCache> weights_cache;
97+
std::unique_lock<std::mutex> lock_weights_cache;
9998
if (use_weight_cache) {
100-
lock_weights_cache.lock();
101-
102-
const auto& cache_path = options_.get_packed_cache_path();
103-
options_.weights_cache().set_packed_cache_path(cache_path);
99+
// Only honor a path coming through runtime_spec (per-PTE opt-in).
100+
// Reading the backend-singleton global would let a non-opt-in PTE
101+
// inherit another model's cache file.
102+
std::string cache_path;
103+
auto path_spec = context.get_runtime_spec<const char*>(
104+
xnnpack::packed_cache_path_option_key);
105+
if (path_spec.ok()) {
106+
cache_path = path_spec.get();
107+
}
108+
auto wc_result = options_.get_or_create_weights_cache(cache_path);
109+
if (!wc_result.ok()) {
110+
return wc_result.error();
111+
}
112+
weights_cache = wc_result.get();
113+
lock_weights_cache = std::unique_lock<std::mutex>(weights_cache->mutex());
104114

105-
options_.weights_cache().initialize_for_runtime(
115+
weights_cache->initialize_for_runtime(
106116
context.get_runtime_allocator(), named_data_map);
107117
workspace->set_uses_weight_cache();
108118
}
@@ -118,7 +128,7 @@ class XnnpackBackend final
118128
processed->data(),
119129
processed->size(),
120130
executor,
121-
&options_.weights_cache(),
131+
weights_cache.get(),
122132
workspace_ptr,
123133
named_data_map,
124134
use_weight_cache);
@@ -135,6 +145,12 @@ class XnnpackBackend final
135145
return err;
136146
}
137147

148+
// Hand the cache to the executor (held by shared_ptr so it
149+
// outlives any sibling executors that share it).
150+
if (use_weight_cache) {
151+
executor->set_weights_cache(std::move(weights_cache));
152+
}
153+
138154
return executor;
139155
}
140156

@@ -146,10 +162,12 @@ class XnnpackBackend final
146162

147163
auto workspace = executor->get_workspace();
148164

149-
std::unique_lock<std::mutex> lock_weights_cache(
150-
options_.weights_cache_mutex(), std::defer_lock);
151-
if (executor->uses_weight_cache() || workspace->uses_weight_cache()) {
152-
lock_weights_cache.lock();
165+
// Lock the cache shared with sibling executors at the same path.
166+
// Empty cache → PTE didn't opt into file-backed mode.
167+
auto cache = executor->get_weights_cache();
168+
std::unique_lock<std::mutex> lock_weights_cache;
169+
if (cache) {
170+
lock_weights_cache = std::unique_lock<std::mutex>(cache->mutex());
153171
}
154172

155173
auto [raii_lock, _] = workspace->acquire();
@@ -176,17 +194,21 @@ class XnnpackBackend final
176194
if (handle != nullptr) {
177195
auto executor = static_cast<xnnpack::delegate::XNNExecutor*>(handle);
178196
auto workspace = executor->get_workspace();
197+
auto cache = executor->get_weights_cache();
179198

180-
const std::lock_guard<std::mutex> lock_weights_cache(
181-
options_.weights_cache_mutex());
199+
// Local shared_ptr keeps the instance alive through
200+
// delete_packed_data even if the executor was the last holder.
201+
std::unique_lock<std::mutex> lock_weights_cache;
202+
if (cache) {
203+
lock_weights_cache = std::unique_lock<std::mutex>(cache->mutex());
204+
}
182205

183206
#ifdef ENABLE_XNNPACK_PROFILING
184207
executor->print_avg_op_timings();
185208
#endif
186209

187-
if (executor->uses_weight_cache()) {
188-
options_.weights_cache().delete_packed_data(
189-
executor->get_packed_data_names());
210+
if (cache && executor->uses_weight_cache()) {
211+
cache->delete_packed_data(executor->get_packed_data_names());
190212
}
191213

192214
// This is needed to serialize access to xnn_delete_runtime which is not
@@ -237,7 +259,9 @@ class XnnpackBackend final
237259
mutable xnnpack::XnnpackBackendOptions options_;
238260

239261
// Lock hierarchy for mutexes:
240-
// options_.weights_cache_mutex()
262+
// weights_cache_manager_.meta_mutex_ (leaf — held only during
263+
// get_or_create map ops)
264+
// XNNWeightsCache::instance_mutex_ (one per cache instance)
241265
// workspace_meta_mutex_
242266
// workspace_mutex_ (owned by executor)
243267
};

backends/xnnpack/runtime/XNNWeightsCache.h

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <executorch/runtime/core/memory_allocator.h>
1515
#include <executorch/runtime/core/result.h>
1616
#include <executorch/runtime/executor/pte_data_map.h>
17+
#include <mutex>
1718
#include <string>
1819
#include <unordered_map>
1920
#include <vector>
@@ -139,21 +140,28 @@ class XNNWeightsCache {
139140
Error delete_packed_data(const std::vector<std::string>& packed_names);
140141

141142
/**
142-
* Set the path for the file-backed packed weight storage.
143-
* When set, reserve_space() allocates from a MAP_SHARED file instead
144-
* of heap, and finalize_for_runtime() calls msync to make pages clean.
143+
* Set the file-backed storage path. When set, reserve_space()
144+
* allocates from a MAP_SHARED file instead of heap, and
145+
* finalize_for_runtime() msyncs pages.
145146
*
146-
* The path MUST be unique per XNNWeightsCache instance — sharing it
147-
* across instances (or processes) would mean O_TRUNC corrupts the other
148-
* holder's mappings (SIGBUS on access). initialize_for_runtime() takes
149-
* an advisory exclusive flock on the file; if the lock fails the mmap
150-
* path is disabled for this instance and allocations fall back to heap.
147+
* Call once, before any other method, and never again. Two
148+
* instances sharing the same path will corrupt each other on
149+
* O_TRUNC (SIGBUS); the manager prevents this by per-path dedup.
151150
*/
152151
void set_packed_cache_path(const std::string& path);
153152

154153
/** Save packed weight index so subsequent loads skip packing. */
155154
Error save_packed_index();
156155

156+
/**
157+
* Per-instance mutex. The cache has no internal synchronization;
158+
* callers must hold this around every method call and every
159+
* XNNPACK callback that touches the cache during xnn_create_runtime.
160+
*/
161+
std::mutex& mutex() noexcept {
162+
return instance_mutex_;
163+
}
164+
157165
private:
158166
static constexpr uint32_t kCacheMagic = 0x58505743; // "XPWC"
159167
// Bump when the on-disk layout (footer or per-entry record) changes.
@@ -215,6 +223,10 @@ class XNNWeightsCache {
215223
// in mmap_regions_, so delete_packed_data() can munmap when ref_count==0.
216224
std::unordered_map<void*, size_t> file_ptr_to_region_index_;
217225

226+
// See mutex() for the locking contract — caller-owned, no internal
227+
// use within this class.
228+
std::mutex instance_mutex_;
229+
218230
// Function pointers to override XNNPACK's default xnn_weights_cache_provider
219231
// functions.
220232
static size_t look_up(
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
* All rights reserved.
4+
*
5+
* This source code is licensed under the BSD-style license found in the
6+
* LICENSE file in the root directory of this source tree.
7+
*/
8+
9+
#include <executorch/backends/xnnpack/runtime/XNNWeightsCacheManager.h>
10+
11+
#include <executorch/runtime/core/error.h>
12+
13+
#include <utility>
14+
#include <vector>
15+
16+
namespace executorch::backends::xnnpack {
17+
18+
using executorch::runtime::Error;
19+
using executorch::runtime::Result;
20+
21+
Result<std::shared_ptr<delegate::XNNWeightsCache>>
22+
XNNWeightsCacheManager::get_or_create(const std::string& cache_file_path) {
23+
// Empty path → one shared heap-only instance. See header for why.
24+
if (cache_file_path.empty()) {
25+
std::scoped_lock<std::mutex> lock(empty_path_mutex_);
26+
if (auto live = empty_path_cache_.lock()) {
27+
return live;
28+
}
29+
auto cache = std::make_shared<delegate::XNNWeightsCache>();
30+
empty_path_cache_ = cache;
31+
return cache;
32+
}
33+
34+
std::scoped_lock<std::mutex> lock(meta_mutex_);
35+
auto it = caches_.find(cache_file_path);
36+
if (it != caches_.end()) {
37+
if (auto live = it->second.lock()) {
38+
return live;
39+
}
40+
caches_.erase(it);
41+
}
42+
43+
auto cache = std::make_shared<delegate::XNNWeightsCache>();
44+
// Set path before publishing into the map so concurrent callers
45+
// observe a fully initialized instance.
46+
cache->set_packed_cache_path(cache_file_path);
47+
caches_[cache_file_path] = cache;
48+
return cache;
49+
}
50+
51+
Error XNNWeightsCacheManager::save_all() {
52+
// Snapshot live shared_ptrs under meta_mutex_, then release it
53+
// before per-instance save (honors lock order, lets get_or_create
54+
// on unrelated paths proceed during the save walk).
55+
std::vector<std::shared_ptr<delegate::XNNWeightsCache>> live;
56+
{
57+
std::scoped_lock<std::mutex> lock(meta_mutex_);
58+
live.reserve(caches_.size());
59+
for (auto it = caches_.begin(); it != caches_.end();) {
60+
if (auto cache = it->second.lock()) {
61+
live.push_back(std::move(cache));
62+
++it;
63+
} else {
64+
it = caches_.erase(it);
65+
}
66+
}
67+
}
68+
69+
Error first_err = Error::Ok;
70+
for (auto& cache : live) {
71+
std::lock_guard<std::mutex> lock(cache->mutex());
72+
Error err = cache->save_packed_index();
73+
if (err != Error::Ok && first_err == Error::Ok) {
74+
first_err = err;
75+
}
76+
}
77+
return first_err;
78+
}
79+
80+
size_t XNNWeightsCacheManager::live_count() const {
81+
std::scoped_lock<std::mutex> lock(meta_mutex_);
82+
size_t count = 0;
83+
for (const auto& entry : caches_) {
84+
if (!entry.second.expired()) {
85+
++count;
86+
}
87+
}
88+
return count;
89+
}
90+
91+
} // namespace executorch::backends::xnnpack
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
* All rights reserved.
4+
*
5+
* This source code is licensed under the BSD-style license found in the
6+
* LICENSE file in the root directory of this source tree.
7+
*/
8+
9+
#pragma once
10+
11+
#include <executorch/backends/xnnpack/runtime/XNNWeightsCache.h>
12+
#include <executorch/runtime/core/error.h>
13+
#include <executorch/runtime/core/result.h>
14+
15+
#include <memory>
16+
#include <mutex>
17+
#include <string>
18+
#include <unordered_map>
19+
20+
namespace executorch::backends::xnnpack {
21+
22+
/**
23+
* One `XNNWeightsCache` per cache file path. Mirrors
24+
* `XNNWorkspaceManager`'s PerModel pattern with `weak_ptr` so
25+
* instances live as long as the executors owning them.
26+
*
27+
* Per-path keying prevents `initialize_for_runtime` from a second
28+
* path tearing down the first path's fd / mmap regions (SIGBUS).
29+
*
30+
* Empty path returns one shared heap-only instance so callers
31+
* without a file still get XNNPACK's in-memory name dedup.
32+
*
33+
* Lock order: `meta_mutex_` → `XNNWeightsCache::mutex()` →
34+
* `XNNWorkspaceManager::workspace_meta_mutex_` → `XNNWorkspace::mutex_`.
35+
*/
36+
class XNNWeightsCacheManager {
37+
public:
38+
XNNWeightsCacheManager() = default;
39+
~XNNWeightsCacheManager() = default;
40+
41+
XNNWeightsCacheManager(const XNNWeightsCacheManager&) = delete;
42+
XNNWeightsCacheManager& operator=(const XNNWeightsCacheManager&) = delete;
43+
XNNWeightsCacheManager(XNNWeightsCacheManager&&) = delete;
44+
XNNWeightsCacheManager& operator=(XNNWeightsCacheManager&&) = delete;
45+
46+
/** Shared `XNNWeightsCache` for `cache_file_path`. Empty path
47+
* returns one shared heap-only instance. Never null on success. */
48+
runtime::Result<std::shared_ptr<delegate::XNNWeightsCache>> get_or_create(
49+
const std::string& cache_file_path);
50+
51+
/** Walk live caches and call `save_packed_index()` on each under
52+
* its per-instance mutex. Returns the first error; keeps going so
53+
* one failure doesn't strand the others. Opportunistically erases
54+
* expired weak_ptrs. */
55+
runtime::Error save_all();
56+
57+
/** Test-only: count of live (non-expired) entries. */
58+
size_t live_count() const;
59+
60+
private:
61+
mutable std::mutex meta_mutex_;
62+
std::unordered_map<std::string, std::weak_ptr<delegate::XNNWeightsCache>>
63+
caches_;
64+
65+
// Separate slot for the empty-path (heap-only) cache to avoid
66+
// string-hashing and contention with mmap-path callers.
67+
mutable std::mutex empty_path_mutex_;
68+
std::weak_ptr<delegate::XNNWeightsCache> empty_path_cache_;
69+
};
70+
71+
} // namespace executorch::backends::xnnpack

0 commit comments

Comments
 (0)