Commit 13a1032
authored
Summary:
Split `XNNWeightsCache` out of its de-facto singleton lifetime (one
instance per `XnnpackBackend`, shared across every PTE that opted into
the file-backed cache) into a per-cache-file-path instance dispensed by
a new `XNNWeightsCacheManager`. Mirrors the `XNNWorkspaceManager`
PerModel pattern: same path → same shared instance; different paths →
independent instances; empty path → one shared heap-only instance so
XNNPACK's in-memory name dedup still works across PTEs.
The singleton design was forced today because
`XnnpackBackendOptions::weights_cache_` is a by-value member and
`XnnpackBackend` itself is a namespace-scope global
(`XNNPACKBackend.cpp:246`). For PTEs that genuinely share a cache file
the singleton's per-entry refcounting works, but two PTEs with
**different** packed-cache paths hit the same `XNNWeightsCache`
instance, so the second PTE's `initialize_for_runtime` calls
`ftruncate(0)` on the file under the first PTE's still-live mmap regions
— every subsequent access SIGBUSes (P2369924970 traces this). The clue
the prior fix added — the warning in `XNNWeightsCache.h:147-151` "the
path MUST be unique per `XNNWeightsCache` instance" — is enforced here
at the manager level rather than relying on each caller to honor it.
Design:
`XNNWeightsCacheManager` (new, mirrors `XNNWorkspaceManager`):
- `std::unordered_map<std::string, std::weak_ptr<XNNWeightsCache>>
caches_` keyed by absolute cache file path.
- `get_or_create(path)` looks up the path under a single `meta_mutex_`;
if a live `weak_ptr` exists, returns the shared instance, otherwise
constructs a new one, calls `set_packed_cache_path(path)` BEFORE
registering it, and stores a `weak_ptr`.
- `meta_mutex_` is held only during the map op — never across any call
into `XNNWeightsCache`, so different-path callers proceed in parallel
after the brief window.
- `save_all()` snapshots live shared_ptrs under `meta_mutex_`, then
iterates outside the meta lock and acquires each instance's own mutex
around `save_packed_index()`.
- Empty path uses a separate `empty_path_cache_` (weak_ptr) +
`empty_path_mutex_`: all heap-only callers (NGTTS sub-runners, FLLM
classifier, PLLM methods when mmap MC is off) share one instance so
XNNPACK's in-memory `look_up_or_insert` name dedup catches duplicate
weights across PTEs and across methods within a PTE. Without this
sharing, every `XnnpackBackend::init` allocated its own packed copy of
every weight, regressing heap-only memory by ~500 MB on LoRA-multimethod
PLLM (paste P2380809516: app_phys peak 1731 MB with per-instance vs
~1260 MB with the prior process-singleton). The hazard the per-path
keying motivates — non-opt-in PTE inheriting an opt-in PTE's path and
writing into its mmap file — never applies to empty-path callers because
they hold no path / no fd / no mmap regions, so sharing among empty-path
callers carries no isolation cost.
- Expired `weak_ptr` entries are erased opportunistically on the next
`get_or_create` / `save_all` for that path. Stale entries from
never-revisited paths linger; cost is one string + weak_ptr per dead
entry. Acceptable per `XNNWorkspaceManager` precedent.
`XNNWeightsCache`:
- New `std::mutex instance_mutex_` member + `mutex()` accessor. The
class has no internal synchronization; callers are responsible for
holding `mutex()` around every method invocation, INCLUDING the XNNPACK
callback paths (`look_up`, `reserve_space`, `look_up_or_insert`) that
fire during `xnn_create_runtime`.
- `set_packed_cache_path` is documented as call-once-before-publish:
production callers go through the manager, which sets the path before
installing the `shared_ptr` in the map, so no other thread can observe
the instance yet. Tests that construct the class directly must respect
this contract.
`XnnpackBackendOptions`:
- Replaced `XNNWeightsCache weights_cache_` + `std::mutex
weights_cache_mutex_` with `XNNWeightsCacheManager
weights_cache_manager_`.
- New `get_or_create_weights_cache(path)` thin wrapper around the
manager.
- `save_weights_cache_locked()` now walks every live cache via the
manager's `save_all()`.
- `packed_cache_path_` keeps a small `path_mutex_` to serialize the
`set_option(packed_cache_path_option_key)` → `init()` read; this is just
transport for the option value, the path's authoritative home is
per-instance inside each cache.
`XNNPACKBackend`:
- `init`: pulls the path from per-PTE `runtime_spec` (see "Per-PTE
caller signal" below), asks the manager for the shared
`XNNWeightsCache`, locks its `mutex()` for the entire init→compileModel
sequence, then publishes the `shared_ptr` into the executor via the new
`XNNExecutor::set_weights_cache`. Same-path PTEs serialize on the same
instance mutex; different-path PTEs hold different mutexes and proceed
in parallel — the singleton design forced full serialization here.
- `execute`: lock the per-executor cache's mutex (if any) instead of the
global one. Concurrent execute on independent caches now runs in
parallel.
- `destroy`: lock the per-executor cache's mutex, call
`delete_packed_data`. The local `shared_ptr` keeps the instance alive
across `delete_packed_data` even if dropping it from the executor was
the last outside reference.
`XNNExecutor`:
- New `std::shared_ptr<XNNWeightsCache> weights_cache_` member, set once
after `compileModel`. Forward-declared (rather than including
`XNNWeightsCache.h`) to keep the transitive `pte_data_map.h` dependency
out of the executor's public header — preserves the existing
`xnnexecutor_test` build dep set.
Per-PTE caller signal (the FLLM / NGTTS isolation guarantee):
The manager's per-path dedup is necessary but not sufficient — if a
non-opt-in PTE inherits an opt-in PTE's globally-set path, the manager
hands it the same shared instance and the non-opt-in PTE's
`reserve_space` writes into the opt-in model's mmap file. Investigation
of the three on-device loaders confirms the concrete risk: cria PLLM
pushes `packed_cache_path_option_key` globally
(`runner_interface.h:365-373`), but NGTTS sub-runners (`AcousticRunner`
/ `HfMimiRunner` / `SemanticLmRunner` at
`executorch/examples/models/fb/llama4/runner/*.cpp`) bypass cria
entirely and never push a path, and the cria FLLM classifier path skips
the push when `FactoryMetaData::useMmapPackedWeights = false`
(`CriaHost.cpp:220-224`). All three loaders run in the same process and
share one `XnnpackBackend` global (`XNNPACKBackend.cpp:246`).
Two complementary changes lock this down:
1. `XNNPACKBackend::init` no longer reads
`options_.get_packed_cache_path()` (shared backend-singleton state). It
reads the path strictly from `BackendInitContext::get_runtime_spec<const
char*>(packed_cache_path_option_key)` — the only per-PTE signal that
proves THIS PTE explicitly opted in. If `runtime_spec` carries no path,
`cache_path` is empty and the manager hands the shared
`empty_path_cache_` (per the empty-path branch above). Non-opt-in PTEs
are guaranteed isolated from the mmap-path file regardless of what the
global path happens to hold; they still dedupe against one another in
the shared empty-path cache.
2. cria `runner_interface.h::loadModel()` no longer pushes XNNPACK
options globally via `executorch::runtime::set_option`. It now builds a
`BackendOptions<3>` carrying path / `weight_cache_option_key` /
`workspace_sharing_mode_option_key`, wraps it in a
`LoadBackendOptionsMap`, and passes that map to every
`Module::load_method` call (primary, multimethod loop, YOCO
prefill/decode). The `BackendOptions` and map both live on the
`loadModel` stack frame, which extends through every `load_method` call
— Span lifetime requirements satisfied. Per-PTE options propagate into
the backend's `BackendInitContext::runtime_spec` via `Method::init`'s
`LoadBackendOptionsMap` path (`method.cpp:957-963`). Non-opt-in cria
PTEs and non-cria loaders (NGTTS, direct-Module) simply don't pass a map
→ empty runtime_spec → init forces empty path → shared heap-only
instance with dedup.
Lock hierarchy (updated):
- `weights_cache_manager_.meta_mutex_` (leaf — only during path-keyed
map ops, never held across calls into instances)
- `weights_cache_manager_.empty_path_mutex_` (leaf — only during
empty-path weak_ptr lookup/store)
- `XNNWeightsCache::instance_mutex_` (one per cache)
- `workspace_meta_mutex_`
- `workspace_mutex_` (owned by executor)
Race-condition / corner-case coverage:
- Same-path concurrent `get_or_create`: serialize on `meta_mutex_`, both
return the same shared instance.
- Different-path concurrent `get_or_create`: parallel after the brief
`meta_mutex_` window.
- Mid-load contention: same-path callers serialize on the instance mutex
around `initialize_for_runtime`.
- Cross-PTE clobbering (the original bug): impossible — each path owns
its own instance.
- Cross-process same-path: existing `flock(LOCK_EX|LOCK_NB)` defense
untouched.
- Cache file deleted on disk: existing mmap stays valid (unix unlink
semantics); manager doesn't track disk state.
- Process shutdown mid-save: executor-held `shared_ptr` outlives the
manager map; instance destruction follows the executor's normal
teardown.
- XNNPACK seed mismatch / cache format bump: existing per-entry seed
reject + v1-trailer reject paths untouched.
- Empty path: shared via `empty_path_cache_` weak_ptr; recreated when
all shared_ptrs drop; never collides with any mmap-path instance.
- Concurrent same-cache execute + destroy: serialize on the instance
mutex.
- Stale global path inherited by non-opt-in PTE: prevented by the
runtime_spec-only path read in init.
Mirrored to `fbcode/executorch/backends/xnnpack/runtime/`. The cria
change lives only under `xplat/cria/` (no fbcode mirror).
### Test plan
Built `fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend` on
linux, Apple, Android, and
`fbcode//executorch/backends/xnnpack:xnnpack_backend` on linux — all
green. Built downstream consumers to verify the API change is
binary-compatible: `fbsource//xplat/cria/core:cria{Apple,Android}`,
`fbsource//xplat/sgr/ml_service/modules/llm:lib_sgr_llmApple`,
`fbsource//xplat/assistant/oacr/trims/modules/ondevice_modules:mwa_ondevice_moduleApple`
— all green.
```
buck2 test \
fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache_manager \
fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache \
fbcode//executorch/backends/xnnpack/test:test_workspace_manager \
fbcode//executorch/backends/xnnpack/test:xnnexecutor_test
→ Pass 38. Fail 0. Build failure 0.
```
The new `test_xnn_weights_cache_manager` exercises 13 hazard cases the
manager handles: `SamePathReturnsSameInstance`,
`DifferentPathsReturnDifferentInstances`,
`EmptyPathSharedAcrossCallers`, `EmptyPathRecreatedAfterAllRefsDrop`,
`EmptyPathDoesNotShareWithMmapPath`, `ExpiredEntryDoesNotLeak`,
`ExpiredEntryRecreatedOnNextCall`, `ConcurrentSamePathSameInstance`
(16-thread fan-in), `ConcurrentDifferentPathsIndependent` (8-thread
fan-out), `SaveAllNoLiveInstancesIsOk`, `SaveAllWalksLiveCaches`,
`SaveAllSkipsExpiredEntries`, `NonEmptyPathRegistersInMap`.
Cria runner tests (`fbsource//xplat/cria/core/runner/tests/...`): Pass
938, Fail 0. The 18 `Fatal` entries reported by buck2
(`PrefillReturnsLogits`, `PrefillMapsParams`, `PrefillStringPrompt`,
etc.) reproduce identically on this commit's parent (605126226e, no cria
change, no init guard) with the same OpenMP/MKL/ASan SEGV stack —
`kmp_basic_flag_native::done_check` → `__kmp_hyper_barrier_release`
triggered by `mkl_blas_sgemm_omp_driver_v1` racing with `pthread_create`
from `pthreadpool_create_v2`. These are pre-existing flakes in the
asan-ubsan platform configuration, not caused by either the manager
refactor or the runtime_spec migration.
`arc lint -a` clean across all 22 changed/added files (11 xplat + 11
fbcode mirrors; cria is xplat-only).
Differential Revision: D108431510
1 parent f61d7c1 commit 13a1032
10 files changed
Lines changed: 546 additions & 73 deletions
File tree
- backends/xnnpack
- runtime
- test
- runtime
- shim_et/xplat/executorch/build
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
40 | 47 | | |
41 | 48 | | |
42 | 49 | | |
| |||
71 | 78 | | |
72 | 79 | | |
73 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
74 | 95 | | |
75 | 96 | | |
76 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
99 | 98 | | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
104 | 114 | | |
105 | | - | |
| 115 | + | |
106 | 116 | | |
107 | 117 | | |
108 | 118 | | |
| |||
118 | 128 | | |
119 | 129 | | |
120 | 130 | | |
121 | | - | |
| 131 | + | |
122 | 132 | | |
123 | 133 | | |
124 | 134 | | |
| |||
135 | 145 | | |
136 | 146 | | |
137 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
138 | 154 | | |
139 | 155 | | |
140 | 156 | | |
| |||
146 | 162 | | |
147 | 163 | | |
148 | 164 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
153 | 171 | | |
154 | 172 | | |
155 | 173 | | |
| |||
176 | 194 | | |
177 | 195 | | |
178 | 196 | | |
| 197 | + | |
179 | 198 | | |
180 | | - | |
181 | | - | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
182 | 205 | | |
183 | 206 | | |
184 | 207 | | |
185 | 208 | | |
186 | 209 | | |
187 | | - | |
188 | | - | |
189 | | - | |
| 210 | + | |
| 211 | + | |
190 | 212 | | |
191 | 213 | | |
192 | 214 | | |
| |||
237 | 259 | | |
238 | 260 | | |
239 | 261 | | |
240 | | - | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
241 | 265 | | |
242 | 266 | | |
243 | 267 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
139 | 140 | | |
140 | 141 | | |
141 | 142 | | |
142 | | - | |
143 | | - | |
144 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
145 | 146 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
151 | 150 | | |
152 | 151 | | |
153 | 152 | | |
154 | 153 | | |
155 | 154 | | |
156 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
157 | 165 | | |
158 | 166 | | |
159 | 167 | | |
| |||
215 | 223 | | |
216 | 224 | | |
217 | 225 | | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
218 | 230 | | |
219 | 231 | | |
220 | 232 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
0 commit comments