Commit 541da29
committed
feat: iOS expert streaming via mmap page-cache for MoE models
ExpertStreamingConfig (new, MLXLMCommon):
- Replaces EXPERIMENTAL_SSD_STREAM env var with a proper Swift API
- .mmapPageCache mode: APFS page-cache (iOS + macOS without directIO)
- .directNVMe mode: pread() at 5GB/s NVMe (macOS default for MoE)
- activate(modelDirectory:useDirectIO:) + deactivate()
- legacyEnvPath shim for any remaining C-level consumers
SwitchLayers.swift:
- ExpertStreamingConfig.shared.isEnabled replaces env var gate
- #if os(macOS) / #else: directNVMe path locked to macOS only
- iOS always routes to mmap prefault fallback (was dead code before)
Load.swift / LayerPartitioning.swift:
- Both env var gates replaced with ExpertStreamingConfig.shared.isEnabled
InferenceEngine.load():
- MoE models get config.lazyLoad = true + ExpertStreamingConfig.activate()
- macOS: useDirectIO=true (5GB/s NVMe pread)
- iOS: useDirectIO=false (APFS mmap, ~2-3GB/s, fits in sandbox)
- Deactivated on error or unload()
ModelCatalog:
- ramRequiredGB for MoE = peak-resident (active experts only)
- Qwen3 30B MoE: ramRequired=4.5GB (targets iPad Pro M4 8GB+)
- DeepSeek R1 0528: ramRequired=8GB (targets iPad Pro M4 16GB+)
- Qwen3.5 122B: ramRequired=12GB (macOS / iPad Pro M4 Max 32GB)
This enables 30B-class MoE reasoning models on iPad Pro M4
without any system swap — purely via OS page-cache eviction.1 parent d454c0c commit 541da29
4 files changed
Lines changed: 75 additions & 21 deletions
File tree
- Sources/MLXInferenceCore
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| 189 | + | |
| 190 | + | |
189 | 191 | | |
190 | 192 | | |
191 | 193 | | |
| |||
197 | 199 | | |
198 | 200 | | |
199 | 201 | | |
200 | | - | |
201 | | - | |
202 | 202 | | |
203 | | - | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
204 | 225 | | |
205 | 226 | | |
206 | 227 | | |
| |||
229 | 250 | | |
230 | 251 | | |
231 | 252 | | |
| 253 | + | |
232 | 254 | | |
233 | 255 | | |
234 | 256 | | |
| |||
241 | 263 | | |
242 | 264 | | |
243 | 265 | | |
| 266 | + | |
244 | 267 | | |
245 | 268 | | |
246 | 269 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
117 | 140 | | |
118 | 141 | | |
119 | 142 | | |
120 | | - | |
| 143 | + | |
121 | 144 | | |
122 | | - | |
| 145 | + | |
123 | 146 | | |
124 | 147 | | |
125 | 148 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
50 | 58 | | |
51 | 59 | | |
52 | 60 | | |
| |||
0 commit comments