You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Install `cmake` via Homebrew (if not already installed)
40
+
3. Compile `mlx.metallib` from the Metal kernel sources
41
+
4. Build the `SwiftLM` binary in release mode
42
+
43
+
Then start the server (models download automatically if not cached):
44
+
```bash
45
+
.build/release/SwiftLM \
46
+
--model mlx-community/Qwen3.5-122B-A10B-4bit \
47
+
--stream-experts \
48
+
--port 5413
49
+
```
50
+
51
+
*(Add `--stream-experts` when running oversized MoE models like Qwen3.5 122B to bypass macOS virtual memory swapping and stream expert layers directly from NVMe.)*
52
+
13
53
## 📊 Performance: Gemma 4-26B on Apple Silicon
14
54
15
55
Benchmark results for `gemma-4-26b-a4b-it-4bit` (26B MoE, 4-bit) on M5 Pro 64 GB.
@@ -148,45 +188,7 @@ Then in Xcode:
148
188
149
189
---
150
190
151
-
## 🛠️ Quick Start (macOS Server)
152
-
153
-
### Fastest: Download Pre-built Binary
154
-
155
-
Download the latest release tarball from the [Releases page](https://github.com/SharpAI/SwiftLM/releases).
156
-
The archive is **self-contained** — `mlx.metallib` is bundled alongside the binary.
2. Install `cmake` via Homebrew (if not already installed)
178
-
3. Compile `mlx.metallib` from the Metal kernel sources
179
-
4. Build the `SwiftLM` binary in release mode
180
-
181
-
Then run:
182
-
```bash
183
-
.build/release/SwiftLM \
184
-
--model mlx-community/Qwen3.5-122B-A10B-4bit \
185
-
--stream-experts \
186
-
--port 5413
187
-
```
188
-
189
-
*(Add `--stream-experts` when running oversized MoE models like Qwen3.5 122B to bypass macOS virtual memory swapping and stream expert layers directly from NVMe.)*
0 commit comments