Skip to content

Commit 33e1511

Browse files
committed
docs: warn against Python mlx-metal metallib version mismatch
Using pip's mlx-metal package to source the metallib causes a 'freed pointer was not the last allocation' crash during inference because the Metal GPU kernel ABI doesn't match the compiled Swift binary. Document that LocalPackages/mlx-swift/ is the only version-matched source and must be used.
1 parent e6556fc commit 33e1511

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -117,19 +117,21 @@ cd SwiftLM
117117
swift build -c release
118118
```
119119

120-
`default.metallib` is a pre-built artifact inside the `mlx-swift` submodule at:
121-
`LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib`
122-
123-
Copy it next to the binary before running:
120+
`default.metallib` is a pre-built artifact inside the `mlx-swift` submodule, version-matched to the Swift binary. Copy it next to the binary before running:
124121

125122
```bash
126-
cp LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib .build/release/
123+
cp LocalPackages/mlx-swift/Source/Cmlx/mlx/mlx/backend/metal/kernels/default.metallib \
124+
.build/release/
125+
127126
.build/release/SwiftLM \
128127
--model mlx-community/Qwen3.5-122B-A10B-4bit \
129128
--stream-experts \
130129
--port 5413
131130
```
132131

132+
> **⚠️ Do NOT use Python's `mlx-metal` package as a source for `mlx.metallib`.**
133+
> While `uv run --with mlx-metal python -c "...shutil.copy(metallib, ...)"` will get the server to start, the pip `mlx-metal` package is a **different version** of MLX than what this binary was compiled against. The version mismatch causes GPU kernel ABI corruption during inference, producing a `freed pointer was not the last allocation` crash. Always use the metallib from `LocalPackages/mlx-swift/` — it is the only version-matched artifact for this build.
134+
133135
*(Add `--stream-experts` when running oversized MoE models like Qwen3.5 122B to bypass macOS virtual memory swapping and stream expert layers directly from NVMe.)*
134136

135137
---

0 commit comments

Comments
 (0)