Skip to content

Commit f93da86

Browse files
committed
docs: remove vLLM completely from matrix
1 parent 17fb626 commit f93da86

1 file changed

Lines changed: 9 additions & 9 deletions

File tree

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ No Python runtime, no Global Interpreter Lock (GIL), no unnecessary memory copie
1717

1818
## 🆚 Why `mlx-server`? (vs. llama.cpp & python mlx-lm)
1919

20-
| Feature | `mlx-server` (Swift) | `llama.cpp` (Metal) | `python mlx-lm` | `vLLM` (Flash-MoE) |
21-
| :--- | :--- | :--- | :--- | :--- |
22-
| **Backend Math** | Official Apple MLX (Metal) | Custom Metal Shaders | Official Apple MLX | NVIDIA CUDA / Triton |
23-
| **Target Hardware** | Consumer Apple Silicon | Universal (CPU/Mac) | Consumer Apple Silicon | Datacenter NVIDIA GPUs |
24-
| **Concurrency / GIL** | 🟢 **Zero GIL** (Swift async) | 🟢 **Zero GIL** (C++) | 🔴 **GIL Bottlenecked** (Python) | 🟢 **Zero GIL** (C++/Python) |
25-
| **Model Format** | Native HF (Safetensors) | GGUF (Requires Conversion) | Native HF (Safetensors) | Native HF (Safetensors) |
26-
| **MoE Memory Footprint**| 🟢 **Direct SSD Streaming** | 🟡 CPU `mmap` Swapping | 🔴 OS Swap (High pressure) | 🟢 **Flash-MoE** (High VRAM required) |
27-
| **KV Cache** | 🟢 **TurboQuantization** | 🟢 Aggressive Quantization | 🟡 Standard Python Hooks | 🟢 PagedAttention |
28-
| **Dependencies** | None (Single Native Binary) | None (Single Native Binary) | Python Runtime, `pip` | Heavy CUDA Python Environment |
20+
| Feature | `mlx-server` (Swift) | `llama.cpp` (Metal) | `python mlx-lm` |
21+
| :--- | :--- | :--- | :--- |
22+
| **Backend Math** | Official Apple MLX (Metal) | Custom Metal Shaders | Official Apple MLX |
23+
| **Target Hardware** | Consumer Apple Silicon | Universal (CPU/Mac) | Consumer Apple Silicon |
24+
| **Concurrency / GIL** | 🟢 **Zero GIL** (Swift async) | 🟢 **Zero GIL** (C++) | 🔴 **GIL Bottlenecked** (Python) |
25+
| **Model Format** | Native HF (Safetensors) | GGUF (Requires Conversion) | Native HF (Safetensors) |
26+
| **MoE Memory Footprint**| 🟢 **Direct SSD Streaming** | 🟡 CPU `mmap` Swapping | 🔴 OS Swap (High pressure) |
27+
| **KV Cache** | 🟢 **TurboQuantization** | 🟢 Aggressive Quantization | 🟡 Standard Python Hooks |
28+
| **Dependencies** | None (Single Native Binary) | None (Single Native Binary) | Python Runtime, `pip` |
2929

3030
**The TL;DR:**
3131
- Use **`llama.cpp`** if you prefer GGUF formats and are running cross-platform on Windows/Linux.

0 commit comments

Comments
 (0)