Skip to content

Commit 17fb626

Browse files
committed
docs: fix test hardware to M5 Pro 64GB
1 parent d6dc654 commit 17fb626

1 file changed

Lines changed: 5 additions & 6 deletions

File tree

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,12 @@ No Python runtime, no Global Interpreter Lock (GIL), no unnecessary memory copie
3636

3737
## 💻 Tested Hardware & Benchmarks
3838

39-
To reliably run massive 122B parameter MoE models over SSD streaming, `mlx-server` was designed and benchmarked on the following hardware:
39+
To reliably run massive 122B parameter MoE models over SSD streaming, `mlx-server` was designed and benchmarked natively on the following hardware:
4040

41-
- **Machine**: MacBook Pro, Apple M3 Max
42-
- **Chip**: 16-core CPU (12P + 4E), 40-core GPU, 16-core ANE
43-
- **Memory**: 48 GB Unified (~400 GB/s bandwidth)
44-
- **SSD**: 1TB Apple Fabric, 17.5 GB/s sequential read (measured)
45-
- **OS**: macOS 26.2 (Darwin 25.2.0)
41+
- **Machine**: MacBook Pro, Apple M5 Pro
42+
- **Memory**: 64 GB Unified Memory
43+
- **Model**: Qwen3.5-122B-A10B-4bit
44+
- **SSD**: Internal Apple NVMe (Zero-Copy Streaming)
4645

4746
> **⚠️ Quantization Disclaimer**: While heavier quantization shrinks the required memory footprint, **4-bit quantization** remains the strict production standard for MoE models. Our metrics indicated that aggressive 2-bit quantization heavily destabilizes JSON grammars—routinely producing broken keys like `\name\` instead of `"name"`—which systematically breaks OpenAI-compatible tool calling.
4847

0 commit comments

Comments
 (0)