Bonsai 1-bit Benchmark Results — AMD Strix Halo
Hardware: AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB unified RAM
OS: Arch Linux, Kernel 7.0.0-1-mainline
Build: Prism llama.cpp b8796-e2d6742, Vulkan backend
Runs: 3, stddev reported
Results
| Model |
Params |
Size |
pp512 t/s |
tg128 t/s |
| Bonsai-1.7B |
1.72B |
231 MB |
3,120.8 ±33 |
136.8 ±0.2 |
| Bonsai-4B |
4.02B |
540 MB |
1,401.3 ±7 |
85.0 ±0.3 |
| Bonsai-8B |
8.19B |
1.07 GB |
831.4 ±2 |
63.8 ±0.1 |
| Qwen3-Coder-Next 80B.A3B |
79.67B MoE |
17.6 GB |
712.4 ±7 |
64.9 ±0.0 |
Notes
- All models loaded fully on GPU (
-ngl 99)
- Vulkan backend auto-selected
zen4 CPU backend + Vulkan GPU
- Qwen3-Coder-Next detected as
IQ1_S - 1.5625 bpw (TQ1_0 GGUF)
- Bonsai models detected as
Q1_0
- Strix Halo has 512MB dedicated VRAM + 115GB shared GTT (unified memory architecture)
What didn't work
i2_s format (Microsoft BitNet 2B-4T, Falcon3-7B 1.58-bit) fails to load — "failed to load model"
- Expected — i2_s needs dedicated BitNet kernel support not yet in Prism b8796
Context
Running a multi-backend AI stack on this hardware. Full results across MLX ROCm, vLLM ROCm, Vulkan llamacpp, and NPU (FLM) at: https://github.com/stampby/bleeding-edge
Raw CSV: https://github.com/stampby/bleeding-edge/blob/main/results/bench-1bit-20260415.csv
Happy to run additional benchmarks — different models, context sizes, batch sizes. Just let me know.
Bonsai 1-bit Benchmark Results — AMD Strix Halo
Hardware: AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB unified RAM
OS: Arch Linux, Kernel 7.0.0-1-mainline
Build: Prism llama.cpp b8796-e2d6742, Vulkan backend
Runs: 3, stddev reported
Results
Notes
-ngl 99)zen4CPU backend + Vulkan GPUIQ1_S - 1.5625 bpw(TQ1_0 GGUF)Q1_0What didn't work
i2_sformat (Microsoft BitNet 2B-4T, Falcon3-7B 1.58-bit) fails to load — "failed to load model"Context
Running a multi-backend AI stack on this hardware. Full results across MLX ROCm, vLLM ROCm, Vulkan llamacpp, and NPU (FLM) at: https://github.com/stampby/bleeding-edge
Raw CSV: https://github.com/stampby/bleeding-edge/blob/main/results/bench-1bit-20260415.csv
Happy to run additional benchmarks — different models, context sizes, batch sizes. Just let me know.