Bonsai Q1_0 benchmarks — Strix Halo Vulkan (gfx1151)

## Bonsai 1-bit Benchmark Results — AMD Strix Halo

**Hardware:** AMD Ryzen AI MAX+ 395, Radeon 8060S (gfx1151), 128GB unified RAM
**OS:** Arch Linux, Kernel 7.0.0-1-mainline
**Build:** Prism llama.cpp b8796-e2d6742, Vulkan backend
**Runs:** 3, stddev reported

### Results

| Model | Params | Size | pp512 t/s | tg128 t/s |
|-------|--------|------|----------:|----------:|
| Bonsai-1.7B | 1.72B | 231 MB | **3,120.8** ±33 | **136.8** ±0.2 |
| Bonsai-4B | 4.02B | 540 MB | **1,401.3** ±7 | **85.0** ±0.3 |
| Bonsai-8B | 8.19B | 1.07 GB | **831.4** ±2 | **63.8** ±0.1 |
| Qwen3-Coder-Next 80B.A3B | 79.67B MoE | 17.6 GB | **712.4** ±7 | **64.9** ±0.0 |

### Notes

- All models loaded fully on GPU (`-ngl 99`)
- Vulkan backend auto-selected `zen4` CPU backend + Vulkan GPU
- Qwen3-Coder-Next detected as `IQ1_S - 1.5625 bpw` (TQ1_0 GGUF)
- Bonsai models detected as `Q1_0`
- Strix Halo has 512MB dedicated VRAM + 115GB shared GTT (unified memory architecture)

### What didn't work

- `i2_s` format (Microsoft BitNet 2B-4T, Falcon3-7B 1.58-bit) fails to load — "failed to load model"
- Expected — i2_s needs dedicated BitNet kernel support not yet in Prism b8796

### Context

Running a multi-backend AI stack on this hardware. Full results across MLX ROCm, vLLM ROCm, Vulkan llamacpp, and NPU (FLM) at: https://github.com/stampby/bleeding-edge

Raw CSV: https://github.com/stampby/bleeding-edge/blob/main/results/bench-1bit-20260415.csv

Happy to run additional benchmarks — different models, context sizes, batch sizes. Just let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonsai Q1_0 benchmarks — Strix Halo Vulkan (gfx1151) #26

Bonsai 1-bit Benchmark Results — AMD Strix Halo

Results

Notes

What didn't work

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Params	Size	pp512 t/s	tg128 t/s
Bonsai-1.7B	1.72B	231 MB	3,120.8 ±33	136.8 ±0.2
Bonsai-4B	4.02B	540 MB	1,401.3 ±7	85.0 ±0.3
Bonsai-8B	8.19B	1.07 GB	831.4 ±2	63.8 ±0.1
Qwen3-Coder-Next 80B.A3B	79.67B MoE	17.6 GB	712.4 ±7	64.9 ±0.0

Bonsai Q1_0 benchmarks — Strix Halo Vulkan (gfx1151) #26

Description

Bonsai 1-bit Benchmark Results — AMD Strix Halo

Results

Notes

What didn't work

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions