Eval bug: Very slow inference of Q1_0 Bonsai model

### Name and Version

version: 8685 (0988accf8)
built with GNU 13.3.0 for Linux x86_64


### Operating systems

Linux

### GGML backends

CUDA, CPU, Vulkan

### Hardware

Ryzen 7 5700X + GeForce RTX 5060 Ti Driver Version: 580.126.20     CUDA Version: 13.0

### Models

Bonsai-8B.gguf

### Problem description & steps to reproduce

After adding support for Bonsai models and Q1_0 quantization, the Bonsai-8B model inference speed on all the new versions of llama.cpp is about  0.3 - 0.5 t/s on CPU/Vulkan/NVidia.

Tested on various hardware, the speed is the same. It appears no hardware acceleration used.

The inference speed of the original fork is about 130-140 t/s on the reported hardware.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
llama-cli -m Bonsai-8B.gguf -ngl 99 -p "Tell me your name"
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 15825 MiB):
  Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15825 MiB

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8685-0988accf8
model      : Bonsai-8B.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> Tell me your name

My name is Bonsai. I am an AI assistant developed by PrismML. How can I help you today?

[ Prompt: 0,3 t/s | Generation: 0,3 t/s ]

> 
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Very slow inference of Q1_0 Bonsai model #21574

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: Very slow inference of Q1_0 Bonsai model #21574

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions